# PoliciesMultiPlayers.EstimateM module¶

EstimateM: generic wrapper on a multi-player decentralized learning policy, to learn on the run the number of players, adapted from rhoEst from [Distributed Algorithms for Learning…, Anandkumar et al., 2010](http://ieeexplore.ieee.org/document/5462144/).

• The procedure to estimate $$\hat{M}_i(t)$$ is not so simple, but basically everyone starts with $$\hat{M}_i(0) = 1$$, and when colliding $$\hat{M}_i(t+1) = \hat{M}_i(t) + 1$$, for some time (with a complicated threshold).

• My choice for the threshold function, see threshold_on_t(), does not need the horizon either, and uses $$t$$ instead.

Note

This is fully decentralized: each child player does NOT need to know the number of players and does NOT require the horizon $$T$$.

Warning

This is still very experimental!

Note

For a less generic approach, see the policies defined in rhoEst.rhoEst (generalizing rhoRand.rhoRand) and RandTopMEst.RandTopMEst (generalizing RandTopM.RandTopM).

PoliciesMultiPlayers.EstimateM.threshold_on_t_with_horizon(t, nbPlayersEstimate, horizon=None)[source]

Function $$\xi(T, k)$$ used as a threshold in rhoEstPlus.

Warning

It requires the horizon $$T$$, and does not use the current time $$t$$.

Example:

>>> threshold_on_t_with_horizon(1000, 3)
14.287...
>>> threshold_on_t_with_horizon(1000, 3, horizon=2000)
16.357...

PoliciesMultiPlayers.EstimateM.threshold_on_t_doubling_trick(t, nbPlayersEstimate, horizon=None, base=2, min_fake_horizon=1000, T0=1)[source]

A trick to have a threshold depending on a growing horizon (doubling-trick).

• Instead of using $$t$$ or $$T$$, a fake horizon $$T_t$$ is used, corresponding to the horizon a doubling-trick algorithm would be using at time $$t$$.

• $$T_t = T_0 b^{\lceil \log_b(t) \rceil}$$ is the default choice, for $$b=2$$ $$T_0 = 10$$.

• If $$T_t$$ is too small, min_fake_horizon is used instead.

Warning

This is ongoing research!

Example:

>>> threshold_on_t_doubling_trick(1000, 3)
14.356...
>>> threshold_on_t_doubling_trick(1000, 3, horizon=2000)
14.356...

PoliciesMultiPlayers.EstimateM.threshold_on_t(t, nbPlayersEstimate, horizon=None)[source]

Function $$\xi(t, k)$$ used as a threshold in rhoEst.

• 0 if nbPlayersEstimate is 0,

• 1 if nbPlayersEstimate is 1,

• My heuristic to be any-time (ie, without needing to know the horizon) is to use a function of $$t$$ (current time) and not $$T$$ (horizon).

• The choice which seemed to perform the best in practice was $$\xi(t, k) = c t$$ for a small constant $$c$$ (like 5 or 10).

Example:

>>> threshold_on_t(1000, 3)
47.730...
>>> threshold_on_t(1000, 3, horizon=2000)
47.730...

class PoliciesMultiPlayers.EstimateM.oneEstimateM(nbArms, playerAlgo, threshold, decentralizedPolicy, *args, lower=0.0, amplitude=1.0, horizon=None, args_decentralizedPolicy=None, kwargs_decentralizedPolicy=None, **kwargs)[source]

Class that acts as a child policy, but in fact it pass all its method calls to the mother class, who passes it to its i-th player.

• The procedure to estimate $$\hat{M}_i(t)$$ is not so simple, but basically everyone starts with $$\hat{M}_i(0) = 1$$, and when colliding $$\hat{M}_i(t+1) = \hat{M}_i(t) + 1$$, for some time (with a complicated threshold).

__init__(nbArms, playerAlgo, threshold, decentralizedPolicy, *args, lower=0.0, amplitude=1.0, horizon=None, args_decentralizedPolicy=None, kwargs_decentralizedPolicy=None, **kwargs)[source]

Initialize self. See help(type(self)) for accurate signature.

threshold = None

Threshold function

nbPlayersEstimate = None

Number of players. Optimistic: start by assuming it is alone!

collisionCount = None

Count collisions on each arm, since last increase of nbPlayersEstimate

timeSinceLastCollision = None

Time since last collision. Don’t remember why I thought using this could be useful… But it’s not!

t = None

Internal time

__str__()[source]

Return str(self).

updateNbPlayers(nbPlayers=None)[source]

Change the value of nbPlayersEstimate, and propagate the change to the underlying policy, for parameters called maxRank or nbPlayers.

startGame()[source]

Start game.

handleCollision(arm, reward=None)[source]

Select a new rank, and maybe update nbPlayersEstimate.

getReward(arm, reward)[source]

One transmission without collision.

choice()[source]

Pass the call to self._policy.choice() with the player’s ID number.

choiceWithRank(rank=1)[source]

Pass the call to self._policy.choiceWithRank() with the player’s ID number.

choiceFromSubSet(availableArms='all')[source]

Pass the call to self._policy.choiceFromSubSet() with the player’s ID number.

choiceMultiple(nb=1)[source]

Pass the call to self._policy.choiceMultiple() with the player’s ID number.

choiceIMP(nb=1)[source]

Pass the call to self._policy.choiceIMP() with the player’s ID number.

estimatedOrder()[source]

Pass the call to self._policy.estimatedOrder() with the player’s ID number.

estimatedBestArms(M=1)[source]

Pass the call to self._policy.estimatedBestArms() with the player’s ID number.

__module__ = 'PoliciesMultiPlayers.EstimateM'
class PoliciesMultiPlayers.EstimateM.EstimateM(nbPlayers, nbArms, decentralizedPolicy, playerAlgo, policyArgs=None, horizon=None, threshold=<function threshold_on_t_doubling_trick>, lower=0.0, amplitude=1.0, *args, **kwargs)[source]

EstimateM: a generic wrapper for an efficient multi-players learning policy, with no prior knowledge of the number of player, and using any other MP policy.

__init__(nbPlayers, nbArms, decentralizedPolicy, playerAlgo, policyArgs=None, horizon=None, threshold=<function threshold_on_t_doubling_trick>, lower=0.0, amplitude=1.0, *args, **kwargs)[source]

Example:

>>> from Policies import *; from PoliciesMultiPlayers import *
>>> import random; random.seed(0); import numpy as np; np.random.seed(0)
>>> nbArms = 4
>>> nbPlayers = 2
>>> s = EstimateM(nbPlayers, nbArms, rhoRand, UCBalpha, alpha=0.5)
>>> [ child.choice() for child in s.children ]
[0, 3]

• To get a list of usable players, use s.children.

Warning

s._players is for internal use ONLY!

nbPlayers = None

Number of players

children = None

List of children, fake algorithms

nbArms = None

Number of arms

__str__()[source]

Return str(self).

__module__ = 'PoliciesMultiPlayers.EstimateM'