PoliciesMultiPlayers.rhoEst module¶

rhoEst: implementation of the 2nd multi-player policy from [Distributed Algorithms for Learning…, Anandkumar et al., 2010](http://ieeexplore.ieee.org/document/5462144/).

Each child player is selfish, and plays according to an index policy (any index policy, e.g., UCB, Thompson, KL-UCB, BayesUCB etc),
But instead of aiming at the best (the 1-st best) arm, player i aims at the rank_i-th best arm,
At first, every player has a random rank_i from 1 to M, and when a collision occurs, rank_i is sampled from a uniform distribution on \([1, \dots, \hat{M}_i(t)]\) where \(\hat{M}_i(t)\) is the current estimated number of player by player i,
The procedure to estimate \(\hat{M}_i(t)\) is not so simple, but basically everyone starts with \(\hat{M}_i(0) = 1\), and when colliding \(\hat{M}_i(t+1) = \hat{M}_i(t) + 1\), for some time (with a complicated threshold).
My choice for the threshold function, see threshold_on_t(), does not need the horizon either, and uses \(t\) instead.

Note

This is fully decentralized: each child player does NOT need to know the number of players and does NOT require the horizon \(T\).

Note

For a more generic approach, see the wrapper defined in EstimateM.EstimateM.

class PoliciesMultiPlayers.rhoEst.oneRhoEst(threshold, *args, **kwargs)[source]¶

Bases: PoliciesMultiPlayers.rhoRand.oneRhoRand

Class that acts as a child policy, but in fact it pass all its method calls to the mother class, who passes it to its i-th player.

Except for the handleCollision method: a new random rank is sampled after observing a collision,
And the player does not aim at the best arm, but at the rank-th best arm, based on her index policy,
The rhoEst policy is used to keep an estimate on the total number of players, \(\hat{M}_i(t)\).
The procedure to estimate \(\hat{M}_i(t)\) is not so simple, but basically everyone starts with \(\hat{M}_i(0) = 1\), and when colliding \(\hat{M}_i(t+1) = \hat{M}_i(t) + 1\), for some time (with a complicated threshold).

__init__(threshold, *args, **kwargs)[source]¶: Initialize self. See help(type(self)) for accurate signature.

threshold = None¶: Threshold function

nbPlayersEstimate = None¶: Number of players. Optimistic: start by assuming it is alone!

rank = None¶: Current rank, starting to 1

collisionCount = None¶: Count collisions on each arm, since last increase of nbPlayersEstimate

timeSinceLastCollision = None¶: Time since last collision. Don’t remember why I thought using this could be useful… But it’s not!

t = None¶: Internal time

__str__()[source]¶: Return str(self).

startGame()[source]¶: Start game.

handleCollision(arm, reward=None)[source]¶: Select a new rank, and maybe update nbPlayersEstimate.

getReward(arm, reward)[source]¶: One transmission without collision.

__module__ = 'PoliciesMultiPlayers.rhoEst'¶

class PoliciesMultiPlayers.rhoEst.rhoEst(nbPlayers, nbArms, playerAlgo, threshold=<function threshold_on_t_doubling_trick>, lower=0.0, amplitude=1.0, *args, **kwargs)[source]¶

Bases: PoliciesMultiPlayers.rhoRand.rhoRand

rhoEst: implementation of the 2nd multi-player policy from [Distributed Algorithms for Learning…, Anandkumar et al., 2010](http://ieeexplore.ieee.org/document/5462144/).

__init__(nbPlayers, nbArms, playerAlgo, threshold=<function threshold_on_t_doubling_trick>, lower=0.0, amplitude=1.0, *args, **kwargs)[source]¶

nbPlayers: number of players to create (in self._players).
playerAlgo: class to use for every players.
nbArms: number of arms, given as first argument to playerAlgo.
threshold: the threshold function to use, see EstimateM.threshold_on_t_with_horizon(), EstimateM.threshold_on_t_doubling_trick() or EstimateM.threshold_on_t() above.
*args, **kwargs: arguments, named arguments, given to playerAlgo.

Example:

>>> from Policies import *
>>> import random; random.seed(0); import numpy as np; np.random.seed(0)
>>> nbArms = 17
>>> nbPlayers = 6
>>> s = rhoEst(nbPlayers, nbArms, UCB, threshold=threshold_on_t)
>>> [ child.choice() for child in s.children ]
[12, 15, 0, 3, 3, 7]
>>> [ child.choice() for child in s.children ]
[9, 4, 6, 12, 1, 6]

To get a list of usable players, use s.children.
Warning: s._players is for internal use ONLY!

nbPlayers = None¶: Number of players

children = None¶: List of children, fake algorithms

nbArms = None¶: Number of arms

__str__()[source]¶: Return str(self).

__module__ = 'PoliciesMultiPlayers.rhoEst'¶

class PoliciesMultiPlayers.rhoEst.rhoEstPlus(nbPlayers, nbArms, playerAlgo, horizon, lower=0.0, amplitude=1.0, *args, **kwargs)[source]¶

Bases: PoliciesMultiPlayers.rhoRand.rhoRand

rhoEstPlus: implementation of the 2nd multi-player policy from [Distributed Algorithms for Learning…, Anandkumar et al., 2010](http://ieeexplore.ieee.org/document/5462144/).

__init__(nbPlayers, nbArms, playerAlgo, horizon, lower=0.0, amplitude=1.0, *args, **kwargs)[source]¶

nbPlayers: number of players to create (in self._players).
playerAlgo: class to use for every players.
nbArms: number of arms, given as first argument to playerAlgo.
horizon: need to know the horizon \(T\).
*args, **kwargs: arguments, named arguments, given to playerAlgo.

Example:

>>> from Policies import *
>>> import random; random.seed(0); import numpy as np; np.random.seed(0)
>>> nbArms = 17
>>> nbPlayers = 6
>>> horizon = 1000
>>> s = rhoEstPlus(nbPlayers, nbArms, UCB, horizon=horizon)
>>> [ child.choice() for child in s.children ]
[12, 15, 0, 3, 3, 7]
>>> [ child.choice() for child in s.children ]
[9, 4, 6, 12, 1, 6]

To get a list of usable players, use s.children.
Warning: s._players is for internal use ONLY!

nbPlayers = None¶: Number of players

children = None¶: List of children, fake algorithms

nbArms = None¶: Number of arms

__module__ = 'PoliciesMultiPlayers.rhoEst'¶

__str__()[source]¶: Return str(self).