PoliciesMultiPlayers.rhoRandSticky module

rhoRandSticky: implementation of a variant of the multi-player policy rhoRand from [Distributed Algorithms for Learning…, Anandkumar et al., 2010](http://ieeexplore.ieee.org/document/5462144/).

  • Each child player is selfish, and plays according to an index policy (any index policy, e.g., UCB, Thompson, KL-UCB, BayesUCB etc),

  • But instead of aiming at the best (the 1-st best) arm, player i aims at the rank_i-th best arm,

  • At first, every player has a random rank_i from 1 to M, and when a collision occurs, rank_i is sampled from a uniform distribution on [1, .., M] where M is the number of player.

  • The only difference with rhoRand is that once a player selected a rank and did not encounter a collision for STICKY_TIME time steps, he will never change his rank. rhoRand has STICKY_TIME = +oo, MusicalChair is something like STICKY_TIME = 1, this variant rhoRandSticky has this as a parameter.

Note

This is not fully decentralized: as each child player needs to know the (fixed) number of players.

PoliciesMultiPlayers.rhoRandSticky.STICKY_TIME = 10

Default value for STICKY_TIME

class PoliciesMultiPlayers.rhoRandSticky.oneRhoRandSticky(maxRank, stickyTime, *args, **kwargs)[source]

Bases: PoliciesMultiPlayers.rhoRand.oneRhoRand

Class that acts as a child policy, but in fact it pass all its method calls to the mother class, who passes it to its i-th player.

  • Except for the handleCollision method: a new random rank is sampled after observing a collision,

  • And the player does not aim at the best arm, but at the rank-th best arm, based on her index policy.

__init__(maxRank, stickyTime, *args, **kwargs)[source]

Initialize self. See help(type(self)) for accurate signature.

maxRank = None

Max rank, usually nbPlayers but can be different

stickyTime = None

Number of time steps needed without collisions before sitting (never changing rank again)

rank = None

Current rank, starting to 1 by default

sitted = None

Not yet sitted. After stickyTime steps without collisions, sit and never change rank again.

stepsWithoutCollisions = None

Number of steps since we chose that rank and did not see any collision. As soon as this gets greater than stickyTime, the player sit.

__str__()[source]

Return str(self).

startGame()[source]

Start game.

handleCollision(arm, reward=None)[source]

Get a new fully random rank, and give reward to the algorithm if not None.

getReward(arm, reward)[source]

Pass the call to self.mother._getReward_one(playerId, arm, reward) with the player’s ID number.

  • Additionally, if the current rank was good enough to not bring any collision during the last stickyTime time steps, the player “sits” on that rank.

__module__ = 'PoliciesMultiPlayers.rhoRandSticky'
class PoliciesMultiPlayers.rhoRandSticky.rhoRandSticky(nbPlayers, nbArms, playerAlgo, stickyTime=10, maxRank=None, lower=0.0, amplitude=1.0, *args, **kwargs)[source]

Bases: PoliciesMultiPlayers.rhoRand.rhoRand

rhoRandSticky: implementation of a variant of the multi-player policy rhoRand from [Distributed Algorithms for Learning…, Anandkumar et al., 2010](http://ieeexplore.ieee.org/document/5462144/).

__init__(nbPlayers, nbArms, playerAlgo, stickyTime=10, maxRank=None, lower=0.0, amplitude=1.0, *args, **kwargs)[source]
  • nbPlayers: number of players to create (in self._players).

  • playerAlgo: class to use for every players.

  • nbArms: number of arms, given as first argument to playerAlgo.

  • stickyTime: given to the oneRhoRandSticky objects (see above).

  • maxRank: maximum rank allowed by the rhoRandSticky child (default to nbPlayers, but for instance if there is 2 × rhoRandSticky[UCB] + 2 × rhoRandSticky[klUCB], maxRank should be 4 not 2).

  • *args, **kwargs: arguments, named arguments, given to playerAlgo.

Example:

>>> from Policies import *
>>> import random; random.seed(0); import numpy as np; np.random.seed(0)
>>> nbArms = 17
>>> nbPlayers = 6
>>> stickyTime = 5
>>> s = rhoRandSticky(nbPlayers, nbArms, UCB, stickyTime=stickyTime)
>>> [ child.choice() for child in s.children ]
[12, 15, 0, 3, 3, 7]
>>> [ child.choice() for child in s.children ]
[9, 4, 6, 12, 1, 6]
  • To get a list of usable players, use s.children.

Warning

s._players is for internal use ONLY!

maxRank = None

Max rank, usually nbPlayers but can be different

stickyTime = None

Number of time steps needed without collisions before sitting (never changing rank again)

nbPlayers = None

Number of players

children = None

List of children, fake algorithms

nbArms = None

Number of arms

__str__()[source]

Return str(self).

__module__ = 'PoliciesMultiPlayers.rhoRandSticky'