PoliciesMultiPlayers.rhoRandALOHA module¶

rhoRandALOHA: implementation of a variant of the multi-player policy rhoRand from [Distributed Algorithms for Learning…, Anandkumar et al., 2010](http://ieeexplore.ieee.org/document/5462144/).

Each child player is selfish, and plays according to an index policy (any index policy, e.g., UCB, Thompson, KL-UCB, BayesUCB etc),
But instead of aiming at the best (the 1-st best) arm, player i aims at the rank_i-th best arm,
At first, every player has a random rank_i from 1 to M, and when a collision occurs, rank_i is sampled from a uniform distribution on [1, .., M] where M is the number of player.
The only difference with rhoRand is that when colliding, users have a small chance of keeping the same rank, following a Bernoulli experiment: with probability = \(p(t)\), it keeps the same rank, with proba \(1 - p(t)\) it changes its rank (uniformly in \(\{1,\dots,M\}\), so there is a chance it finds the same again? FIXME).
There is also a variant, like in MEGA (ALOHA-like protocol), the proba change after time: p(t+1) = alpha p(t) + (1-alpha)

Note

This is not fully decentralized: as each child player needs to know the (fixed) number of players.

PoliciesMultiPlayers.rhoRandALOHA.random() → x in the interval [0, 1).¶

PoliciesMultiPlayers.rhoRandALOHA.new_rank(rank, maxRank, forceChange=False)[source]¶

Return a new rank, from \(1, \dots, \mathrm{maxRank}\), different than rank, uniformly.

Internally, it uses a simple rejection sampling : keep taking a new rank \(\sim U(\{1, \dots, \mathrm{maxRank}\})\), until it is different than rank (that’s not the most efficient way to do it but is simpler).

Example:

>>> from random import seed; seed(0)  # reproducibility
>>> [ new_rank(1, 8, False) for _ in range(10) ]
[7, 7, 1, 5, 9, 8, 7, 5, 8, 6]
>>> [ new_rank(8, 8, False) for _ in range(10) ]
[4, 9, 3, 5, 3, 2, 5, 9, 3, 5]

Example with forceChange = True, when a new rank is picked different than the current one.

>>> [ new_rank(1, 8, True) for _ in range(10) ]
[2, 2, 6, 8, 9, 2, 6, 7, 6, 4]
>>> [ new_rank(5, 8, True) for _ in range(10) ]
[9, 8, 8, 9, 1, 9, 1, 2, 7, 1]

class PoliciesMultiPlayers.rhoRandALOHA.oneRhoRandALOHA(maxRank, p0, alpha_p0, forceChange, *args, **kwargs)[source]¶

Bases: PoliciesMultiPlayers.rhoRand.oneRhoRand

Class that acts as a child policy, but in fact it pass all its method calls to the mother class, who passes it to its i-th player.

Except for the handleCollision method: a new random rank is sampled after observing a collision,
And the player does not aim at the best arm, but at the rank-th best arm, based on her index policy.

__init__(maxRank, p0, alpha_p0, forceChange, *args, **kwargs)[source]¶: Initialize self. See help(type(self)) for accurate signature.

maxRank = None¶: Max rank, usually nbPlayers but can be different

p0 = None¶: Initial probability, should not be modified.

p = None¶: Current probability of staying with the current rank after a collision. If 0, then it is like the initial rhoRand policy.

alpha_p0 = None¶: Parameter alpha for the recurrence equation for probability p(t)

rank = None¶: Current rank, starting to 1 by default

forceChange = None¶: Should a different rank be used when moving? Or not.

__str__()[source]¶: Return str(self).

startGame()[source]¶: Start game.

handleCollision(arm, reward=None)[source]¶: Get a new fully random rank, and give reward to the algorithm if not None.

getReward(arm, reward)[source]¶

Pass the call to self.mother._getReward_one(playerId, arm, reward) with the player’s ID number.

Additionally, if the current rank was good enough to not bring any collision during the last p0 time steps, the player “sits” on that rank.

__module__ = 'PoliciesMultiPlayers.rhoRandALOHA'¶

PoliciesMultiPlayers.rhoRandALOHA.P0 = 0.0¶: Default value for P0, ideally, it should be 1/(K*M) the number of player

PoliciesMultiPlayers.rhoRandALOHA.ALPHA_P0 = 0.9999¶: Default value for ALPHA_P0, FIXME I have no idea what the best possible choise ca be!

PoliciesMultiPlayers.rhoRandALOHA.FORCE_CHANGE = False¶: Default value for forceChange. Logically, it should be True.

class PoliciesMultiPlayers.rhoRandALOHA.rhoRandALOHA(nbPlayers, nbArms, playerAlgo, p0=None, alpha_p0=0.9999, forceChange=False, maxRank=None, lower=0.0, amplitude=1.0, *args, **kwargs)[source]¶

Bases: PoliciesMultiPlayers.rhoRand.rhoRand

rhoRandALOHA: implementation of a variant of the multi-player policy rhoRand from [Distributed Algorithms for Learning…, Anandkumar et al., 2010](http://ieeexplore.ieee.org/document/5462144/).

__init__(nbPlayers, nbArms, playerAlgo, p0=None, alpha_p0=0.9999, forceChange=False, maxRank=None, lower=0.0, amplitude=1.0, *args, **kwargs)[source]¶

nbPlayers: number of players to create (in self._players).
playerAlgo: class to use for every players.
nbArms: number of arms, given as first argument to playerAlgo.
p0: given to the oneRhoRandALOHA objects (see above).
alpha_p0: given to the oneRhoRandALOHA objects (see above).
forceChange: given to the oneRhoRandALOHA objects (see above).
maxRank: maximum rank allowed by the rhoRandALOHA child (default to nbPlayers, but for instance if there is 2 × rhoRandALOHA[UCB] + 2 × rhoRandALOHA[klUCB], maxRank should be 4 not 2).
*args, **kwargs: arguments, named arguments, given to playerAlgo.

Example:

>>> from Policies import *
>>> import random; random.seed(0); import numpy as np; np.random.seed(0)
>>> nbArms = 17
>>> nbPlayers = 6
>>> p0, alpha_p0, forceChange = 0.6, 0.5, True
>>> s = rhoRandALOHA(nbPlayers, nbArms, UCB, p0, alpha_p0, forceChange)
>>> [ child.choice() for child in s.children ]
[12, 15, 0, 3, 3, 7]
>>> [ child.choice() for child in s.children ]
[9, 4, 6, 12, 1, 6]

To get a list of usable players, use s.children.
Warning: s._players is for internal use ONLY!

maxRank = None¶: Max rank, usually nbPlayers but can be different

p0 = None¶: Initial value for p, current probability of staying with the current rank after a collision

alpha_p0 = None¶: Parameter alpha for the recurrence equation for probability p(t)

forceChange = None¶: Should a different rank be used when moving? Or not.

nbPlayers = None¶: Number of players

children = None¶: List of children, fake algorithms

nbArms = None¶: Number of arms

__str__()[source]¶: Return str(self).

__module__ = 'PoliciesMultiPlayers.rhoRandALOHA'¶