PoliciesMultiPlayers.rhoLearn module

rhoLearn: implementation of the multi-player policy from [Distributed Algorithms for Learning…, Anandkumar et al., 2010](http://ieeexplore.ieee.org/document/5462144/), using a learning algorithm instead of a random exploration for choosing the rank.

  • Each child player is selfish, and plays according to an index policy (any index policy, e.g., UCB, Thompson, KL-UCB, BayesUCB etc),

  • But instead of aiming at the best (the 1-st best) arm, player i aims at the rank_i-th best arm,

  • At first, every player has a random rank_i from 1 to M, and when a collision occurs, rank_i is given by a second learning algorithm, playing on arms = ranks from [1, .., M], where M is the number of player.

  • If rankSelection = Uniform, this is like rhoRand, but if it is a smarter policy, it might be better! Warning: no theoretical guarantees exist!

  • Reference: [Proof-of-Concept System for Opportunistic Spectrum Access in Multi-user Decentralized Networks, S.J.Darak, C.Moy, J.Palicot, EAI 2016](https://doi.org/10.4108/eai.5-9-2016.151647), algorithm 2. (for BayesUCB only)

Note

This is not fully decentralized: as each child player needs to know the (fixed) number of players.

PoliciesMultiPlayers.rhoLearn.CHANGE_RANK_EACH_STEP = False

Should oneRhoLearn players select a (possibly new) rank at each step ? The algorithm P2 from https://doi.org/10.4108/eai.5-9-2016.151647 suggests to do so. But I found it works better without this trick.

class PoliciesMultiPlayers.rhoLearn.oneRhoLearn(maxRank, rankSelectionAlgo, change_rank_each_step, *args, **kwargs)[source]

Bases: PoliciesMultiPlayers.rhoRand.oneRhoRand

Class that acts as a child policy, but in fact it pass all its method calls to the mother class, who passes it to its i-th player.

  • Except for the handleCollision method: a (possibly new) rank is sampled after observing a collision, from the rankSelection algorithm.

  • When no collision is observed on a arm, a small reward is given to the rank used for this play, in order to learn the best ranks with rankSelection.

  • And the player does not aim at the best arm, but at the rank-th best arm, based on her index policy.

__init__(maxRank, rankSelectionAlgo, change_rank_each_step, *args, **kwargs)[source]

Initialize self. See help(type(self)) for accurate signature.

maxRank = None

Max rank, usually nbPlayers but can be different

rank = None

Current rank, starting to 1

change_rank_each_step = None

Change rank at each step?

__str__()[source]

Return str(self).

startGame()[source]

Initialize both rank and arm selection algorithms.

getReward(arm, reward)[source]

Give a 1 reward to the rank selection algorithm (no collision), give reward to the arm selection algorithm, and if self.change_rank_each_step, select a (possibly new) rank.

handleCollision(arm, reward=None)[source]

Give a 0 reward to the rank selection algorithm, and select a (possibly new) rank.

__module__ = 'PoliciesMultiPlayers.rhoLearn'
class PoliciesMultiPlayers.rhoLearn.rhoLearn(nbPlayers, nbArms, playerAlgo, rankSelectionAlgo=<class 'Policies.Uniform.Uniform'>, lower=0.0, amplitude=1.0, maxRank=None, change_rank_each_step=False, *args, **kwargs)[source]

Bases: PoliciesMultiPlayers.rhoRand.rhoRand

rhoLearn: implementation of the multi-player policy from [Distributed Algorithms for Learning…, Anandkumar et al., 2010](http://ieeexplore.ieee.org/document/5462144/), using a learning algorithm instead of a random exploration for choosing the rank.

__init__(nbPlayers, nbArms, playerAlgo, rankSelectionAlgo=<class 'Policies.Uniform.Uniform'>, lower=0.0, amplitude=1.0, maxRank=None, change_rank_each_step=False, *args, **kwargs)[source]
  • nbPlayers: number of players to create (in self._players).

  • playerAlgo: class to use for every players.

  • nbArms: number of arms, given as first argument to playerAlgo.

  • rankSelectionAlgo: algorithm to use for selecting the ranks.

  • maxRank: maximum rank allowed by the rhoRand child (default to nbPlayers, but for instance if there is 2 × rhoRand[UCB] + 2 × rhoRand[klUCB], maxRank should be 4 not 2).

  • *args, **kwargs: arguments, named arguments, given to playerAlgo.

Example:

>>> from Policies import *
>>> import random; random.seed(0); import numpy as np; np.random.seed(0)
>>> nbArms = 17
>>> nbPlayers = 6
>>> stickyTime = 5
>>> s = rhoLearn(nbPlayers, nbArms, UCB, UCB)
>>> [ child.choice() for child in s.children ]
[12, 15, 0, 3, 3, 7]
>>> [ child.choice() for child in s.children ]
[9, 4, 6, 12, 1, 6]
  • To get a list of usable players, use s.children.

  • Warning: s._players is for internal use ONLY!

maxRank = None

Max rank, usually nbPlayers but can be different

nbPlayers = None

Number of players

children = None

List of children, fake algorithms

rankSelectionAlgo = None

Policy to use to chose the ranks

nbArms = None

Number of arms

change_rank_each_step = None

Change rank at every steps?

__module__ = 'PoliciesMultiPlayers.rhoLearn'
__str__()[source]

Return str(self).