PoliciesMultiPlayers.rhoLearnEst module¶
rhoLearnEst: implementation of the multi-player policy from [Distributed Algorithms for Learning…, Anandkumar et al., 2010](http://ieeexplore.ieee.org/document/5462144/), using a learning algorithm instead of a random exploration for choosing the rank, and without knowing the number of users.
It generalizes
PoliciesMultiPlayers.rhoLearn.rhoLearn
simply by letting the ranks be \(\{1,\dots,K\}\) and not in \(\{1,\dots,M\}\), by hoping the learning algorithm will be “smart enough” and learn by itself that ranks should be \(\leq M\).Each child player is selfish, and plays according to an index policy (any index policy, e.g., UCB, Thompson, KL-UCB, BayesUCB etc),
But instead of aiming at the best (the 1-st best) arm, player i aims at the rank_i-th best arm,
At first, every player has a random rank_i from 1 to M, and when a collision occurs, rank_i is given by a second learning algorithm, playing on arms = ranks from [1, .., M], where M is the number of player.
If rankSelection = Uniform, this is like rhoRand, but if it is a smarter policy, it might be better! Warning: no theoretical guarantees exist!
Reference: [Proof-of-Concept System for Opportunistic Spectrum Access in Multi-user Decentralized Networks, S.J.Darak, C.Moy, J.Palicot, EAI 2016](https://doi.org/10.4108/eai.5-9-2016.151647), algorithm 2. (for BayesUCB only)
Note
This is fully decentralized: each child player does not need to know the (fixed) number of players, it will learn to select ranks only in \(\{1,\dots,M\}\) instead of \(\{1,\dots,K\}\).
Warning
This policy does not work very well!
-
class
PoliciesMultiPlayers.rhoLearnEst.
oneRhoLearnEst
(maxRank, rankSelectionAlgo, change_rank_each_step, *args, **kwargs)[source]¶ Bases:
PoliciesMultiPlayers.rhoLearn.oneRhoLearn
-
__module__
= 'PoliciesMultiPlayers.rhoLearnEst'¶
-
-
class
PoliciesMultiPlayers.rhoLearnEst.
rhoLearnEst
(nbPlayers, nbArms, playerAlgo, rankSelectionAlgo=<class 'Policies.Uniform.Uniform'>, lower=0.0, amplitude=1.0, change_rank_each_step=False, *args, **kwargs)[source]¶ Bases:
PoliciesMultiPlayers.rhoLearn.rhoLearn
rhoLearnEst: implementation of the multi-player policy from [Distributed Algorithms for Learning…, Anandkumar et al., 2010](http://ieeexplore.ieee.org/document/5462144/), using a learning algorithm instead of a random exploration for choosing the rank, and without knowing the number of users.
-
__init__
(nbPlayers, nbArms, playerAlgo, rankSelectionAlgo=<class 'Policies.Uniform.Uniform'>, lower=0.0, amplitude=1.0, change_rank_each_step=False, *args, **kwargs)[source]¶ nbPlayers: number of players to create (in self._players).
playerAlgo: class to use for every players.
nbArms: number of arms, given as first argument to playerAlgo.
rankSelectionAlgo: algorithm to use for selecting the ranks.
*args, **kwargs: arguments, named arguments, given to playerAlgo.
Difference with
PoliciesMultiPlayers.rhoLearn.rhoLearn
:maxRank: maximum rank allowed by the rhoRand child, is not an argument, but it is always nbArms (= K).
Example:
>>> from Policies import * >>> import random; random.seed(0); import numpy as np; np.random.seed(0) >>> nbArms = 17 >>> nbPlayers = 6 >>> s = rhoLearnEst(nbPlayers, nbArms, UCB, UCB) >>> [ child.choice() for child in s.children ] [12, 15, 0, 3, 3, 7] >>> [ child.choice() for child in s.children ] [9, 4, 6, 12, 1, 6]
To get a list of usable players, use
s.children
.Warning:
s._players
is for internal use ONLY!
-
nbPlayers
= None¶ Number of players
-
children
= None¶ List of children, fake algorithms
-
rankSelectionAlgo
= None¶ Policy to use to chose the ranks
-
nbArms
= None¶ Number of arms
-
change_rank_each_step
= None¶ Change rank at every steps?
-
__module__
= 'PoliciesMultiPlayers.rhoLearnEst'¶
-