Policies.MusicalChairNoSensing module¶
MusicalChairNoSensing: implementation of the decentralized multi-player policy from [[“Multiplayer bandits without observing collision information”, by Gabor Lugosi and Abbas Mehrabian]](https://arxiv.org/abs/1808.08416).
Note
The algorithm implemented here is Algorithm 1 (page 8) in the article, but the authors did not named it. I will refer to it as the Musical Chair algorithm with no sensing, or MusicalChairNoSensing in the code.
-
Policies.MusicalChairNoSensing.ConstantC= 1¶ A crazy large constant to get all the theoretical results working. The paper suggests \(C = 128\).
Warning
One can choose a much smaller value in order to (try to) have reasonable empirical performances! I have tried \(C = 1\). BUT the algorithm DOES NOT work better with a much smaller constant: every single simulations I tried end up with a linear regret for
MusicalChairNoSensing.
-
Policies.MusicalChairNoSensing.parameter_g(K=9, m=3, T=1000, constant_c=1)[source]¶ Length \(g\) of the phase 1, from parameters
K,mandT.\[g = 128 K \log(3 K m^2 T^2).\]Examples:
>>> parameter_g(m=2, K=2, T=100) # DOCTEST: +ELLIPSIS 3171.428... >>> parameter_g(m=2, K=2, T=1000) # DOCTEST: +ELLIPSIS 4350.352... >>> parameter_g(m=2, K=3, T=100) # DOCTEST: +ELLIPSIS 4912.841... >>> parameter_g(m=3, K=3, T=100) # DOCTEST: +ELLIPSIS 5224.239...
-
Policies.MusicalChairNoSensing.estimate_length_phases_12(K=3, m=9, Delta=0.1, T=1000)[source]¶ Estimate the length of phase 1 and 2 from the parameters of the problem.
Examples:
>>> estimate_length_phases_12(m=2, K=2, Delta=0.1, T=100) 198214307 >>> estimate_length_phases_12(m=2, K=2, Delta=0.01, T=100) 19821430723 >>> estimate_length_phases_12(m=2, K=2, Delta=0.1, T=1000) 271897030 >>> estimate_length_phases_12(m=2, K=3, Delta=0.1, T=100) 307052623 >>> estimate_length_phases_12(m=2, K=5, Delta=0.1, T=100) 532187397
-
Policies.MusicalChairNoSensing.smallest_T_from_where_length_phases_12_is_larger(K=3, m=9, Delta=0.1, Tmax=1000000000.0)[source]¶ Compute the smallest horizon T from where the (estimated) length of phases 1 and 2 is larger than T.
Examples:
>>> smallest_T_from_where_length_phases_12_is_larger(K=2, m=1) 687194767 >>> smallest_T_from_where_length_phases_12_is_larger(K=3, m=2) 1009317314 >>> smallest_T_from_where_length_phases_12_is_larger(K=3, m=3) 1009317314
Examples with even longer phase 1:
>>> smallest_T_from_where_length_phases_12_is_larger(K=10, m=5) 1009317314 >>> smallest_T_from_where_length_phases_12_is_larger(K=10, m=10) 1009317314
With \(K=100\) arms, it starts to be crazy:
>>> smallest_T_from_where_length_phases_12_is_larger(K=100, m=10) 1009317314
-
class
Policies.MusicalChairNoSensing.State¶ Bases:
enum.EnumDifferent states during the Musical Chair with no sensing algorithm
-
InitialPhase= 2¶
-
MusicalChair= 4¶
-
NotStarted= 1¶
-
Sitted= 5¶
-
UniformWaitPhase2= 3¶
-
__module__= 'Policies.MusicalChairNoSensing'¶
-
-
class
Policies.MusicalChairNoSensing.MusicalChairNoSensing(nbPlayers=1, nbArms=1, horizon=1000, constant_c=1, lower=0.0, amplitude=1.0)[source]¶ Bases:
Policies.BasePolicy.BasePolicyMusicalChairNoSensing: implementation of the decentralized multi-player policy from [[“Multiplayer bandits without observing collision information”, by Gabor Lugosi and Abbas Mehrabian]](https://arxiv.org/abs/1808.08416).
-
__init__(nbPlayers=1, nbArms=1, horizon=1000, constant_c=1, lower=0.0, amplitude=1.0)[source]¶ nbArms: number of arms (
Kin the paper),nbPlayers: number of players (
min the paper),horizon: horizon (length) of the game (
Tin the paper),
Example:
>>> nbPlayers, nbArms, horizon = 3, 9, 10000 >>> player1 = MusicalChairNoSensing(nbPlayers, nbArms, horizon)
For multi-players use:
>>> configuration["players"] = Selfish(NB_PLAYERS, MusicalChairNoSensing, nbArms, nbPlayers=nbPlayers, horizon=horizon).children
or
>>> configuration["players"] = [ MusicalChairNoSensing(nbPlayers=nbPlayers, nbArms=nbArms, horizon=horizon) for _ in range(NB_PLAYERS) ]
-
state= None¶ Current state
-
nbPlayers= None¶ Number of players
-
nbArms= None¶ Number of arms
-
horizon= None¶ Parameter T (horizon)
-
chair= None¶ Current chair. Not sited yet.
-
cumulatedRewards= None¶ That’s the s_i(t) of the paper
-
nbObservations= None¶ That’s the o_i of the paper
-
A= None¶ A random permutation of arms, it will then be of size nbPlayers!
-
tau_phase_2= None¶ Time when phase 2 starts
-
t= None¶ Internal times
-
startGame()[source]¶ Just reinitialize all the internal memory, and decide how to start (state 1 or 2).
-
getReward(arm, reward)[source]¶ Receive a reward on arm of index ‘arm’, as described by the Musical Chair with no Sensing algorithm.
If not collision, receive a reward after pulling the arm.
-
__module__= 'Policies.MusicalChairNoSensing'¶
-
handleCollision(arm, reward=None)[source]¶ Handle a collision, on arm of index ‘arm’.
Here, as its name suggests it, the
MusicalChairNoSensingalgorithm does not use any collision information, hence this method is empty.Warning: this method has to be implemented in the collision model, it is NOT implemented in the EvaluatorMultiPlayers.
-