Policies.MusicalChairNoSensing module¶

MusicalChairNoSensing: implementation of the decentralized multi-player policy from [[“Multiplayer bandits without observing collision information”, by Gabor Lugosi and Abbas Mehrabian]](https://arxiv.org/abs/1808.08416).

Note

The algorithm implemented here is Algorithm 1 (page 8) in the article, but the authors did not named it. I will refer to it as the Musical Chair algorithm with no sensing, or MusicalChairNoSensing in the code.

Policies.MusicalChairNoSensing.ConstantC = 1¶: A crazy large constant to get all the theoretical results working. The paper suggests \(C = 128\).

Warning

One can choose a much smaller value in order to (try to) have reasonable empirical performances! I have tried \(C = 1\). BUT the algorithm DOES NOT work better with a much smaller constant: every single simulations I tried end up with a linear regret for MusicalChairNoSensing.

Policies.MusicalChairNoSensing.parameter_g(K=9, m=3, T=1000, constant_c=1)[source]¶

Length \(g\) of the phase 1, from parameters K, m and T.

\[g = 128 K \log(3 K m^2 T^2).\]

Examples:

>>> parameter_g(m=2, K=2, T=100)  # DOCTEST: +ELLIPSIS
3171.428...
>>> parameter_g(m=2, K=2, T=1000)  # DOCTEST: +ELLIPSIS
4350.352...
>>> parameter_g(m=2, K=3, T=100)  # DOCTEST: +ELLIPSIS
4912.841...
>>> parameter_g(m=3, K=3, T=100)  # DOCTEST: +ELLIPSIS
5224.239...

Policies.MusicalChairNoSensing.estimate_length_phases_12(K=3, m=9, Delta=0.1, T=1000)[source]¶

Estimate the length of phase 1 and 2 from the parameters of the problem.

Examples:

>>> estimate_length_phases_12(m=2, K=2, Delta=0.1, T=100)
198214307
>>> estimate_length_phases_12(m=2, K=2, Delta=0.01, T=100)
19821430723
>>> estimate_length_phases_12(m=2, K=2, Delta=0.1, T=1000)
271897030
>>> estimate_length_phases_12(m=2, K=3, Delta=0.1, T=100)
307052623
>>> estimate_length_phases_12(m=2, K=5, Delta=0.1, T=100)
532187397

Policies.MusicalChairNoSensing.smallest_T_from_where_length_phases_12_is_larger(K=3, m=9, Delta=0.1, Tmax=1000000000.0)[source]¶

Compute the smallest horizon T from where the (estimated) length of phases 1 and 2 is larger than T.

Examples:

>>> smallest_T_from_where_length_phases_12_is_larger(K=2, m=1)
687194767
>>> smallest_T_from_where_length_phases_12_is_larger(K=3, m=2)
1009317314
>>> smallest_T_from_where_length_phases_12_is_larger(K=3, m=3)
1009317314

Examples with even longer phase 1:

>>> smallest_T_from_where_length_phases_12_is_larger(K=10, m=5)
1009317314
>>> smallest_T_from_where_length_phases_12_is_larger(K=10, m=10)
1009317314

With \(K=100\) arms, it starts to be crazy:

>>> smallest_T_from_where_length_phases_12_is_larger(K=100, m=10)
1009317314

class Policies.MusicalChairNoSensing.State¶

Bases: enum.Enum

Different states during the Musical Chair with no sensing algorithm

InitialPhase = 2¶

MusicalChair = 4¶

NotStarted = 1¶

Sitted = 5¶

UniformWaitPhase2 = 3¶

__module__ = 'Policies.MusicalChairNoSensing'¶

class Policies.MusicalChairNoSensing.MusicalChairNoSensing(nbPlayers=1, nbArms=1, horizon=1000, constant_c=1, lower=0.0, amplitude=1.0)[source]¶

Bases: Policies.BasePolicy.BasePolicy

MusicalChairNoSensing: implementation of the decentralized multi-player policy from [[“Multiplayer bandits without observing collision information”, by Gabor Lugosi and Abbas Mehrabian]](https://arxiv.org/abs/1808.08416).

__init__(nbPlayers=1, nbArms=1, horizon=1000, constant_c=1, lower=0.0, amplitude=1.0)[source]¶

nbArms: number of arms (K in the paper),
nbPlayers: number of players (m in the paper),
horizon: horizon (length) of the game (T in the paper),

Example:

>>> nbPlayers, nbArms, horizon = 3, 9, 10000
>>> player1 = MusicalChairNoSensing(nbPlayers, nbArms, horizon)

For multi-players use:

>>> configuration["players"] = Selfish(NB_PLAYERS, MusicalChairNoSensing, nbArms, nbPlayers=nbPlayers, horizon=horizon).children

or

>>> configuration["players"] = [ MusicalChairNoSensing(nbPlayers=nbPlayers, nbArms=nbArms, horizon=horizon) for _ in range(NB_PLAYERS) ]

state = None¶: Current state

nbPlayers = None¶: Number of players

nbArms = None¶: Number of arms

horizon = None¶: Parameter T (horizon)

chair = None¶: Current chair. Not sited yet.

cumulatedRewards = None¶: That’s the s_i(t) of the paper

nbObservations = None¶: That’s the o_i of the paper

A = None¶: A random permutation of arms, it will then be of size nbPlayers!

tau_phase_2 = None¶: Time when phase 2 starts

t = None¶: Internal times

__str__()[source]¶: -> str

startGame()[source]¶: Just reinitialize all the internal memory, and decide how to start (state 1 or 2).

choice()[source]¶: Choose an arm, as described by the Musical Chair with no Sensing algorithm.

getReward(arm, reward)[source]¶

Receive a reward on arm of index ‘arm’, as described by the Musical Chair with no Sensing algorithm.

If not collision, receive a reward after pulling the arm.

__module__ = 'Policies.MusicalChairNoSensing'¶

_endPhase2()[source]¶: Small computation needed at the end of the initial random exploration phase.

handleCollision(arm, reward=None)[source]¶

Handle a collision, on arm of index ‘arm’.

Here, as its name suggests it, the MusicalChairNoSensing algorithm does not use any collision information, hence this method is empty.
Warning: this method has to be implemented in the collision model, it is NOT implemented in the EvaluatorMultiPlayers.