Policies.MEGA module

MEGA: implementation of the single-player policy from [Concurrent bandits and cognitive radio network, O.Avner & S.Mannor, 2014](https://arxiv.org/abs/1404.5421).

The Multi-user epsilon-Greedy collision Avoiding (MEGA) algorithm is based on the epsilon-greedy algorithm introduced in [2], augmented by a collision avoidance mechanism that is inspired by the classical ALOHA protocol.

  • [2]: Finite-time analysis of the multi-armed bandit problem, P.Auer & N.Cesa-Bianchi & P.Fischer, 2002

Policies.MEGA.random() → x in the interval [0, 1).
class Policies.MEGA.MEGA(nbArms, p0=0.5, alpha=0.5, beta=0.5, c=0.1, d=0.01, lower=0.0, amplitude=1.0)[source]

Bases: Policies.BasePolicy.BasePolicy

MEGA: implementation of the single-player policy from [Concurrent bandits and cognitive radio network, O.Avner & S.Mannor, 2014](https://arxiv.org/abs/1404.5421).

__init__(nbArms, p0=0.5, alpha=0.5, beta=0.5, c=0.1, d=0.01, lower=0.0, amplitude=1.0)[source]
  • nbArms: number of arms.

  • p0: initial probability p(0); p(t) is the probability of persistance on the chosenArm at time t

  • alpha: scaling in the update for p(t+1) <- alpha p(t) + (1 - alpha(t))

  • beta: exponent used in the interval [t, t + t^beta], from where to sample a random time t_next(k), until when the chosenArm is unavailable

  • c, d: used to compute the exploration probability epsilon_t, cf the function _epsilon_t().

Example:

>>> nbArms, p0, alpha, beta, c, d = 17, 0.5, 0.5, 0.5, 0.1, 0.01
>>> player1 = MEGA(nbArms, p0, alpha, beta, c, d)

For multi-players use:

>>> configuration["players"] = Selfish(NB_PLAYERS, MEGA, nbArms, p0, alpha, beta, c, d).children
c = None

Parameter c

d = None

Parameter d

p0 = None

Parameter p0, should not be modified

p = None

Parameter p, can be modified

alpha = None

Parameter alpha

beta = None

Parameter beta

chosenArm = None

Last chosen arm

tnext = None

Only store the delta time

meanRewards = None

Mean rewards

__str__()[source]

-> str

startGame()[source]

Just reinitialize all the internal memory.

choice()[source]

Choose an arm, as described by the MEGA algorithm.

getReward(arm, reward)[source]

Receive a reward on arm of index ‘arm’, as described by the MEGA algorithm.

  • If not collision, receive a reward after pulling the arm.

handleCollision(arm, reward=None)[source]

Handle a collision, on arm of index ‘arm’.

  • Warning: this method has to be implemented in the collision model, it is NOT implemented in the EvaluatorMultiPlayers.

Note

We do not care on which arm the collision occured.

_epsilon_t()[source]

Compute the value of decreasing epsilon(t), cf. Algorithm 1 in [Avner & Mannor, 2014](https://arxiv.org/abs/1404.5421).

__module__ = 'Policies.MEGA'