Policies.MEGA module¶

MEGA: implementation of the single-player policy from [Concurrent bandits and cognitive radio network, O.Avner & S.Mannor, 2014](https://arxiv.org/abs/1404.5421).

The Multi-user epsilon-Greedy collision Avoiding (MEGA) algorithm is based on the epsilon-greedy algorithm introduced in [2], augmented by a collision avoidance mechanism that is inspired by the classical ALOHA protocol.

[2]: Finite-time analysis of the multi-armed bandit problem, P.Auer & N.Cesa-Bianchi & P.Fischer, 2002

Policies.MEGA.random() → x in the interval [0, 1).¶

class Policies.MEGA.MEGA(nbArms, p0=0.5, alpha=0.5, beta=0.5, c=0.1, d=0.01, lower=0.0, amplitude=1.0)[source]¶

Bases: Policies.BasePolicy.BasePolicy

MEGA: implementation of the single-player policy from [Concurrent bandits and cognitive radio network, O.Avner & S.Mannor, 2014](https://arxiv.org/abs/1404.5421).

__init__(nbArms, p0=0.5, alpha=0.5, beta=0.5, c=0.1, d=0.01, lower=0.0, amplitude=1.0)[source]¶

nbArms: number of arms.
p0: initial probability p(0); p(t) is the probability of persistance on the chosenArm at time t
alpha: scaling in the update for p(t+1) <- alpha p(t) + (1 - alpha(t))
beta: exponent used in the interval [t, t + t^beta], from where to sample a random time t_next(k), until when the chosenArm is unavailable
c, d: used to compute the exploration probability epsilon_t, cf the function _epsilon_t().

Example:

>>> nbArms, p0, alpha, beta, c, d = 17, 0.5, 0.5, 0.5, 0.1, 0.01
>>> player1 = MEGA(nbArms, p0, alpha, beta, c, d)

For multi-players use:

>>> configuration["players"] = Selfish(NB_PLAYERS, MEGA, nbArms, p0, alpha, beta, c, d).children

c = None¶: Parameter c

d = None¶: Parameter d

p0 = None¶: Parameter p0, should not be modified

p = None¶: Parameter p, can be modified

alpha = None¶: Parameter alpha

beta = None¶: Parameter beta

chosenArm = None¶: Last chosen arm

tnext = None¶: Only store the delta time

meanRewards = None¶: Mean rewards

__str__()[source]¶: -> str

startGame()[source]¶: Just reinitialize all the internal memory.

choice()[source]¶: Choose an arm, as described by the MEGA algorithm.

getReward(arm, reward)[source]¶

Receive a reward on arm of index ‘arm’, as described by the MEGA algorithm.

If not collision, receive a reward after pulling the arm.

handleCollision(arm, reward=None)[source]¶

Handle a collision, on arm of index ‘arm’.

Warning: this method has to be implemented in the collision model, it is NOT implemented in the EvaluatorMultiPlayers.

Note

We do not care on which arm the collision occured.

_epsilon_t()[source]¶: Compute the value of decreasing epsilon(t), cf. Algorithm 1 in [Avner & Mannor, 2014](https://arxiv.org/abs/1404.5421).

__module__ = 'Policies.MEGA'¶