PoliciesMultiPlayers.ALOHA module¶
ALOHA: generalized implementation of the single-player policy from [Concurrent bandits and cognitive radio network, O.Avner & S.Mannor, 2014](https://arxiv.org/abs/1404.5421), for a generic single-player policy.
This policy uses the collision avoidance mechanism that is inspired by the classical ALOHA protocol, and any single-player policy.
-
PoliciesMultiPlayers.ALOHA.
random
() → x in the interval [0, 1).¶
-
PoliciesMultiPlayers.ALOHA.
tnext_beta
(t, beta=0.5)[source]¶ Simple function, as used in MEGA:
upper_tnext(t)
= \(t^{\beta}\). Default to \(t^{0.5}\).>>> tnext_beta(100, beta=0.1) 1.584... >>> tnext_beta(100, beta=0.5) 10.0 >>> tnext_beta(100, beta=0.9) 63.095... >>> tnext_beta(1000) 31.622...
-
PoliciesMultiPlayers.ALOHA.
make_tnext_beta
(beta=0.5)[source]¶ Returns the function \(t \mapsto t^{\beta}\).
>>> tnext = make_tnext_beta(0.5) >>> tnext(100) 10.0 >>> tnext(1000) 31.622...
-
PoliciesMultiPlayers.ALOHA.
tnext_log
(t, scaling=1.0)[source]¶ Other function, not the one used in MEGA, but our proposal:
upper_tnext(t)
= \(\text{scaling} * \log(1 + t)\).>>> tnext_log(100, scaling=1) 4.615... >>> tnext_log(100, scaling=10) 46.151... >>> tnext_log(100, scaling=100) 461.512... >>> tnext_log(1000) 6.908...
-
PoliciesMultiPlayers.ALOHA.
make_tnext_log_scaling
(scaling=1.0)[source]¶ Returns the function \(t \mapsto \text{scaling} * \log(1 + t)\).
>>> tnext = make_tnext_log_scaling(1) >>> tnext(100) 4.615... >>> tnext(1000) 6.908...
-
class
PoliciesMultiPlayers.ALOHA.
oneALOHA
(nbPlayers, mother, playerId, nbArms, p0=0.5, alpha_p0=0.5, ftnext=<function tnext_beta>, beta=None)[source]¶ Bases:
PoliciesMultiPlayers.ChildPointer.ChildPointer
Class that acts as a child policy, but in fact it pass all its method calls to the mother class, who passes it to its i-th player.
Except for the handleCollision method: the ALOHA collision avoidance protocol is implemented here.
-
__init__
(nbPlayers, mother, playerId, nbArms, p0=0.5, alpha_p0=0.5, ftnext=<function tnext_beta>, beta=None)[source]¶ Initialize self. See help(type(self)) for accurate signature.
-
nbPlayers
= None¶ Number of players
-
p0
= None¶ Initial probability, should not be modified
-
p
= None¶ Current probability, can be modified
-
alpha_p0
= None¶ Parameter alpha for the recurrence equation for probability p(t)
-
beta
= None¶ Parameter beta
-
tnext
= None¶ Only store the delta time
-
t
= None¶ Internal time
-
chosenArm
= None¶ Last chosen arm
-
getReward
(arm, reward)[source]¶ Receive a reward on arm of index ‘arm’, as described by the ALOHA protocol.
If not collision, receive a reward after pulling the arm.
-
handleCollision
(arm, reward=None)[source]¶ Handle a collision, on arm of index ‘arm’.
Warning
This method has to be implemented in the collision model, it is NOT implemented in the EvaluatorMultiPlayers.
Note
We do not care on which arm the collision occured.
-
choice
()[source]¶ Identify the available arms, and use the underlying single-player policy (UCB, Thompson etc) to choose an arm from this sub-set of arms.
-
__module__
= 'PoliciesMultiPlayers.ALOHA'¶
-
class
PoliciesMultiPlayers.ALOHA.
ALOHA
(nbPlayers, nbArms, playerAlgo, p0=0.5, alpha_p0=0.5, ftnext=<function tnext_beta>, beta=None, *args, **kwargs)[source]¶ Bases:
PoliciesMultiPlayers.BaseMPPolicy.BaseMPPolicy
ALOHA: implementation of the multi-player policy from [Concurrent bandits and cognitive radio network, O.Avner & S.Mannor, 2014](https://arxiv.org/abs/1404.5421), for a generic single-player policy.
-
__init__
(nbPlayers, nbArms, playerAlgo, p0=0.5, alpha_p0=0.5, ftnext=<function tnext_beta>, beta=None, *args, **kwargs)[source]¶ nbPlayers: number of players to create (in self._players).
playerAlgo: class to use for every players.
nbArms: number of arms, given as first argument to playerAlgo.
p0: initial probability p(0); p(t) is the probability of persistance on the chosenArm at time t
alpha_p0: scaling in the update for p[t+1] <- alpha_p0 p[t] + (1 - alpha_p0)
ftnext: general function, default to t -> t^beta, to know from where to sample a random time t_next(k), until when the chosenArm is unavailable. t -> log(1 + t) is also possible.
(optional) beta: if present, overwrites ftnext, which will be t –> t^beta.
*args, **kwargs: arguments, named arguments, given to playerAlgo.
Example:
>>> from Policies import * >>> import random; random.seed(0); import numpy as np; np.random.seed(0) >>> nbArms = 17 >>> nbPlayers = 6 >>> p0, alpha_p0 = 0.6, 0.5 >>> s = ALOHA(nbPlayers, nbArms, Thompson, p0=p0, alpha_p0=alpha_p0, ftnext=tnext_log) >>> [ child.choice() for child in s.children ] [6, 11, 8, 4, 8, 8] >>> s = ALOHA(nbPlayers, nbArms, UCBalpha, p0=p0, alpha_p0=alpha_p0, beta=0.5, alpha=1) >>> [ child.choice() for child in s.children ] [1, 0, 5, 2, 15, 3]
To get a list of usable players, use
s.children
.Warning:
s._players
is for internal use ONLY!
-
__module__
= 'PoliciesMultiPlayers.ALOHA'¶
-
nbPlayers
= None¶ Number of players
-
nbArms
= None¶ Number of arms
-
children
= None¶ List of children, fake algorithms
-