Policies.TrekkingTSN module¶

TrekkingTSN: implementation of the decentralized multi-player policy from [R.Kumar, A.Yadav, S.J.Darak, M.K.Hanawal, Trekking based Distributed Algorithm for Opportunistic Spectrum Access in Infrastructure-less Network, 2018](XXX).

Each player has 3 states, 1st is channel characterization, 2nd is Trekking phase
1st step
- FIXME
2nd step:
- FIXME

Policies.TrekkingTSN.special_times(nbArms=10, theta=0.01, epsilon=0.1, delta=0.05)[source]¶

Compute the lower-bound suggesting “large-enough” values for the different parameters \(T_{RH}\), \(T_{SH}\) and \(T_{TR}\) that should guarantee constant regret with probability at least \(1 - \delta\), if the gap \(\Delta\) is larger than \(\epsilon\) and the smallest mean is larger than \(\theta\).

\[\begin{split}T_{RH} &= \frac{\log(\frac{\delta}{3 K})}{\log(1 - \theta (1 - \frac{1}{K})^{K-1}))} \\ T_{SH} &= (2 K / \varepsilon^2) \log(\frac{2 K^2}{\delta / 3}) \\ T_{TR} &= \lceil\frac{\log((\delta / 3) K XXX)}{\log(1 - \theta)} \rceil \frac{(K - 1) K}{2}.\end{split}\]

Cf. Theorem 1 of [Kumar et al., 2018](XXX).
Examples:

>>> nbArms = 8
>>> theta = Delta = 0.07
>>> epsilon = theta
>>> delta = 0.1
>>> special_times(nbArms=nbArms, theta=theta, epsilon=epsilon, delta=delta)  
(197, 26949, -280)
>>> delta = 0.01
>>> special_times(nbArms=nbArms, theta=theta, epsilon=epsilon, delta=delta)  
(279, 34468, 616)
>>> delta = 0.001
>>> special_times(nbArms=nbArms, theta=theta, epsilon=epsilon, delta=delta)  
(362, 41987, 1512)

Policies.TrekkingTSN.boundOnFinalRegret(T_RH, T_SH, T_TR, nbPlayers, nbArms)[source]¶

Use the upper-bound on regret when \(T_{RH}\), \(T_{SH}\) and \(T_{TR}\) and \(M\) are known.

The “constant” regret of course grows linearly with \(T_{RH}\), \(T_{SH}\) and \(T_{TR}\), as:

\[\forall T \geq T_{RH} + T_{SH} + T_{TR}, \;\; R_T \leq M (T_{RH} + (1 - \frac{M}{K}) T_{SH} + T_{TR}).\]

Warning

this bound is not a deterministic result, it is only value with a certain probability (at least \(1 - \delta\), if \(T_{RH}\), \(T_{SH}\) and \(T_{TR}\) is chosen as given by special_times()).

Cf. Theorem 1 of [Kumar et al., 2018](XXX).
Examples:

>>> boundOnFinalRegret(197, 26949, -280, 2, 8)  
40257.5
>>> boundOnFinalRegret(279, 34468, 616, 2, 8)   
53492.0
>>> boundOnFinalRegret(362, 41987, 1512, 2, 8)  
66728.5

For \(M=5\):

>>> boundOnFinalRegret(197, 26949, -280, 5, 8)  
50114.375
>>> boundOnFinalRegret(279, 34468, 616, 5, 8)   
69102.5
>>> boundOnFinalRegret(362, 41987, 1512, 5, 8)  
88095.625

For \(M=K=8\):

>>> boundOnFinalRegret(197, 26949, -280, 8, 8)  
-664.0  # there is something wrong with T_TR !
>>> boundOnFinalRegret(279, 34468, 616, 8, 8)   
7160.0
>>> boundOnFinalRegret(362, 41987, 1512, 8, 8)  
14992.0

class Policies.TrekkingTSN.State¶

Bases: enum.Enum

Different states during the Musical Chair algorithm

ChannelCharacterization = 2¶

NotStarted = 1¶

TrekkingTSN = 3¶

__module__ = 'Policies.TrekkingTSN'¶

class Policies.TrekkingTSN.TrekkingTSN(nbArms, theta=0.01, epsilon=0.1, delta=0.05, lower=0.0, amplitude=1.0)[source]¶

Bases: Policies.BasePolicy.BasePolicy

TrekkingTSN: implementation of the single-player policy from [R.Kumar, A.Yadav, S.J.Darak, M.K.Hanawal, Trekking based Distributed Algorithm for Opportunistic Spectrum Access in Infrastructure-less Network, 2018](XXX).

__init__(nbArms, theta=0.01, epsilon=0.1, delta=0.05, lower=0.0, amplitude=1.0)[source]¶

nbArms: number of arms,

Example:

>>> nbArms = 8
>>> theta, epsilon, delta = 0.01, 0.1, 0.05
>>> player1 = TrekkingTSN(nbArms, theta=theta, epsilon=epsilon, delta=delta)

For multi-players use:

>>> configuration["players"] = Selfish(NB_PLAYERS, TrekkingTSN, nbArms, theta=theta, epsilon=epsilon, delta=delta).children

state = None¶: Current state

theta = None¶: Parameter \(\theta\).

epsilon = None¶: Parameter \(\epsilon\).

delta = None¶: Parameter \(\delta\).

T_RH = None¶: Parameter \(T_{RH}\) computed from special_times()

T_SH = None¶: Parameter \(T_{SH}\) computed from special_times()

T_CC = None¶: Parameter \(T_{CC} = T_{RH} + T_{SH}\)

T_TR = None¶: Parameter \(T_{TR}\) computed from special_times()

last_was_successful = None¶: That’s the l of the paper

last_choice = None¶: Keep memory of the last choice for CC phase

cumulatedRewards = None¶: That’s the V_n of the paper

nbObservations = None¶: That’s the S_n of the paper

lock_channel = None¶: That’s the L of the paper

t = None¶: Internal times

__str__()[source]¶: -> str

startGame()[source]¶: Just reinitialize all the internal memory, and decide how to start (state 1 or 2).

choice()[source]¶: Choose an arm, as described by the Musical Chair algorithm.

getReward(arm, reward)[source]¶

Receive a reward on arm of index ‘arm’, as described by the Musical Chair algorithm.

If not collision, receive a reward after pulling the arm.

_endCCPhase()[source]¶: Small computation needed at the end of the initial CC phase.

handleCollision(arm, reward=None)[source]¶

Handle a collision, on arm of index ‘arm’.

Warning: this method has to be implemented in the collision model, it is NOT implemented in the EvaluatorMultiPlayers.

__module__ = 'Policies.TrekkingTSN'¶