Policies.TrekkingTSN module

TrekkingTSN: implementation of the decentralized multi-player policy from [R.Kumar, A.Yadav, S.J.Darak, M.K.Hanawal, Trekking based Distributed Algorithm for Opportunistic Spectrum Access in Infrastructure-less Network, 2018](XXX).

  • Each player has 3 states, 1st is channel characterization, 2nd is Trekking phase

  • 1st step
    • FIXME

  • 2nd step:
    • FIXME

Policies.TrekkingTSN.special_times(nbArms=10, theta=0.01, epsilon=0.1, delta=0.05)[source]

Compute the lower-bound suggesting “large-enough” values for the different parameters \(T_{RH}\), \(T_{SH}\) and \(T_{TR}\) that should guarantee constant regret with probability at least \(1 - \delta\), if the gap \(\Delta\) is larger than \(\epsilon\) and the smallest mean is larger than \(\theta\).

\[\begin{split}T_{RH} &= \frac{\log(\frac{\delta}{3 K})}{\log(1 - \theta (1 - \frac{1}{K})^{K-1}))} \\ T_{SH} &= (2 K / \varepsilon^2) \log(\frac{2 K^2}{\delta / 3}) \\ T_{TR} &= \lceil\frac{\log((\delta / 3) K XXX)}{\log(1 - \theta)} \rceil \frac{(K - 1) K}{2}.\end{split}\]
  • Cf. Theorem 1 of [Kumar et al., 2018](XXX).

  • Examples:

>>> nbArms = 8
>>> theta = Delta = 0.07
>>> epsilon = theta
>>> delta = 0.1
>>> special_times(nbArms=nbArms, theta=theta, epsilon=epsilon, delta=delta)  
(197, 26949, -280)
>>> delta = 0.01
>>> special_times(nbArms=nbArms, theta=theta, epsilon=epsilon, delta=delta)  
(279, 34468, 616)
>>> delta = 0.001
>>> special_times(nbArms=nbArms, theta=theta, epsilon=epsilon, delta=delta)  
(362, 41987, 1512)
Policies.TrekkingTSN.boundOnFinalRegret(T_RH, T_SH, T_TR, nbPlayers, nbArms)[source]

Use the upper-bound on regret when \(T_{RH}\), \(T_{SH}\) and \(T_{TR}\) and \(M\) are known.

  • The “constant” regret of course grows linearly with \(T_{RH}\), \(T_{SH}\) and \(T_{TR}\), as:

    \[\forall T \geq T_{RH} + T_{SH} + T_{TR}, \;\; R_T \leq M (T_{RH} + (1 - \frac{M}{K}) T_{SH} + T_{TR}).\]

Warning

this bound is not a deterministic result, it is only value with a certain probability (at least \(1 - \delta\), if \(T_{RH}\), \(T_{SH}\) and \(T_{TR}\) is chosen as given by special_times()).

  • Cf. Theorem 1 of [Kumar et al., 2018](XXX).

  • Examples:

>>> boundOnFinalRegret(197, 26949, -280, 2, 8)  
40257.5
>>> boundOnFinalRegret(279, 34468, 616, 2, 8)   
53492.0
>>> boundOnFinalRegret(362, 41987, 1512, 2, 8)  
66728.5
  • For \(M=5\):

>>> boundOnFinalRegret(197, 26949, -280, 5, 8)  
50114.375
>>> boundOnFinalRegret(279, 34468, 616, 5, 8)   
69102.5
>>> boundOnFinalRegret(362, 41987, 1512, 5, 8)  
88095.625
  • For \(M=K=8\):

>>> boundOnFinalRegret(197, 26949, -280, 8, 8)  
-664.0  # there is something wrong with T_TR !
>>> boundOnFinalRegret(279, 34468, 616, 8, 8)   
7160.0
>>> boundOnFinalRegret(362, 41987, 1512, 8, 8)  
14992.0
class Policies.TrekkingTSN.State

Bases: enum.Enum

Different states during the Musical Chair algorithm

ChannelCharacterization = 2
NotStarted = 1
TrekkingTSN = 3
__module__ = 'Policies.TrekkingTSN'
class Policies.TrekkingTSN.TrekkingTSN(nbArms, theta=0.01, epsilon=0.1, delta=0.05, lower=0.0, amplitude=1.0)[source]

Bases: Policies.BasePolicy.BasePolicy

TrekkingTSN: implementation of the single-player policy from [R.Kumar, A.Yadav, S.J.Darak, M.K.Hanawal, Trekking based Distributed Algorithm for Opportunistic Spectrum Access in Infrastructure-less Network, 2018](XXX).

__init__(nbArms, theta=0.01, epsilon=0.1, delta=0.05, lower=0.0, amplitude=1.0)[source]
  • nbArms: number of arms,

Example:

>>> nbArms = 8
>>> theta, epsilon, delta = 0.01, 0.1, 0.05
>>> player1 = TrekkingTSN(nbArms, theta=theta, epsilon=epsilon, delta=delta)

For multi-players use:

>>> configuration["players"] = Selfish(NB_PLAYERS, TrekkingTSN, nbArms, theta=theta, epsilon=epsilon, delta=delta).children
state = None

Current state

theta = None

Parameter \(\theta\).

epsilon = None

Parameter \(\epsilon\).

delta = None

Parameter \(\delta\).

T_RH = None

Parameter \(T_{RH}\) computed from special_times()

T_SH = None

Parameter \(T_{SH}\) computed from special_times()

T_CC = None

Parameter \(T_{CC} = T_{RH} + T_{SH}\)

T_TR = None

Parameter \(T_{TR}\) computed from special_times()

last_was_successful = None

That’s the l of the paper

last_choice = None

Keep memory of the last choice for CC phase

cumulatedRewards = None

That’s the V_n of the paper

nbObservations = None

That’s the S_n of the paper

lock_channel = None

That’s the L of the paper

t = None

Internal times

__str__()[source]

-> str

startGame()[source]

Just reinitialize all the internal memory, and decide how to start (state 1 or 2).

choice()[source]

Choose an arm, as described by the Musical Chair algorithm.

getReward(arm, reward)[source]

Receive a reward on arm of index ‘arm’, as described by the Musical Chair algorithm.

  • If not collision, receive a reward after pulling the arm.

_endCCPhase()[source]

Small computation needed at the end of the initial CC phase.

handleCollision(arm, reward=None)[source]

Handle a collision, on arm of index ‘arm’.

  • Warning: this method has to be implemented in the collision model, it is NOT implemented in the EvaluatorMultiPlayers.

__module__ = 'Policies.TrekkingTSN'