# Policies.TrekkingTSN module¶

TrekkingTSN: implementation of the decentralized multi-player policy from [R.Kumar, A.Yadav, S.J.Darak, M.K.Hanawal, Trekking based Distributed Algorithm for Opportunistic Spectrum Access in Infrastructure-less Network, 2018](XXX).

• Each player has 3 states, 1st is channel characterization, 2nd is Trekking phase

• 1st step
• FIXME

• 2nd step:
• FIXME

Policies.TrekkingTSN.special_times(nbArms=10, theta=0.01, epsilon=0.1, delta=0.05)[source]

Compute the lower-bound suggesting “large-enough” values for the different parameters $$T_{RH}$$, $$T_{SH}$$ and $$T_{TR}$$ that should guarantee constant regret with probability at least $$1 - \delta$$, if the gap $$\Delta$$ is larger than $$\epsilon$$ and the smallest mean is larger than $$\theta$$.

$\begin{split}T_{RH} &= \frac{\log(\frac{\delta}{3 K})}{\log(1 - \theta (1 - \frac{1}{K})^{K-1}))} \\ T_{SH} &= (2 K / \varepsilon^2) \log(\frac{2 K^2}{\delta / 3}) \\ T_{TR} &= \lceil\frac{\log((\delta / 3) K XXX)}{\log(1 - \theta)} \rceil \frac{(K - 1) K}{2}.\end{split}$
• Cf. Theorem 1 of [Kumar et al., 2018](XXX).

• Examples:

>>> nbArms = 8
>>> theta = Delta = 0.07
>>> epsilon = theta
>>> delta = 0.1
>>> special_times(nbArms=nbArms, theta=theta, epsilon=epsilon, delta=delta)
(197, 26949, -280)
>>> delta = 0.01
>>> special_times(nbArms=nbArms, theta=theta, epsilon=epsilon, delta=delta)
(279, 34468, 616)
>>> delta = 0.001
>>> special_times(nbArms=nbArms, theta=theta, epsilon=epsilon, delta=delta)
(362, 41987, 1512)

Policies.TrekkingTSN.boundOnFinalRegret(T_RH, T_SH, T_TR, nbPlayers, nbArms)[source]

Use the upper-bound on regret when $$T_{RH}$$, $$T_{SH}$$ and $$T_{TR}$$ and $$M$$ are known.

• The “constant” regret of course grows linearly with $$T_{RH}$$, $$T_{SH}$$ and $$T_{TR}$$, as:

$\forall T \geq T_{RH} + T_{SH} + T_{TR}, \;\; R_T \leq M (T_{RH} + (1 - \frac{M}{K}) T_{SH} + T_{TR}).$

Warning

this bound is not a deterministic result, it is only value with a certain probability (at least $$1 - \delta$$, if $$T_{RH}$$, $$T_{SH}$$ and $$T_{TR}$$ is chosen as given by special_times()).

• Cf. Theorem 1 of [Kumar et al., 2018](XXX).

• Examples:

>>> boundOnFinalRegret(197, 26949, -280, 2, 8)
40257.5
>>> boundOnFinalRegret(279, 34468, 616, 2, 8)
53492.0
>>> boundOnFinalRegret(362, 41987, 1512, 2, 8)
66728.5

• For $$M=5$$:

>>> boundOnFinalRegret(197, 26949, -280, 5, 8)
50114.375
>>> boundOnFinalRegret(279, 34468, 616, 5, 8)
69102.5
>>> boundOnFinalRegret(362, 41987, 1512, 5, 8)
88095.625

• For $$M=K=8$$:

>>> boundOnFinalRegret(197, 26949, -280, 8, 8)
-664.0  # there is something wrong with T_TR !
>>> boundOnFinalRegret(279, 34468, 616, 8, 8)
7160.0
>>> boundOnFinalRegret(362, 41987, 1512, 8, 8)
14992.0

class Policies.TrekkingTSN.State

Bases: enum.Enum

Different states during the Musical Chair algorithm

ChannelCharacterization = 2
NotStarted = 1
TrekkingTSN = 3
__module__ = 'Policies.TrekkingTSN'
class Policies.TrekkingTSN.TrekkingTSN(nbArms, theta=0.01, epsilon=0.1, delta=0.05, lower=0.0, amplitude=1.0)[source]

TrekkingTSN: implementation of the single-player policy from [R.Kumar, A.Yadav, S.J.Darak, M.K.Hanawal, Trekking based Distributed Algorithm for Opportunistic Spectrum Access in Infrastructure-less Network, 2018](XXX).

__init__(nbArms, theta=0.01, epsilon=0.1, delta=0.05, lower=0.0, amplitude=1.0)[source]
• nbArms: number of arms,

Example:

>>> nbArms = 8
>>> theta, epsilon, delta = 0.01, 0.1, 0.05
>>> player1 = TrekkingTSN(nbArms, theta=theta, epsilon=epsilon, delta=delta)


For multi-players use:

>>> configuration["players"] = Selfish(NB_PLAYERS, TrekkingTSN, nbArms, theta=theta, epsilon=epsilon, delta=delta).children

state = None

Current state

theta = None

Parameter $$\theta$$.

epsilon = None

Parameter $$\epsilon$$.

delta = None

Parameter $$\delta$$.

T_RH = None

Parameter $$T_{RH}$$ computed from special_times()

T_SH = None

Parameter $$T_{SH}$$ computed from special_times()

T_CC = None

Parameter $$T_{CC} = T_{RH} + T_{SH}$$

T_TR = None

Parameter $$T_{TR}$$ computed from special_times()

last_was_successful = None

That’s the l of the paper

last_choice = None

Keep memory of the last choice for CC phase

cumulatedRewards = None

That’s the V_n of the paper

nbObservations = None

That’s the S_n of the paper

lock_channel = None

That’s the L of the paper

t = None

Internal times

__str__()[source]

-> str

startGame()[source]

Just reinitialize all the internal memory, and decide how to start (state 1 or 2).

choice()[source]

Choose an arm, as described by the Musical Chair algorithm.

getReward(arm, reward)[source]

Receive a reward on arm of index ‘arm’, as described by the Musical Chair algorithm.

• If not collision, receive a reward after pulling the arm.

_endCCPhase()[source]

Small computation needed at the end of the initial CC phase.

handleCollision(arm, reward=None)[source]

Handle a collision, on arm of index ‘arm’.

• Warning: this method has to be implemented in the collision model, it is NOT implemented in the EvaluatorMultiPlayers.

__module__ = 'Policies.TrekkingTSN'