Policies.LM_DSEE module¶

The LM-DSEE policy for non-stationary bandits, from [[“On Abruptly-Changing and Slowly-Varying Multiarmed Bandit Problems”, by Lai Wei, Vaibhav Srivastava, 2018, arXiv:1802.08380]](https://arxiv.org/pdf/1802.08380)

It uses an additional \(\mathcal{O}(\tau_\max)\) memory for a game of maximum stationary length \(\tau_\max\).

Warning

This implementation is still experimental!

class Policies.LM_DSEE.State¶

Bases: enum.Enum

Different states during the LM-DSEE algorithm

Exploitation = 2¶

Exploration = 1¶

__module__ = 'Policies.LM_DSEE'¶

Policies.LM_DSEE.VERBOSE = False¶: Whether to be verbose when doing the search for valid parameter \(\ell\).

Policies.LM_DSEE.parameter_ell(a, N, b, gamma, verbose=False, max_value_on_l=1000000)[source]¶: Look for the smallest value of the parameter \(\ell\) that satisfies the following equations:

class Policies.LM_DSEE.LM_DSEE(nbArms, nu=0.5, DeltaMin=0.5, a=1, b=0.25, *args, **kwargs)[source]¶

Bases: Policies.BasePolicy.BasePolicy

The LM-DSEE policy for non-stationary bandits, from [[“On Abruptly-Changing and Slowly-Varying Multiarmed Bandit Problems”, by Lai Wei, Vaibhav Srivastava, 2018, arXiv:1802.08380]](https://arxiv.org/pdf/1802.08380)

__init__(nbArms, nu=0.5, DeltaMin=0.5, a=1, b=0.25, *args, **kwargs)[source]¶: New policy.

a = None¶: Parameter \(a\) for the LM-DSEE algorithm.

b = None¶: Parameter \(b\) for the LM-DSEE algorithm.

l = None¶: Parameter \(\ell\) for the LM-DSEE algorithm, as computed by the function parameter_ell().

gamma = None¶: Parameter \(\gamma\) for the LM-DSEE algorithm.

rho = None¶: Parameter \(\rho = \frac{1-\nu}{1+\nu}\) for the LM-DSEE algorithm.

phase = None¶: Current phase, exploration or exploitation.

current_exploration_arm = None¶: Currently explored arm.

current_exploitation_arm = None¶: Currently exploited arm.

batch_number = None¶: Number of batch

length_of_current_phase = None¶: Length of the current phase, either computed from length_exploration_phase() or func:length_exploitation_phase.

step_of_current_phase = None¶: Timer inside the current phase.

all_rewards = None¶: Memory of all the rewards. A list per arm. Growing list until restart of that arm?

__str__()[source]¶: -> str

startGame()[source]¶: Start the game (fill pulls and rewards with 0).

length_exploration_phase(verbose=False)[source]¶: Compute the value of the current exploration phase:

\[L_1(k) = L(k) = \lceil \gamma \log(k^{\rho} l b)\rceil.\]

Warning

I think there is a typo in the paper, as their formula are weird (like \(al\) is defined from \(a\)). See parameter_ell().

length_exploitation_phase(verbose=False)[source]¶: Compute the value of the current exploitation phase:

\[L_2(k) = \lceil a k^{\rho} l \rceil - K L_1(k).\]

Warning

I think there is a typo in the paper, as their formula are weird (like \(al\) is defined from \(a\)). See parameter_ell().

getReward(arm, reward)[source]¶: Get a reward from an arm.

__module__ = 'Policies.LM_DSEE'¶

choice()[source]¶: Choose an arm following the different phase of growing lenghts according to the LM-DSEE algorithm.