Policies.LM_DSEE module¶

The LM-DSEE policy for non-stationary bandits, from [[“On Abruptly-Changing and Slowly-Varying Multiarmed Bandit Problems”, by Lai Wei, Vaibhav Srivastava, 2018, arXiv:1802.08380]](https://arxiv.org/pdf/1802.08380)

• It uses an additional $$\mathcal{O}(\tau_\max)$$ memory for a game of maximum stationary length $$\tau_\max$$.

Warning

This implementation is still experimental!

class Policies.LM_DSEE.State

Bases: enum.Enum

Different states during the LM-DSEE algorithm

Exploitation = 2
Exploration = 1
__module__ = 'Policies.LM_DSEE'
Policies.LM_DSEE.VERBOSE = False

Whether to be verbose when doing the search for valid parameter $$\ell$$.

Policies.LM_DSEE.parameter_ell(a, N, b, gamma, verbose=False, max_value_on_l=1000000)[source]

Look for the smallest value of the parameter $$\ell$$ that satisfies the following equations:

class Policies.LM_DSEE.LM_DSEE(nbArms, nu=0.5, DeltaMin=0.5, a=1, b=0.25, *args, **kwargs)[source]

The LM-DSEE policy for non-stationary bandits, from [[“On Abruptly-Changing and Slowly-Varying Multiarmed Bandit Problems”, by Lai Wei, Vaibhav Srivastava, 2018, arXiv:1802.08380]](https://arxiv.org/pdf/1802.08380)

__init__(nbArms, nu=0.5, DeltaMin=0.5, a=1, b=0.25, *args, **kwargs)[source]

New policy.

a = None

Parameter $$a$$ for the LM-DSEE algorithm.

b = None

Parameter $$b$$ for the LM-DSEE algorithm.

l = None

Parameter $$\ell$$ for the LM-DSEE algorithm, as computed by the function parameter_ell().

gamma = None

Parameter $$\gamma$$ for the LM-DSEE algorithm.

rho = None

Parameter $$\rho = \frac{1-\nu}{1+\nu}$$ for the LM-DSEE algorithm.

phase = None

Current phase, exploration or exploitation.

current_exploration_arm = None

Currently explored arm.

current_exploitation_arm = None

Currently exploited arm.

batch_number = None

Number of batch

length_of_current_phase = None

Length of the current phase, either computed from length_exploration_phase() or func:length_exploitation_phase.

step_of_current_phase = None

Timer inside the current phase.

all_rewards = None

Memory of all the rewards. A list per arm. Growing list until restart of that arm?

__str__()[source]

-> str

startGame()[source]

Start the game (fill pulls and rewards with 0).

length_exploration_phase(verbose=False)[source]

Compute the value of the current exploration phase:

$L_1(k) = L(k) = \lceil \gamma \log(k^{\rho} l b)\rceil.$

Warning

I think there is a typo in the paper, as their formula are weird (like $$al$$ is defined from $$a$$). See parameter_ell().

length_exploitation_phase(verbose=False)[source]

Compute the value of the current exploitation phase:

$L_2(k) = \lceil a k^{\rho} l \rceil - K L_1(k).$

Warning

I think there is a typo in the paper, as their formula are weird (like $$al$$ is defined from $$a$$). See parameter_ell().

getReward(arm, reward)[source]

Get a reward from an arm.

__module__ = 'Policies.LM_DSEE'
choice()[source]

Choose an arm following the different phase of growing lenghts according to the LM-DSEE algorithm.