Policies.LM_DSEE module¶
The LM-DSEE policy for non-stationary bandits, from [[“On Abruptly-Changing and Slowly-Varying Multiarmed Bandit Problems”, by Lai Wei, Vaibhav Srivastava, 2018, arXiv:1802.08380]](https://arxiv.org/pdf/1802.08380)
It uses an additional \(\mathcal{O}(\tau_\max)\) memory for a game of maximum stationary length \(\tau_\max\).
Warning
This implementation is still experimental!
-
class
Policies.LM_DSEE.State¶ Bases:
enum.EnumDifferent states during the LM-DSEE algorithm
-
Exploitation= 2¶
-
Exploration= 1¶
-
__module__= 'Policies.LM_DSEE'¶
-
-
Policies.LM_DSEE.VERBOSE= False¶ Whether to be verbose when doing the search for valid parameter \(\ell\).
-
Policies.LM_DSEE.parameter_ell(a, N, b, gamma, verbose=False, max_value_on_l=1000000)[source]¶ Look for the smallest value of the parameter \(\ell\) that satisfies the following equations:
-
class
Policies.LM_DSEE.LM_DSEE(nbArms, nu=0.5, DeltaMin=0.5, a=1, b=0.25, *args, **kwargs)[source]¶ Bases:
Policies.BasePolicy.BasePolicyThe LM-DSEE policy for non-stationary bandits, from [[“On Abruptly-Changing and Slowly-Varying Multiarmed Bandit Problems”, by Lai Wei, Vaibhav Srivastava, 2018, arXiv:1802.08380]](https://arxiv.org/pdf/1802.08380)
-
a= None¶ Parameter \(a\) for the LM-DSEE algorithm.
-
b= None¶ Parameter \(b\) for the LM-DSEE algorithm.
-
l= None¶ Parameter \(\ell\) for the LM-DSEE algorithm, as computed by the function
parameter_ell().
-
gamma= None¶ Parameter \(\gamma\) for the LM-DSEE algorithm.
-
rho= None¶ Parameter \(\rho = \frac{1-\nu}{1+\nu}\) for the LM-DSEE algorithm.
-
phase= None¶ Current phase, exploration or exploitation.
-
current_exploration_arm= None¶ Currently explored arm.
-
current_exploitation_arm= None¶ Currently exploited arm.
-
batch_number= None¶ Number of batch
-
length_of_current_phase= None¶ Length of the current phase, either computed from
length_exploration_phase()or func:length_exploitation_phase.
-
step_of_current_phase= None¶ Timer inside the current phase.
-
all_rewards= None¶ Memory of all the rewards. A list per arm. Growing list until restart of that arm?
-
length_exploration_phase(verbose=False)[source]¶ Compute the value of the current exploration phase:
\[L_1(k) = L(k) = \lceil \gamma \log(k^{\rho} l b)\rceil.\]Warning
I think there is a typo in the paper, as their formula are weird (like \(al\) is defined from \(a\)). See
parameter_ell().
-
length_exploitation_phase(verbose=False)[source]¶ Compute the value of the current exploitation phase:
\[L_2(k) = \lceil a k^{\rho} l \rceil - K L_1(k).\]Warning
I think there is a typo in the paper, as their formula are weird (like \(al\) is defined from \(a\)). See
parameter_ell().
-
__module__= 'Policies.LM_DSEE'¶
-