# Policies.Exp3R module¶

The Drift-Detection algorithm for non-stationary bandits.

Warning

It works on Exp3 or other parametrizations of the Exp3 policy, e.g., Exp3PlusPlus.

Policies.Exp3R.VERBOSE = False

Whether to be verbose when doing the search for valid parameter $$\ell$$.

Policies.Exp3R.CONSTANT_C = 1.0

The constant $$C$$ used in Corollary 1 of paper [[“EXP3 with Drift Detection for the Switching Bandit Problem”, Robin Allesiardo & Raphael Feraud]](https://www.researchgate.net/profile/Allesiardo_Robin/publication/281028960_EXP3_with_Drift_Detection_for_the_Switching_Bandit_Problem/links/55d1927808aee19936fdac8e.pdf).

class Policies.Exp3R.DriftDetection_IndexPolicy(nbArms, H=None, delta=None, C=1.0, horizon=None, policy=<class 'Policies.Exp3.Exp3'>, *args, **kwargs)[source]

The Drift-Detection generic policy for non-stationary bandits, using a custom Drift-Detection test, for 1-dimensional exponential families.

__init__(nbArms, H=None, delta=None, C=1.0, horizon=None, policy=<class 'Policies.Exp3.Exp3'>, *args, **kwargs)[source]

New policy.

H = None

Parameter $$H$$ for the Drift-Detection algorithm. Default value is $$\lceil C \sqrt{T \log(T)} \rceil$$, for some constant $$C=$$ C (= CONSTANT_C by default).

delta = None

Parameter $$\delta$$ for the Drift-Detection algorithm. Default value is $$\sqrt{\frac{\log(T)}{K T}}$$ for $$K$$ arms and horizon $$T$$.

property proba_random_exploration

Parameter $$\gamma$$ for the Exp3 algorithm.

property threshold_h

Parameter $$\varepsilon$$ for the Drift-Detection algorithm.

$\varepsilon = \sqrt{\frac{K \log(\frac{1}{\delta})}{2 \gamma H}}.$
property min_number_of_pulls_to_test_change

Compute $$\Gamma_{\min}(I) := \frac{\gamma H}{K}$$, the minimum number of samples we should have for all arms before testing for a change.

__str__()[source]

-> str

detect_change(arm, verbose=False)[source]

Detect a change in the current arm, using a Drift-Detection test (DD).

$\begin{split}k_{\max} &:= \arg\max_k \tilde{\rho}_k(t),\\ DD_t(k) &= \hat{\mu}_k(I) - \hat{\mu}_{k_{\max}}(I).\end{split}$
• The change is detected if there is an arm $$k$$ such that $$DD_t(k) \geq 2 * \varepsilon = h$$, where threshold_h is the threshold of the test, and $$I$$ is the (number of the) current interval since the last (global) restart,

• where $$\tilde{\rho}_k(t)$$ is the trust probability of arm $$k$$ from the Exp3 algorithm,

• and where $$\hat{\mu}_k(I)$$ is the empirical mean of arm $$k$$ from the data in the current interval.

Warning

FIXME I know this implementation is not (yet) correct… I should count differently the samples we obtained from the Gibbs distribution (when Exp3 uses the trust vector) and from the uniform distribution This $$\Gamma_{\min}(I)$$ is the minimum number of samples obtained from the uniform exploration (of probability $$\gamma$$). It seems painful to code correctly, I will do it later.

__module__ = 'Policies.Exp3R'
class Policies.Exp3R.Exp3R(nbArms, policy=<class 'Policies.Exp3.Exp3'>, *args, **kwargs)[source]

The Exp3.R policy for non-stationary bandits.

__init__(nbArms, policy=<class 'Policies.Exp3.Exp3'>, *args, **kwargs)[source]

New policy.

__str__()[source]

-> str

__module__ = 'Policies.Exp3R'
class Policies.Exp3R.Exp3RPlusPlus(nbArms, policy=<class 'Policies.Exp3PlusPlus.Exp3PlusPlus'>, *args, **kwargs)[source]

The Exp3.R++ policy for non-stationary bandits.

__init__(nbArms, policy=<class 'Policies.Exp3PlusPlus.Exp3PlusPlus'>, *args, **kwargs)[source]

New policy.

__module__ = 'Policies.Exp3R'
__str__()[source]

-> str