Policies.GLR_UCB module¶
The GLR-UCB policy and variants, for non-stationary bandits.
- Reference: [[“Combining the Generalized Likelihood Ratio Test and kl-UCB for Non-Stationary Bandits. E. Kaufmann and L. Besson, 2019]](https://hal.inria.fr/hal-02006471/) 
- It runs on top of a simple policy, e.g., - UCB, and- BernoulliGLR_IndexPolicyis a wrapper:- >>> policy = BernoulliGLR_IndexPolicy(nbArms, UCB) >>> # use policy as usual, with policy.startGame(), r = policy.choice(), policy.getReward(arm, r) 
- It uses an additional \(\mathcal{O}(\tau_\max)\) memory for a game of maximum stationary length \(\tau_\max\). 
Warning
It can only work on basic index policy based on empirical averages (and an exploration bias), like UCB, and cannot work on any Bayesian policy (for which we would have to remember all previous observations in order to reset the history with a small history)!
- 
Policies.GLR_UCB.VERBOSE= False¶
- Whether to be verbose when doing the change detection algorithm. 
- 
Policies.GLR_UCB.PROBA_RANDOM_EXPLORATION= 0.1¶
- Default probability of random exploration \(\alpha\). 
- 
Policies.GLR_UCB.PER_ARM_RESTART= True¶
- Should we reset one arm empirical average or all? Default is - True, it’s usually more efficient!
- 
Policies.GLR_UCB.FULL_RESTART_WHEN_REFRESH= False¶
- Should we fully restart the algorithm or simply reset one arm empirical average? Default is - False, it’s usually more efficient!
- 
Policies.GLR_UCB.LAZY_DETECT_CHANGE_ONLY_X_STEPS= 10¶
- XXX Be lazy and try to detect changes only X steps, where X is small like 10 for instance. It is a simple but efficient way to speed up CD tests, see https://github.com/SMPyBandits/SMPyBandits/issues/173 Default value is 0, to not use this feature, and 10 should speed up the test by x10. 
- 
Policies.GLR_UCB.LAZY_TRY_VALUE_S_ONLY_X_STEPS= 10¶
- XXX Be lazy and try to detect changes for \(s\) taking steps of size - steps_s. Default is to have- steps_s=1, but only using- steps_s=2should already speed up by 2. It is a simple but efficient way to speed up GLR tests, see https://github.com/SMPyBandits/SMPyBandits/issues/173 Default value is 1, to not use this feature, and 10 should speed up the test by x10.
- 
Policies.GLR_UCB.USE_LOCALIZATION= True¶
- Default value of - use_localizationfor policies. All the experiments I tried showed that the localization always helps improving learning, so the default value is set to True.
- 
Policies.GLR_UCB.eps= 1e-10¶
- Threshold value: everything in [0, 1] is truncated to [eps, 1 - eps] 
- 
Policies.GLR_UCB.klBern(x, y)[source]¶
- Kullback-Leibler divergence for Bernoulli distributions. https://en.wikipedia.org/wiki/Bernoulli_distribution#Kullback.E2.80.93Leibler_divergence \[\mathrm{KL}(\mathcal{B}(x), \mathcal{B}(y)) = x \log(\frac{x}{y}) + (1-x) \log(\frac{1-x}{1-y}).\]
- 
Policies.GLR_UCB.klGauss(x, y, sig2x=1)[source]¶
- Kullback-Leibler divergence for Gaussian distributions of means - xand- yand variances- sig2xand- sig2y, \(\nu_1 = \mathcal{N}(x, \sigma_x^2)\) and \(\nu_2 = \mathcal{N}(y, \sigma_x^2)\):\[\mathrm{KL}(\nu_1, \nu_2) = \frac{(x - y)^2}{2 \sigma_y^2} + \frac{1}{2}\left( \frac{\sigma_x^2}{\sigma_y^2} - 1 \log\left(\frac{\sigma_x^2}{\sigma_y^2}\right) \right).\]- See https://en.wikipedia.org/wiki/Normal_distribution#Other_properties 
- 
Policies.GLR_UCB.threshold_GaussianGLR(t, horizon=None, delta=None, variant=None)[source]¶
- Compute the value :math:`c from the corollary of of Theorem 2 from [“Sequential change-point detection: Laplace concentration of scan statistics and non-asymptotic delay bounds”, O.-A. Maillard, 2018]. - The threshold is computed as (with \(t_0 = 0\)): 
 \[\beta(t_0, t, \delta) := \left(1 + \frac{1}{t - t_0 + 1}\right) 2 \log\left(\frac{2 (t - t_0) \sqrt{(t - t_0) + 2}}{\delta}\right).\]
- 
Policies.GLR_UCB.function_h_minus_one(x)[source]¶
- The inverse function of \(h(u)\), that is \(h^{-1}(x) = u \Leftrightarrow h(u) = x\). It is given by the Lambert W function, see - scipy.special.lambertw():\[h^{-1}(x) = - \mathcal{W}(- \exp(-x)).\]- Example: 
 - >>> np.random.seed(105) >>> y = np.random.randn() ** 2 >>> print(f"y = {y}") y = 0.060184682907834595 >>> x = function_h(y) >>> print(f"h(y) = {x}") h(y) = 2.8705220786966508 >>> z = function_h_minus_one(x) >>> print(f"h^-1(x) = {z}") h^-1(x) = 0.060184682907834595 >>> assert np.isclose(z, y), "Error: h^-1(h(y)) = z = {z} should be very close to y = {}...".format(z, y) 
- 
Policies.GLR_UCB.constant_power_function_h= 1.5¶
- The constant \(\frac{3}{2}\), used in the definition of functions \(h\), \(h^{-1}\), \(\tilde{h}\) and \(\mathcal{T}\). 
- 
Policies.GLR_UCB.threshold_function_h_tilde= 3.801770285137458¶
- The constant \(h^{-1}(1/\log(\frac{3}{2}))\), used in the definition of function \(\tilde{h}\). 
- 
Policies.GLR_UCB.constant_function_h_tilde= -0.90272045571788¶
- The constant \(\log(\log(\frac{3}{2}))\), used in the definition of function \(\tilde{h}\). 
- 
Policies.GLR_UCB.function_h_tilde(x)[source]¶
- The function \(\tilde{h}(x)\), defined by: \[\begin{split}\tilde{h}(x) = \begin{cases} e^{1/h^{-1}(x)} h^{-1}(x) & \text{ if } x \ge h^{-1}(1/\ln (3/2)), \\ (3/2) (x-\ln \ln (3/2)) & \text{otherwise}. \end{cases}\end{split}\]
- 
Policies.GLR_UCB.zeta_of_two= 1.6449340668482264¶
- The constant \(\zeta(2) = \frac{\pi^2}{6}\). 
- 
Policies.GLR_UCB.function_T_mathcal(x)[source]¶
- The function \(\mathcal{T}(x)\), defined by: \[\mathcal{T}(x) = 2 \tilde h\left(\frac{h^{-1}(1+x) + \ln(2\zeta(2))}{2}\right).\]
- 
Policies.GLR_UCB.approximation_function_T_mathcal(x)[source]¶
- An efficiently computed approximation of \(\mathcal{T}(x)\), valid for \(x \geq 5\): \[\mathcal{T}(x) \simeq x + 4 \log(1 + x + \sqrt(2 x)).\]
- 
Policies.GLR_UCB.threshold_BernoulliGLR(t, horizon=None, delta=None, variant=None)[source]¶
- Compute the value \(c\) from the corollary of of Theorem 2 from [“Sequential change-point detection: Laplace concentration of scan statistics and non-asymptotic delay bounds”, O.-A. Maillard, 2018]. - Warning - This is still experimental, you can try different variants of the threshold function: - Variant #0 (default) is: 
 \[\beta(t, \delta) := \log\left(\frac{3 t^{3/2}}{\delta}\right) = \log(\frac{1}{\delta}) + \log(3) + 3/2 \log(t).\]- Variant #1 is smaller: 
 \[\beta(t, \delta) := \log(\frac{1}{\delta}) + \log(1 + \log(t)).\]- Variant #2 is using \(\mathcal{T}\): 
 \[\beta(t, \delta) := 2 \mathcal{T}\left(\frac{\log(2 t^{3/2}) / \delta}{2}\right) + 6 \log(1 + \log(t)).\]- Variant #3 is using \(\tilde{\mathcal{T}}(x) = x + 4 \log(1 + x + \sqrt{2x})\) an approximation of \(\mathcal{T}(x)\) (valid and quite accurate as soon as \(x \geq 5\)): 
 \[\beta(t, \delta) := 2 \tilde{\mathcal{T}}\left(\frac{\log(2 t^{3/2}) / \delta}{2}\right) + 6 \log(1 + \log(t)).\]
- 
Policies.GLR_UCB.EXPONENT_BETA= 1.01¶
- The default value of parameter \(\beta\) for the function - decreasing_alpha__GLR().
- 
Policies.GLR_UCB.ALPHA_T1= 0.05¶
- The default value of parameter \(\alpha_{t=1}\) for the function - decreasing_alpha__GLR().
- 
Policies.GLR_UCB.decreasing_alpha__GLR(alpha0=None, t=1, exponentBeta=1.01, alpha_t1=0.05)[source]¶
- Either use a fixed alpha, or compute it with an exponential decay (if - alpha0=None).- Note - I am currently exploring the following variant (November 2018): - The probability of uniform exploration, \(\alpha\), is computed as a function of the current time: 
 \[\forall t>0, \alpha = \alpha_t := \alpha_{t=1} \frac{1}{\max(1, t^{\beta})}.\]- with \(\beta > 1, \beta\) = - exponentBeta(=1.05) and \(\alpha_{t=1} < 1, \alpha_{t=1}\) =- alpha_t1(=0.01).
- the only requirement on \(\alpha_t\) seems to be that sum_{t=1}^T alpha_t < +infty (ie. be finite), which is the case for \(\alpha_t = \alpha = \frac{1}{T}\), but also any \(\alpha_t = \frac{\alpha_1}{t^{\beta}}\) for any \(\beta>1\) (cf. Riemann series). 
 
- 
Policies.GLR_UCB.smart_delta_from_T_UpsilonT(horizon=1, max_nb_random_events=1, scaleFactor=1.0, per_arm_restart=True, nbArms=1)[source]¶
- Compute a smart estimate of the optimal value for the confidence level \(\delta\), with - scaleFactor\(= \delta_0\in(0,1)\) a constant.- If - per_arm_restartis True (Local option):
 \[\delta = \frac{\delta_0}{\sqrt{K \Upsilon_T T}.\]- If - per_arm_restartis False (Global option):
 \[\delta = \frac{\delta_0}{\sqrt{\Upsilon_T T}.\]- Note that if \(\Upsilon_T\) is unknown, it is assumed to be \(\Upsilon_T=1\). 
- 
Policies.GLR_UCB.smart_alpha_from_T_UpsilonT(horizon=1, max_nb_random_events=1, scaleFactor=0.1, per_arm_restart=True, nbArms=1)[source]¶
- Compute a smart estimate of the optimal value for the fixed or random forced exploration probability \(\alpha\) (or tracking based), with - scaleFactor\(= \alpha_0\in(0,1)\) a constant.- If - per_arm_restartis True (Local option):
 \[\alpha = \alpha_0 \times \sqrt{\frac{K \Upsilon_T}{T} \log(T)}.\]- If - per_arm_restartis False (Global option):
 \[\alpha = \alpha_0 \times \sqrt{\frac{\Upsilon_T}{T} \log(T)}.\]- Note that if \(\Upsilon_T\) is unknown, it is assumed to be \(\Upsilon_T=1\). 
- 
class Policies.GLR_UCB.GLR_IndexPolicy(nbArms, horizon=None, delta=None, max_nb_random_events=None, kl=<function klGauss>, alpha0=None, exponentBeta=1.01, alpha_t1=0.05, threshold_function=<function threshold_BernoulliGLR>, variant=None, use_increasing_alpha=False, lazy_try_value_s_only_x_steps=10, per_arm_restart=True, use_localization=True, *args, **kwargs)[source]¶
- Bases: - Policies.CD_UCB.CD_IndexPolicy- The GLR-UCB generic policy for non-stationary bandits, using the Generalized Likelihood Ratio test (GLR), for 1-dimensional exponential families. - It works for any 1-dimensional exponential family, you just have to give a - klfunction.
- For instance - kullback.klBern(), for Bernoulli distributions, gives- GaussianGLR_IndexPolicy,
- And - kullback.klGauss()for univariate Gaussian distributions, gives- BernoulliGLR_IndexPolicy.
- threshold_functioncomputes the threshold \(\beta(t, \delta)\), it can be for instance- threshold_GaussianGLR()or- threshold_BernoulliGLR().
- From [“Sequential change-point detection: Laplace concentration of scan statistics and non-asymptotic delay bounds”, O.-A. Maillard, 2018]. 
- Reference: [[“Combining the Generalized Likelihood Ratio Test and kl-UCB for Non-Stationary Bandits. E. Kaufmann and L. Besson, 2019]](https://hal.inria.fr/hal-02006471/) 
 - 
__init__(nbArms, horizon=None, delta=None, max_nb_random_events=None, kl=<function klGauss>, alpha0=None, exponentBeta=1.01, alpha_t1=0.05, threshold_function=<function threshold_BernoulliGLR>, variant=None, use_increasing_alpha=False, lazy_try_value_s_only_x_steps=10, per_arm_restart=True, use_localization=True, *args, **kwargs)[source]¶
- New policy. 
 - 
horizon= None¶
- The horizon \(T\). 
 - 
max_nb_random_events= None¶
- The number of breakpoints \(\Upsilon_T\). 
 - 
use_localization= None¶
- experiment to use localization of the break-point, ie, restart memory of arm by keeping observations s+1…n instead of just the last one 
 - 
delta= None¶
- The confidence level \(\delta\). Defaults to \(\delta=\frac{1}{\sqrt{T}}\) if - horizonis given and- delta=Nonebut \(\Upsilon_T\) is unknown. Defaults to \(\delta=\frac{1}{\sqrt{\Upsilon_T T}}\) if both \(T\) and \(\Upsilon_T\) are given (- horizonand- max_nb_random_events).
 - 
kl= None¶
- The parametrized Kullback-Leibler divergence (\(\mathrm{kl}(x,y) = KL(D(x),D(y))\)) for the 1-dimensional exponential family \(x\mapsto D(x)\). Example: - kullback.klBern()or- kullback.klGauss().
 - 
lazy_try_value_s_only_x_steps= None¶
- Be lazy and try to detect changes for \(s\) taking steps of size - steps_s.
 - 
property proba_random_exploration¶
- What they call \(\alpha\) in their paper: the probability of uniform exploration at each time. 
 - 
getReward(arm, reward)[source]¶
- Do as - CD_UCBto handle the new reward, and also, update the internal times of each arm for the indexes of- klUCB_forGLR(or other index policies), which use \(f(t - \tau_i(t))\) for the exploration function of each arm \(i\) at time \(t\), where \(\tau_i(t)\) denotes the (last) restart time of the arm.
 - 
detect_change(arm, verbose=False)[source]¶
- Detect a change in the current arm, using the Generalized Likelihood Ratio test (GLR) and the - klfunction.- For each time step \(s\) between \(t_0=0\) and \(t\), compute: 
 \[G^{\mathrm{kl}}_{t_0:s:t} = (s-t_0+1) \mathrm{kl}(\mu_{t_0,s}, \mu_{t_0,t}) + (t-s) \mathrm{kl}(\mu_{s+1,t}, \mu_{t_0,t}).\]- The change is detected if there is a time \(s\) such that \(G^{\mathrm{kl}}_{t_0:s:t} > h\), where - threshold_his the threshold of the test,
- And \(\mu_{a,b} = \frac{1}{b-a+1} \sum_{s=a}^{b} y_s\) is the mean of the samples between \(a\) and \(b\). 
 - Warning - This is computationally costly, so an easy way to speed up this test is to use - lazy_try_value_s_only_x_steps\(= \mathrm{Step_s}\) for a small value (e.g., 10), so not test for all \(s\in[t_0, t-1]\) but only \(s\in[t_0, t-1], s \mod \mathrm{Step_s} = 0\) (e.g., one out of every 10 steps).
 - 
__module__= 'Policies.GLR_UCB'¶
 
- 
class Policies.GLR_UCB.GLR_IndexPolicy_WithTracking(nbArms, horizon=None, delta=None, max_nb_random_events=None, kl=<function klGauss>, alpha0=None, exponentBeta=1.01, alpha_t1=0.05, threshold_function=<function threshold_BernoulliGLR>, variant=None, use_increasing_alpha=False, lazy_try_value_s_only_x_steps=10, per_arm_restart=True, use_localization=True, *args, **kwargs)[source]¶
- Bases: - Policies.GLR_UCB.GLR_IndexPolicy- A variant of the GLR policy where the exploration is not forced to be uniformly random but based on a tracking of arms that haven’t been explored enough (with a tracking). - Reference: [[“Combining the Generalized Likelihood Ratio Test and kl-UCB for Non-Stationary Bandits. E. Kaufmann and L. Besson, 2019]](https://hal.inria.fr/hal-02006471/) 
 - 
choice()[source]¶
- If any arm is not explored enough (\(n_k \leq \frac{\alpha}{K} \times (t - n_k)\), play uniformly at random one of these arms, otherwise, pass the call to - choice()of the underlying policy.
 - 
__module__= 'Policies.GLR_UCB'¶
 
- 
class Policies.GLR_UCB.GLR_IndexPolicy_WithDeterministicExploration(nbArms, horizon=None, delta=None, max_nb_random_events=None, kl=<function klGauss>, alpha0=None, exponentBeta=1.01, alpha_t1=0.05, threshold_function=<function threshold_BernoulliGLR>, variant=None, use_increasing_alpha=False, lazy_try_value_s_only_x_steps=10, per_arm_restart=True, use_localization=True, *args, **kwargs)[source]¶
- Bases: - Policies.GLR_UCB.GLR_IndexPolicy- A variant of the GLR policy where the exploration is not forced to be uniformly random but deterministic, inspired by what M-UCB proposed. - If \(t\) is the current time and \(\tau\) is the latest restarting time, then uniform exploration is done if: 
 \[\begin{split}A &:= (t - \tau) \mod \lceil \frac{K}{\gamma} \rceil,\\ A &\leq K \implies A_t = A.\end{split}\]- Reference: [[“Combining the Generalized Likelihood Ratio Test and kl-UCB for Non-Stationary Bandits. E. Kaufmann and L. Besson, 2019]](https://hal.inria.fr/hal-02006471/) 
 - 
choice()[source]¶
- For some time steps, play uniformly at random one of these arms, otherwise, pass the call to - choice()of the underlying policy.
 - 
__module__= 'Policies.GLR_UCB'¶
 
- 
class Policies.GLR_UCB.GaussianGLR_IndexPolicy(nbArms, sig2=0.25, kl=<function klGauss>, threshold_function=<function threshold_GaussianGLR>, *args, **kwargs)[source]¶
- Bases: - Policies.GLR_UCB.GLR_IndexPolicy- The GaussianGLR-UCB policy for non-stationary bandits, for fixed-variance Gaussian distributions (ie, \(\sigma^2\) known and fixed). - 
__init__(nbArms, sig2=0.25, kl=<function klGauss>, threshold_function=<function threshold_GaussianGLR>, *args, **kwargs)[source]¶
- New policy. 
 - 
_sig2= None¶
- Fixed variance \(\sigma^2\) of the Gaussian distributions. Extra parameter given to - kullback.klGauss(). Default to \(\sigma^2 = \frac{1}{4}\).
 - 
__module__= 'Policies.GLR_UCB'¶
 
- 
- 
class Policies.GLR_UCB.GaussianGLR_IndexPolicy_WithTracking(nbArms, sig2=0.25, kl=<function klGauss>, threshold_function=<function threshold_GaussianGLR>, *args, **kwargs)[source]¶
- Bases: - Policies.GLR_UCB.GLR_IndexPolicy_WithTracking,- Policies.GLR_UCB.GaussianGLR_IndexPolicy- A variant of the GaussianGLR-UCB policy where the exploration is not forced to be uniformly random but based on a tracking of arms that haven’t been explored enough. - 
__module__= 'Policies.GLR_UCB'¶
 
- 
- 
class Policies.GLR_UCB.GaussianGLR_IndexPolicy_WithDeterministicExploration(nbArms, sig2=0.25, kl=<function klGauss>, threshold_function=<function threshold_GaussianGLR>, *args, **kwargs)[source]¶
- Bases: - Policies.GLR_UCB.GLR_IndexPolicy_WithDeterministicExploration,- Policies.GLR_UCB.GaussianGLR_IndexPolicy- A variant of the GaussianGLR-UCB policy where the exploration is not forced to be uniformly random but deterministic, inspired by what M-UCB proposed. - 
__module__= 'Policies.GLR_UCB'¶
 
- 
- 
class Policies.GLR_UCB.BernoulliGLR_IndexPolicy(nbArms, kl=<function klBern>, threshold_function=<function threshold_BernoulliGLR>, *args, **kwargs)[source]¶
- Bases: - Policies.GLR_UCB.GLR_IndexPolicy- The BernoulliGLR-UCB policy for non-stationary bandits, for Bernoulli distributions. - Reference: [[“Combining the Generalized Likelihood Ratio Test and kl-UCB for Non-Stationary Bandits. E. Kaufmann and L. Besson, 2019]](https://hal.inria.fr/hal-02006471/) 
 - 
__init__(nbArms, kl=<function klBern>, threshold_function=<function threshold_BernoulliGLR>, *args, **kwargs)[source]¶
- New policy. 
 - 
__module__= 'Policies.GLR_UCB'¶
 
- 
class Policies.GLR_UCB.BernoulliGLR_IndexPolicy_WithTracking(nbArms, kl=<function klBern>, threshold_function=<function threshold_BernoulliGLR>, *args, **kwargs)[source]¶
- Bases: - Policies.GLR_UCB.GLR_IndexPolicy_WithTracking,- Policies.GLR_UCB.BernoulliGLR_IndexPolicy- A variant of the BernoulliGLR-UCB policy where the exploration is not forced to be uniformly random but based on a tracking of arms that haven’t been explored enough. - Reference: [[“Combining the Generalized Likelihood Ratio Test and kl-UCB for Non-Stationary Bandits. E. Kaufmann and L. Besson, 2019]](https://hal.inria.fr/hal-02006471/) 
 - 
__module__= 'Policies.GLR_UCB'¶
 
- 
class Policies.GLR_UCB.BernoulliGLR_IndexPolicy_WithDeterministicExploration(nbArms, kl=<function klBern>, threshold_function=<function threshold_BernoulliGLR>, *args, **kwargs)[source]¶
- Bases: - Policies.GLR_UCB.GLR_IndexPolicy_WithDeterministicExploration,- Policies.GLR_UCB.BernoulliGLR_IndexPolicy- A variant of the BernoulliGLR-UCB policy where the exploration is not forced to be uniformly random but deterministic, inspired by what M-UCB proposed. - Reference: [[“Combining the Generalized Likelihood Ratio Test and kl-UCB for Non-Stationary Bandits. E. Kaufmann and L. Besson, 2019]](https://hal.inria.fr/hal-02006471/) 
 - 
__module__= 'Policies.GLR_UCB'¶
 
- 
class Policies.GLR_UCB.OurGaussianGLR_IndexPolicy(nbArms, sig2=0.25, kl=<function klGauss>, threshold_function=<function threshold_BernoulliGLR>, *args, **kwargs)[source]¶
- Bases: - Policies.GLR_UCB.GLR_IndexPolicy- The GaussianGLR-UCB policy for non-stationary bandits, for fixed-variance Gaussian distributions (ie, \(\sigma^2\) known and fixed), but with our threshold designed for the sub-Bernoulli case. - Reference: [[“Combining the Generalized Likelihood Ratio Test and kl-UCB for Non-Stationary Bandits. E. Kaufmann and L. Besson, 2019]](https://hal.inria.fr/hal-02006471/) 
 - 
__init__(nbArms, sig2=0.25, kl=<function klGauss>, threshold_function=<function threshold_BernoulliGLR>, *args, **kwargs)[source]¶
- New policy. 
 - 
_sig2= None¶
- Fixed variance \(\sigma^2\) of the Gaussian distributions. Extra parameter given to - kullback.klGauss(). Default to \(\sigma^2 = \frac{1}{4}\).
 - 
__module__= 'Policies.GLR_UCB'¶
 
- 
class Policies.GLR_UCB.OurGaussianGLR_IndexPolicy_WithTracking(nbArms, sig2=0.25, kl=<function klGauss>, threshold_function=<function threshold_BernoulliGLR>, *args, **kwargs)[source]¶
- Bases: - Policies.GLR_UCB.GLR_IndexPolicy_WithTracking,- Policies.GLR_UCB.OurGaussianGLR_IndexPolicy- A variant of the GaussianGLR-UCB policy where the exploration is not forced to be uniformly random but based on a tracking of arms that haven’t been explored enough, but with our threshold designed for the sub-Bernoulli case, but with our threshold designed for the sub-Bernoulli case. - Reference: [[“Combining the Generalized Likelihood Ratio Test and kl-UCB for Non-Stationary Bandits. E. Kaufmann and L. Besson, 2019]](https://hal.inria.fr/hal-02006471/) 
 - 
__module__= 'Policies.GLR_UCB'¶
 
- 
class Policies.GLR_UCB.OurGaussianGLR_IndexPolicy_WithDeterministicExploration(nbArms, sig2=0.25, kl=<function klGauss>, threshold_function=<function threshold_BernoulliGLR>, *args, **kwargs)[source]¶
- Bases: - Policies.GLR_UCB.GLR_IndexPolicy_WithDeterministicExploration,- Policies.GLR_UCB.OurGaussianGLR_IndexPolicy- A variant of the GaussianGLR-UCB policy where the exploration is not forced to be uniformly random but deterministic, inspired by what M-UCB proposed, but with our threshold designed for the sub-Bernoulli case. - Reference: [[“Combining the Generalized Likelihood Ratio Test and kl-UCB for Non-Stationary Bandits. E. Kaufmann and L. Besson, 2019]](https://hal.inria.fr/hal-02006471/) 
 - 
__module__= 'Policies.GLR_UCB'¶
 
- 
Policies.GLR_UCB.SubGaussianGLR_DELTA= 0.01¶
- Default confidence level for - SubGaussianGLR_IndexPolicy.
- 
Policies.GLR_UCB.SubGaussianGLR_SIGMA= 0.25¶
- By default, - SubGaussianGLR_IndexPolicyassumes distributions are 0.25-sub Gaussian, like Bernoulli or any distributions with support on \([0,1]\).
- 
Policies.GLR_UCB.SubGaussianGLR_JOINT= True¶
- Whether to use the joint or disjoint threshold function ( - threshold_SubGaussianGLR_joint()or- threshold_SubGaussianGLR_disjoint()) for- SubGaussianGLR_IndexPolicy.
- 
Policies.GLR_UCB.threshold_SubGaussianGLR_joint(s, t, delta=0.01, sigma=0.25)[source]¶
- Compute the threshold :math:`b^{text{joint}}_{t_0}(s,t,delta) according to this formula: \[b^{\text{joint}}_{t_0}(s,t,\delta) := \sigma \sqrt{ \left(\frac{1}{s-t_0+1} + \frac{1}{t-s}\right) \left(1 + \frac{1}{t-t_0+1}\right) 2 \log\left( \frac{2(t-t_0)\sqrt{t-t_0+2}}{\delta} \right)}.\]
- 
Policies.GLR_UCB.threshold_SubGaussianGLR_disjoint(s, t, delta=0.01, sigma=0.25)[source]¶
- Compute the threshold \(b^{\text{disjoint}}_{t_0}(s,t,\delta)\) according to this formula: \[b^{\text{disjoint}}_{t_0}(s,t,\delta) := \sqrt{2} \sigma \sqrt{\frac{1 + \frac{1}{s - t_0 + 1}}{s - t_0 + 1} \log\left( \frac{4 \sqrt{s - t_0 + 2}}{\delta}\right)} + \sqrt{\frac{1 + \frac{1}{t - s + 1}}{t - s + 1} \log\left( \frac{4 (t - t_0) \sqrt{t - s + 1}}{\delta}\right)}.\]
- 
Policies.GLR_UCB.threshold_SubGaussianGLR(s, t, delta=0.01, sigma=0.25, joint=True)[source]¶
- Compute the threshold \(b^{\text{joint}}_{t_0}(s,t,\delta)\) or \(b^{\text{disjoint}}_{t_0}(s,t,\delta)\). 
- 
class Policies.GLR_UCB.SubGaussianGLR_IndexPolicy(nbArms, horizon=None, max_nb_random_events=None, full_restart_when_refresh=False, policy=<class 'Policies.UCB.UCB'>, delta=0.01, sigma=0.25, joint=True, exponentBeta=1.05, alpha_t1=0.1, alpha0=None, lazy_detect_change_only_x_steps=10, lazy_try_value_s_only_x_steps=10, use_localization=True, *args, **kwargs)[source]¶
- Bases: - Policies.CD_UCB.CD_IndexPolicy- The SubGaussianGLR-UCB policy for non-stationary bandits, using the Generalized Likelihood Ratio test (GLR), for sub-Gaussian distributions. - It works for any sub-Gaussian family of distributions, being \(\sigma^2\)-sub Gaussian with known \(\sigma\). 
- From [“Sequential change-point detection: Laplace concentration of scan statistics and non-asymptotic delay bounds”, O.-A. Maillard, 2018]. 
 - 
__init__(nbArms, horizon=None, max_nb_random_events=None, full_restart_when_refresh=False, policy=<class 'Policies.UCB.UCB'>, delta=0.01, sigma=0.25, joint=True, exponentBeta=1.05, alpha_t1=0.1, alpha0=None, lazy_detect_change_only_x_steps=10, lazy_try_value_s_only_x_steps=10, use_localization=True, *args, **kwargs)[source]¶
- New policy. 
 - 
horizon= None¶
- The horizon \(T\). 
 - 
max_nb_random_events= None¶
- The number of breakpoints \(\Upsilon_T\). 
 - 
delta= None¶
- The confidence level \(\delta\). Defaults to \(\delta=\frac{1}{T}\) if - horizonis given and- delta=None.
 - 
sigma= None¶
- Parameter \(\sigma\) for the Sub-Gaussian-GLR test. 
 - 
joint= None¶
- Parameter - jointfor the Sub-Gaussian-GLR test.
 - 
lazy_try_value_s_only_x_steps= None¶
- Be lazy and try to detect changes for \(s\) taking steps of size - steps_s.
 - 
use_localization= None¶
- experiment to use localization of the break-point, ie, restart memory of arm by keeping observations s+1…n instead of just the last one 
 - 
compute_threshold_h(s, t)[source]¶
- Compute the threshold \(h\) with - threshold_SubGaussianGLR().
 - 
__module__= 'Policies.GLR_UCB'¶
 - 
property proba_random_exploration¶
- What they call \(\alpha\) in their paper: the probability of uniform exploration at each time. 
 - 
detect_change(arm, verbose=False)[source]¶
- Detect a change in the current arm, using the non-parametric sub-Gaussian Generalized Likelihood Ratio test (GLR) works like this: - For each time step \(s\) between \(t_0=0\) and \(t\), compute: 
 \[G^{\text{sub-}\sigma}_{t_0:s:t} = |\mu_{t_0,s} - \mu_{s+1,t}|.\]- The change is detected if there is a time \(s\) such that \(G^{\text{sub-}\sigma}_{t_0:s:t} > b_{t_0}(s,t,\delta)\), where \(b_{t_0}(s,t,\delta)\) is the threshold of the test, 
- The threshold is computed as: 
 \[b_{t_0}(s,t,\delta) := \sigma \sqrt{ \left(\frac{1}{s-t_0+1} + \frac{1}{t-s}\right) \left(1 + \frac{1}{t-t_0+1}\right) 2 \log\left( \frac{2(t-t_0)\sqrt{t-t_0+2}}{\delta} \right)}.\]- And \(\mu_{a,b} = \frac{1}{b-a+1} \sum_{s=a}^{b} y_s\) is the mean of the samples between \(a\) and \(b\).