Policies.klUCBswitch module

The kl-UCB-switch policy, for bounded distributions.

Policies.klUCBswitch.TOLERANCE = 0.0001

Default value for the tolerance for computing numerical approximations of the kl-UCB indexes.

Policies.klUCBswitch.threshold_switch_bestchoice(T, K, gamma=0.2)[source]

The threshold function \(f(T, K)\), to know when to switch from using \(I^{KL}_k(t)\) (kl-UCB index) to using \(I^{MOSS}_k(t)\) (MOSS index).

\[f(T, K) := \lfloor (T / K)^{\gamma} \rfloor, \gamma = 1/5.\]
Policies.klUCBswitch.threshold_switch_delayed(T, K, gamma=0.8888888888888888)[source]

Another threshold function \(f(T, K)\), to know when to switch from using \(I^{KL}_k(t)\) (kl-UCB index) to using \(I^{MOSS}_k(t)\) (MOSS index).

\[f(T, K) := \lfloor (T / K)^{\gamma} \rfloor, \gamma = 8/9.\]
Policies.klUCBswitch.threshold_switch_default(T, K, gamma=0.2)

The threshold function \(f(T, K)\), to know when to switch from using \(I^{KL}_k(t)\) (kl-UCB index) to using \(I^{MOSS}_k(t)\) (MOSS index).

\[f(T, K) := \lfloor (T / K)^{\gamma} \rfloor, \gamma = 1/5.\]
Policies.klUCBswitch.klucbplus_index(reward, pull, horizon, nbArms, klucb=CPUDispatcher(<function klucbBern>), c=1.0, tolerance=0.0001)[source]

One kl-UCB+ index, from [Cappé et al. 13](https://arxiv.org/pdf/1210.1136.pdf):

\[\begin{split}\hat{\mu}_k(t) &= \frac{X_k(t)}{N_k(t)}, \\ I^{KL+}_k(t) &= \sup\limits_{q \in [a, b]} \left\{ q : \mathrm{kl}(\hat{\mu}_k(t), q) \leq \frac{c \log(T / (K * N_k(t)))}{N_k(t)} \right\}.\end{split}\]
Policies.klUCBswitch.mossplus_index(reward, pull, horizon, nbArms)[source]

One MOSS+ index, from [Audibert & Bubeck, 2010](http://www.jmlr.org/papers/volume11/audibert10a/audibert10a.pdf):

\[I^{MOSS+}_k(t) = \frac{X_k(t)}{N_k(t)} + \sqrt{\max\left(0, \frac{\log\left(\frac{T}{K N_k(t)}\right)}{N_k(t)}\right)}.\]
class Policies.klUCBswitch.klUCBswitch(nbArms, horizon=None, threshold='best', tolerance=0.0001, klucb=CPUDispatcher(<function klucbBern>), c=1.0, lower=0.0, amplitude=1.0)[source]

Bases: Policies.klUCB.klUCB

The kl-UCB-switch policy, for bounded distributions.

__init__(nbArms, horizon=None, threshold='best', tolerance=0.0001, klucb=CPUDispatcher(<function klucbBern>), c=1.0, lower=0.0, amplitude=1.0)[source]

New generic index policy.

  • nbArms: the number of arms,

  • lower, amplitude: lower value and known amplitude of the rewards.

horizon = None

Parameter \(T\) = known horizon of the experiment.

constant_threshold_switch = None

For klUCBswitch (not the anytime variant), we can precompute the threshold as it is constant, \(= f(T, K)\).

use_MOSS_index = None

Initialize internal memory: at first, every arm uses the kl-UCB index, then some will switch to MOSS. (Array of K bool).

__str__()[source]

-> str

computeIndex(arm)[source]

Compute the current index, at time t and after \(N_k(t)\) pulls of arm k:

\[\begin{split}U_k(t) = \begin{cases} U^{KL+}_k(t) & \text{if } N_k(t) \leq f(T, K), \\ U^{MOSS+}_k(t) & \text{if } N_k(t) > f(T, K). \end{cases}.\end{split}\]
__module__ = 'Policies.klUCBswitch'
Policies.klUCBswitch.logplus(x)[source]

The \(\log_+\) function.

\[\log_+(x) := \max(0, \log(x)).\]
Policies.klUCBswitch.phi(x)[source]

The \(\phi(x)\) function defined in equation (6) in their paper.

\[\phi(x) := \log_+(x (1 + (\log_+(x))^2)).\]
Policies.klUCBswitch.klucb_index(reward, pull, t, nbArms, klucb=CPUDispatcher(<function klucbBern>), c=1.0, tolerance=0.0001)[source]

One kl-UCB index, from [Garivier & Cappé - COLT, 2011](https://arxiv.org/pdf/1102.2490.pdf):

\[\begin{split}\hat{\mu}_k(t) &= \frac{X_k(t)}{N_k(t)}, \\ I^{KL}_k(t) &= \sup\limits_{q \in [a, b]} \left\{ q : \mathrm{kl}(\hat{\mu}_k(t), q) \leq \frac{c \log(t / N_k(t))}{N_k(t)} \right\}.\end{split}\]
Policies.klUCBswitch.moss_index(reward, pull, t, nbArms)[source]

One MOSS index, from [Audibert & Bubeck, 2010](http://www.jmlr.org/papers/volume11/audibert10a/audibert10a.pdf):

\[I^{MOSS}_k(t) = \frac{X_k(t)}{N_k(t)} + \sqrt{\max\left(0, \frac{\log\left(\frac{t}{K N_k(t)}\right)}{N_k(t)}\right)}.\]
class Policies.klUCBswitch.klUCBswitchAnytime(nbArms, threshold='delayed', tolerance=0.0001, klucb=CPUDispatcher(<function klucbBern>), c=1.0, lower=0.0, amplitude=1.0)[source]

Bases: Policies.klUCBswitch.klUCBswitch

The anytime variant of the kl-UCB-switch policy, for bounded distributions.

__init__(nbArms, threshold='delayed', tolerance=0.0001, klucb=CPUDispatcher(<function klucbBern>), c=1.0, lower=0.0, amplitude=1.0)[source]

New generic index policy.

  • nbArms: the number of arms,

  • lower, amplitude: lower value and known amplitude of the rewards.

__module__ = 'Policies.klUCBswitch'
threshold_switch = None

A function, like threshold_switch(), of T and K, to decide when to switch from kl-UCB indexes to MOSS indexes (for each arm).

__str__()[source]

-> str

computeIndex(arm)[source]

Compute the current index, at time t and after \(N_k(t)\) pulls of arm k:

\[\begin{split}U_k(t) = \begin{cases} U^{KL}_k(t) & \text{if } N_k(t) \leq f(t, K), \\ U^{MOSS}_k(t) & \text{if } N_k(t) > f(t, K). \end{cases}.\end{split}\]