Policies.klUCBswitch module¶

The kl-UCB-switch policy, for bounded distributions.

Reference: [Garivier et al, 2018](https://arxiv.org/abs/1805.05071)

Policies.klUCBswitch.TOLERANCE = 0.0001¶: Default value for the tolerance for computing numerical approximations of the kl-UCB indexes.

Policies.klUCBswitch.threshold_switch_bestchoice(T, K, gamma=0.2)[source]¶: The threshold function \(f(T, K)\), to know when to switch from using \(I^{KL}_k(t)\) (kl-UCB index) to using \(I^{MOSS}_k(t)\) (MOSS index).

\[f(T, K) := \lfloor (T / K)^{\gamma} \rfloor, \gamma = 1/5.\]

Policies.klUCBswitch.threshold_switch_delayed(T, K, gamma=0.8888888888888888)[source]¶: Another threshold function \(f(T, K)\), to know when to switch from using \(I^{KL}_k(t)\) (kl-UCB index) to using \(I^{MOSS}_k(t)\) (MOSS index).

\[f(T, K) := \lfloor (T / K)^{\gamma} \rfloor, \gamma = 8/9.\]

Policies.klUCBswitch.threshold_switch_default(T, K, gamma=0.2)¶: The threshold function \(f(T, K)\), to know when to switch from using \(I^{KL}_k(t)\) (kl-UCB index) to using \(I^{MOSS}_k(t)\) (MOSS index).

\[f(T, K) := \lfloor (T / K)^{\gamma} \rfloor, \gamma = 1/5.\]

Policies.klUCBswitch.klucbplus_index(reward, pull, horizon, nbArms, klucb=CPUDispatcher(<function klucbBern>), c=1.0, tolerance=0.0001)[source]¶: One kl-UCB+ index, from [Cappé et al. 13](https://arxiv.org/pdf/1210.1136.pdf):

\[\begin{split}\hat{\mu}_k(t) &= \frac{X_k(t)}{N_k(t)}, \\ I^{KL+}_k(t) &= \sup\limits_{q \in [a, b]} \left\{ q : \mathrm{kl}(\hat{\mu}_k(t), q) \leq \frac{c \log(T / (K * N_k(t)))}{N_k(t)} \right\}.\end{split}\]

Policies.klUCBswitch.mossplus_index(reward, pull, horizon, nbArms)[source]¶: One MOSS+ index, from [Audibert & Bubeck, 2010](http://www.jmlr.org/papers/volume11/audibert10a/audibert10a.pdf):

\[I^{MOSS+}_k(t) = \frac{X_k(t)}{N_k(t)} + \sqrt{\max\left(0, \frac{\log\left(\frac{T}{K N_k(t)}\right)}{N_k(t)}\right)}.\]

class Policies.klUCBswitch.klUCBswitch(nbArms, horizon=None, threshold='best', tolerance=0.0001, klucb=CPUDispatcher(<function klucbBern>), c=1.0, lower=0.0, amplitude=1.0)[source]¶

Bases: Policies.klUCB.klUCB

The kl-UCB-switch policy, for bounded distributions.

Reference: [Garivier et al, 2018](https://arxiv.org/abs/1805.05071)

__init__(nbArms, horizon=None, threshold='best', tolerance=0.0001, klucb=CPUDispatcher(<function klucbBern>), c=1.0, lower=0.0, amplitude=1.0)[source]¶

New generic index policy.

nbArms: the number of arms,
lower, amplitude: lower value and known amplitude of the rewards.

horizon = None¶: Parameter \(T\) = known horizon of the experiment.

constant_threshold_switch = None¶: For klUCBswitch (not the anytime variant), we can precompute the threshold as it is constant, \(= f(T, K)\).

use_MOSS_index = None¶: Initialize internal memory: at first, every arm uses the kl-UCB index, then some will switch to MOSS. (Array of K bool).

__str__()[source]¶: -> str

computeIndex(arm)[source]¶

Compute the current index, at time t and after \(N_k(t)\) pulls of arm k:

\[\begin{split}U_k(t) = \begin{cases} U^{KL+}_k(t) & \text{if } N_k(t) \leq f(T, K), \\ U^{MOSS+}_k(t) & \text{if } N_k(t) > f(T, K). \end{cases}.\end{split}\]

It starts by using klucbplus_index(), then it calls threshold_switch() to know when to stop and start using mossplus_index().

__module__ = 'Policies.klUCBswitch'¶

Policies.klUCBswitch.logplus(x)[source]¶: The \(\log_+\) function.

\[\log_+(x) := \max(0, \log(x)).\]

Policies.klUCBswitch.phi(x)[source]¶: The \(\phi(x)\) function defined in equation (6) in their paper.

\[\phi(x) := \log_+(x (1 + (\log_+(x))^2)).\]

Policies.klUCBswitch.klucb_index(reward, pull, t, nbArms, klucb=CPUDispatcher(<function klucbBern>), c=1.0, tolerance=0.0001)[source]¶: One kl-UCB index, from [Garivier & Cappé - COLT, 2011](https://arxiv.org/pdf/1102.2490.pdf):

\[\begin{split}\hat{\mu}_k(t) &= \frac{X_k(t)}{N_k(t)}, \\ I^{KL}_k(t) &= \sup\limits_{q \in [a, b]} \left\{ q : \mathrm{kl}(\hat{\mu}_k(t), q) \leq \frac{c \log(t / N_k(t))}{N_k(t)} \right\}.\end{split}\]

Policies.klUCBswitch.moss_index(reward, pull, t, nbArms)[source]¶: One MOSS index, from [Audibert & Bubeck, 2010](http://www.jmlr.org/papers/volume11/audibert10a/audibert10a.pdf):

\[I^{MOSS}_k(t) = \frac{X_k(t)}{N_k(t)} + \sqrt{\max\left(0, \frac{\log\left(\frac{t}{K N_k(t)}\right)}{N_k(t)}\right)}.\]

class Policies.klUCBswitch.klUCBswitchAnytime(nbArms, threshold='delayed', tolerance=0.0001, klucb=CPUDispatcher(<function klucbBern>), c=1.0, lower=0.0, amplitude=1.0)[source]¶

Bases: Policies.klUCBswitch.klUCBswitch

The anytime variant of the kl-UCB-switch policy, for bounded distributions.

It does not use a doubling trick, but an augmented exploration function (replaces the \(\log_+\) by \(\phi\) in both klucb_index() and moss_index() from klucbplus_index() and mossplus_index()).
Reference: [Garivier et al, 2018](https://arxiv.org/abs/1805.05071)

__init__(nbArms, threshold='delayed', tolerance=0.0001, klucb=CPUDispatcher(<function klucbBern>), c=1.0, lower=0.0, amplitude=1.0)[source]¶

New generic index policy.

nbArms: the number of arms,
lower, amplitude: lower value and known amplitude of the rewards.

__module__ = 'Policies.klUCBswitch'¶

threshold_switch = None¶: A function, like threshold_switch(), of T and K, to decide when to switch from kl-UCB indexes to MOSS indexes (for each arm).

__str__()[source]¶: -> str

computeIndex(arm)[source]¶

Compute the current index, at time t and after \(N_k(t)\) pulls of arm k:

\[\begin{split}U_k(t) = \begin{cases} U^{KL}_k(t) & \text{if } N_k(t) \leq f(t, K), \\ U^{MOSS}_k(t) & \text{if } N_k(t) > f(t, K). \end{cases}.\end{split}\]

It starts by using klucb_index(), then it calls threshold_switch() to know when to stop and start using moss_index().