Policies.Experimentals.UCBoost_cython module

The UCBoost policy for bounded bandits (on [0, 1]), using a Cython extension.

Warning

The whole goal of their paper is to provide a numerically efficient alternative to kl-UCB, so for my comparison to be fair, I should either use the Python versions of klUCB utility functions (using kullback) or write C or Cython versions of this UCBoost module. My conclusion is that kl-UCB is always faster than UCBoost.

Warning

This extension should be used with the setup.py script, by running:

$ python setup.py build_ext --inplace

You can also use [pyximport](http://docs.cython.org/en/latest/src/tutorial/cython_tutorial.html#pyximport-cython-compilation-for-developers) to import the kullback_cython module transparently:

>>> import pyximport; pyximport.install()  # instantaneous  
(None, <pyximport.pyximport.PyxImporter at 0x...>)
>>> from UCBoost_cython import *     # takes about two seconds
class Policies.Experimentals.UCBoost_cython.UCB_bq

Bases: IndexPolicy.IndexPolicy

The UCB(d_bq) policy for bounded bandits (on [0, 1]).

__init__
__module__ = 'Policies.Experimentals.UCBoost_cython'
__str__
computeIndex

Compute the current index, at time t and after \(N_k(t)\) pulls of arm k:

\[\begin{split}\hat{\mu}_k(t) &= \frac{X_k(t)}{N_k(t)}, \\ I_k(t) &= P_1(d_{bq})\left(\hat{\mu}_k(t), \frac{\log(t) + c\log(\log(t))}{N_k(t)}\right).\end{split}\]
class Policies.Experimentals.UCBoost_cython.UCB_h

Bases: IndexPolicy.IndexPolicy

The UCB(d_h) policy for bounded bandits (on [0, 1]).

__init__
__module__ = 'Policies.Experimentals.UCBoost_cython'
__str__
computeIndex

Compute the current index, at time t and after \(N_k(t)\) pulls of arm k:

\[\begin{split}\hat{\mu}_k(t) &= \frac{X_k(t)}{N_k(t)}, \\ I_k(t) &= P_1(d_h)\left(\hat{\mu}_k(t), \frac{\log(t) + c\log(\log(t))}{N_k(t)}\right).\end{split}\]
class Policies.Experimentals.UCBoost_cython.UCB_lb

Bases: IndexPolicy.IndexPolicy

The UCB(d_lb) policy for bounded bandits (on [0, 1]).

__init__
__module__ = 'Policies.Experimentals.UCBoost_cython'
__str__
computeIndex

Compute the current index, at time t and after \(N_k(t)\) pulls of arm k:

\[\begin{split}\hat{\mu}_k(t) &= \frac{X_k(t)}{N_k(t)}, \\ I_k(t) &= P_1(d_{lb})\left(\hat{\mu}_k(t), \frac{\log(t) + c\log(\log(t))}{N_k(t)}\right).\end{split}\]
class Policies.Experimentals.UCBoost_cython.UCB_sq

Bases: IndexPolicy.IndexPolicy

The UCB(d_sq) policy for bounded bandits (on [0, 1]).

__init__
__module__ = 'Policies.Experimentals.UCBoost_cython'
__str__
computeIndex

Compute the current index, at time t and after \(N_k(t)\) pulls of arm k:

\[\begin{split}\hat{\mu}_k(t) &= \frac{X_k(t)}{N_k(t)}, \\ I_k(t) &= P_1(d_{sq})\left(\hat{\mu}_k(t), \frac{\log(t) + c\log(\log(t))}{N_k(t)}\right).\end{split}\]
class Policies.Experimentals.UCBoost_cython.UCB_t

Bases: IndexPolicy.IndexPolicy

The UCB(d_t) policy for bounded bandits (on [0, 1]).

Warning

It has bad performance, as expected (see the paper for their remark).

__init__
__module__ = 'Policies.Experimentals.UCBoost_cython'
__str__
computeIndex

Compute the current index, at time t and after \(N_k(t)\) pulls of arm k:

\[\begin{split}\hat{\mu}_k(t) &= \frac{X_k(t)}{N_k(t)}, \\ I_k(t) &= P_1(d_t)\left(\hat{\mu}_k(t), \frac{\log(t) + c\log(\log(t))}{N_k(t)}\right).\end{split}\]
class Policies.Experimentals.UCBoost_cython.UCBoost

Bases: IndexPolicy.IndexPolicy

The UCBoost policy for bounded bandits (on [0, 1]).

  • It is quite simple: using a set of kl-dominated and candidate semi-distances D, the UCB index for each arm (at each step) is computed as the smallest upper confidence bound given (for this arm at this time t) for each distance d.

  • set_D should be either a set of strings (and NOT functions), or a number (3, 4 or 5). 3 indicate using d_bq, d_h, d_lb, 4 adds d_t, and 5 adds d_sq (see the article, Corollary 3, p5, for more details).

  • Reference: [Fang Liu et al, 2018](https://arxiv.org/abs/1804.05929).

__init__
__module__ = 'Policies.Experimentals.UCBoost_cython'
__str__
computeIndex

Compute the current index, at time t and after \(N_k(t)\) pulls of arm k:

\[\begin{split}\hat{\mu}_k(t) &= \frac{X_k(t)}{N_k(t)}, \\ I_k(t) &= \min_{d\in D} P_1(d)\left(\hat{\mu}_k(t), \frac{\log(t) + c\log(\log(t))}{N_k(t)}\right).\end{split}\]
class Policies.Experimentals.UCBoost_cython.UCBoostEpsilon

Bases: IndexPolicy.IndexPolicy

The UCBoostEpsilon policy for bounded bandits (on [0, 1]).

  • It is quite simple: using a set of kl-dominated and candidate semi-distances D, the UCB index for each arm (at each step) is computed as the smallest upper confidence bound given (for this arm at this time t) for each distance d.

  • This variant uses solutions_pb_from_epsilon() to also compute the \(\varepsilon\) approximation of the kullback_leibler_distance_on_mean() function (see the article for details, Th.3 p6).

  • Reference: [Fang Liu et al, 2018](https://arxiv.org/abs/1804.05929).

__init__
__module__ = 'Policies.Experimentals.UCBoost_cython'
__str__
computeIndex

Compute the current index, at time t and after \(N_k(t)\) pulls of arm k:

\[\begin{split}\hat{\mu}_k(t) &= \frac{X_k(t)}{N_k(t)}, \\ I_k(t) &= \min_{d\in D_{\varepsilon}} P_1(d)\left(\hat{\mu}_k(t), \frac{\log(t) + c\log(\log(t))}{N_k(t)}\right).\end{split}\]
class Policies.Experimentals.UCBoost_cython.UCBoost_bq_h_lb

Bases: Policies.Experimentals.UCBoost_cython.UCBoost

The UCBoost policy for bounded bandits (on [0, 1]).

  • It is quite simple: using a set of kl-dominated and candidate semi-distances D, the UCB index for each arm (at each step) is computed as the smallest upper confidence bound given (for this arm at this time t) for each distance d.

  • set_D is d_bq, d_h, d_lb (see the article, Corollary 3, p5, for more details).

  • Reference: [Fang Liu et al, 2018](https://arxiv.org/abs/1804.05929).

__init__
__module__ = 'Policies.Experimentals.UCBoost_cython'
__str__
computeIndex

Compute the current index, at time t and after \(N_k(t)\) pulls of arm k:

\[\begin{split}\hat{\mu}_k(t) &= \frac{X_k(t)}{N_k(t)}, \\ I_k(t) &= \min_{d\in D} P_1(d)\left(\hat{\mu}_k(t), \frac{\log(t) + c\log(\log(t))}{N_k(t)}\right).\end{split}\]
class Policies.Experimentals.UCBoost_cython.UCBoost_bq_h_lb_t

Bases: Policies.Experimentals.UCBoost_cython.UCBoost

The UCBoost policy for bounded bandits (on [0, 1]).

  • It is quite simple: using a set of kl-dominated and candidate semi-distances D, the UCB index for each arm (at each step) is computed as the smallest upper confidence bound given (for this arm at this time t) for each distance d.

  • set_D is d_bq, d_h, d_lb, d_t (see the article, Corollary 3, p5, for more details).

  • Reference: [Fang Liu et al, 2018](https://arxiv.org/abs/1804.05929).

__init__
__module__ = 'Policies.Experimentals.UCBoost_cython'
__str__
computeIndex

Compute the current index, at time t and after \(N_k(t)\) pulls of arm k:

\[\begin{split}\hat{\mu}_k(t) &= \frac{X_k(t)}{N_k(t)}, \\ I_k(t) &= \min_{d\in D} P_1(d)\left(\hat{\mu}_k(t), \frac{\log(t) + c\log(\log(t))}{N_k(t)}\right).\end{split}\]
class Policies.Experimentals.UCBoost_cython.UCBoost_bq_h_lb_t_sq

Bases: Policies.Experimentals.UCBoost_cython.UCBoost

The UCBoost policy for bounded bandits (on [0, 1]).

  • It is quite simple: using a set of kl-dominated and candidate semi-distances D, the UCB index for each arm (at each step) is computed as the smallest upper confidence bound given (for this arm at this time t) for each distance d.

  • set_D is d_bq, d_h, d_lb, d_t, d_sq (see the article, Corollary 3, p5, for more details).

  • Reference: [Fang Liu et al, 2018](https://arxiv.org/abs/1804.05929).

__init__
__module__ = 'Policies.Experimentals.UCBoost_cython'
__str__
computeIndex

Compute the current index, at time t and after \(N_k(t)\) pulls of arm k:

\[\begin{split}\hat{\mu}_k(t) &= \frac{X_k(t)}{N_k(t)}, \\ I_k(t) &= \min_{d\in D} P_1(d)\left(\hat{\mu}_k(t), \frac{\log(t) + c\log(\log(t))}{N_k(t)}\right).\end{split}\]
Policies.Experimentals.UCBoost_cython.biquadratic_distance()

The biquadratic distance, \(d_{bq}(p, q) := 2 (p - q)^2 + (4/9) * (p - q)^4\).

Policies.Experimentals.UCBoost_cython.distance_t()

A shifted tangent line function of kullback_leibler_distance_on_mean().

\[d_t(p, q) = \frac{2 q}{p + 1} + p \log\left(\frac{p}{p + 1}\right) + \log\left(\frac{2}{\mathrm{e}(p + 1)}\right).\]

Warning

I think there might be a typo in the formula in the article, as this \(d_t\) does not seem to “depend enough on q” (just intuition).

Policies.Experimentals.UCBoost_cython.hellinger_distance()

The Hellinger distance, \(d_{h}(p, q) := (\sqrt{p} - \sqrt{q})^2 + (\sqrt{1 - p} - \sqrt{1 - q})^2\).

Policies.Experimentals.UCBoost_cython.is_a_true_number()

Check if n is a number or not (int, float, complex etc, any instance of numbers.Number class.

Policies.Experimentals.UCBoost_cython.kullback_leibler_distance_lowerbound()

Lower-bound on the Kullback-Leibler divergence for Bernoulli distributions. https://en.wikipedia.org/wiki/Bernoulli_distribution#Kullback.E2.80.93Leibler_divergence

\[d_{lb}(p, q) = p \log\left( p \right) + (1-p) \log\left(\frac{1-p}{1-q}\right).\]
Policies.Experimentals.UCBoost_cython.kullback_leibler_distance_on_mean()

Kullback-Leibler divergence for Bernoulli distributions. https://en.wikipedia.org/wiki/Bernoulli_distribution#Kullback.E2.80.93Leibler_divergence

\[\mathrm{kl}(p, q) = \mathrm{KL}(\mathcal{B}(p), \mathcal{B}(q)) = p \log\left(\frac{p}{q}\right) + (1-p) \log\left(\frac{1-p}{1-q}\right).\]
Policies.Experimentals.UCBoost_cython.min_solutions_pb_from_epsilon()

Minimum of the closed-form solutions of the following optimisation problems, for \(d = d_s^k\) approximation of \(d_{kl}\) and any \(\tau_1(p) \leq k \leq \tau_2(p)\):

\[\begin{split}P_1(d_s^k)(p, \delta): & \max_{q \in \Theta} q,\\ \text{such that } & d_s^k(p, q) \leq \delta.\end{split}\]
  • The solution is:

\[\begin{split}q^* &= q_k^{\boldsymbol{1}(\delta < d_{kl}(p, q_k))},\\ d_s^k &: (p, q) \mapsto d_{kl}(p, q_k) \boldsymbol{1}(q > q_k),\\ q_k &:= 1 - \left( 1 - \frac{\varepsilon}{1 + \varepsilon} \right)^k.\end{split}\]
  • \(\delta\) is the upperbound parameter on the semi-distance between input \(p\) and solution \(q^*\).

Policies.Experimentals.UCBoost_cython.solution_pb_bq()

Closed-form solution of the following optimisation problem, for \(d = d_{bq}\) the biquadratic_distance() function:

\[\begin{split}P_1(d_{bq})(p, \delta): & \max_{q \in \Theta} q,\\ \text{such that } & d_{bq}(p, q) \leq \delta.\end{split}\]
  • The solution is:

\[q^* = \min(1, p + \sqrt{-\frac{9}{4} + \sqrt{\frac{81}{16} + \frac{9}{4} \delta}}).\]
  • \(\delta\) is the upperbound parameter on the semi-distance between input \(p\) and solution \(q^*\).

Policies.Experimentals.UCBoost_cython.solution_pb_hellinger()

Closed-form solution of the following optimisation problem, for \(d = d_{h}\) the hellinger_distance() function:

\[\begin{split}P_1(d_h)(p, \delta): & \max_{q \in \Theta} q,\\ \text{such that } & d_h(p, q) \leq \delta.\end{split}\]
  • The solution is:

\[q^* = \left( (1 - \frac{\delta}{2}) \sqrt{p} + \sqrt{(1 - p) (\delta - \frac{\delta^2}{4})} \right)^{2 \times \boldsymbol{1}(\delta < 2 - 2 \sqrt{p})}.\]
  • \(\delta\) is the upperbound parameter on the semi-distance between input \(p\) and solution \(q^*\).

Policies.Experimentals.UCBoost_cython.solution_pb_kllb()

Closed-form solution of the following optimisation problem, for \(d = d_{lb}\) the proposed lower-bound on the Kullback-Leibler binary distance (kullback_leibler_distance_lowerbound()) function:

\[\begin{split}P_1(d_{lb})(p, \delta): & \max_{q \in \Theta} q,\\ \text{such that } & d_{lb}(p, q) \leq \delta.\end{split}\]
  • The solution is:

\[q^* = 1 - (1 - p) \exp\left(\frac{p \log(p) - \delta}{1 - p}\right).\]
  • \(\delta\) is the upperbound parameter on the semi-distance between input \(p\) and solution \(q^*\).

Policies.Experimentals.UCBoost_cython.solution_pb_sq()

Closed-form solution of the following optimisation problem, for \(d = d_{sq}\) the biquadratic_distance() function:

\[\begin{split}P_1(d_{sq})(p, \delta): & \max_{q \in \Theta} q,\\ \text{such that } & d_{sq}(p, q) \leq \delta.\end{split}\]
  • The solution is:

\[q^* = p + \sqrt{\frac{\delta}{2}}.\]
  • \(\delta\) is the upperbound parameter on the semi-distance between input \(p\) and solution \(q^*\).

Policies.Experimentals.UCBoost_cython.solution_pb_t()

Closed-form solution of the following optimisation problem, for \(d = d_t\) a shifted tangent line function of kullback_leibler_distance_on_mean() (distance_t()) function:

\[\begin{split}P_1(d_t)(p, \delta): & \max_{q \in \Theta} q,\\ \text{such that } & d_t(p, q) \leq \delta.\end{split}\]
  • The solution is:

\[q^* = \min\left(1, \frac{p + 1}{2} \left( \delta - p \log\left(\frac{p}{p + 1}\right) - \log\left(\frac{2}{\mathrm{e} (p + 1)}\right) \right)\right).\]
  • \(\delta\) is the upperbound parameter on the semi-distance between input \(p\) and solution \(q^*\).

Policies.Experimentals.UCBoost_cython.squadratic_distance()

The quadratic distance, \(d_{sq}(p, q) := 2 (p - q)^2\).