Policies.Experimentals.UCBoost_cython module¶
The UCBoost policy for bounded bandits (on [0, 1]), using a Cython extension.
- Reference: [Fang Liu et al, 2018](https://arxiv.org/abs/1804.05929). 
- The entire module is optimized with Cython, that should be compiled using Cython (http://docs.cython.org/). 
Warning
The whole goal of their paper is to provide a numerically efficient alternative to kl-UCB, so for my comparison to be fair, I should either use the Python versions of klUCB utility functions (using kullback) or write C or Cython versions of this UCBoost module. My conclusion is that kl-UCB is always faster than UCBoost.
Warning
This extension should be used with the setup.py script, by running:
$ python setup.py build_ext --inplace
You can also use [pyximport](http://docs.cython.org/en/latest/src/tutorial/cython_tutorial.html#pyximport-cython-compilation-for-developers) to import the kullback_cython module transparently:
>>> import pyximport; pyximport.install()  # instantaneous  
(None, <pyximport.pyximport.PyxImporter at 0x...>)
>>> from UCBoost_cython import *     # takes about two seconds
- 
class Policies.Experimentals.UCBoost_cython.UCB_bq¶
- Bases: - IndexPolicy.IndexPolicy- The UCB(d_bq) policy for bounded bandits (on [0, 1]). - It uses - solution_pb_bq()as a closed-form solution to compute the UCB indexes (using the biquadratic distance).
- Reference: [Fang Liu et al, 2018](https://arxiv.org/abs/1804.05929). 
 - 
__init__¶
 - 
__module__= 'Policies.Experimentals.UCBoost_cython'¶
 - 
__str__¶
 - 
computeIndex¶
- Compute the current index, at time t and after \(N_k(t)\) pulls of arm k: \[\begin{split}\hat{\mu}_k(t) &= \frac{X_k(t)}{N_k(t)}, \\ I_k(t) &= P_1(d_{bq})\left(\hat{\mu}_k(t), \frac{\log(t) + c\log(\log(t))}{N_k(t)}\right).\end{split}\]
 
- 
class Policies.Experimentals.UCBoost_cython.UCB_h¶
- Bases: - IndexPolicy.IndexPolicy- The UCB(d_h) policy for bounded bandits (on [0, 1]). - It uses - solution_pb_hellinger()as a closed-form solution to compute the UCB indexes (using the Hellinger distance).
- Reference: [Fang Liu et al, 2018](https://arxiv.org/abs/1804.05929). 
 - 
__init__¶
 - 
__module__= 'Policies.Experimentals.UCBoost_cython'¶
 - 
__str__¶
 - 
computeIndex¶
- Compute the current index, at time t and after \(N_k(t)\) pulls of arm k: \[\begin{split}\hat{\mu}_k(t) &= \frac{X_k(t)}{N_k(t)}, \\ I_k(t) &= P_1(d_h)\left(\hat{\mu}_k(t), \frac{\log(t) + c\log(\log(t))}{N_k(t)}\right).\end{split}\]
 
- 
class Policies.Experimentals.UCBoost_cython.UCB_lb¶
- Bases: - IndexPolicy.IndexPolicy- The UCB(d_lb) policy for bounded bandits (on [0, 1]). - It uses - solution_pb_kllb()as a closed-form solution to compute the UCB indexes (using the lower-bound on the Kullback-Leibler distance).
- Reference: [Fang Liu et al, 2018](https://arxiv.org/abs/1804.05929). 
 - 
__init__¶
 - 
__module__= 'Policies.Experimentals.UCBoost_cython'¶
 - 
__str__¶
 - 
computeIndex¶
- Compute the current index, at time t and after \(N_k(t)\) pulls of arm k: \[\begin{split}\hat{\mu}_k(t) &= \frac{X_k(t)}{N_k(t)}, \\ I_k(t) &= P_1(d_{lb})\left(\hat{\mu}_k(t), \frac{\log(t) + c\log(\log(t))}{N_k(t)}\right).\end{split}\]
 
- 
class Policies.Experimentals.UCBoost_cython.UCB_sq¶
- Bases: - IndexPolicy.IndexPolicy- The UCB(d_sq) policy for bounded bandits (on [0, 1]). - It uses - solution_pb_sq()as a closed-form solution to compute the UCB indexes (using the quadratic distance).
- Reference: [Fang Liu et al, 2018](https://arxiv.org/abs/1804.05929). 
 - 
__init__¶
 - 
__module__= 'Policies.Experimentals.UCBoost_cython'¶
 - 
__str__¶
 - 
computeIndex¶
- Compute the current index, at time t and after \(N_k(t)\) pulls of arm k: \[\begin{split}\hat{\mu}_k(t) &= \frac{X_k(t)}{N_k(t)}, \\ I_k(t) &= P_1(d_{sq})\left(\hat{\mu}_k(t), \frac{\log(t) + c\log(\log(t))}{N_k(t)}\right).\end{split}\]
 
- 
class Policies.Experimentals.UCBoost_cython.UCB_t¶
- Bases: - IndexPolicy.IndexPolicy- The UCB(d_t) policy for bounded bandits (on [0, 1]). - It uses - solution_pb_t()as a closed-form solution to compute the UCB indexes (using a shifted tangent line function of- kullback_leibler_distance_on_mean()).
- Reference: [Fang Liu et al, 2018](https://arxiv.org/abs/1804.05929). 
 - Warning - It has bad performance, as expected (see the paper for their remark). - 
__init__¶
 - 
__module__= 'Policies.Experimentals.UCBoost_cython'¶
 - 
__str__¶
 - 
computeIndex¶
- Compute the current index, at time t and after \(N_k(t)\) pulls of arm k: \[\begin{split}\hat{\mu}_k(t) &= \frac{X_k(t)}{N_k(t)}, \\ I_k(t) &= P_1(d_t)\left(\hat{\mu}_k(t), \frac{\log(t) + c\log(\log(t))}{N_k(t)}\right).\end{split}\]
 
- 
class Policies.Experimentals.UCBoost_cython.UCBoost¶
- Bases: - IndexPolicy.IndexPolicy- The UCBoost policy for bounded bandits (on [0, 1]). - It is quite simple: using a set of kl-dominated and candidate semi-distances D, the UCB index for each arm (at each step) is computed as the smallest upper confidence bound given (for this arm at this time t) for each distance d. 
- set_Dshould be either a set of strings (and NOT functions), or a number (3, 4 or 5). 3 indicate using- d_bq,- d_h,- d_lb, 4 adds- d_t, and 5 adds- d_sq(see the article, Corollary 3, p5, for more details).
- Reference: [Fang Liu et al, 2018](https://arxiv.org/abs/1804.05929). 
 - 
__init__¶
 - 
__module__= 'Policies.Experimentals.UCBoost_cython'¶
 - 
__str__¶
 - 
computeIndex¶
- Compute the current index, at time t and after \(N_k(t)\) pulls of arm k: \[\begin{split}\hat{\mu}_k(t) &= \frac{X_k(t)}{N_k(t)}, \\ I_k(t) &= \min_{d\in D} P_1(d)\left(\hat{\mu}_k(t), \frac{\log(t) + c\log(\log(t))}{N_k(t)}\right).\end{split}\]
 
- 
class Policies.Experimentals.UCBoost_cython.UCBoostEpsilon¶
- Bases: - IndexPolicy.IndexPolicy- The UCBoostEpsilon policy for bounded bandits (on [0, 1]). - It is quite simple: using a set of kl-dominated and candidate semi-distances D, the UCB index for each arm (at each step) is computed as the smallest upper confidence bound given (for this arm at this time t) for each distance d. 
- This variant uses - solutions_pb_from_epsilon()to also compute the \(\varepsilon\) approximation of the- kullback_leibler_distance_on_mean()function (see the article for details, Th.3 p6).
- Reference: [Fang Liu et al, 2018](https://arxiv.org/abs/1804.05929). 
 - 
__init__¶
 - 
__module__= 'Policies.Experimentals.UCBoost_cython'¶
 - 
__str__¶
 - 
computeIndex¶
- Compute the current index, at time t and after \(N_k(t)\) pulls of arm k: \[\begin{split}\hat{\mu}_k(t) &= \frac{X_k(t)}{N_k(t)}, \\ I_k(t) &= \min_{d\in D_{\varepsilon}} P_1(d)\left(\hat{\mu}_k(t), \frac{\log(t) + c\log(\log(t))}{N_k(t)}\right).\end{split}\]
 
- 
class Policies.Experimentals.UCBoost_cython.UCBoost_bq_h_lb¶
- Bases: - Policies.Experimentals.UCBoost_cython.UCBoost- The UCBoost policy for bounded bandits (on [0, 1]). - It is quite simple: using a set of kl-dominated and candidate semi-distances D, the UCB index for each arm (at each step) is computed as the smallest upper confidence bound given (for this arm at this time t) for each distance d. 
- set_Dis- d_bq,- d_h,- d_lb(see the article, Corollary 3, p5, for more details).
- Reference: [Fang Liu et al, 2018](https://arxiv.org/abs/1804.05929). 
 - 
__init__¶
 - 
__module__= 'Policies.Experimentals.UCBoost_cython'¶
 - 
__str__¶
 - 
computeIndex¶
- Compute the current index, at time t and after \(N_k(t)\) pulls of arm k: \[\begin{split}\hat{\mu}_k(t) &= \frac{X_k(t)}{N_k(t)}, \\ I_k(t) &= \min_{d\in D} P_1(d)\left(\hat{\mu}_k(t), \frac{\log(t) + c\log(\log(t))}{N_k(t)}\right).\end{split}\]
 
- 
class Policies.Experimentals.UCBoost_cython.UCBoost_bq_h_lb_t¶
- Bases: - Policies.Experimentals.UCBoost_cython.UCBoost- The UCBoost policy for bounded bandits (on [0, 1]). - It is quite simple: using a set of kl-dominated and candidate semi-distances D, the UCB index for each arm (at each step) is computed as the smallest upper confidence bound given (for this arm at this time t) for each distance d. 
- set_Dis- d_bq,- d_h,- d_lb,- d_t(see the article, Corollary 3, p5, for more details).
- Reference: [Fang Liu et al, 2018](https://arxiv.org/abs/1804.05929). 
 - 
__init__¶
 - 
__module__= 'Policies.Experimentals.UCBoost_cython'¶
 - 
__str__¶
 - 
computeIndex¶
- Compute the current index, at time t and after \(N_k(t)\) pulls of arm k: \[\begin{split}\hat{\mu}_k(t) &= \frac{X_k(t)}{N_k(t)}, \\ I_k(t) &= \min_{d\in D} P_1(d)\left(\hat{\mu}_k(t), \frac{\log(t) + c\log(\log(t))}{N_k(t)}\right).\end{split}\]
 
- 
class Policies.Experimentals.UCBoost_cython.UCBoost_bq_h_lb_t_sq¶
- Bases: - Policies.Experimentals.UCBoost_cython.UCBoost- The UCBoost policy for bounded bandits (on [0, 1]). - It is quite simple: using a set of kl-dominated and candidate semi-distances D, the UCB index for each arm (at each step) is computed as the smallest upper confidence bound given (for this arm at this time t) for each distance d. 
- set_Dis- d_bq,- d_h,- d_lb,- d_t,- d_sq(see the article, Corollary 3, p5, for more details).
- Reference: [Fang Liu et al, 2018](https://arxiv.org/abs/1804.05929). 
 - 
__init__¶
 - 
__module__= 'Policies.Experimentals.UCBoost_cython'¶
 - 
__str__¶
 - 
computeIndex¶
- Compute the current index, at time t and after \(N_k(t)\) pulls of arm k: \[\begin{split}\hat{\mu}_k(t) &= \frac{X_k(t)}{N_k(t)}, \\ I_k(t) &= \min_{d\in D} P_1(d)\left(\hat{\mu}_k(t), \frac{\log(t) + c\log(\log(t))}{N_k(t)}\right).\end{split}\]
 
- 
Policies.Experimentals.UCBoost_cython.biquadratic_distance()¶
- The biquadratic distance, \(d_{bq}(p, q) := 2 (p - q)^2 + (4/9) * (p - q)^4\). 
- 
Policies.Experimentals.UCBoost_cython.distance_t()¶
- A shifted tangent line function of - kullback_leibler_distance_on_mean().\[d_t(p, q) = \frac{2 q}{p + 1} + p \log\left(\frac{p}{p + 1}\right) + \log\left(\frac{2}{\mathrm{e}(p + 1)}\right).\]- Warning - I think there might be a typo in the formula in the article, as this \(d_t\) does not seem to “depend enough on q” (just intuition). 
- 
Policies.Experimentals.UCBoost_cython.hellinger_distance()¶
- The Hellinger distance, \(d_{h}(p, q) := (\sqrt{p} - \sqrt{q})^2 + (\sqrt{1 - p} - \sqrt{1 - q})^2\). 
- 
Policies.Experimentals.UCBoost_cython.is_a_true_number()¶
- Check if n is a number or not ( - int,- float,- complexetc, any instance of- numbers.Numberclass.
- 
Policies.Experimentals.UCBoost_cython.kullback_leibler_distance_lowerbound()¶
- Lower-bound on the Kullback-Leibler divergence for Bernoulli distributions. https://en.wikipedia.org/wiki/Bernoulli_distribution#Kullback.E2.80.93Leibler_divergence \[d_{lb}(p, q) = p \log\left( p \right) + (1-p) \log\left(\frac{1-p}{1-q}\right).\]
- 
Policies.Experimentals.UCBoost_cython.kullback_leibler_distance_on_mean()¶
- Kullback-Leibler divergence for Bernoulli distributions. https://en.wikipedia.org/wiki/Bernoulli_distribution#Kullback.E2.80.93Leibler_divergence \[\mathrm{kl}(p, q) = \mathrm{KL}(\mathcal{B}(p), \mathcal{B}(q)) = p \log\left(\frac{p}{q}\right) + (1-p) \log\left(\frac{1-p}{1-q}\right).\]
- 
Policies.Experimentals.UCBoost_cython.min_solutions_pb_from_epsilon()¶
- Minimum of the closed-form solutions of the following optimisation problems, for \(d = d_s^k\) approximation of \(d_{kl}\) and any \(\tau_1(p) \leq k \leq \tau_2(p)\): \[\begin{split}P_1(d_s^k)(p, \delta): & \max_{q \in \Theta} q,\\ \text{such that } & d_s^k(p, q) \leq \delta.\end{split}\]- The solution is: 
 \[\begin{split}q^* &= q_k^{\boldsymbol{1}(\delta < d_{kl}(p, q_k))},\\ d_s^k &: (p, q) \mapsto d_{kl}(p, q_k) \boldsymbol{1}(q > q_k),\\ q_k &:= 1 - \left( 1 - \frac{\varepsilon}{1 + \varepsilon} \right)^k.\end{split}\]- \(\delta\) is the - upperboundparameter on the semi-distance between input \(p\) and solution \(q^*\).
 
- 
Policies.Experimentals.UCBoost_cython.solution_pb_bq()¶
- Closed-form solution of the following optimisation problem, for \(d = d_{bq}\) the - biquadratic_distance()function:\[\begin{split}P_1(d_{bq})(p, \delta): & \max_{q \in \Theta} q,\\ \text{such that } & d_{bq}(p, q) \leq \delta.\end{split}\]- The solution is: 
 \[q^* = \min(1, p + \sqrt{-\frac{9}{4} + \sqrt{\frac{81}{16} + \frac{9}{4} \delta}}).\]- \(\delta\) is the - upperboundparameter on the semi-distance between input \(p\) and solution \(q^*\).
 
- 
Policies.Experimentals.UCBoost_cython.solution_pb_hellinger()¶
- Closed-form solution of the following optimisation problem, for \(d = d_{h}\) the - hellinger_distance()function:\[\begin{split}P_1(d_h)(p, \delta): & \max_{q \in \Theta} q,\\ \text{such that } & d_h(p, q) \leq \delta.\end{split}\]- The solution is: 
 \[q^* = \left( (1 - \frac{\delta}{2}) \sqrt{p} + \sqrt{(1 - p) (\delta - \frac{\delta^2}{4})} \right)^{2 \times \boldsymbol{1}(\delta < 2 - 2 \sqrt{p})}.\]- \(\delta\) is the - upperboundparameter on the semi-distance between input \(p\) and solution \(q^*\).
 
- 
Policies.Experimentals.UCBoost_cython.solution_pb_kllb()¶
- Closed-form solution of the following optimisation problem, for \(d = d_{lb}\) the proposed lower-bound on the Kullback-Leibler binary distance ( - kullback_leibler_distance_lowerbound()) function:\[\begin{split}P_1(d_{lb})(p, \delta): & \max_{q \in \Theta} q,\\ \text{such that } & d_{lb}(p, q) \leq \delta.\end{split}\]- The solution is: 
 \[q^* = 1 - (1 - p) \exp\left(\frac{p \log(p) - \delta}{1 - p}\right).\]- \(\delta\) is the - upperboundparameter on the semi-distance between input \(p\) and solution \(q^*\).
 
- 
Policies.Experimentals.UCBoost_cython.solution_pb_sq()¶
- Closed-form solution of the following optimisation problem, for \(d = d_{sq}\) the - biquadratic_distance()function:\[\begin{split}P_1(d_{sq})(p, \delta): & \max_{q \in \Theta} q,\\ \text{such that } & d_{sq}(p, q) \leq \delta.\end{split}\]- The solution is: 
 \[q^* = p + \sqrt{\frac{\delta}{2}}.\]- \(\delta\) is the - upperboundparameter on the semi-distance between input \(p\) and solution \(q^*\).
 
- 
Policies.Experimentals.UCBoost_cython.solution_pb_t()¶
- Closed-form solution of the following optimisation problem, for \(d = d_t\) a shifted tangent line function of - kullback_leibler_distance_on_mean()(- distance_t()) function:\[\begin{split}P_1(d_t)(p, \delta): & \max_{q \in \Theta} q,\\ \text{such that } & d_t(p, q) \leq \delta.\end{split}\]- The solution is: 
 \[q^* = \min\left(1, \frac{p + 1}{2} \left( \delta - p \log\left(\frac{p}{p + 1}\right) - \log\left(\frac{2}{\mathrm{e} (p + 1)}\right) \right)\right).\]- \(\delta\) is the - upperboundparameter on the semi-distance between input \(p\) and solution \(q^*\).
 
- 
Policies.Experimentals.UCBoost_cython.squadratic_distance()¶
- The quadratic distance, \(d_{sq}(p, q) := 2 (p - q)^2\).