Policies.Experimentals.UCBoost_faster module¶

The UCBoost policy for bounded bandits (on [0, 1]), using a small Cython extension for the intermediate functions that require heavy computations.

Reference: [Fang Liu et al, 2018](https://arxiv.org/abs/1804.05929).

Warning

The whole goal of their paper is to provide a numerically efficient alternative to kl-UCB, so for my comparison to be fair, I should either use the Python versions of klUCB utility functions (using kullback) or write C or Cython versions of this UCBoost module. My conclusion is that kl-UCB is always faster than UCBoost.

Policies.Experimentals.UCBoost_faster.c = 0.0¶: Default value for better practical performance.

class Policies.Experimentals.UCBoost_faster.UCB_sq(nbArms, c=0.0, lower=0.0, amplitude=1.0)[source]¶

Bases: IndexPolicy.IndexPolicy

The UCB(d_sq) policy for bounded bandits (on [0, 1]).

It uses solution_pb_sq() as a closed-form solution to compute the UCB indexes (using the quadratic distance).
Reference: [Fang Liu et al, 2018](https://arxiv.org/abs/1804.05929).

__init__(nbArms, c=0.0, lower=0.0, amplitude=1.0)[source]¶

New generic index policy.

nbArms: the number of arms,
lower, amplitude: lower value and known amplitude of the rewards.

c = None¶: Parameter c

__str__()[source]¶: -> str

computeIndex(arm)[source]¶: Compute the current index, at time t and after \(N_k(t)\) pulls of arm k:

\[\begin{split}\hat{\mu}_k(t) &= \frac{X_k(t)}{N_k(t)}, \\ I_k(t) &= P_1(d_{sq})\left(\hat{\mu}_k(t), \frac{\log(t) + c\log(\log(t))}{N_k(t)}\right).\end{split}\]

__module__ = 'Policies.Experimentals.UCBoost_faster'¶

class Policies.Experimentals.UCBoost_faster.UCB_bq(nbArms, c=0.0, lower=0.0, amplitude=1.0)[source]¶

Bases: IndexPolicy.IndexPolicy

The UCB(d_bq) policy for bounded bandits (on [0, 1]).

It uses solution_pb_bq() as a closed-form solution to compute the UCB indexes (using the biquadratic distance).
Reference: [Fang Liu et al, 2018](https://arxiv.org/abs/1804.05929).

__init__(nbArms, c=0.0, lower=0.0, amplitude=1.0)[source]¶

New generic index policy.

nbArms: the number of arms,
lower, amplitude: lower value and known amplitude of the rewards.

c = None¶: Parameter c

__str__()[source]¶: -> str

computeIndex(arm)[source]¶: Compute the current index, at time t and after \(N_k(t)\) pulls of arm k:

\[\begin{split}\hat{\mu}_k(t) &= \frac{X_k(t)}{N_k(t)}, \\ I_k(t) &= P_1(d_{bq})\left(\hat{\mu}_k(t), \frac{\log(t) + c\log(\log(t))}{N_k(t)}\right).\end{split}\]

__module__ = 'Policies.Experimentals.UCBoost_faster'¶

class Policies.Experimentals.UCBoost_faster.UCB_h(nbArms, c=0.0, lower=0.0, amplitude=1.0)[source]¶

Bases: IndexPolicy.IndexPolicy

The UCB(d_h) policy for bounded bandits (on [0, 1]).

It uses solution_pb_hellinger() as a closed-form solution to compute the UCB indexes (using the Hellinger distance).
Reference: [Fang Liu et al, 2018](https://arxiv.org/abs/1804.05929).

__init__(nbArms, c=0.0, lower=0.0, amplitude=1.0)[source]¶

New generic index policy.

nbArms: the number of arms,
lower, amplitude: lower value and known amplitude of the rewards.

c = None¶: Parameter c

__str__()[source]¶: -> str

computeIndex(arm)[source]¶: Compute the current index, at time t and after \(N_k(t)\) pulls of arm k:

\[\begin{split}\hat{\mu}_k(t) &= \frac{X_k(t)}{N_k(t)}, \\ I_k(t) &= P_1(d_h)\left(\hat{\mu}_k(t), \frac{\log(t) + c\log(\log(t))}{N_k(t)}\right).\end{split}\]

__module__ = 'Policies.Experimentals.UCBoost_faster'¶

class Policies.Experimentals.UCBoost_faster.UCB_lb(nbArms, c=0.0, lower=0.0, amplitude=1.0)[source]¶

Bases: IndexPolicy.IndexPolicy

The UCB(d_lb) policy for bounded bandits (on [0, 1]).

It uses solution_pb_kllb() as a closed-form solution to compute the UCB indexes (using the lower-bound on the Kullback-Leibler distance).
Reference: [Fang Liu et al, 2018](https://arxiv.org/abs/1804.05929).

__init__(nbArms, c=0.0, lower=0.0, amplitude=1.0)[source]¶

New generic index policy.

nbArms: the number of arms,
lower, amplitude: lower value and known amplitude of the rewards.

c = None¶: Parameter c

__str__()[source]¶: -> str

computeIndex(arm)[source]¶: Compute the current index, at time t and after \(N_k(t)\) pulls of arm k:

\[\begin{split}\hat{\mu}_k(t) &= \frac{X_k(t)}{N_k(t)}, \\ I_k(t) &= P_1(d_{lb})\left(\hat{\mu}_k(t), \frac{\log(t) + c\log(\log(t))}{N_k(t)}\right).\end{split}\]

__module__ = 'Policies.Experimentals.UCBoost_faster'¶

class Policies.Experimentals.UCBoost_faster.UCB_t(nbArms, c=0.0, lower=0.0, amplitude=1.0)[source]¶

Bases: IndexPolicy.IndexPolicy

The UCB(d_t) policy for bounded bandits (on [0, 1]).

It uses solution_pb_t() as a closed-form solution to compute the UCB indexes (using a shifted tangent line function of kullback_leibler_distance_on_mean()).
Reference: [Fang Liu et al, 2018](https://arxiv.org/abs/1804.05929).

Warning

It has bad performance, as expected (see the paper for their remark).

__init__(nbArms, c=0.0, lower=0.0, amplitude=1.0)[source]¶

New generic index policy.

nbArms: the number of arms,
lower, amplitude: lower value and known amplitude of the rewards.

c = None¶: Parameter c

__str__()[source]¶: -> str

computeIndex(arm)[source]¶: Compute the current index, at time t and after \(N_k(t)\) pulls of arm k:

\[\begin{split}\hat{\mu}_k(t) &= \frac{X_k(t)}{N_k(t)}, \\ I_k(t) &= P_1(d_t)\left(\hat{\mu}_k(t), \frac{\log(t) + c\log(\log(t))}{N_k(t)}\right).\end{split}\]

__module__ = 'Policies.Experimentals.UCBoost_faster'¶

Policies.Experimentals.UCBoost_faster.is_a_true_number(n)[source]¶: Check if n is a number or not (int, float, complex etc, any instance of numbers.Number class.

class Policies.Experimentals.UCBoost_faster.UCBoost(nbArms, set_D=None, c=0.0, lower=0.0, amplitude=1.0)[source]¶

Bases: IndexPolicy.IndexPolicy

The UCBoost policy for bounded bandits (on [0, 1]).

It is quite simple: using a set of kl-dominated and candidate semi-distances D, the UCB index for each arm (at each step) is computed as the smallest upper confidence bound given (for this arm at this time t) for each distance d.
set_D should be either a set of strings (and NOT functions), or a number (3, 4 or 5). 3 indicate using d_bq, d_h, d_lb, 4 adds d_t, and 5 adds d_sq (see the article, Corollary 3, p5, for more details).
Reference: [Fang Liu et al, 2018](https://arxiv.org/abs/1804.05929).

__init__(nbArms, set_D=None, c=0.0, lower=0.0, amplitude=1.0)[source]¶

New generic index policy.

nbArms: the number of arms,
lower, amplitude: lower value and known amplitude of the rewards.

set_D = None¶: Set of strings that indicate which d functions are in the set of functions D. Warning: do not use real functions here, or the object won’t be hashable!

c = None¶: Parameter c

__str__()[source]¶: -> str

computeIndex(arm)[source]¶: Compute the current index, at time t and after \(N_k(t)\) pulls of arm k:

\[\begin{split}\hat{\mu}_k(t) &= \frac{X_k(t)}{N_k(t)}, \\ I_k(t) &= \min_{d\in D} P_1(d)\left(\hat{\mu}_k(t), \frac{\log(t) + c\log(\log(t))}{N_k(t)}\right).\end{split}\]

__module__ = 'Policies.Experimentals.UCBoost_faster'¶

class Policies.Experimentals.UCBoost_faster.UCBoost_bq_h_lb(nbArms, c=0.0, lower=0.0, amplitude=1.0)[source]¶

Bases: Policies.Experimentals.UCBoost_faster.UCBoost

The UCBoost policy for bounded bandits (on [0, 1]).

It is quite simple: using a set of kl-dominated and candidate semi-distances D, the UCB index for each arm (at each step) is computed as the smallest upper confidence bound given (for this arm at this time t) for each distance d.
set_D is d_bq, d_h, d_lb (see the article, Corollary 3, p5, for more details).
Reference: [Fang Liu et al, 2018](https://arxiv.org/abs/1804.05929).

__init__(nbArms, c=0.0, lower=0.0, amplitude=1.0)[source]¶

New generic index policy.

nbArms: the number of arms,
lower, amplitude: lower value and known amplitude of the rewards.

__str__()[source]¶: -> str

computeIndex(arm)[source]¶: Compute the current index, at time t and after \(N_k(t)\) pulls of arm k:

\[\begin{split}\hat{\mu}_k(t) &= \frac{X_k(t)}{N_k(t)}, \\ I_k(t) &= \min_{d\in D} P_1(d)\left(\hat{\mu}_k(t), \frac{\log(t) + c\log(\log(t))}{N_k(t)}\right).\end{split}\]

__module__ = 'Policies.Experimentals.UCBoost_faster'¶

class Policies.Experimentals.UCBoost_faster.UCBoost_bq_h_lb_t(nbArms, c=0.0, lower=0.0, amplitude=1.0)[source]¶

Bases: Policies.Experimentals.UCBoost_faster.UCBoost

The UCBoost policy for bounded bandits (on [0, 1]).

It is quite simple: using a set of kl-dominated and candidate semi-distances D, the UCB index for each arm (at each step) is computed as the smallest upper confidence bound given (for this arm at this time t) for each distance d.
set_D is d_bq, d_h, d_lb, d_t (see the article, Corollary 3, p5, for more details).
Reference: [Fang Liu et al, 2018](https://arxiv.org/abs/1804.05929).

__init__(nbArms, c=0.0, lower=0.0, amplitude=1.0)[source]¶

New generic index policy.

nbArms: the number of arms,
lower, amplitude: lower value and known amplitude of the rewards.

__str__()[source]¶: -> str

computeIndex(arm)[source]¶: Compute the current index, at time t and after \(N_k(t)\) pulls of arm k:

\[\begin{split}\hat{\mu}_k(t) &= \frac{X_k(t)}{N_k(t)}, \\ I_k(t) &= \min_{d\in D} P_1(d)\left(\hat{\mu}_k(t), \frac{\log(t) + c\log(\log(t))}{N_k(t)}\right).\end{split}\]

__module__ = 'Policies.Experimentals.UCBoost_faster'¶

class Policies.Experimentals.UCBoost_faster.UCBoost_bq_h_lb_t_sq(nbArms, c=0.0, lower=0.0, amplitude=1.0)[source]¶

Bases: Policies.Experimentals.UCBoost_faster.UCBoost

The UCBoost policy for bounded bandits (on [0, 1]).

It is quite simple: using a set of kl-dominated and candidate semi-distances D, the UCB index for each arm (at each step) is computed as the smallest upper confidence bound given (for this arm at this time t) for each distance d.
set_D is d_bq, d_h, d_lb, d_t, d_sq (see the article, Corollary 3, p5, for more details).
Reference: [Fang Liu et al, 2018](https://arxiv.org/abs/1804.05929).

__init__(nbArms, c=0.0, lower=0.0, amplitude=1.0)[source]¶

New generic index policy.

nbArms: the number of arms,
lower, amplitude: lower value and known amplitude of the rewards.

__str__()[source]¶: -> str

computeIndex(arm)[source]¶: Compute the current index, at time t and after \(N_k(t)\) pulls of arm k:

\[\begin{split}\hat{\mu}_k(t) &= \frac{X_k(t)}{N_k(t)}, \\ I_k(t) &= \min_{d\in D} P_1(d)\left(\hat{\mu}_k(t), \frac{\log(t) + c\log(\log(t))}{N_k(t)}\right).\end{split}\]

__module__ = 'Policies.Experimentals.UCBoost_faster'¶

class Policies.Experimentals.UCBoost_faster.UCBoostEpsilon(nbArms, epsilon=0.01, c=0.0, lower=0.0, amplitude=1.0)[source]¶

Bases: IndexPolicy.IndexPolicy

The UCBoostEpsilon policy for bounded bandits (on [0, 1]).

It is quite simple: using a set of kl-dominated and candidate semi-distances D, the UCB index for each arm (at each step) is computed as the smallest upper confidence bound given (for this arm at this time t) for each distance d.
This variant uses solutions_pb_from_epsilon() to also compute the \(\varepsilon\) approximation of the kullback_leibler_distance_on_mean() function (see the article for details, Th.3 p6).
Reference: [Fang Liu et al, 2018](https://arxiv.org/abs/1804.05929).

__init__(nbArms, epsilon=0.01, c=0.0, lower=0.0, amplitude=1.0)[source]¶

New generic index policy.

nbArms: the number of arms,
lower, amplitude: lower value and known amplitude of the rewards.

c = None¶: Parameter c

epsilon = None¶: Parameter epsilon

__str__()[source]¶: -> str

__module__ = 'Policies.Experimentals.UCBoost_faster'¶

computeIndex(arm)[source]¶: Compute the current index, at time t and after \(N_k(t)\) pulls of arm k:

\[\begin{split}\hat{\mu}_k(t) &= \frac{X_k(t)}{N_k(t)}, \\ I_k(t) &= \min_{d\in D_{\varepsilon}} P_1(d)\left(\hat{\mu}_k(t), \frac{\log(t) + c\log(\log(t))}{N_k(t)}\right).\end{split}\]