Policies.OCUCB module

The Optimally Confident UCB (OC-UCB) policy for bounded stochastic bandits, with sub-Gaussian noise.

Policies.OCUCB.ETA = 2

Default value for parameter \(\eta > 1\) for OCUCB.

Policies.OCUCB.RHO = 1

Default value for parameter \(\rho \in (1/2, 1]\) for OCUCB.

class Policies.OCUCB.OCUCB(nbArms, eta=2, rho=1, lower=0.0, amplitude=1.0)[source]

Bases: Policies.UCB.UCB

The Optimally Confident UCB (OC-UCB) policy for bounded stochastic bandits, with sub-Gaussian noise.

__init__(nbArms, eta=2, rho=1, lower=0.0, amplitude=1.0)[source]

New generic index policy.

  • nbArms: the number of arms,

  • lower, amplitude: lower value and known amplitude of the rewards.

eta = None

Parameter \(\eta > 1\).

rho = None

Parameter \(\rho \in (1/2, 1]\).

__str__()[source]

-> str

_Bterm(k)[source]

Compute the extra term \(B_k(t)\) as follows:

\[\begin{split}B_k(t) &= \max\Big\{ \exp(1), \log(t), t \log(t) / C_k(t) \Big\},\\ \text{where}\; C_k(t) &= \sum_{j=1}^{K} \min\left\{ T_k(t), T_j(t)^{\rho} T_k(t)^{1 - \rho} \right\}\end{split}\]
_Bterms()[source]

Compute all the extra terms, \(B_k(t)\) for each arm k, in a naive manner, not optimized to be vectorial, but it works.

computeIndex(arm)[source]

Compute the current index, at time t and after \(N_k(t)\) pulls of arm k:

\[I_k(t) = \frac{X_k(t)}{N_k(t)} + \sqrt{\frac{2 \eta \log(B_k(t))}{N_k(t)}}.\]
  • Where \(\eta\) is a parameter of the algorithm,

  • And \(B_k(t)\) is the additional term defined above.

__module__ = 'Policies.OCUCB'