# Policies.OCUCB module¶

The Optimally Confident UCB (OC-UCB) policy for bounded stochastic bandits, with sub-Gaussian noise.

Policies.OCUCB.ETA = 2

Default value for parameter $$\eta > 1$$ for OCUCB.

Policies.OCUCB.RHO = 1

Default value for parameter $$\rho \in (1/2, 1]$$ for OCUCB.

class Policies.OCUCB.OCUCB(nbArms, eta=2, rho=1, lower=0.0, amplitude=1.0)[source]

Bases: Policies.UCB.UCB

The Optimally Confident UCB (OC-UCB) policy for bounded stochastic bandits, with sub-Gaussian noise.

__init__(nbArms, eta=2, rho=1, lower=0.0, amplitude=1.0)[source]

New generic index policy.

• nbArms: the number of arms,

• lower, amplitude: lower value and known amplitude of the rewards.

eta = None

Parameter $$\eta > 1$$.

rho = None

Parameter $$\rho \in (1/2, 1]$$.

__str__()[source]

-> str

_Bterm(k)[source]

Compute the extra term $$B_k(t)$$ as follows:

$\begin{split}B_k(t) &= \max\Big\{ \exp(1), \log(t), t \log(t) / C_k(t) \Big\},\\ \text{where}\; C_k(t) &= \sum_{j=1}^{K} \min\left\{ T_k(t), T_j(t)^{\rho} T_k(t)^{1 - \rho} \right\}\end{split}$
_Bterms()[source]

Compute all the extra terms, $$B_k(t)$$ for each arm k, in a naive manner, not optimized to be vectorial, but it works.

computeIndex(arm)[source]

Compute the current index, at time t and after $$N_k(t)$$ pulls of arm k:

$I_k(t) = \frac{X_k(t)}{N_k(t)} + \sqrt{\frac{2 \eta \log(B_k(t))}{N_k(t)}}.$
• Where $$\eta$$ is a parameter of the algorithm,

• And $$B_k(t)$$ is the additional term defined above.

__module__ = 'Policies.OCUCB'