Policies.OCUCBH module

The Optimally Confident UCB (OC-UCB) policy for bounded stochastic bandits. Initial version (horizon-dependent).

Policies.OCUCBH.PSI = 2

Default value for parameter \(\psi \geq 2\) for OCUCBH.

Policies.OCUCBH.ALPHA = 4

Default value for parameter \(\alpha \geq 2\) for OCUCBH.

class Policies.OCUCBH.OCUCBH(nbArms, horizon=None, psi=2, alpha=4, lower=0.0, amplitude=1.0)[source]

Bases: Policies.OCUCB.OCUCB

The Optimally Confident UCB (OC-UCB) policy for bounded stochastic bandits. Initial version (horizon-dependent).

__init__(nbArms, horizon=None, psi=2, alpha=4, lower=0.0, amplitude=1.0)[source]

New generic index policy.

  • nbArms: the number of arms,

  • lower, amplitude: lower value and known amplitude of the rewards.

psi = None

Parameter \(\psi \geq 2\).

alpha = None

Parameter \(\alpha \geq 2\).

horizon = None

Horizon T.

__str__()[source]

-> str

computeIndex(arm)[source]

Compute the current index, at time t and after \(N_k(t)\) pulls of arm k:

\[I_k(t) = \frac{X_k(t)}{N_k(t)} + \sqrt{\frac{\alpha}{N_k(t)} \log(\frac{\psi T}{t})}.\]
  • Where \(\alpha\) and \(\psi\) are two parameters of the algorithm.

__module__ = 'Policies.OCUCBH'
class Policies.OCUCBH.AOCUCBH(nbArms, horizon=None, lower=0.0, amplitude=1.0)[source]

Bases: Policies.OCUCBH.OCUCBH

The Almost Optimally Confident UCB (OC-UCB) policy for bounded stochastic bandits. Initial version (horizon-dependent).

__init__(nbArms, horizon=None, lower=0.0, amplitude=1.0)[source]

New generic index policy.

  • nbArms: the number of arms,

  • lower, amplitude: lower value and known amplitude of the rewards.

__str__()[source]

-> str

computeIndex(arm)[source]

Compute the current index, at time t and after \(N_k(t)\) pulls of arm k:

\[I_k(t) = \frac{X_k(t)}{N_k(t)} + \sqrt{\frac{2}{N_k(t)} \log(\frac{T}{N_k(t)})}.\]
__module__ = 'Policies.OCUCBH'