Policies.Experimentals.UCBlog10 module

The UCB policy for bounded bandits, using \(\log10(t)\) and not \(\log(t)\) for UCB index. Reference: [Lai & Robbins, 1985].

class Policies.Experimentals.UCBlog10.UCBlog10(nbArms, lower=0.0, amplitude=1.0)[source]

Bases: IndexPolicy.IndexPolicy

The UCB policy for bounded bandits, using \(\log10(t)\) and not \(\log(t)\) for UCB index. Reference: [Lai & Robbins, 1985].

computeIndex(arm)[source]

Compute the current index, at time t and after \(N_k(t)\) pulls of arm k:

\[I_k(t) = \frac{X_k(t)}{N_k(t)} + \sqrt{\frac{2 \log_{10}(t)}{N_k(t)}}.\]
computeAllIndex()[source]

Compute the current indexes for all arms, in a vectorized manner.

__module__ = 'Policies.Experimentals.UCBlog10'