Policies.Experimentals.UCBlog10alpha module¶

The UCB1 (UCB-alpha) index policy, modified to take a random permutation order for the initial exploration of each arm (reduce collisions in the multi-players setting). Note: \(\log10(t)\) and not \(\log(t)\) for UCB index. Reference: [Auer et al. 02].

Policies.Experimentals.UCBlog10alpha.ALPHA = 1¶: Default parameter for alpha

class Policies.Experimentals.UCBlog10alpha.UCBlog10alpha(nbArms, alpha=1, lower=0.0, amplitude=1.0)[source]¶

Bases: Policies.Experimentals.UCBlog10.UCBlog10

The UCB1 (UCB-alpha) index policy, modified to take a random permutation order for the initial exploration of each arm (reduce collisions in the multi-players setting). Note: \(\log10(t)\) and not \(\log(t)\) for UCB index. Reference: [Auer et al. 02].

__init__(nbArms, alpha=1, lower=0.0, amplitude=1.0)[source]¶

New generic index policy.

nbArms: the number of arms,
lower, amplitude: lower value and known amplitude of the rewards.

alpha = None¶: Parameter alpha

__str__()[source]¶: -> str

computeIndex(arm)[source]¶: Compute the current index, at time t and after \(N_k(t)\) pulls of arm k:

\[I_k(t) = \frac{X_k(t)}{N_k(t)} + \sqrt{\frac{\alpha \log_{10}(t)}{2 N_k(t)}}.\]

__module__ = 'Policies.Experimentals.UCBlog10alpha'¶

computeAllIndex()[source]¶: Compute the current indexes for all arms, in a vectorized manner.