Policies.UCBrandomInit module

The UCB index policy, modified to take a random permutation order for the initial exploration of each arm (could reduce collisions in the multi-players setting). Reference: [Lai & Robbins, 1985].

class Policies.UCBrandomInit.UCBrandomInit(nbArms, lower=0.0, amplitude=1.0)[source]

Bases: Policies.UCB.UCB

The UCB index policy, modified to take a random permutation order for the initial exploration of each arm (could reduce collisions in the multi-players setting). Reference: [Lai & Robbins, 1985].

__init__(nbArms, lower=0.0, amplitude=1.0)[source]

New generic index policy.

  • nbArms: the number of arms,

  • lower, amplitude: lower value and known amplitude of the rewards.

choice()[source]

In an index policy, choose an arm with maximal index (uniformly at random):

\[A(t) \sim U(\arg\max_{1 \leq k \leq K} I_k(t)).\]

Warning

In almost all cases, there is a unique arm with maximal index, so we loose a lot of time with this generic code, but I couldn’t find a way to be more efficient without loosing generality.

__module__ = 'Policies.UCBrandomInit'