Policies.UCBrandomInit module¶
The UCB index policy, modified to take a random permutation order for the initial exploration of each arm (could reduce collisions in the multi-players setting). Reference: [Lai & Robbins, 1985].
-
class
Policies.UCBrandomInit.
UCBrandomInit
(nbArms, lower=0.0, amplitude=1.0)[source]¶ Bases:
Policies.UCB.UCB
The UCB index policy, modified to take a random permutation order for the initial exploration of each arm (could reduce collisions in the multi-players setting). Reference: [Lai & Robbins, 1985].
-
__init__
(nbArms, lower=0.0, amplitude=1.0)[source]¶ New generic index policy.
nbArms: the number of arms,
lower, amplitude: lower value and known amplitude of the rewards.
-
choice
()[source]¶ In an index policy, choose an arm with maximal index (uniformly at random):
\[A(t) \sim U(\arg\max_{1 \leq k \leq K} I_k(t)).\]Warning
In almost all cases, there is a unique arm with maximal index, so we loose a lot of time with this generic code, but I couldn’t find a way to be more efficient without loosing generality.
-
__module__
= 'Policies.UCBrandomInit'¶
-