Policies.IndexPolicy module

Generic index policy.

  • If rewards are not in [0, 1], be sure to give the lower value and the amplitude. Eg, if rewards are in [-3, 3], lower = -3, amplitude = 6.

class Policies.IndexPolicy.IndexPolicy(nbArms, lower=0.0, amplitude=1.0)[source]

Bases: Policies.BasePolicy.BasePolicy

Class that implements a generic index policy.

__init__(nbArms, lower=0.0, amplitude=1.0)[source]

New generic index policy.

  • nbArms: the number of arms,

  • lower, amplitude: lower value and known amplitude of the rewards.

index = None

Numerical index for each arms


Initialize the policy for a new game.


Compute the current index of arm ‘arm’.


Compute the current indexes for all arms. Possibly vectorized, by default it can not be vectorized automatically.


In an index policy, choose an arm with maximal index (uniformly at random):

\[A(t) \sim U(\arg\max_{1 \leq k \leq K} I_k(t)).\]


In almost all cases, there is a unique arm with maximal index, so we loose a lot of time with this generic code, but I couldn’t find a way to be more efficient without loosing generality.


In an index policy, choose an arm with index is the (1+rank)-th best (uniformly at random).

  • For instance, if rank is 1, the best arm is chosen (the 1-st best).

  • If rank is 4, the 4-th best arm is chosen.


This method is required for the PoliciesMultiPlayers.rhoRand policy.


In an index policy, choose the best arm from sub-set availableArms (uniformly at random).


In an index policy, choose nb arms with maximal indexes (uniformly at random).

choiceIMP(nb=1, startWithChoiceMultiple=True)[source]

In an index policy, the IMP strategy is hybrid: choose nb-1 arms with maximal empirical averages, then 1 arm with maximal index. Cf. algorithm IMP-TS [Komiyama, Honda, Nakagawa, 2016, arXiv 1506.00779].


Return the estimate order of the arms, as a permutation on [0..K-1] that would order the arms by increasing means.


Return a (non-necessarily sorted) list of the indexes of the M-best arms. Identify the set M-best.

__module__ = 'Policies.IndexPolicy'