Policies.IndexPolicy module

Generic index policy.

  • If rewards are not in [0, 1], be sure to give the lower value and the amplitude. Eg, if rewards are in [-3, 3], lower = -3, amplitude = 6.

class Policies.IndexPolicy.IndexPolicy(nbArms, lower=0.0, amplitude=1.0)[source]

Bases: Policies.BasePolicy.BasePolicy

Class that implements a generic index policy.

__init__(nbArms, lower=0.0, amplitude=1.0)[source]

New generic index policy.

  • nbArms: the number of arms,

  • lower, amplitude: lower value and known amplitude of the rewards.

index = None

Numerical index for each arms

startGame()[source]

Initialize the policy for a new game.

computeIndex(arm)[source]

Compute the current index of arm ‘arm’.

computeAllIndex()[source]

Compute the current indexes for all arms. Possibly vectorized, by default it can not be vectorized automatically.

choice()[source]

In an index policy, choose an arm with maximal index (uniformly at random):

\[A(t) \sim U(\arg\max_{1 \leq k \leq K} I_k(t)).\]

Warning

In almost all cases, there is a unique arm with maximal index, so we loose a lot of time with this generic code, but I couldn’t find a way to be more efficient without loosing generality.

choiceWithRank(rank=1)[source]

In an index policy, choose an arm with index is the (1+rank)-th best (uniformly at random).

  • For instance, if rank is 1, the best arm is chosen (the 1-st best).

  • If rank is 4, the 4-th best arm is chosen.

Note

This method is required for the PoliciesMultiPlayers.rhoRand policy.

choiceFromSubSet(availableArms='all')[source]

In an index policy, choose the best arm from sub-set availableArms (uniformly at random).

choiceMultiple(nb=1)[source]

In an index policy, choose nb arms with maximal indexes (uniformly at random).

choiceIMP(nb=1, startWithChoiceMultiple=True)[source]

In an index policy, the IMP strategy is hybrid: choose nb-1 arms with maximal empirical averages, then 1 arm with maximal index. Cf. algorithm IMP-TS [Komiyama, Honda, Nakagawa, 2016, arXiv 1506.00779].

estimatedOrder()[source]

Return the estimate order of the arms, as a permutation on [0..K-1] that would order the arms by increasing means.

estimatedBestArms(M=1)[source]

Return a (non-necessarily sorted) list of the indexes of the M-best arms. Identify the set M-best.

__module__ = 'Policies.IndexPolicy'