Policies.IndexPolicy module¶

Generic index policy.

If rewards are not in [0, 1], be sure to give the lower value and the amplitude. Eg, if rewards are in [-3, 3], lower = -3, amplitude = 6.

class Policies.IndexPolicy.IndexPolicy(nbArms, lower=0.0, amplitude=1.0)[source]¶

Bases: Policies.BasePolicy.BasePolicy

Class that implements a generic index policy.

__init__(nbArms, lower=0.0, amplitude=1.0)[source]¶

New generic index policy.

nbArms: the number of arms,
lower, amplitude: lower value and known amplitude of the rewards.

index = None¶: Numerical index for each arms

startGame()[source]¶: Initialize the policy for a new game.

computeIndex(arm)[source]¶: Compute the current index of arm ‘arm’.

computeAllIndex()[source]¶: Compute the current indexes for all arms. Possibly vectorized, by default it can not be vectorized automatically.

choice()[source]¶: In an index policy, choose an arm with maximal index (uniformly at random):

\[A(t) \sim U(\arg\max_{1 \leq k \leq K} I_k(t)).\]

Warning

In almost all cases, there is a unique arm with maximal index, so we loose a lot of time with this generic code, but I couldn’t find a way to be more efficient without loosing generality.

choiceWithRank(rank=1)[source]¶

In an index policy, choose an arm with index is the (1+rank)-th best (uniformly at random).

For instance, if rank is 1, the best arm is chosen (the 1-st best).
If rank is 4, the 4-th best arm is chosen.

Note

This method is required for the PoliciesMultiPlayers.rhoRand policy.

choiceFromSubSet(availableArms='all')[source]¶: In an index policy, choose the best arm from sub-set availableArms (uniformly at random).

choiceMultiple(nb=1)[source]¶: In an index policy, choose nb arms with maximal indexes (uniformly at random).

choiceIMP(nb=1, startWithChoiceMultiple=True)[source]¶: In an index policy, the IMP strategy is hybrid: choose nb-1 arms with maximal empirical averages, then 1 arm with maximal index. Cf. algorithm IMP-TS [Komiyama, Honda, Nakagawa, 2016, arXiv 1506.00779].

estimatedOrder()[source]¶: Return the estimate order of the arms, as a permutation on [0..K-1] that would order the arms by increasing means.

estimatedBestArms(M=1)[source]¶: Return a (non-necessarily sorted) list of the indexes of the M-best arms. Identify the set M-best.

__module__ = 'Policies.IndexPolicy'¶