Policies.IndexPolicy module¶
Generic index policy.
If rewards are not in [0, 1], be sure to give the lower value and the amplitude. Eg, if rewards are in [-3, 3], lower = -3, amplitude = 6.
-
class
Policies.IndexPolicy.
IndexPolicy
(nbArms, lower=0.0, amplitude=1.0)[source]¶ Bases:
Policies.BasePolicy.BasePolicy
Class that implements a generic index policy.
-
__init__
(nbArms, lower=0.0, amplitude=1.0)[source]¶ New generic index policy.
nbArms: the number of arms,
lower, amplitude: lower value and known amplitude of the rewards.
-
index
= None¶ Numerical index for each arms
-
computeAllIndex
()[source]¶ Compute the current indexes for all arms. Possibly vectorized, by default it can not be vectorized automatically.
-
choice
()[source]¶ In an index policy, choose an arm with maximal index (uniformly at random):
\[A(t) \sim U(\arg\max_{1 \leq k \leq K} I_k(t)).\]Warning
In almost all cases, there is a unique arm with maximal index, so we loose a lot of time with this generic code, but I couldn’t find a way to be more efficient without loosing generality.
-
choiceWithRank
(rank=1)[source]¶ In an index policy, choose an arm with index is the (1+rank)-th best (uniformly at random).
For instance, if rank is 1, the best arm is chosen (the 1-st best).
If rank is 4, the 4-th best arm is chosen.
Note
This method is required for the
PoliciesMultiPlayers.rhoRand
policy.
-
choiceFromSubSet
(availableArms='all')[source]¶ In an index policy, choose the best arm from sub-set availableArms (uniformly at random).
-
choiceMultiple
(nb=1)[source]¶ In an index policy, choose nb arms with maximal indexes (uniformly at random).
-
choiceIMP
(nb=1, startWithChoiceMultiple=True)[source]¶ In an index policy, the IMP strategy is hybrid: choose nb-1 arms with maximal empirical averages, then 1 arm with maximal index. Cf. algorithm IMP-TS [Komiyama, Honda, Nakagawa, 2016, arXiv 1506.00779].
-
estimatedOrder
()[source]¶ Return the estimate order of the arms, as a permutation on [0..K-1] that would order the arms by increasing means.
-
estimatedBestArms
(M=1)[source]¶ Return a (non-necessarily sorted) list of the indexes of the M-best arms. Identify the set M-best.
-
__module__
= 'Policies.IndexPolicy'¶
-