Policies.BasePolicy module¶

Base class for any policy.

If rewards are not in [0, 1], be sure to give the lower value and the amplitude. Eg, if rewards are in [-3, 3], lower = -3, amplitude = 6.

Policies.BasePolicy.CHECKBOUNDS = False¶: If True, every time a reward is received, a warning message is displayed if it lies outsides of [lower, lower + amplitude].

class Policies.BasePolicy.BasePolicy(nbArms, lower=0.0, amplitude=1.0)[source]¶

Base class for any policy.

getReward(arm, reward)[source]¶: Give a reward: increase t, pulls, and update cumulated sum of rewards for that arm (normalized in [0, 1]).

__dict__ = mappingproxy({'__module__': 'Policies.BasePolicy', '__doc__': ' Base class for any policy.', '__init__': <function BasePolicy.__init__>, '__str__': <function BasePolicy.__str__>, 'startGame': <function BasePolicy.startGame>, 'getReward': <function BasePolicy.getReward>, 'choice': <function BasePolicy.choice>, 'choiceWithRank': <function BasePolicy.choiceWithRank>, 'choiceFromSubSet': <function BasePolicy.choiceFromSubSet>, 'choiceMultiple': <function BasePolicy.choiceMultiple>, 'choiceIMP': <function BasePolicy.choiceIMP>, 'estimatedOrder': <function BasePolicy.estimatedOrder>, '__dict__': <attribute '__dict__' of 'BasePolicy' objects>, '__weakref__': <attribute '__weakref__' of 'BasePolicy' objects>})¶

estimatedOrder()[source]¶

Return the estimate order of the arms, as a permutation on [0..K-1] that would order the arms by increasing means.