Policies.GenericAggregation module¶

The GenericAggregation aggregation bandit algorithm: use a bandit policy A (master), managing several “slave” algorithms, \(A_1, ..., A_N\).

At every step, one slave algorithm A_i is selected, by the master policy A.
Then its decision is listen to, played by the master algorithm, and a feedback reward is received.
All slaves receive the observation (arm, reward).
The master also receives the same observation.

Policies.GenericAggregation.random() → x in the interval [0, 1).¶

class Policies.GenericAggregation.GenericAggregation(nbArms, master=None, children=None, lower=0.0, amplitude=1.0)[source]¶

Bases: Policies.BasePolicy.BasePolicy

The GenericAggregation aggregation bandit algorithm.

__init__(nbArms, master=None, children=None, lower=0.0, amplitude=1.0)[source]¶: New policy.

nbArms = None¶: Number of arms.

lower = None¶: Lower values for rewards.

amplitude = None¶: Larger values for rewards.

last_choice = None¶: Remember the index of the last child trusted for a decision.

children = None¶: List of slave algorithms.

__str__()[source]¶: Nicely print the name of the algorithm with its relevant parameters.

startGame()[source]¶: Start the game for each child, and for the master.

getReward(arm, reward)[source]¶: Give reward for each child, and for the master.

choice()[source]¶: Trust one of the slave and listen to his choice.

choiceWithRank(rank=1)[source]¶: Trust one of the slave and listen to his choiceWithRank.

choiceFromSubSet(availableArms='all')[source]¶: Trust one of the slave and listen to his choiceFromSubSet.

choiceMultiple(nb=1)[source]¶: Trust one of the slave and listen to his choiceMultiple.

__module__ = 'Policies.GenericAggregation'¶

choiceIMP(nb=1, startWithChoiceMultiple=True)[source]¶: Trust one of the slave and listen to his choiceIMP.

estimatedOrder()[source]¶

Trust one of the slave and listen to his estimatedOrder.

Return the estimate order of the arms, as a permutation on \([0,...,K-1]\) that would order the arms by increasing means.

estimatedBestArms(M=1)[source]¶: Return a (non-necessarily sorted) list of the indexes of the M-best arms. Identify the set M-best.