Policies.GenericAggregation module

The GenericAggregation aggregation bandit algorithm: use a bandit policy A (master), managing several “slave” algorithms, \(A_1, ..., A_N\).

  • At every step, one slave algorithm A_i is selected, by the master policy A.

  • Then its decision is listen to, played by the master algorithm, and a feedback reward is received.

  • All slaves receive the observation (arm, reward).

  • The master also receives the same observation.

Policies.GenericAggregation.random() → x in the interval [0, 1).
class Policies.GenericAggregation.GenericAggregation(nbArms, master=None, children=None, lower=0.0, amplitude=1.0)[source]

Bases: Policies.BasePolicy.BasePolicy

The GenericAggregation aggregation bandit algorithm.

__init__(nbArms, master=None, children=None, lower=0.0, amplitude=1.0)[source]

New policy.

nbArms = None

Number of arms.

lower = None

Lower values for rewards.

amplitude = None

Larger values for rewards.

last_choice = None

Remember the index of the last child trusted for a decision.

children = None

List of slave algorithms.

__str__()[source]

Nicely print the name of the algorithm with its relevant parameters.

startGame()[source]

Start the game for each child, and for the master.

getReward(arm, reward)[source]

Give reward for each child, and for the master.

choice()[source]

Trust one of the slave and listen to his choice.

choiceWithRank(rank=1)[source]

Trust one of the slave and listen to his choiceWithRank.

choiceFromSubSet(availableArms='all')[source]

Trust one of the slave and listen to his choiceFromSubSet.

choiceMultiple(nb=1)[source]

Trust one of the slave and listen to his choiceMultiple.

__module__ = 'Policies.GenericAggregation'
choiceIMP(nb=1, startWithChoiceMultiple=True)[source]

Trust one of the slave and listen to his choiceIMP.

estimatedOrder()[source]

Trust one of the slave and listen to his estimatedOrder.

  • Return the estimate order of the arms, as a permutation on \([0,...,K-1]\) that would order the arms by increasing means.

estimatedBestArms(M=1)[source]

Return a (non-necessarily sorted) list of the indexes of the M-best arms. Identify the set M-best.