Policies.SWA module¶
author : Julien Seznec Sliding Window Average policy for rotting bandits.
Reference: [Levine et al., 2017, https://papers.nips.cc/paper/6900-rotting-bandits.pdf]. Advances in Neural Information Processing Systems 30 (NIPS 2017) Nir Levine, Koby Crammer, Shie Mannor
-
class
Policies.SWA.SWA(nbArms, horizon=1, subgaussian=1, maxDecrement=1, alpha=0.2, doublingTrick=False)[source]¶ Bases:
Policies.IndexPolicy.IndexPolicyThe Sliding Window Average policy for rotting bandits. Reference: [Levine et al., 2017, https://papers.nips.cc/paper/6900-rotting-bandits.pdf].
-
__init__(nbArms, horizon=1, subgaussian=1, maxDecrement=1, alpha=0.2, doublingTrick=False)[source]¶ New generic index policy.
nbArms: the number of arms,
lower, amplitude: lower value and known amplitude of the rewards.
-
getReward(arm, reward)[source]¶ Give a reward: increase t, pulls, and update cumulated sum of rewards for that arm (normalized in [0, 1]).
-
__module__= 'Policies.SWA'¶
-
-
class
Policies.SWA.wSWA(nbArms, firstHorizon=1, subgaussian=1, maxDecrement=1, alpha=0.2)[source]¶ Bases:
Policies.SWA.SWASWA with doubling trick Reference: [Levine et al., 2017, https://papers.nips.cc/paper/6900-rotting-bandits.pdf].
-
__init__(nbArms, firstHorizon=1, subgaussian=1, maxDecrement=1, alpha=0.2)[source]¶ New generic index policy.
nbArms: the number of arms,
lower, amplitude: lower value and known amplitude of the rewards.
-
getReward(arm, reward)[source]¶ Give a reward: increase t, pulls, and update cumulated sum of rewards for that arm (normalized in [0, 1]).
-
__module__= 'Policies.SWA'¶
-