Policies.SWA module¶
author : Julien Seznec Sliding Window Average policy for rotting bandits.
Reference: [Levine et al., 2017, https://papers.nips.cc/paper/6900-rotting-bandits.pdf]. Advances in Neural Information Processing Systems 30 (NIPS 2017) Nir Levine, Koby Crammer, Shie Mannor
- 
class Policies.SWA.SWA(nbArms, horizon=1, subgaussian=1, maxDecrement=1, alpha=0.2, doublingTrick=False)[source]¶
- Bases: - Policies.IndexPolicy.IndexPolicy- The Sliding Window Average policy for rotting bandits. Reference: [Levine et al., 2017, https://papers.nips.cc/paper/6900-rotting-bandits.pdf]. - 
__init__(nbArms, horizon=1, subgaussian=1, maxDecrement=1, alpha=0.2, doublingTrick=False)[source]¶
- New generic index policy. - nbArms: the number of arms, 
- lower, amplitude: lower value and known amplitude of the rewards. 
 
 - 
getReward(arm, reward)[source]¶
- Give a reward: increase t, pulls, and update cumulated sum of rewards for that arm (normalized in [0, 1]). 
 - 
__module__= 'Policies.SWA'¶
 
- 
- 
class Policies.SWA.wSWA(nbArms, firstHorizon=1, subgaussian=1, maxDecrement=1, alpha=0.2)[source]¶
- Bases: - Policies.SWA.SWA- SWA with doubling trick Reference: [Levine et al., 2017, https://papers.nips.cc/paper/6900-rotting-bandits.pdf]. - 
__init__(nbArms, firstHorizon=1, subgaussian=1, maxDecrement=1, alpha=0.2)[source]¶
- New generic index policy. - nbArms: the number of arms, 
- lower, amplitude: lower value and known amplitude of the rewards. 
 
 - 
getReward(arm, reward)[source]¶
- Give a reward: increase t, pulls, and update cumulated sum of rewards for that arm (normalized in [0, 1]). 
 - 
__module__= 'Policies.SWA'¶
 
-