Policies.FEWA module¶

author: Julien Seznec

Filtering on Expanding Window Algorithm for rotting bandits.

Reference: [Seznec et al., 2019a] Rotting bandits are not harder than stochastic ones; Julien Seznec, Andrea Locatelli, Alexandra Carpentier, Alessandro Lazaric, Michal Valko ; Proceedings of Machine Learning Research, PMLR 89:2564-2572, 2019. http://proceedings.mlr.press/v89/seznec19a.html https://arxiv.org/abs/1811.11043 (updated version)

Reference : [Seznec et al., 2019b] A single algorithm for both rested and restless rotting bandits (WIP) Julien Seznec, Pierre Ménard, Alessandro Lazaric, Michal Valko

class Policies.FEWA.EFF_FEWA(nbArms, alpha=0.06, subgaussian=1, m=None, delta=None)[source]¶

Bases: Policies.BasePolicy.BasePolicy

Efficient Filtering on Expanding Window Average Efficient trick described in [Seznec et al., 2019a, https://arxiv.org/abs/1811.11043] (m=2) and [Seznec et al., 2019b, WIP] (m<=2) We use the confidence level :math:`delta_t =

rac{1}{t^lpha}`.

__init__(nbArms, alpha=0.06, subgaussian=1, m=None, delta=None)[source]¶: New policy.

__str__()[source]¶: -> str

getReward(arm, reward)[source]¶: Give a reward: increase t, pulls, and update cumulated sum of rewards for that arm (normalized in [0, 1]).

choice()[source]¶: Not defined.

_append_thresholds(w)[source]¶

_inlog()[source]¶

startGame()[source]¶: Start the game (fill pulls and rewards with 0).

__module__ = 'Policies.FEWA'¶

class Policies.FEWA.FEWA(nbArms, subgaussian=1, alpha=4, delta=None)[source]¶

Bases: Policies.FEWA.EFF_FEWA

Filtering on Expanding Window Average. Reference: [Seznec et al., 2019a, https://arxiv.org/abs/1811.11043]. FEWA is equivalent to EFF_FEWA for $m < 1+1/T$ [Seznec et al., 2019b, WIP]. This implementation is valid for $:math:T < 10^{15}. For $T>10^{15}$, FEWA will have time and memory issues as its time and space complexity is O(KT) per round.

__init__(nbArms, subgaussian=1, alpha=4, delta=None)[source]¶: New policy.

__str__()[source]¶: -> str

__module__ = 'Policies.FEWA'¶