Policies.FEWA module

author: Julien Seznec

Filtering on Expanding Window Algorithm for rotting bandits.

Reference: [Seznec et al., 2019a] Rotting bandits are not harder than stochastic ones; Julien Seznec, Andrea Locatelli, Alexandra Carpentier, Alessandro Lazaric, Michal Valko ; Proceedings of Machine Learning Research, PMLR 89:2564-2572, 2019. http://proceedings.mlr.press/v89/seznec19a.html https://arxiv.org/abs/1811.11043 (updated version)

Reference : [Seznec et al., 2019b] A single algorithm for both rested and restless rotting bandits (WIP) Julien Seznec, Pierre Ménard, Alessandro Lazaric, Michal Valko

class Policies.FEWA.EFF_FEWA(nbArms, alpha=0.06, subgaussian=1, m=None, delta=None)[source]

Bases: Policies.BasePolicy.BasePolicy

Efficient Filtering on Expanding Window Average Efficient trick described in [Seznec et al., 2019a, https://arxiv.org/abs/1811.11043] (m=2) and [Seznec et al., 2019b, WIP] (m<=2) We use the confidence level :math:`delta_t =

rac{1}{t^lpha}`.

__init__(nbArms, alpha=0.06, subgaussian=1, m=None, delta=None)[source]

New policy.

__str__()[source]

-> str

getReward(arm, reward)[source]

Give a reward: increase t, pulls, and update cumulated sum of rewards for that arm (normalized in [0, 1]).

choice()[source]

Not defined.

_append_thresholds(w)[source]
_inlog()[source]
startGame()[source]

Start the game (fill pulls and rewards with 0).

__module__ = 'Policies.FEWA'
class Policies.FEWA.FEWA(nbArms, subgaussian=1, alpha=4, delta=None)[source]

Bases: Policies.FEWA.EFF_FEWA

Filtering on Expanding Window Average. Reference: [Seznec et al., 2019a, https://arxiv.org/abs/1811.11043]. FEWA is equivalent to EFF_FEWA for \(m < 1+1/T\) [Seznec et al., 2019b, WIP]. This implementation is valid for $:math:T < 10^{15}. For \(T>10^{15}\), FEWA will have time and memory issues as its time and space complexity is O(KT) per round.

__init__(nbArms, subgaussian=1, alpha=4, delta=None)[source]

New policy.

__str__()[source]

-> str

__module__ = 'Policies.FEWA'