Policies.SlidingWindowRestart module

An experimental policy, using a sliding window (of for instance \(\tau=100\) draws of each arm), and reset the algorithm as soon as the small empirical average is too far away from the long history empirical average (or just restart for one arm, if possible).

  • Reference: none yet, idea from Rémi Bonnefoi and Lilian Besson.

  • It runs on top of a simple policy, e.g., UCB, and SlidingWindowRestart() is a generic policy using any simple policy with this “sliding window” trick:

    >>> policy = SlidingWindowRestart(nbArms, UCB, tau=100, threshold=0.1)
    >>> # use policy as usual, with policy.startGame(), r = policy.choice(), policy.getReward(arm, r)
    
  • It uses an additional \(\mathcal{O}(\tau)\) memory but do not cost anything else in terms of time complexity (the average is done with a sliding window, and costs \(\mathcal{O}(1)\) at every time step).

Warning

This is very experimental!

Warning

It can only work on basic index policy based on empirical averages (and an exploration bias), like UCB, and cannot work on any Bayesian policy (for which we would have to remember all previous observations in order to reset the history with a small history)! Note that it works on Policies.Thompson.Thompson.

Policies.SlidingWindowRestart.TAU = 100

Size of the sliding window.

Policies.SlidingWindowRestart.THRESHOLD = 0.005

Threshold to know when to restart the base algorithm.

Policies.SlidingWindowRestart.FULL_RESTART_WHEN_REFRESH = True

Should we fully restart the algorithm or simply reset one arm empirical average ?

class Policies.SlidingWindowRestart.SlidingWindowRestart(nbArms, policy=<class 'Policies.UCB.UCB'>, tau=100, threshold=0.005, full_restart_when_refresh=True, *args, **kwargs)[source]

Bases: Policies.BaseWrapperPolicy.BaseWrapperPolicy

An experimental policy, using a sliding window of for instance \(\tau=100\) draws, and reset the algorithm as soon as the small empirical average is too far away from the full history empirical average (or just restart for one arm, if possible).

__init__(nbArms, policy=<class 'Policies.UCB.UCB'>, tau=100, threshold=0.005, full_restart_when_refresh=True, *args, **kwargs)[source]

New policy.

last_rewards = None

Keep in memory all the rewards obtained in the last \(\tau\) steps.

last_pulls = None

Keep in memory the times where each arm was last seen. Start with -1 (never seen)

__str__()[source]

-> str

getReward(arm, reward)[source]

Give a reward: increase t, pulls, and update cumulated sum of rewards and update small history (sliding window) for that arm (normalized in [0, 1]).

  • Reset the whole empirical average if the small average is too far away from it.

__module__ = 'Policies.SlidingWindowRestart'
class Policies.SlidingWindowRestart.SWR_UCB(nbArms, tau=100, threshold=0.005, full_restart_when_refresh=True, *args, **kwargs)[source]

Bases: Policies.UCB.UCB

An experimental policy, using a sliding window of for instance \(\tau=100\) draws, and reset the algorithm as soon as the small empirical average is too far away from the full history empirical average (or just restart for one arm, if possible).

Warning

FIXME I should remove this code, it’s useless now that the generic wrapper SlidingWindowRestart works fine.

__init__(nbArms, tau=100, threshold=0.005, full_restart_when_refresh=True, *args, **kwargs)[source]

New generic index policy.

  • nbArms: the number of arms,

  • lower, amplitude: lower value and known amplitude of the rewards.

tau = None

Size of the sliding window.

threshold = None

Threshold to know when to restart the base algorithm.

last_rewards = None

Keep in memory all the rewards obtained in the last \(\tau\) steps.

last_pulls = None

Keep in memory the times where each arm was last seen. Start with -1 (never seen)

full_restart_when_refresh = None

Should we fully restart the algorithm or simply reset one arm empirical average ?

__str__()[source]

-> str

getReward(arm, reward)[source]

Give a reward: increase t, pulls, and update cumulated sum of rewards and update small history (sliding window) for that arm (normalized in [0, 1]).

  • Reset the whole empirical average if the small average is too far away from it.

__module__ = 'Policies.SlidingWindowRestart'
class Policies.SlidingWindowRestart.SWR_UCBalpha(nbArms, tau=100, threshold=0.005, full_restart_when_refresh=True, alpha=4, *args, **kwargs)[source]

Bases: Policies.UCBalpha.UCBalpha

An experimental policy, using a sliding window of for instance \(\tau=100\) draws, and reset the algorithm as soon as the small empirical average is too far away from the full history empirical average (or just restart for one arm, if possible).

Warning

FIXME I should remove this code, it’s useless now that the generic wrapper SlidingWindowRestart works fine.

__init__(nbArms, tau=100, threshold=0.005, full_restart_when_refresh=True, alpha=4, *args, **kwargs)[source]

New generic index policy.

  • nbArms: the number of arms,

  • lower, amplitude: lower value and known amplitude of the rewards.

tau = None

Size of the sliding window.

threshold = None

Threshold to know when to restart the base algorithm.

last_rewards = None

Keep in memory all the rewards obtained in the last \(\tau\) steps.

last_pulls = None

Keep in memory the times where each arm was last seen. Start with -1 (never seen)

full_restart_when_refresh = None

Should we fully restart the algorithm or simply reset one arm empirical average ?

__str__()[source]

-> str

getReward(arm, reward)[source]

Give a reward: increase t, pulls, and update cumulated sum of rewards and update small history (sliding window) for that arm (normalized in [0, 1]).

  • Reset the whole empirical average if the small average is too far away from it.

__module__ = 'Policies.SlidingWindowRestart'
class Policies.SlidingWindowRestart.SWR_klUCB(nbArms, tau=100, threshold=0.005, full_restart_when_refresh=True, tolerance=0.0001, klucb=CPUDispatcher(<function klucbBern>), c=1.0, *args, **kwargs)[source]

Bases: Policies.klUCB.klUCB

An experimental policy, using a sliding window of for instance \(\tau=100\) draws, and reset the algorithm as soon as the small empirical average is too far away from the full history empirical average (or just restart for one arm, if possible).

Warning

FIXME I should remove this code, it’s useless now that the generic wrapper SlidingWindowRestart works fine.

__init__(nbArms, tau=100, threshold=0.005, full_restart_when_refresh=True, tolerance=0.0001, klucb=CPUDispatcher(<function klucbBern>), c=1.0, *args, **kwargs)[source]

New generic index policy.

  • nbArms: the number of arms,

  • lower, amplitude: lower value and known amplitude of the rewards.

tau = None

Size of the sliding window.

threshold = None

Threshold to know when to restart the base algorithm.

last_rewards = None

Keep in memory all the rewards obtained in the last \(\tau\) steps.

last_pulls = None

Keep in memory the times where each arm was last seen. Start with -1 (never seen)

full_restart_when_refresh = None

Should we fully restart the algorithm or simply reset one arm empirical average ?

__str__()[source]

-> str

getReward(arm, reward)[source]

Give a reward: increase t, pulls, and update cumulated sum of rewards and update small history (sliding window) for that arm (normalized in [0, 1]).

  • Reset the whole empirical average if the small average is too far away from it.

__module__ = 'Policies.SlidingWindowRestart'