# Policies.Exp3S module¶

The historical Exp3.S algorithm for non-stationary bandits.

• Reference: [[“The nonstochastic multiarmed bandit problem”, P. Auer, N. Cesa-Bianchi, Y. Freund, R.E. Schapire, SIAM journal on computing, 2002]](http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.21.8735&rep=rep1&type=pdf)

• It is a simple extension of the Exp3 policy:

>>> policy = Exp3S(nbArms, C=1)
>>> # use policy as usual, with policy.startGame(), r = policy.choice(), policy.getReward(arm, r)

• It uses an additional $$\mathcal{O}(\tau_\max)$$ memory for a game of maximum stationary length $$\tau_\max$$.

class Policies.Exp3S.Exp3S(nbArms, gamma=None, alpha=None, gamma0=1.0, alpha0=1.0, horizon=None, max_nb_random_events=None, *args, **kwargs)[source]

The historical Exp3.S algorithm for non-stationary bandits.

__init__(nbArms, gamma=None, alpha=None, gamma0=1.0, alpha0=1.0, horizon=None, max_nb_random_events=None, *args, **kwargs)[source]

New policy.

weights = None

Weights on the arms

__str__()[source]

-> str

property gamma

Constant $$\gamma_t = \gamma$$.

property alpha

Constant $$\alpha_t = \alpha$$.

startGame()[source]

property trusts

Update the trusts probabilities according to Exp3 formula, and the parameter $$\gamma_t$$.

$\begin{split}\mathrm{trusts}'_k(t+1) &= (1 - \gamma_t) w_k(t) + \gamma_t \frac{1}{K}, \\ \mathrm{trusts}(t+1) &= \mathrm{trusts}'(t+1) / \sum_{k=1}^{K} \mathrm{trusts}'_k(t+1).\end{split}$

If $$w_k(t)$$ is the current weight from arm k.

getReward(arm, reward)[source]

Give a reward: accumulate rewards on that arm k, then update the weight $$w_k(t)$$ and renormalize the weights.

• With unbiased estimators, divide by the trust on that arm k, i.e., the probability of observing arm k: $$\tilde{r}_k(t) = \frac{r_k(t)}{\mathrm{trusts}_k(t)}$$.

• But with a biased estimators, $$\tilde{r}_k(t) = r_k(t)$$.

$\begin{split}w'_k(t+1) &= w_k(t) \times \exp\left( \frac{\tilde{r}_k(t)}{\gamma_t N_k(t)} \right) \\ w(t+1) &= w'(t+1) / \sum_{k=1}^{K} w'_k(t+1).\end{split}$
__module__ = 'Policies.Exp3S'