# Policies.BoltzmannGumbel module¶

The Boltzmann-Gumbel Exploration (BGE) index policy, a different formulation of the Exp3 policy with an optimally tune decreasing sequence of temperature parameters $$\gamma_t$$.

• Reference: Section 4 of [Boltzmann Exploration Done Right, N.Cesa-Bianchi & C.Gentile & G.Lugosi & G.Neu, arXiv 2017](https://arxiv.org/pdf/1705.10257.pdf).

• It is an index policy with indexes computed from the empirical mean estimators and a random sample from a Gumbel distribution.

Policies.BoltzmannGumbel.SIGMA = 1

Default constant $$\sigma$$ assuming the arm distributions are $$\sigma^2$$-subgaussian. 1 for Bernoulli arms.

class Policies.BoltzmannGumbel.BoltzmannGumbel(nbArms, C=1, lower=0.0, amplitude=1.0)[source]

The Boltzmann-Gumbel Exploration (BGE) index policy, a different formulation of the Exp3 policy with an optimally tune decreasing sequence of temperature parameters $$\gamma_t$$.

• Reference: Section 4 of [Boltzmann Exploration Done Right, N.Cesa-Bianchi & C.Gentile & G.Lugosi & G.Neu, arXiv 2017](https://arxiv.org/pdf/1705.10257.pdf).

• It is an index policy with indexes computed from the empirical mean estimators and a random sample from a Gumbel distribution.

__init__(nbArms, C=1, lower=0.0, amplitude=1.0)[source]

New generic index policy.

• nbArms: the number of arms,

• lower, amplitude: lower value and known amplitude of the rewards.

__str__()[source]

-> str

computeIndex(arm)[source]

Take a random index, at time t and after $$N_k(t)$$ pulls of arm k:

$\begin{split}I_k(t) &= \frac{X_k(t)}{N_k(t)} + \beta_k(t) Z_k(t), \\ \text{where}\;\; \beta_k(t) &:= \sqrt{C^2 / N_k(t)}, \\ \text{and}\;\; Z_k(t) &\sim \mathrm{Gumbel}(0, 1).\end{split}$

Where $$\mathrm{Gumbel}(0, 1)$$ is the standard Gumbel distribution. See [Numpy documentation](https://docs.scipy.org/doc/numpy/reference/generated/numpy.random.gumbel.html#numpy.random.gumbel) or [Wikipedia page](https://en.wikipedia.org/wiki/Gumbel_distribution) for more details.

computeAllIndex()[source]

Compute the current indexes for all arms, in a vectorized manner.

__module__ = 'Policies.BoltzmannGumbel'