Policies.ApproximatedFHGittins module¶

The approximated Finite-Horizon Gittins index policy for bounded bandits.

Policies.ApproximatedFHGittins.ALPHA = 0.5

Default value for the parameter $$\alpha > 0$$ for ApproximatedFHGittins.

Policies.ApproximatedFHGittins.DISTORTION_HORIZON = 1.01

Default value for the parameter $$\tau \geq 1$$ that is used to artificially increase the horizon, from $$T$$ to :mathtau T.

class Policies.ApproximatedFHGittins.ApproximatedFHGittins(nbArms, horizon=None, alpha=0.5, distortion_horizon=1.01, lower=0.0, amplitude=1.0)[source]

The approximated Finite-Horizon Gittins index policy for bounded bandits.

__init__(nbArms, horizon=None, alpha=0.5, distortion_horizon=1.01, lower=0.0, amplitude=1.0)[source]

New generic index policy.

• nbArms: the number of arms,
• lower, amplitude: lower value and known amplitude of the rewards.
alpha = None

Parameter $$\alpha > 0$$.

distortion_horizon = None

Parameter $$\tau > 0$$.

horizon = None

Parameter $$T$$ = known horizon of the experiment.

__str__()[source]

-> str

m

$$m = T - t + 1$$ is the number of steps to be played until end of the game.

Note

The article does not explain how to deal with unknown horizon, but eventually if $$T$$ is wrong, this m becomes negative. Empirically, I force it to be $$\geq 1$$, to not mess up with the $$\log(m)$$ used below, by using $$\tau T$$ instead of $$T$$ (e.g., $$\tau = 1.01$$ is enough to not ruin the performance in the last steps of the experiment).

computeIndex(arm)[source]

Compute the current index, at time t and after $$N_k(t)$$ pulls of arm k:

$\begin{split}I_k(t) &= \frac{X_k(t)}{N_k(t)} + \sqrt{\frac{2 \alpha}{N_k(t)} \log\left( \frac{m}{N_k(t) \log^{1/2}\left( \frac{m}{N_k(t)} \right)} \right)}, \\ \text{where}\;\; & m = T - t + 1.\end{split}$

Note

This $$\log^{1/2}(\dots) = \sqrt(\log(\dots)))$$ term can be undefined, as soon as $$m < N_k(t)$$, so empirically, $$\sqrt(\max(0, \log(\dots))$$ is used instead, or a larger horizon can be used to make $$m$$ artificially larger (e.g., $$T' = 1.1 T$$).

__module__ = 'Policies.ApproximatedFHGittins'
computeAllIndex()[source]

Compute the current indexes for all arms, in a vectorized manner.