Policies.SparseUCB module¶

The SparseUCB policy, designed to tackle sparse stochastic bandit problems:

This means that only a small subset of size s of the K arms has non-zero means.
The SparseUCB algorithm requires to known exactly the value of s.
Reference: [[“Sparse Stochastic Bandits”, by J. Kwon, V. Perchet & C. Vernade, COLT 2017](https://arxiv.org/abs/1706.01383)].

Warning

This algorithm only works for sparse Gaussian (or sub-Gaussian) stochastic bandits.

class Policies.SparseUCB.Phase¶

Bases: enum.Enum

Different states during the SparseUCB algorithm.

RoundRobin means all are sampled once.
ForceLog uniformly explores arms that are in the set \(\mathcal{J}(t) \setminus \mathcal{K}(t)\).
UCB is the phase that the algorithm should converge to, when a normal UCB selection is done only on the “good” arms, i.e., \(\mathcal{K}(t)\).

Policies.SparseUCB.ALPHA = 4¶: Default parameter for \(\alpha\) for the UCB indexes.

class Policies.SparseUCB.SparseUCB(nbArms, sparsity=None, alpha=4, lower=0.0, amplitude=1.0)[source]¶

The SparseUCB policy, designed to tackle sparse stochastic bandit problems.

__init__(nbArms, sparsity=None, alpha=4, lower=0.0, amplitude=1.0)[source]¶

New generic index policy.

update_j()[source]¶: Recompute the set \(\mathcal{J}(t)\):

\[\mathcal{J}(t) = \left\{ k \in [1,...,K]\;, \frac{X_k(t)}{N_k(t)} \geq \sqrt{\frac{\alpha \log(N_k(t))}{N_k(t)}} \right\}.\]

update_k()[source]¶: Recompute the set \(\mathcal{K}(t)\):

\[\mathcal{K}(t) = \left\{ k \in [1,...,K]\;, \frac{X_k(t)}{N_k(t)} \geq \sqrt{\frac{\alpha \log(t)}{N_k(t)}} \right\}.\]

Choose the next arm to play:

If still in a Round-Robin phase, play the next arm,
Otherwise, recompute the set \(\mathcal{J}(t)\),
If it is too small, if \(\mathcal{J}(t) < s\):
- Start a new Round-Robin phase from arm 0.
Otherwise, recompute the second set \(\mathcal{K}(t)\),
If it is too small, if \(\mathcal{K}(t) < s\):
- Play a Force-Log step by choosing an arm uniformly at random from the set \(\mathcal{J}(t) \setminus \mathcal{K}(t)\).
Otherwise,
- Play a UCB step by choosing an arm with highest UCB index from the set \(\mathcal{K}(t)\).