# Policies.SparseUCB module¶

The SparseUCB policy, designed to tackle sparse stochastic bandit problems:

• This means that only a small subset of size s of the K arms has non-zero means.

• The SparseUCB algorithm requires to known exactly the value of s.

• Reference: [[“Sparse Stochastic Bandits”, by J. Kwon, V. Perchet & C. Vernade, COLT 2017](https://arxiv.org/abs/1706.01383)].

Warning

This algorithm only works for sparse Gaussian (or sub-Gaussian) stochastic bandits.

class Policies.SparseUCB.Phase

Bases: enum.Enum

Different states during the SparseUCB algorithm.

• RoundRobin means all are sampled once.

• ForceLog uniformly explores arms that are in the set $$\mathcal{J}(t) \setminus \mathcal{K}(t)$$.

• UCB is the phase that the algorithm should converge to, when a normal UCB selection is done only on the “good” arms, i.e., $$\mathcal{K}(t)$$.

ForceLog = 2
RoundRobin = 1
UCB = 3
__module__ = 'Policies.SparseUCB'
Policies.SparseUCB.ALPHA = 4

Default parameter for $$\alpha$$ for the UCB indexes.

class Policies.SparseUCB.SparseUCB(nbArms, sparsity=None, alpha=4, lower=0.0, amplitude=1.0)[source]

The SparseUCB policy, designed to tackle sparse stochastic bandit problems.

• By default, assume sparsity = nbArms.

__init__(nbArms, sparsity=None, alpha=4, lower=0.0, amplitude=1.0)[source]

New generic index policy.

• nbArms: the number of arms,

• lower, amplitude: lower value and known amplitude of the rewards.

sparsity = None

Known value of the sparsity of the current problem.

phase = None

Current phase of the algorithm.

force_to_see = None

Binary array for the set $$\mathcal{J}(t)$$.

goods = None

Binary array for the set $$\mathcal{K}(t)$$.

offset = None

Next arm to sample, for the Round-Robin phase

__str__()[source]

-> str

startGame()[source]

Initialize the policy for a new game.

update_j()[source]

Recompute the set $$\mathcal{J}(t)$$:

$\mathcal{J}(t) = \left\{ k \in [1,...,K]\;, \frac{X_k(t)}{N_k(t)} \geq \sqrt{\frac{\alpha \log(N_k(t))}{N_k(t)}} \right\}.$
update_k()[source]

Recompute the set $$\mathcal{K}(t)$$:

$\mathcal{K}(t) = \left\{ k \in [1,...,K]\;, \frac{X_k(t)}{N_k(t)} \geq \sqrt{\frac{\alpha \log(t)}{N_k(t)}} \right\}.$
choice()[source]

Choose the next arm to play:

• If still in a Round-Robin phase, play the next arm,

• Otherwise, recompute the set $$\mathcal{J}(t)$$,

• If it is too small, if $$\mathcal{J}(t) < s$$:
• Start a new Round-Robin phase from arm 0.

• Otherwise, recompute the second set $$\mathcal{K}(t)$$,

• If it is too small, if $$\mathcal{K}(t) < s$$:
• Play a Force-Log step by choosing an arm uniformly at random from the set $$\mathcal{J}(t) \setminus \mathcal{K}(t)$$.

• Otherwise,
• Play a UCB step by choosing an arm with highest UCB index from the set $$\mathcal{K}(t)$$.

__module__ = 'Policies.SparseUCB'