Policies.SparseUCB module¶
The SparseUCB policy, designed to tackle sparse stochastic bandit problems:
This means that only a small subset of size
sof theKarms has non-zero means.The SparseUCB algorithm requires to known exactly the value of
s.Reference: [[“Sparse Stochastic Bandits”, by J. Kwon, V. Perchet & C. Vernade, COLT 2017](https://arxiv.org/abs/1706.01383)].
Warning
This algorithm only works for sparse Gaussian (or sub-Gaussian) stochastic bandits.
-
class
Policies.SparseUCB.Phase¶ Bases:
enum.EnumDifferent states during the SparseUCB algorithm.
RoundRobinmeans all are sampled once.ForceLoguniformly explores arms that are in the set \(\mathcal{J}(t) \setminus \mathcal{K}(t)\).UCBis the phase that the algorithm should converge to, when a normal UCB selection is done only on the “good” arms, i.e., \(\mathcal{K}(t)\).
-
ForceLog= 2¶
-
RoundRobin= 1¶
-
UCB= 3¶
-
__module__= 'Policies.SparseUCB'¶
-
Policies.SparseUCB.ALPHA= 4¶ Default parameter for \(\alpha\) for the UCB indexes.
-
class
Policies.SparseUCB.SparseUCB(nbArms, sparsity=None, alpha=4, lower=0.0, amplitude=1.0)[source]¶ Bases:
Policies.UCBalpha.UCBalphaThe SparseUCB policy, designed to tackle sparse stochastic bandit problems.
By default, assume
sparsity=nbArms.
-
__init__(nbArms, sparsity=None, alpha=4, lower=0.0, amplitude=1.0)[source]¶ New generic index policy.
nbArms: the number of arms,
lower, amplitude: lower value and known amplitude of the rewards.
-
sparsity= None¶ Known value of the sparsity of the current problem.
-
phase= None¶ Current phase of the algorithm.
-
force_to_see= None¶ Binary array for the set \(\mathcal{J}(t)\).
-
goods= None¶ Binary array for the set \(\mathcal{K}(t)\).
-
offset= None¶ Next arm to sample, for the Round-Robin phase
-
update_j()[source]¶ Recompute the set \(\mathcal{J}(t)\):
\[\mathcal{J}(t) = \left\{ k \in [1,...,K]\;, \frac{X_k(t)}{N_k(t)} \geq \sqrt{\frac{\alpha \log(N_k(t))}{N_k(t)}} \right\}.\]
-
update_k()[source]¶ Recompute the set \(\mathcal{K}(t)\):
\[\mathcal{K}(t) = \left\{ k \in [1,...,K]\;, \frac{X_k(t)}{N_k(t)} \geq \sqrt{\frac{\alpha \log(t)}{N_k(t)}} \right\}.\]
-
choice()[source]¶ Choose the next arm to play:
If still in a Round-Robin phase, play the next arm,
Otherwise, recompute the set \(\mathcal{J}(t)\),
- If it is too small, if \(\mathcal{J}(t) < s\):
Start a new Round-Robin phase from arm 0.
Otherwise, recompute the second set \(\mathcal{K}(t)\),
- If it is too small, if \(\mathcal{K}(t) < s\):
Play a Force-Log step by choosing an arm uniformly at random from the set \(\mathcal{J}(t) \setminus \mathcal{K}(t)\).
- Otherwise,
Play a UCB step by choosing an arm with highest UCB index from the set \(\mathcal{K}(t)\).
-
__module__= 'Policies.SparseUCB'¶