Policies.SparseklUCB module¶
The SparseklUCB policy, designed to tackle sparse stochastic bandit problems:
This means that only a small subset of size
s
of theK
arms has non-zero means.The SparseklUCB algorithm requires to known exactly the value of
s
.This SparseklUCB is my version. It uses the KL-UCB index for both the decision in the UCB phase and the construction of the sets \(\mathcal{J}(t)\) and \(\mathcal{K}(t)\).
The usual UCB indexes can be used for the sets by setting the flag
use_ucb_for_sets
to true.Reference: [[“Sparse Stochastic Bandits”, by J. Kwon, V. Perchet & C. Vernade, COLT 2017](https://arxiv.org/abs/1706.01383)] who introduced SparseUCB.
Warning
This algorithm only works for sparse Gaussian (or sub-Gaussian) stochastic bandits, of known variance.
-
class
Policies.SparseklUCB.
Phase
¶ Bases:
enum.Enum
Different states during the SparseklUCB algorithm.
RoundRobin
means all are sampled once.ForceLog
uniformly explores arms that are in the set \(\mathcal{J}(t) \setminus \mathcal{K}(t)\).UCB
is the phase that the algorithm should converge to, when a normal UCB selection is done only on the “good” arms, i.e., \(\mathcal{K}(t)\).
-
ForceLog
= 2¶
-
RoundRobin
= 1¶
-
UCB
= 3¶
-
__module__
= 'Policies.SparseklUCB'¶
-
Policies.SparseklUCB.
c
= 1.0¶ default value, as it was in pymaBandits v1.0
-
Policies.SparseklUCB.
USE_UCB_FOR_SETS
= False¶ Default value for the flag controlling whether the usual UCB indexes are used for the sets \(\mathcal{J}(t)\) and \(\mathcal{K}(t)\). Default it to use the KL-UCB indexes, which should be more efficient.
-
class
Policies.SparseklUCB.
SparseklUCB
(nbArms, sparsity=None, tolerance=0.0001, klucb=CPUDispatcher(<function klucbBern>), c=1.0, use_ucb_for_sets=False, lower=0.0, amplitude=1.0)[source]¶ Bases:
Policies.klUCB.klUCB
The SparseklUCB policy, designed to tackle sparse stochastic bandit problems.
By default, assume
sparsity
=nbArms
.
-
__init__
(nbArms, sparsity=None, tolerance=0.0001, klucb=CPUDispatcher(<function klucbBern>), c=1.0, use_ucb_for_sets=False, lower=0.0, amplitude=1.0)[source]¶ New generic index policy.
nbArms: the number of arms,
lower, amplitude: lower value and known amplitude of the rewards.
-
sparsity
= None¶ Known value of the sparsity of the current problem.
-
use_ucb_for_sets
= None¶ Whether the usual UCB indexes are used for the sets \(\mathcal{J}(t)\) and \(\mathcal{K}(t)\).
-
phase
= None¶ Current phase of the algorithm.
-
force_to_see
= None¶ Binary array for the set \(\mathcal{J}(t)\).
-
goods
= None¶ Binary array for the set \(\mathcal{K}(t)\).
-
offset
= None¶ Next arm to sample, for the Round-Robin phase
-
update_j
()[source]¶ Recompute the set \(\mathcal{J}(t)\):
\[\begin{split}\hat{\mu}_k(t) &= \frac{X_k(t)}{N_k(t)}, \\ U^{\mathcal{J}}_k(t) &= \sup\limits_{q \in [a, b]} \left\{ q : \mathrm{kl}(\hat{\mu}_k(t), q) \leq \frac{c \log(N_k(t))}{N_k(t)} \right\},\\ \mathcal{J}(t) &= \left\{ k \in [1,...,K]\;, \hat{\mu}_k(t) \geq U^{\mathcal{J}}_k(t) - \hat{\mu}_k(t) \right\}.\end{split}\]If
use_ucb_for_sets
isTrue
, the same formula fromPolicies.SparseUCB
is used.
-
update_k
()[source]¶ Recompute the set \(\mathcal{K}(t)\):
\[\begin{split}\hat{\mu}_k(t) &= \frac{X_k(t)}{N_k(t)}, \\ U^{\mathcal{K}}_k(t) &= \sup\limits_{q \in [a, b]} \left\{ q : \mathrm{kl}(\hat{\mu}_k(t), q) \leq \frac{c \log(t)}{N_k(t)} \right\},\\ \mathcal{J}(t) &= \left\{ k \in [1,...,K]\;, \hat{\mu}_k(t) \geq U^{\mathcal{K}}_k(t) - \hat{\mu}_k(t) \right\}.\end{split}\]If
use_ucb_for_sets
isTrue
, the same formula fromPolicies.SparseUCB
is used.
-
__module__
= 'Policies.SparseklUCB'¶
-
choice
()[source]¶ Choose the next arm to play:
If still in a Round-Robin phase, play the next arm,
Otherwise, recompute the set \(\mathcal{J}(t)\),
- If it is too small, if \(\mathcal{J}(t) < s\):
Start a new Round-Robin phase from arm 0.
Otherwise, recompute the second set \(\mathcal{K}(t)\),
- If it is too small, if \(\mathcal{K}(t) < s\):
Play a Force-Log step by choosing an arm uniformly at random from the set \(\mathcal{J}(t) \setminus K(t)\).
- Otherwise,
Play a UCB step by choosing an arm with highest KL-UCB index from the set \(\mathcal{K}(t)\).