Policies.SparseWrapper module

The SparseWrapper policy, designed to tackle sparse stochastic bandit problems:

  • This means that only a small subset of size s of the K arms has non-zero means.

  • The SparseWrapper algorithm requires to known exactly the value of s.

  • This SparseWrapper is a very generic version, and can use any index policy for both the decision in the UCB phase and the construction of the sets \(\mathcal{J}(t)\) and \(\mathcal{K}(t)\).

  • The usual UCB indexes can be used for the set \(\mathcal{K}(t)\) by setting the flag use_ucb_for_set_K to true (but usually the indexes from the underlying policy can be used efficiently for set \(\mathcal{K}(t)\)), and for the set \(\mathcal{J}(t)\) by setting the flag use_ucb_for_set_J to true (as its formula is less easily generalized).

  • If used with Policy.UCBalpha or Policy.klUCB, it should be better to use directly Policy.SparseUCB or Policy.SparseklUCB.

  • Reference: [[“Sparse Stochastic Bandits”, by J. Kwon, V. Perchet & C. Vernade, COLT 2017](https://arxiv.org/abs/1706.01383)] who introduced SparseUCB.

Warning

This is very EXPERIMENTAL! I have no proof yet! But it works fine!!

Policies.SparseWrapper.default_index_policy

alias of Policies.UCBalpha.UCBalpha

class Policies.SparseWrapper.Phase

Bases: enum.Enum

Different states during the SparseWrapper algorithm.

  • RoundRobin means all are sampled once.

  • ForceLog uniformly explores arms that are in the set \(\mathcal{J}(t) \setminus \mathcal{K}(t)\).

  • UCB is the phase that the algorithm should converge to, when a normal UCB selection is done only on the “good” arms, i.e., \(\mathcal{K}(t)\).

ForceLog = 2
RoundRobin = 1
UCB = 3
__module__ = 'Policies.SparseWrapper'
Policies.SparseWrapper.USE_UCB_FOR_SET_K = False

Default value for the flag controlling whether the usual UCB indexes are used for the set \(\mathcal{K}(t)\). Default it to use the indexes of the underlying policy, which could be more efficient.

Policies.SparseWrapper.USE_UCB_FOR_SET_J = False

Default value for the flag controlling whether the usual UCB indexes are used for the set \(\mathcal{J}(t)\). Default it to use the UCB indexes as there is no clean and generic formula to obtain the indexes for \(\mathcal{J}(t)\) from the indexes of the underlying policy. Note that I found a formula, it’s just durty. See below.

Policies.SparseWrapper.ALPHA = 1

Default parameter for \(\alpha\) for the UCB indexes.

class Policies.SparseWrapper.SparseWrapper(nbArms, sparsity=None, use_ucb_for_set_K=False, use_ucb_for_set_J=False, alpha=1, policy=<class 'Policies.UCBalpha.UCBalpha'>, lower=0.0, amplitude=1.0, *args, **kwargs)[source]

Bases: Policies.BaseWrapperPolicy.BaseWrapperPolicy

The SparseWrapper policy, designed to tackle sparse stochastic bandit problems.

  • By default, assume sparsity = nbArms.

__init__(nbArms, sparsity=None, use_ucb_for_set_K=False, use_ucb_for_set_J=False, alpha=1, policy=<class 'Policies.UCBalpha.UCBalpha'>, lower=0.0, amplitude=1.0, *args, **kwargs)[source]

New policy.

sparsity = None

Known value of the sparsity of the current problem.

use_ucb_for_set_K = None

Whether the usual UCB indexes are used for the set \(\mathcal{K}(t)\).

use_ucb_for_set_J = None

Whether the usual UCB indexes are used for the set \(\mathcal{J}(t)\).

alpha = None

Parameter \(\alpha\) for the UCB indexes for the two sets, if not using the indexes of the underlying policy.

phase = None

Current phase of the algorithm.

force_to_see = None

Binary array for the set \(\mathcal{J}(t)\).

goods = None

Binary array for the set \(\mathcal{K}(t)\).

offset = None

Next arm to sample, for the Round-Robin phase

__str__()[source]

-> str

startGame()[source]

Initialize the policy for a new game.

update_j()[source]

Recompute the set \(\mathcal{J}(t)\):

\[\begin{split}\hat{\mu}_k(t) &= \frac{X_k(t)}{N_k(t)}, \\ U^{\mathcal{K}}_k(t) &= I_k^{P}(t) - \hat{\mu}_k(t),\\ U^{\mathcal{J}}_k(t) &= U^{\mathcal{K}}_k(t) \times \sqrt{\frac{\log(N_k(t))}{\log(t)}},\\ \mathcal{J}(t) &= \left\{ k \in [1,...,K]\;, \hat{\mu}_k(t) \geq U^{\mathcal{J}}_k(t) - \hat{\mu}_k(t) \right\}.\end{split}\]
  • Yes, this is a nothing but a hack, as there is no generic formula to retrieve the indexes used in the set \(\mathcal{J}(t)\) from the indexes \(I_k^{P}(t)\) of the underlying index policy \(P\).

  • If use_ucb_for_set_J is True, the same formula from Policies.SparseUCB is used.

Warning

FIXME rewrite the above with LCB and UCB instead of this weird U - mean.

__module__ = 'Policies.SparseWrapper'
update_k()[source]

Recompute the set \(\mathcal{K}(t)\):

\[\begin{split}\hat{\mu}_k(t) &= \frac{X_k(t)}{N_k(t)}, \\ U^{\mathcal{K}}_k(t) &= I_k^{P}(t) - \hat{\mu}_k(t),\\ \mathcal{K}(t) &= \left\{ k \in [1,...,K]\;, \hat{\mu}_k(t) \geq U^{\mathcal{K}}_k(t) - \hat{\mu}_k(t) \right\}.\end{split}\]
choice()[source]

Choose the next arm to play:

  • If still in a Round-Robin phase, play the next arm,

  • Otherwise, recompute the set \(\mathcal{J}(t)\),

  • If it is too small, if \(\mathcal{J}(t) < s\):
    • Start a new Round-Robin phase from arm 0.

  • Otherwise, recompute the second set \(\mathcal{K}(t)\),

  • If it is too small, if \(\mathcal{K}(t) < s\):
    • Play a Force-Log step by choosing an arm uniformly at random from the set \(\mathcal{J}(t) \setminus K(t)\).

  • Otherwise,
    • Play a UCB step by choosing an arm with highest index (from the underlying policy) from the set \(\mathcal{K}(t)\).