Structure and Sparsity of Stochastic Multi-Armed Bandits¶

This page explains shortly what I studied on sparse stochastic multi-armed bandits. Assume a MAB problem with $K$ arms, each parametrized by its mean $\mu_k\in\mathbb{R}$ . If you know in advance that only a small subset (of size $s$ ) of the arms have a positive arm, it sounds reasonable to hope to be more efficient in playing the bandit game compared to an approach which is non aware of the sparsity.

The SparseUCB is an extension of the well-known UCB, and requires to known exactly the value of $s$ . It works by identifying as fast as possible (actually, in a sub-logarithmic number of samples) the arms with non-positive means. Then it only plays in the “good” arms with positive means, with a regular UCB policy.

I studied extensions of this idea, first of all the SparseklUCB policy as it was suggested in the original research paper, but mainly a generic “wrapper” black-box approach. For more details, see SparseWrapper.

Reference: [“Sparse Stochastic Bandits”, by J. Kwon, V. Perchet & C. Vernade, COLT 2017]. Note that this algorithm only works for sparse Gaussian (or sub-Gaussian) stochastic bandits, and it includes Bernoulli arms.

Article¶

TODO finish! I am writing a small research article on that topic, it is a better introduction as a self-contained document to explain this idea and the algorithms. Reference: [Structure and Sparsity of Stochastic Multi-Arm Bandits, Lilian Besson and Emilie Kaufmann, 2018].

Example of simulation configuration¶

A simple python file, configuration_sparse.py, is used to import the arm classes, the policy classes and define the problems and the experiments.

For example, we can compare the standard UCB and BayesUCB algorithms, non aware of the sparsity, against the sparsity-aware SparseUCB algorithm, as well as 4 versions of SparseWrapper applied to BayesUCB.

configuration = {
    "horizon": 10000,    # Finite horizon of the simulation
    "repetitions": 100,  # number of repetitions
    "n_jobs": -1,        # Maximum number of cores for parallelization: use ALL your CPU
    "verbosity": 5,      # Verbosity for the joblib calls
    # Environment configuration, you can set up more than one.
    "environment": [
        {   # sparsity = nb of >= 0 mean, = 3 here
            "arm_type": Bernoulli,
            "params": 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.1, 0.2, 0.3
        }
    ],
    # Policies that should be simulated, and their parameters.
    "policies": [
        {"archtype": UCB, "params": {} },
        {"archtype": SparseUCB, "params": { "sparsity": 3 } },
        {"archtype": BayesUCB, "params": { } },
    ]
}

Then add a Sparse-Wrapper bandit algorithm (SparseWrapper class), you can use this piece of code:

configuration["policies"] += [
    {
        "archtype": SparseWrapper,
        "params": {
            "policy": BayesUCB,
            "use_ucb_for_set_J": use_ucb_for_set_J,
            "use_ucb_for_set_K": use_ucb_for_set_K,
        }
    }
    for use_ucb_for_set_J in [ True, False ]
    for use_ucb_for_set_K in [ True, False ]
]

How to run the experiments ?¶

You should use the provided Makefile file to do this simply:

make install  # install the requirements ONLY ONCE
make sparse   # run and log the main.py script

Some illustrations¶

Here are some plots illustrating the performances of the different policies implemented in this project, against various sparse problems (with Bernoulli or UnboundedGaussian arms only):

3 variants of Sparse-Wrapper for UCB, on a “simple” sparse Bernoulli problem¶

3 variants of Sparse-Wrapper for UCB, on a "simple" sparse Bernoulli problem

FIXME run some simulations and explain them!

These illustrations come from my (work in progress) article, [Structure and Sparsity of Stochastic Multi-Arm Bandits, Lilian Besson and Emilie Kaufmann, 2018].

📜 License ? ¶

MIT Licensed (file LICENSE).

PyPI version PyPI implementation