**Structure and Sparsity of Stochastic Multi-Armed Bandits**¶

This page explains shortly what I studied on sparse stochastic multi-armed bandits.
Assume a MAB problem with `$K$`

arms, each parametrized by its *mean* `$\mu_k\in\mathbb{R}$`

.
If you know in advance that only a small subset (of size `$s$`

) of the arms have a positive arm, it sounds reasonable to hope to be more efficient in playing the bandit game compared to an approach which is non aware of the sparsity.

The `SparseUCB`

is an extension of the well-known `UCB`

, and requires to known **exactly** the value of `$s$`

.
It works by identifying as fast as possible (actually, in a sub-logarithmic number of samples) the arms with non-positive means.
Then it only plays in the “good” arms with positive means, with a regular UCB policy.

I studied extensions of this idea, first of all the `SparseklUCB`

policy as it was suggested in the original research paper, but mainly a generic “wrapper” black-box approach.
For more details, see `SparseWrapper`

.

Reference: [“Sparse Stochastic Bandits”, by J. Kwon, V. Perchet & C. Vernade, COLT 2017]. Note that this algorithm only works for sparse Gaussian (or sub-Gaussian) stochastic bandits, and it includes Bernoulli arms.

## Article¶

TODO finish! I am writing a small research article on that topic, it is a better introduction as a self-contained document to explain this idea and the algorithms. Reference: [Structure and Sparsity of Stochastic Multi-Arm Bandits, Lilian Besson and Emilie Kaufmann, 2018].

## Example of simulation configuration¶

A simple python file, `configuration_sparse.py`

, is used to import the arm classes, the policy classes and define the problems and the experiments.

For example, we can compare the standard `UCB`

and `BayesUCB`

algorithms, non aware of the sparsity, against the sparsity-aware `SparseUCB`

algorithm, as well as 4 versions of `SparseWrapper`

applied to `BayesUCB`

.

```
configuration = {
"horizon": 10000, # Finite horizon of the simulation
"repetitions": 100, # number of repetitions
"n_jobs": -1, # Maximum number of cores for parallelization: use ALL your CPU
"verbosity": 5, # Verbosity for the joblib calls
# Environment configuration, you can set up more than one.
"environment": [
{ # sparsity = nb of >= 0 mean, = 3 here
"arm_type": Bernoulli,
"params": 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.1, 0.2, 0.3
}
],
# Policies that should be simulated, and their parameters.
"policies": [
{"archtype": UCB, "params": {} },
{"archtype": SparseUCB, "params": { "sparsity": 3 } },
{"archtype": BayesUCB, "params": { } },
]
}
```

Then add a Sparse-Wrapper bandit algorithm (`SparseWrapper`

class), you can use this piece of code:

```
configuration["policies"] += [
{
"archtype": SparseWrapper,
"params": {
"policy": BayesUCB,
"use_ucb_for_set_J": use_ucb_for_set_J,
"use_ucb_for_set_K": use_ucb_for_set_K,
}
}
for use_ucb_for_set_J in [ True, False ]
for use_ucb_for_set_K in [ True, False ]
]
```

## How to run the experiments ?¶

You should use the provided `Makefile`

file to do this simply:

```
make install # install the requirements ONLY ONCE
make sparse # run and log the main.py script
```

## Some illustrations¶

Here are some plots illustrating the performances of the different policies implemented in this project, against various sparse problems (with `Bernoulli`

or `UnboundedGaussian`

arms only):

### 3 variants of Sparse-Wrapper for UCB, on a “simple” sparse Bernoulli problem¶

FIXME run some simulations and explain them!

These illustrations come from my (work in progress) article, [Structure and Sparsity of Stochastic Multi-Arm Bandits, Lilian Besson and Emilie Kaufmann, 2018].