# **Doubling Trick for Multi-Armed Bandits**
I studied what [Doubling Trick](https://smpybandits.github.io/docs/Policies.DoublingTrickWrapper.html) can and can't do for multi-armed bandits, to obtain efficient anytime version of non-anytime optimal Multi-Armed Bandits algorithms.

The [Doubling Trick](https://smpybandits.github.io/docs/Policies.DoublingTrickWrapper.html) algorithm, denoted `$DT(A, (T_i))$` for a diverging increasing sequence `$T_i$`, is the following algorithm:

![Policies/DoublingTrick.py](plots/DoublingTrick_algo1.png)

Long story short, we proved the two following theorems.

### For geometric sequences
> It works for minimax regret bounds (in `$R_T = \mathcal{O}(\sqrt{T}))$`, with a constant multiplicative loss `$\leq 4$`, but not for logarithmic regret bounds (in `$R_T = \mathcal{O}(\log T))$`.

![https://hal.inria.fr/hal-01736357](plots/DoublingTrick_theorem1.png)

### For exponential sequences
> It works for logarithmic regret bounds (in `$R_T = \mathcal{O}(\log T))$`, but not for minimax regret bounds (in `$R_T = \mathcal{O}(\sqrt{T}))$`.

![https://hal.inria.fr/hal-01736357](plots/DoublingTrick_theorem2.png)

----

## Article
I wrote a research article on that topic, it is a better introduction as a self-contained document to explain this idea and the algorithms. Reference: [[What the Doubling Trick Can or Can't Do for Multi-Armed Bandits, Lilian Besson and Emilie Kaufmann, 2018]](https://hal.inria.fr/hal-01736357).

> PDF : [BK__ALT_2018.pdf](https://hal.inria.fr/hal-01736357/document) | HAL notice : [BK__ALT_2018](https://hal.inria.fr/hal-01736357/) | BibTeX : [BK__ALT_2018.bib](https://hal.inria.fr/hal-01736357/bibtex) | [Source code and documentation](MultiPlayers.html)
> [![Published](https://img.shields.io/badge/Published%3F-waiting-orange.svg)](https://hal.inria.fr/hal-01736357) [![Maintenance](https://img.shields.io/badge/Maintained%3F-almost%20finished-orange.svg)](https://bitbucket.org/lbesson/what-doubling-tricks-can-and-cant-do-for-multi-armed-bandits/commits/) [![Ask Me Anything !](https://img.shields.io/badge/Ask%20me-anything-1abc9c.svg)](https://bitbucket.org/lbesson/ama)

----

## Configuration
A simple python file, [`configuration_comparing_doubling_algorithms.py`](https://smpybandits.github.io/docs/configuration_comparing_doubling_algorithms.html), is used to import the [arm classes](Arms/), the [policy classes](Policies/) and define the problems and the experiments.

For example, we can compare the standard anytime [`klUCB`](https://smpybandits.github.io/docs/Policies.klUCB.html) algorithm against the non-anytime [`klUCBPlusPlus`](https://smpybandits.github.io/docs/Policies.klUCBPlusPlus.html) algorithm, as well as 3 versions of [`DoublingTrickWrapper`](https://smpybandits.github.io/docs/Policies.DoublingTrickWrapper.html) applied to [`klUCBPlusPlus`](https://smpybandits.github.io/docs/Policies.klUCBPlusPlus.html).

```python
configuration = {
    "horizon": 10000,    # Finite horizon of the simulation
    "repetitions": 100,  # number of repetitions
    "n_jobs": -1,        # Maximum number of cores for parallelization: use ALL your CPU
    "verbosity": 5,      # Verbosity for the joblib calls
    # Environment configuration, you can set up more than one.
    "environment": [
        {
            "arm_type": Bernoulli,
            "params": 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9
        }
    ],
    # Policies that should be simulated, and their parameters.
    "policies": [
        {"archtype": UCB, "params": {} },
        {"archtype": klUCB, "params": {} },
        {"archtype": klUCBPlusPlus, "params": { "horizon": 10000 } },
    ]
}
```

Then add a [Doubling-Trick](https://smpybandits.github.io/docs/Policies.DoublingTrickWrapper.html) bandit algorithm ([`DoublingTrickWrapper` class](https://smpybandits.github.io/docs/Policies.DoublingTrickWrapper.html)), you can use this piece of code:

```python
configuration["policies"] += [
    {
        "archtype": DoublingTrickWrapper,
        "params": {
            "next_horizon": next_horizon,
            "full_restart": full_restart,
            "policy": BayesUCB,
        }
    }
    for full_restart in [ True, False ]
    for next_horizon in [
        next_horizon__arithmetic,
        next_horizon__geometric,
        next_horizon__exponential_fast,
        next_horizon__exponential_slow,
        next_horizon__exponential_generic
    ]
]
```

----

## [How to run the experiments ?](How_to_run_the_code.md)

You should use the provided [`Makefile`](Makefile) file to do this simply:
```bash
# if not already installed, otherwise update with 'git pull'
git clone https://github.com/SMPyBandits/SMPyBandits/
cd SMPyBandits
make install  # install the requirements ONLY ONCE
make comparing_doubling_algorithms   # run and log the main.py script
```

----

## Some illustrations
Here are some plots illustrating the performances of the different [policies](https://smpybandits.github.io/docs/Policies/) implemented in this project, against various problems (with [`Bernoulli`](https://smpybandits.github.io/docs/Arms.Bernoulli.html) and [`UnboundedGaussian`](https://smpybandits.github.io/docs/Arms.Gaussian.html) arms only):

### [Doubling-Trick](https://smpybandits.github.io/docs/Policies.DoublingTrickWrapper.html) with restart, on a "simple" Bernoulli problem
![Doubling-Trick with restart, on a "simple" Bernoulli problem](plots/main____env1-1_1217677871459230631.png)

Regret for [Doubling-Trick](https://smpybandits.github.io/docs/Policies.DoublingTrickWrapper.html), for `$K=9$` [Bernoulli arms](https://smpybandits.github.io/docs/Arms.Bernoulli.html), horizon `$T=45678$`, `$n=1000$` repetitions and `$\mu_1,\ldots,\mu_K$` taken uniformly in `$[0,1]^K$`.
Geometric doubling (`$b=2$`) and slow exponential doubling (`$b=1.1$`) are too slow, and short first sequences make the regret blow up in the beginning of the experiment.
At `$t=40000$` we see clearly the effect of a new sequence for the best [doubling trick](https://smpybandits.github.io/docs/Policies.DoublingTrickWrapper.html) (`$T_i = 200 \times 2^i$`).
As expected, [kl-UCB++](https://smpybandits.github.io/docs/Policies.klUCBPlusPlus.html) outperforms [kl-UCB](https://smpybandits.github.io/docs/Policies.klUCB.html), and if the doubling sequence is growing fast enough then [Doubling-Trick](https://smpybandits.github.io/docs/Policies.DoublingTrickWrapper.html)([kl-UCB++](https://smpybandits.github.io/docs/Policies.klUCBPlusPlus.html)) can perform as well as [kl-UCB++](https://smpybandits.github.io/docs/Policies.klUCBPlusPlus.html) (see for `$t < 40000$`).

### [Doubling-Trick](https://smpybandits.github.io/docs/Policies.DoublingTrickWrapper.html) with restart, on randomly taken Bernoulli problems
![Doubling-Trick with restart, on randomly taken Bernoulli problems](plots/main____env1-1_3633169128724378553.png)

Similarly but for `$\mu_1,\ldots,\mu_K$` evenly spaced in `$[0,1]^K$` (`${0.1,\ldots,0.9}$`).
Both [kl-UCB](https://smpybandits.github.io/docs/Policies.klUCB.html) and [kl-UCB++](https://smpybandits.github.io/docs/Policies.klUCBPlusPlus.html) are very efficient on "easy" problems like this one, and we can check visually that they match the lower bound from Lai & Robbins (1985).
As before we check that slow doubling are too slow to give reasonable performance.


### [Doubling-Trick](https://smpybandits.github.io/docs/Policies.DoublingTrickWrapper.html) with restart, on randomly taken Gaussian problems with variance `$V=1$`
![Doubling-Trick with restart, on randomly taken Gaussian problems with variance V=1](plots/main____env1-1_2223860464453456415.png)

Regret for `$K=9$` [Gaussian arms](https://smpybandits.github.io/docs/Arms.Gaussian.html) `$\mathcal{N}(\mu, 1)$`, horizon `$T=45678$`, `$n=1000$` repetitions and `$\mu_1,\ldots,\mu_K$` taken uniformly in `$[-5,5]^K$` and variance `$V=1$`.
On "hard" problems like this one, both [UCB](https://smpybandits.github.io/docs/Policies.UCB.html) and [AFHG](https://smpybandits.github.io/docs/Policies.ApproximatedFHGittins.html) perform similarly and poorly w.r.t. to the lower bound from Lai & Robbins (1985).
As before we check that geometric doubling (`$b=2$`) and slow exponential doubling (`$b=1.1$`) are too slow, but a fast enough doubling sequence does give reasonable performance for the anytime [AFHG](https://smpybandits.github.io/docs/Policies.ApproximatedFHGittins.html) obtained by [Doubling-Trick](https://smpybandits.github.io/docs/Policies.DoublingTrickWrapper.html).

### [Doubling-Trick](https://smpybandits.github.io/docs/Policies.DoublingTrickWrapper.html) with restart, on an easy Gaussian problems with variance `$V=1$`
![Doubling-Trick with restart, on an easy Gaussian problems with variance V=1](plots/main____env1-1_6979515539977716717.png)

Regret for [Doubling-Trick](https://smpybandits.github.io/docs/Policies.DoublingTrickWrapper.html), for `$K=9$` [Gaussian arms](https://smpybandits.github.io/docs/Arms.Gaussian.html) `$\mathcal{N}(\mu, 1)$`, horizon `$T=45678$`, `$n=1000$` repetitions and `$\mu_1,\ldots,\mu_K$` uniformly spaced in `$[-5,5]^K$`.
On "easy" problems like this one, both [UCB](https://smpybandits.github.io/docs/Policies.UCB.html) and [AFHG](https://smpybandits.github.io/docs/Policies.ApproximatedFHGittins.html) perform similarly and attain near constant regret (identifying the best [Gaussian arm](https://smpybandits.github.io/docs/Arms.Gaussian.html) is very easy here as they are sufficiently distinct).
Each [doubling trick](https://smpybandits.github.io/docs/Policies.DoublingTrickWrapper.html) also appear to attain near constant regret, but geometric doubling (`$b=2$`) and slow exponential doubling (`$b=1.1$`) are slower to converge and thus less efficient.


### [Doubling-Trick](https://smpybandits.github.io/docs/Policies.DoublingTrickWrapper.html) with no restart, on randomly taken Bernoulli problems
![Doubling-Trick with no restart, on randomly taken Bernoulli problems](plots/main____env1-1_5964629015089571121.png)

Regret for `$K=9$` [Bernoulli arms](https://smpybandits.github.io/docs/Arms.Bernoulli.html), horizon `$T=45678$`, `$n=1000$` repetitions and `$\mu_1,\ldots,\mu_K$` taken uniformly in `$[0,1]^K$`, for [Doubling-Trick](https://smpybandits.github.io/docs/Policies.DoublingTrickWrapper.html) no-restart.
Geometric doubling (\eg, `$b=2$`) and slow exponential doubling (\eg, `$b=1.1$`) are too slow, and short first sequences make the regret blow up in the beginning of the experiment.
At `$t=40000$` we see clearly the effect of a new sequence for the best [doubling trick](https://smpybandits.github.io/docs/Policies.DoublingTrickWrapper.html) (`$T_i = 200 \times 2^i$`).
As expected, [kl-UCB++](https://smpybandits.github.io/docs/Policies.klUCBPlusPlus.html) outperforms [kl-UCB](https://smpybandits.github.io/docs/Policies.klUCB.html), and if the doubling sequence is growing fast enough then [Doubling-Trick](https://smpybandits.github.io/docs/Policies.DoublingTrickWrapper.html) no-restart for [kl-UCB++](https://smpybandits.github.io/docs/Policies.klUCBPlusPlus.html) can perform as well as [kl-UCB++](https://smpybandits.github.io/docs/Policies.klUCBPlusPlus.html).

### [Doubling-Trick](https://smpybandits.github.io/docs/Policies.DoublingTrickWrapper.html) with no restart, on an "simple" Bernoulli problems
![Doubling-Trick with no restart, on an "simple" Bernoulli problems](plots/main____env1-1_5972568793654673752.png)

`$K=9$` [Bernoulli arms](https://smpybandits.github.io/docs/Arms.Bernoulli.html) with `$\mu_1,\ldots,\mu_K$` evenly spaced in `$[0,1]^K$`.
On easy problems like this one, both [kl-UCB](https://smpybandits.github.io/docs/Policies.klUCB.html) and [kl-UCB++](https://smpybandits.github.io/docs/Policies.klUCBPlusPlus.html) are very efficient, and here the geometric allows the [Doubling-Trick](https://smpybandits.github.io/docs/Policies.DoublingTrickWrapper.html) no-restart anytime version of [kl-UCB++](https://smpybandits.github.io/docs/Policies.klUCBPlusPlus.html) to outperform both [kl-UCB](https://smpybandits.github.io/docs/Policies.klUCB.html) and [kl-UCB++](https://smpybandits.github.io/docs/Policies.klUCBPlusPlus.html).


> These illustrations come from my article, [[What the Doubling Trick Can or Can't Do for Multi-Armed Bandits, Lilian Besson and Emilie Kaufmann, 2018]](https://hal.inria.fr/hal-01736357).


----

### :scroll: License ? [![GitHub license](https://img.shields.io/github/license/SMPyBandits/SMPyBandits.svg)](https://github.com/SMPyBandits/SMPyBandits/blob/master/LICENSE)
[MIT Licensed](https://lbesson.mit-license.org/) (file [LICENSE](LICENSE)).

© 2016-2018 [Lilian Besson](https://GitHub.com/Naereen).

[![Open Source? Yes!](https://badgen.net/badge/Open%20Source%20%3F/Yes%21/blue?icon=github)](https://github.com/SMPyBandits/SMPyBandits/)
[![Maintenance](https://img.shields.io/badge/Maintained%3F-yes-green.svg)](https://GitHub.com/SMPyBandits/SMPyBandits/graphs/commit-activity)
[![Ask Me Anything !](https://img.shields.io/badge/Ask%20me-anything-1abc9c.svg)](https://GitHub.com/Naereen/ama)
[![Analytics](https://ga-beacon.appspot.com/UA-38514290-17/github.com/SMPyBandits/SMPyBandits/README.md?pixel)](https://GitHub.com/SMPyBandits/SMPyBandits/)
![![PyPI version](https://img.shields.io/pypi/v/smpybandits.svg)](https://pypi.org/project/SMPyBandits)
![![PyPI implementation](https://img.shields.io/pypi/implementation/smpybandits.svg)](https://pypi.org/project/SMPyBandits)
[![![PyPI pyversions](https://img.shields.io/pypi/pyversions/smpybandits.svg?logo=python)](https://pypi.org/project/SMPyBandits)](https://pypi.org/project/SMPyBandits)
[![![PyPI download](https://img.shields.io/pypi/dm/smpybandits.svg)](https://pypi.org/project/SMPyBandits)](https://pypi.org/project/SMPyBandits)
[![![PyPI status](https://img.shields.io/pypi/status/smpybandits.svg)](https://pypi.org/project/SMPyBandits)](https://pypi.org/project/SMPyBandits)
[![Documentation Status](https://readthedocs.org/projects/smpybandits/badge/?version=latest)](https://SMPyBandits.ReadTheDocs.io/en/latest/?badge=latest)
[![Build Status](https://travis-ci.org/SMPyBandits/SMPyBandits.svg?branch=master)](https://travis-ci.org/SMPyBandits/SMPyBandits)
[![Stars of https://github.com/SMPyBandits/SMPyBandits/](https://badgen.net/github/stars/SMPyBandits/SMPyBandits)](https://GitHub.com/SMPyBandits/SMPyBandits/stargazers)
[![Releases of https://github.com/SMPyBandits/SMPyBandits/](https://badgen.net/github/release/SMPyBandits/SMPyBandits)](https://github.com/SMPyBandits/SMPyBandits/releases)