List of research publications using Lilian Besson’s SMPyBandits project¶

I (Lilian Besson) have started my PhD in October 2016, and this project is a part of my on going research since December 2016.

1st article, about policy aggregation algorithm (aka model selection)¶

I designed and added the Aggregator policy, in order to test its validity and performance.

It is a “simple” voting algorithm to combine multiple bandit algorithms into one. Basically, it behaves like a simple MAB bandit just based on empirical means (even simpler than UCB), where arms are the child algorithms A_1 .. A_N, each running in “parallel”.

For more details, refer to this file: Aggregation.md and this research article.

PDF : BKM_IEEEWCNC_2018.pdf | HAL notice : BKM_IEEEWCNC_2018 | BibTeX : BKM_IEEEWCNC_2018.bib | Source code and documentation

2nd article, about Multi-players Multi-Armed Bandits ¶

There is another point of view: instead of comparing different single-player policies on the same problem, we can make them play against each other, in a multi-player setting. The basic difference is about collisions : at each time t, if two or more user chose to sense the same channel, there is a collision. Collisions can be handled in different way from the base station point of view, and from each player point of view.

For more details, refer to this file: MultiPlayers.md and this research article.

PDF : BK__ALT_2018.pdf | HAL notice : BK__ALT_2018 | BibTeX : BK__ALT_2018.bib | Source code and documentation

3rd article, using Doubling Trick for Multi-Armed Bandits ¶

I studied what Doubling Trick can and can’t do to obtain efficient anytime version of non-anytime optimal Multi-Armed Bandits algorithms.

For more details, refer to this file: DoublingTrick.md and this research article.

PDF : BK__DoublingTricks_2018.pdf | HAL notice : BK__DoublingTricks_2018 | BibTeX : BK__DoublingTricks_2018.bib | Source code and documentation

4th article, about Piece-Wise Stationary Multi-Armed Bandits ¶

With Emilie Kaufmann, we studied the Generalized Likelihood Ratio Test (GLRT) for sub-Bernoulli distributions, and proposed the B-GLRT algorithm for change-point detection for piece-wise stationary one-armed bandit problems. We combined the B-GLRT with the kl-UCB multi-armed bandit algorithm and proposed the GLR-klUCB algorithm for piece-wise stationary multi-armed bandit problems. We prove finite-time guarantees for the B-GLRT and the GLR-klUCB algorithm, and we illustrate its performance with extensive numerical experiments.

For more details, refer to this file: NonStationaryBandits.md and this research article.

PDF : BK__COLT_2019.pdf | HAL notice : BK__COLT_2019 | BibTeX : BK__COLT_2019.bib | Source code and documentation

Other interesting things¶

Single-player Policies ¶

More than 65 algorithms, including all known variants of the UCB, kl-UCB, MOSS and Thompson Sampling algorithms, as well as other less known algorithms (https://smpybandits.github.io/docs/OCUCB, BESA, OSSB etc).
SparseWrapper is a generalization of the SparseUCB from this article.
Implementation of very recent Multi-Armed Bandits algorithms, e.g., kl-UCB++ (from this article), UCB-dagger (from this article), or MOSS-anytime (from this article).
Experimental policies: BlackBoxOpt or UnsupervisedLearning (using Gaussian processes to learn the arms distributions).

Arms and problems¶

My framework mainly targets stochastic bandits, with arms following Bernoulli, bounded (truncated) or unbounded Gaussian, Exponential, Gamma or Poisson distributions.
The default configuration is to use a fixed problem for N repetitions (e.g. 1000 repetitions, use MAB.MAB), but there is also a perfect support for “Bayesian” problems where the mean vector µ1,…,µK change at every repetition (see MAB.DynamicMAB).
There is also a good support for Markovian problems, see MAB.MarkovianMAB, even though I didn’t implement any policies tailored for Markovian problems.
I’m actively working on adding a very clean support for non-stationary MAB problems, and MAB.PieceWiseStationaryMAB is already working well. Use it with policies designed for piece-wise stationary problems, like Discounted-Thompson, CD-UCB, M-UCB, SW-UCB#.

📜 License ? ¶

MIT Licensed (file LICENSE).

Note: I have worked on other topics during my PhD, you can find my research articles on my website, or have a look to my Google Scholar profile or résumé on HAL.

PyPI version PyPI implementation