List of research publications using Lilian Besson’s SMPyBandits project¶
1st article, about policy aggregation algorithm (aka model selection)¶
I designed and added the
Aggregator policy, in order to test its validity and performance.
It is a “simple” voting algorithm to combine multiple bandit algorithms into one.
Basically, it behaves like a simple MAB bandit just based on empirical means (even simpler than UCB), where arms are the child algorithms
A_1 .. A_N, each running in “parallel”.
2nd article, about Multi-players Multi-Armed Bandits¶
There is another point of view: instead of comparing different single-player policies on the same problem, we can make them play against each other, in a multi-player setting.
The basic difference is about collisions : at each time
t, if two or more user chose to sense the same channel, there is a collision. Collisions can be handled in different way from the base station point of view, and from each player point of view.
3rd article, using Doubling Trick for Multi-Armed Bandits¶
I studied what Doubling Trick can and can’t do to obtain efficient anytime version of non-anytime optimal Multi-Armed Bandits algorithms.
4th article, about Piece-Wise Stationary Multi-Armed Bandits¶
With Emilie Kaufmann, we studied the Generalized Likelihood Ratio Test (GLRT) for sub-Bernoulli distributions, and proposed the B-GLRT algorithm for change-point detection for piece-wise stationary one-armed bandit problems. We combined the B-GLRT with the kl-UCB multi-armed bandit algorithm and proposed the GLR-klUCB algorithm for piece-wise stationary multi-armed bandit problems. We prove finite-time guarantees for the B-GLRT and the GLR-klUCB algorithm, and we illustrate its performance with extensive numerical experiments.
Other interesting things¶
- More than 65 algorithms, including all known variants of the
MOSSand Thompson Sampling algorithms, as well as other less known algorithms (https://smpybandits.github.io/docs/
SparseWrapperis a generalization of the SparseUCB from this article.
- Implementation of very recent Multi-Armed Bandits algorithms, e.g.,
kl-UCB++(from this article),
UCB-dagger(from this article), or
MOSS-anytime(from this article).
- Experimental policies:
UnsupervisedLearning(using Gaussian processes to learn the arms distributions).
Arms and problems¶
- My framework mainly targets stochastic bandits, with arms following
Bernoulli, bounded (truncated) or unbounded
- The default configuration is to use a fixed problem for N repetitions (e.g. 1000 repetitions, use
MAB.MAB), but there is also a perfect support for “Bayesian” problems where the mean vector µ1,…,µK change at every repetition (see
- There is also a good support for Markovian problems, see
MAB.MarkovianMAB, even though I didn’t implement any policies tailored for Markovian problems.
- I’m actively working on adding a very clean support for non-stationary MAB problems, and
MAB.PieceWiseStationaryMABis already working well. Use it with policies designed for piece-wise stationary problems, like Discounted-Thompson, CD-UCB, M-UCB, SW-UCB#.
📜 License ? GitHub license¶
© 2016-2018 Lilian Besson.
Open Source? Yes! Maintenance Ask Me Anything ! Analytics PyPI version PyPI implementation PyPI pyversions PyPI download PyPI status Documentation Status Build Status Stars of https://github.com/SMPyBandits/SMPyBandits/ Releases of https://github.com/SMPyBandits/SMPyBandits/