List of research publications using Lilian Besson’s SMPyBandits project¶
1st article, about policy aggregation algorithm (aka model selection)¶
I designed and added the
Aggregator policy, in order to test its validity and performance.
It is a “simple” voting algorithm to combine multiple bandit algorithms into one.
Basically, it behaves like a simple MAB bandit just based on empirical means (even simpler than UCB), where arms are the child algorithms
A_1 .. A_N, each running in “parallel”.
2nd article, about Multi-players Multi-Armed Bandits¶
There is another point of view: instead of comparing different single-player policies on the same problem, we can make them play against each other, in a multi-player setting.
The basic difference is about collisions : at each time
t, if two or more user chose to sense the same channel, there is a collision. Collisions can be handled in different way from the base station point of view, and from each player point of view.
3rd article, using Doubling Trick for Multi-Armed Bandits¶
I studied what Doubling Trick can and can’t do to obtain efficient anytime version of non-anytime optimal Multi-Armed Bandits algorithms.
4th article, about Piece-Wise Stationary Multi-Armed Bandits¶
With Emilie Kaufmann, we studied the Generalized Likelihood Ratio Test (GLRT) for sub-Bernoulli distributions, and proposed the B-GLRT algorithm for change-point detection for piece-wise stationary one-armed bandit problems. We combined the B-GLRT with the kl-UCB multi-armed bandit algorithm and proposed the GLR-klUCB algorithm for piece-wise stationary multi-armed bandit problems. We prove finite-time guarantees for the B-GLRT and the GLR-klUCB algorithm, and we illustrate its performance with extensive numerical experiments.
Other interesting things¶
More than 65 algorithms, including all known variants of the
MOSSand Thompson Sampling algorithms, as well as other less known algorithms (https://smpybandits.github.io/docs/
Arms and problems¶
The default configuration is to use a fixed problem for N repetitions (e.g. 1000 repetitions, use
MAB.MAB), but there is also a perfect support for “Bayesian” problems where the mean vector µ1,…,µK change at every repetition (see
There is also a good support for Markovian problems, see
MAB.MarkovianMAB, even though I didn’t implement any policies tailored for Markovian problems.
I’m actively working on adding a very clean support for non-stationary MAB problems, and
MAB.PieceWiseStationaryMABis already working well. Use it with policies designed for piece-wise stationary problems, like Discounted-Thompson, CD-UCB, M-UCB, SW-UCB#.