Policies.OSSB module¶

Optimal Sampling for Structured Bandits (OSSB) algorithm.

Reference: [[Minimal Exploration in Structured Stochastic Bandits, Combes et al, arXiv:1711.00400 [stat.ML]]](https://arxiv.org/abs/1711.00400)
See also: https://github.com/SMPyBandits/SMPyBandits/issues/101

Warning

This is the simplified OSSB algorithm for classical bandits. It can be applied to more general bandit problems, see the original paper.

The OSSB is for Bernoulli stochastic bandits, and GaussianOSSB is for Gaussian stochastic bandits, with a direct application of the result from their paper.
The SparseOSSB is for sparse Gaussian (or sub-Gaussian) stochastic bandits, of known variance.
I also added support for non-constant :math:`

arepsilon` and \(\gamma\) rates, as suggested in a talk given by Combes, 24th of May 2018, Rotterdam (Workshop, “Learning while Earning”). See OSSB_DecreasingRate and OSSB_AutoDecreasingRate.

Policies.OSSB.klGauss_vect(xs, y, sig2x=0.25)[source]¶

class Policies.OSSB.Phase¶

Bases: enum.Enum

Different phases during the OSSB algorithm

__module__ = 'Policies.OSSB'¶

estimation = 3¶

exploitation = 2¶

exploration = 4¶

initialisation = 1¶

Policies.OSSB.EPSILON = 0.0¶: Default value for the \(\varepsilon\) parameter, 0.0 is a safe default.

Policies.OSSB.GAMMA = 0.0¶: Default value for the \(\gamma\) parameter, 0.0 is a safe default.

Policies.OSSB.solve_optimization_problem__classic(thetas)[source]¶

Solve the optimization problem (2)-(3) as defined in the paper, for classical stochastic bandits.

No need to solve anything, as they give the solution for classical bandits.

Policies.OSSB.solve_optimization_problem__gaussian(thetas, sig2x=0.25)[source]¶

Solve the optimization problem (2)-(3) as defined in the paper, for Gaussian classical stochastic bandits.

No need to solve anything, as they give the solution for Gaussian classical bandits.

Policies.OSSB.solve_optimization_problem__sparse_bandits(thetas, sparsity=None, only_strong_or_weak=False)[source]¶

Solve the optimization problem (2)-(3) as defined in the paper, for sparse stochastic bandits.

I recomputed suboptimal solution to the optimization problem, and found the same as in [[“Sparse Stochastic Bandits”, by J. Kwon, V. Perchet & C. Vernade, COLT 2017](https://arxiv.org/abs/1706.01383)].
If only_strong_or_weak is True, the solution \(c_i\) are not returned, but instead strong_or_weak, k is returned (to know if the problem is strongly sparse or not, and if not, the k that satisfy the required constraint).

class Policies.OSSB.OSSB(nbArms, epsilon=0.0, gamma=0.0, solve_optimization_problem='classic', lower=0.0, amplitude=1.0, **kwargs)[source]¶

Bases: Policies.BasePolicy.BasePolicy

Optimal Sampling for Structured Bandits (OSSB) algorithm.

solve_optimization_problem can be "classic" or "bernoulli" for classic stochastic bandit with no structure, "gaussian" for classic bandit for Gaussian arms, or "sparse" for sparse stochastic bandit (give the sparsity s in a kwargs).
Reference: [[Minimal Exploration in Structured Stochastic Bandits, Combes et al, arXiv:1711.00400 [stat.ML]]](https://arxiv.org/abs/1711.00400)

__init__(nbArms, epsilon=0.0, gamma=0.0, solve_optimization_problem='classic', lower=0.0, amplitude=1.0, **kwargs)[source]¶: New policy.

epsilon = None¶: Parameter \(\varepsilon\) for the OSSB algorithm. Can be = 0.

gamma = None¶: Parameter \(\gamma\) for the OSSB algorithm. Can be = 0.

counter_s_no_exploitation_phase = None¶: counter of number of exploitation phase

phase = None¶: categorical variable for the phase

__str__()[source]¶: -> str

startGame()[source]¶: Start the game (fill pulls and rewards with 0).

getReward(arm, reward)[source]¶: Give a reward: increase t, pulls, and update cumulated sum of rewards for that arm (normalized in [0, 1]).

choice()[source]¶: Applies the OSSB procedure, it’s quite complicated so see the original paper.

handleCollision(arm, reward=None)[source]¶: Nothing special to do.

__module__ = 'Policies.OSSB'¶

class Policies.OSSB.GaussianOSSB(nbArms, epsilon=0.0, gamma=0.0, variance=0.25, lower=0.0, amplitude=1.0, **kwargs)[source]¶

Bases: Policies.OSSB.OSSB

Optimal Sampling for Structured Bandits (OSSB) algorithm, for Gaussian Stochastic Bandits.

__init__(nbArms, epsilon=0.0, gamma=0.0, variance=0.25, lower=0.0, amplitude=1.0, **kwargs)[source]¶: New policy.

__module__ = 'Policies.OSSB'¶

class Policies.OSSB.SparseOSSB(nbArms, epsilon=0.0, gamma=0.0, sparsity=None, lower=0.0, amplitude=1.0, **kwargs)[source]¶

Bases: Policies.OSSB.OSSB

Optimal Sampling for Structured Bandits (OSSB) algorithm, for Sparse Stochastic Bandits.

__init__(nbArms, epsilon=0.0, gamma=0.0, sparsity=None, lower=0.0, amplitude=1.0, **kwargs)[source]¶: New policy.

__module__ = 'Policies.OSSB'¶

Policies.OSSB.DECREASINGRATE = 1e-06¶: Default value for the constant for the decreasing rate

class Policies.OSSB.OSSB_DecreasingRate(nbArms, epsilon=0.0, gamma=0.0, decreasingRate=1e-06, lower=0.0, amplitude=1.0, **kwargs)[source]¶

Bases: Policies.OSSB.OSSB

Optimal Sampling for Structured Bandits (OSSB) algorithm, with decreasing rates for both \(\varepsilon\) and \(\gamma\).

Warning

This is purely experimental, the paper does not talk about how to chose decreasing rates. It is inspired by the rates for Exp3 algorithm, cf [Bubeck & Cesa-Bianchi, 2012](http://sbubeck.com/SurveyBCB12.pdf).

__init__(nbArms, epsilon=0.0, gamma=0.0, decreasingRate=1e-06, lower=0.0, amplitude=1.0, **kwargs)[source]¶: New policy.

__str__()[source]¶: -> str

property epsilon¶: Decreasing \(\varepsilon(t) = \min(1, \varepsilon_0 \exp(- t \tau))\).

__module__ = 'Policies.OSSB'¶

property gamma¶: Decreasing \(\gamma(t) = \min(1, \gamma_0 \exp(- t \tau))\).

class Policies.OSSB.OSSB_AutoDecreasingRate(nbArms, lower=0.0, amplitude=1.0, **kwargs)[source]¶

Bases: Policies.OSSB.OSSB

Optimal Sampling for Structured Bandits (OSSB) algorithm, with automatically-tuned decreasing rates for both \(\varepsilon\) and \(\gamma\).

Warning

This is purely experimental, the paper does not talk about how to chose decreasing rates. It is inspired by the rates for Exp3++ algorithm, [[One practical algorithm for both stochastic and adversarial bandits, S.Seldin & A.Slivkins, ICML, 2014](http://www.jmlr.org/proceedings/papers/v32/seldinb14-supp.pdf)].

__module__ = 'Policies.OSSB'¶

__init__(nbArms, lower=0.0, amplitude=1.0, **kwargs)[source]¶: New policy.

__str__()[source]¶: -> str

property epsilon¶: Decreasing \(\varepsilon(t) = \frac{1}{2} \sqrt{\frac{\log(K)}{t K}}\).

property gamma¶: Decreasing \(\gamma(t) = \frac{1}{2} \sqrt{\frac{\log(K)}{t K}}\).