Policies.AdBandits module

The AdBandits bandit algorithm, mixing Thompson Sampling and BayesUCB.

Warning

This policy is very not famous, but for stochastic bandits it works usually VERY WELL! It is not anytime thought.

Policies.AdBandits.random() → x in the interval [0, 1).
Policies.AdBandits.ALPHA = 1

Default value for the parameter \(\alpha\) for the AdBandits class.

class Policies.AdBandits.AdBandits(nbArms, horizon=1000, alpha=1, posterior=<class 'Policies.Posterior.Beta.Beta'>, lower=0.0, amplitude=1.0)[source]

Bases: Policies.BasePolicy.BasePolicy

The AdBandits bandit algorithm, mixing Thompson Sampling and BayesUCB.

Warning

This policy is very not famous, but for stochastic bandits it works usually VERY WELL! It is not anytime thought.

__init__(nbArms, horizon=1000, alpha=1, posterior=<class 'Policies.Posterior.Beta.Beta'>, lower=0.0, amplitude=1.0)[source]

New policy.

alpha = None

Parameter alpha

horizon = None

Parameter \(T\) = known horizon of the experiment. Default value is 1000.

posterior = None

Posterior for each arm. List instead of dict, quicker access

__str__()[source]

-> str

startGame()[source]

Reset each posterior.

getReward(arm, reward)[source]

Store the reward, and update the posterior for that arm.

property epsilon

Time variating parameter \(\varepsilon(t)\).

choice()[source]

With probability \(1 - \varepsilon(t)\), use a Thompson Sampling step, otherwise use a UCB-Bayes step, to choose one arm.

choiceWithRank(rank=1)[source]

With probability \(1 - \varepsilon(t)\), use a Thompson Sampling step, otherwise use a UCB-Bayes step, to choose one arm of a certain rank.

__module__ = 'Policies.AdBandits'