Policies.Posterior.Beta module

Manipulate posteriors of Bernoulli/Beta experiments.

Rewards not in \({0, 1}\) are handled with a trick, see bernoulliBinarization(), with a “random binarization”, cf., [Agrawal12] (algorithm 2). When reward \(r_t \in [0, 1]\) is observed, the player receives the result of a Bernoulli sample of average \(r_t\): \(r_t \sim \mathrm{Bernoulli}(r_t)\) so it is well in \({0, 1}\).



Policies.Posterior.Beta.random() → x in the interval [0, 1).

beta(a, b, size=None)

Draw samples from a Beta distribution.

The Beta distribution is a special case of the Dirichlet distribution, and is related to the Gamma distribution. It has the probability distribution function

\[f(x; a,b) = \frac{1}{B(\alpha, \beta)} x^{\alpha - 1} (1 - x)^{\beta - 1},\]

where the normalization, B, is the beta function,

\[B(\alpha, \beta) = \int_0^1 t^{\alpha - 1} (1 - t)^{\beta - 1} dt.\]

It is often seen in Bayesian inference and order statistics.

afloat or array_like of floats

Alpha, positive (>0).

bfloat or array_like of floats

Beta, positive (>0).

sizeint or tuple of ints, optional

Output shape. If the given shape is, e.g., (m, n, k), then m * n * k samples are drawn. If size is None (default), a single value is returned if a and b are both scalars. Otherwise, np.broadcast(a, b).size samples are drawn.

outndarray or scalar

Drawn samples from the parameterized beta distribution.


Return a (random) binarization of a reward \(r_t\), in the continuous interval \([0, 1]\) as an observation in discrete \({0, 1}\).

  • Useful to allow to use a Beta posterior for non-Bernoulli experiments,

  • That way, Thompson sampling can be used for any continuous-valued bounded rewards.


>>> import random
>>> random.seed(0)
>>> bernoulliBinarization(0.3)
>>> bernoulliBinarization(0.3)
>>> bernoulliBinarization(0.3)
>>> bernoulliBinarization(0.3)
>>> bernoulliBinarization(0.9)
>>> bernoulliBinarization(0.9)
>>> bernoulliBinarization(0.9)
>>> bernoulliBinarization(0.9)
class Policies.Posterior.Beta.Beta(a=1, b=1)[source]

Bases: Policies.Posterior.Posterior.Posterior

Manipulate posteriors of Bernoulli/Beta experiments.

__init__(a=1, b=1)[source]

Create a Beta posterior \(\mathrm{Beta}(\alpha, \beta)\) with no observation, i.e., \(\alpha = 1\) and \(\beta = 1\) by default.

N = None

List of two parameters [a, b]


Return str(self).

reset(a=None, b=None)[source]

Reset alpha and beta, both to 1 as when creating a new default Beta.


Get a random sample from the Beta posterior (using numpy.random.betavariate()).

  • Used only by Thompson Sampling and AdBandits so far.


Return the p quantile of the Beta posterior (using scipy.stats.btdtri()).

  • Used only by BayesUCB and AdBandits so far.


Compute the mean of the Beta posterior (should be useless).


Forget the last observation.


Add an observation.

  • If obs is 1, update \(\alpha\) the count of positive observations,

  • If it is 0, update \(\beta\) the count of negative observations.


Otherwise, a trick with bernoulliBinarization() has to be used.

__module__ = 'Policies.Posterior.Beta'