Policies.Posterior.Beta module¶

Manipulate posteriors of Bernoulli/Beta experiments.

Rewards not in $${0, 1}$$ are handled with a trick, see bernoulliBinarization(), with a “random binarization”, cf., [Agrawal12] (algorithm 2). When reward $$r_t \in [0, 1]$$ is observed, the player receives the result of a Bernoulli sample of average $$r_t$$: $$r_t \sim \mathrm{Bernoulli}(r_t)$$ so it is well in $${0, 1}$$.

Agrawal12

http://jmlr.org/proceedings/papers/v23/agrawal12/agrawal12.pdf

Policies.Posterior.Beta.random() → x in the interval [0, 1).
Policies.Posterior.Beta.betavariate()

beta(a, b, size=None)

Draw samples from a Beta distribution.

The Beta distribution is a special case of the Dirichlet distribution, and is related to the Gamma distribution. It has the probability distribution function

$f(x; a,b) = \frac{1}{B(\alpha, \beta)} x^{\alpha - 1} (1 - x)^{\beta - 1},$

where the normalization, B, is the beta function,

$B(\alpha, \beta) = \int_0^1 t^{\alpha - 1} (1 - t)^{\beta - 1} dt.$

It is often seen in Bayesian inference and order statistics.

afloat or array_like of floats

Alpha, positive (>0).

bfloat or array_like of floats

Beta, positive (>0).

sizeint or tuple of ints, optional

Output shape. If the given shape is, e.g., (m, n, k), then m * n * k samples are drawn. If size is None (default), a single value is returned if a and b are both scalars. Otherwise, np.broadcast(a, b).size samples are drawn.

outndarray or scalar

Drawn samples from the parameterized beta distribution.

Policies.Posterior.Beta.bernoulliBinarization(r_t)[source]

Return a (random) binarization of a reward $$r_t$$, in the continuous interval $$[0, 1]$$ as an observation in discrete $${0, 1}$$.

• Useful to allow to use a Beta posterior for non-Bernoulli experiments,

• That way, Thompson sampling can be used for any continuous-valued bounded rewards.

Examples:

>>> import random
>>> random.seed(0)

>>> bernoulliBinarization(0.3)
1
>>> bernoulliBinarization(0.3)
0
>>> bernoulliBinarization(0.3)
0
>>> bernoulliBinarization(0.3)
0

>>> bernoulliBinarization(0.9)
1
>>> bernoulliBinarization(0.9)
1
>>> bernoulliBinarization(0.9)
1
>>> bernoulliBinarization(0.9)
0

class Policies.Posterior.Beta.Beta(a=1, b=1)[source]

Manipulate posteriors of Bernoulli/Beta experiments.

__init__(a=1, b=1)[source]

Create a Beta posterior $$\mathrm{Beta}(\alpha, \beta)$$ with no observation, i.e., $$\alpha = 1$$ and $$\beta = 1$$ by default.

N = None

List of two parameters [a, b]

__str__()[source]

Return str(self).

reset(a=None, b=None)[source]

Reset alpha and beta, both to 1 as when creating a new default Beta.

sample()[source]

Get a random sample from the Beta posterior (using numpy.random.betavariate()).

• Used only by Thompson Sampling and AdBandits so far.

quantile(p)[source]

Return the p quantile of the Beta posterior (using scipy.stats.btdtri()).

• Used only by BayesUCB and AdBandits so far.

mean()[source]

Compute the mean of the Beta posterior (should be useless).

forget(obs)[source]

Forget the last observation.

update(obs)[source]

• If obs is 1, update $$\alpha$$ the count of positive observations,
• If it is 0, update $$\beta$$ the count of negative observations.
Otherwise, a trick with bernoulliBinarization() has to be used.
__module__ = 'Policies.Posterior.Beta'