Policies.Posterior.Beta module¶
Manipulate posteriors of Bernoulli/Beta experiments.
Rewards not in \({0, 1}\) are handled with a trick, see bernoulliBinarization(), with a “random binarization”, cf., [Agrawal12] (algorithm 2).
When reward \(r_t \in [0, 1]\) is observed, the player receives the result of a Bernoulli sample of average \(r_t\): \(r_t \sim \mathrm{Bernoulli}(r_t)\) so it is well in \({0, 1}\).
See https://en.wikipedia.org/wiki/Bernoulli_distribution#Related_distributions
And https://en.wikipedia.org/wiki/Conjugate_prior#Discrete_distributions
-
Policies.Posterior.Beta.random() → x in the interval [0, 1).¶
-
Policies.Posterior.Beta.betavariate()¶ beta(a, b, size=None)
Draw samples from a Beta distribution.
The Beta distribution is a special case of the Dirichlet distribution, and is related to the Gamma distribution. It has the probability distribution function
\[f(x; a,b) = \frac{1}{B(\alpha, \beta)} x^{\alpha - 1} (1 - x)^{\beta - 1},\]where the normalization, B, is the beta function,
\[B(\alpha, \beta) = \int_0^1 t^{\alpha - 1} (1 - t)^{\beta - 1} dt.\]It is often seen in Bayesian inference and order statistics.
- afloat or array_like of floats
Alpha, positive (>0).
- bfloat or array_like of floats
Beta, positive (>0).
- sizeint or tuple of ints, optional
Output shape. If the given shape is, e.g.,
(m, n, k), thenm * n * ksamples are drawn. If size isNone(default), a single value is returned ifaandbare both scalars. Otherwise,np.broadcast(a, b).sizesamples are drawn.
- outndarray or scalar
Drawn samples from the parameterized beta distribution.
-
Policies.Posterior.Beta.bernoulliBinarization(r_t)[source]¶ Return a (random) binarization of a reward \(r_t\), in the continuous interval \([0, 1]\) as an observation in discrete \({0, 1}\).
Useful to allow to use a Beta posterior for non-Bernoulli experiments,
That way,
Thompsonsampling can be used for any continuous-valued bounded rewards.
Examples:
>>> import random >>> random.seed(0)
>>> bernoulliBinarization(0.3) 1 >>> bernoulliBinarization(0.3) 0 >>> bernoulliBinarization(0.3) 0 >>> bernoulliBinarization(0.3) 0
>>> bernoulliBinarization(0.9) 1 >>> bernoulliBinarization(0.9) 1 >>> bernoulliBinarization(0.9) 1 >>> bernoulliBinarization(0.9) 0
-
class
Policies.Posterior.Beta.Beta(a=1, b=1)[source]¶ Bases:
Policies.Posterior.Posterior.PosteriorManipulate posteriors of Bernoulli/Beta experiments.
-
__init__(a=1, b=1)[source]¶ Create a Beta posterior \(\mathrm{Beta}(\alpha, \beta)\) with no observation, i.e., \(\alpha = 1\) and \(\beta = 1\) by default.
-
N= None¶ List of two parameters [a, b]
-
sample()[source]¶ Get a random sample from the Beta posterior (using
numpy.random.betavariate()).Used only by
ThompsonSampling andAdBanditsso far.
-
quantile(p)[source]¶ Return the p quantile of the Beta posterior (using
scipy.stats.btdtri()).Used only by
BayesUCBandAdBanditsso far.
-
update(obs)[source]¶ Add an observation.
If obs is 1, update \(\alpha\) the count of positive observations,
If it is 0, update \(\beta\) the count of negative observations.
Note
Otherwise, a trick with
bernoulliBinarization()has to be used.
-
__module__= 'Policies.Posterior.Beta'¶
-