Policies.Posterior.DiscountedBeta module¶

Manipulate posteriors of Bernoulli/Beta experiments., for discounted Bayesian policies (Policies.DiscountedBayesianIndexPolicy).

Policies.Posterior.DiscountedBeta.betavariate()¶

beta(a, b, size=None)

Draw samples from a Beta distribution.

The Beta distribution is a special case of the Dirichlet distribution, and is related to the Gamma distribution. It has the probability distribution function

\[f(x; a,b) = \frac{1}{B(\alpha, \beta)} x^{\alpha - 1} (1 - x)^{\beta - 1},\]

where the normalization, B, is the beta function,

\[B(\alpha, \beta) = \int_0^1 t^{\alpha - 1} (1 - t)^{\beta - 1} dt.\]

It is often seen in Bayesian inference and order statistics.

afloat or array_like of floats: Alpha, positive (>0).
bfloat or array_like of floats: Beta, positive (>0).
sizeint or tuple of ints, optional: Output shape. If the given shape is, e.g., (m, n, k), then m * n * k samples are drawn. If size is None (default), a single value is returned if a and b are both scalars. Otherwise, np.broadcast(a, b).size samples are drawn.

outndarray or scalar: Drawn samples from the parameterized beta distribution.

Policies.Posterior.DiscountedBeta.GAMMA = 0.95¶: Default value for the discount factor \(\gamma\in(0,1)\). 0.95 is empirically a reasonable value for short-term non-stationary experiments.

class Policies.Posterior.DiscountedBeta.DiscountedBeta(gamma=0.95, a=1, b=1)[source]¶

Bases: Policies.Posterior.Beta.Beta

Manipulate posteriors of Bernoulli/Beta experiments, for discounted Bayesian policies (Policies.DiscountedBayesianIndexPolicy).

It keeps \(\tilde{S}(t)\) and \(\tilde{F}(t)\) the discounted counts of successes and failures (S and F).

__init__(gamma=0.95, a=1, b=1)[source]¶: Create a Beta posterior \(\mathrm{Beta}(\alpha, \beta)\) with no observation, i.e., \(\alpha = 1\) and \(\beta = 1\) by default.

N = None¶: List of two parameters [a, b]

gamma = None¶: Discount factor \(\gamma\in(0,1)\).

__str__()[source]¶: Return str(self).

reset(a=None, b=None)[source]¶: Reset alpha and beta, both to 0 as when creating a new default DiscountedBeta.

sample()[source]¶

Get a random sample from the DiscountedBeta posterior (using numpy.random.betavariate()).

Used only by Thompson Sampling and AdBandits so far.

quantile(p)[source]¶

Return the p quantile of the DiscountedBeta posterior (using scipy.stats.btdtri()).

Used only by BayesUCB and AdBandits so far.

forget(obs)[source]¶: Forget the last observation, and undiscount the count of observations.

update(obs)[source]¶

Add an observation, and discount the previous observations.

If obs is 1, update \(\alpha\) the count of positive observations,
If it is 0, update \(\beta\) the count of negative observations.
But instead of using \(\tilde{S}(t) = S(t)\) and \(\tilde{N}(t) = N(t)\), they are updated at each time step using the discount factor \(\gamma\):

\[\tilde{S}(t+1) &= \gamma \tilde{S}(t) + r(t), \tilde{F}(t+1) &= \gamma \tilde{F}(t) + (1 - r(t)).\]

Note

Otherwise, a trick with bernoulliBinarization() has to be used.

discount()[source]¶: Simply discount the old observation, when no observation is given at this time.

\[\tilde{S}(t+1) &= \gamma \tilde{S}(t), \tilde{F}(t+1) &= \gamma \tilde{F}(t).\]

undiscount()[source]¶: Simply cancel the discount on the old observation, when no observation is given at this time.

\[\tilde{S}(t+1) &= \frac{1}{\gamma} \tilde{S}(t), \tilde{F}(t+1) &= \frac{1}{\gamma} \tilde{F}(t).\]

__module__ = 'Policies.Posterior.DiscountedBeta'¶