Policies.Posterior.DiscountedBeta module

Manipulate posteriors of Bernoulli/Beta experiments., for discounted Bayesian policies (Policies.DiscountedBayesianIndexPolicy).

Policies.Posterior.DiscountedBeta.betavariate()

beta(a, b, size=None)

Draw samples from a Beta distribution.

The Beta distribution is a special case of the Dirichlet distribution, and is related to the Gamma distribution. It has the probability distribution function

\[f(x; a,b) = \frac{1}{B(\alpha, \beta)} x^{\alpha - 1} (1 - x)^{\beta - 1},\]

where the normalization, B, is the beta function,

\[B(\alpha, \beta) = \int_0^1 t^{\alpha - 1} (1 - t)^{\beta - 1} dt.\]

It is often seen in Bayesian inference and order statistics.

afloat or array_like of floats

Alpha, positive (>0).

bfloat or array_like of floats

Beta, positive (>0).

sizeint or tuple of ints, optional

Output shape. If the given shape is, e.g., (m, n, k), then m * n * k samples are drawn. If size is None (default), a single value is returned if a and b are both scalars. Otherwise, np.broadcast(a, b).size samples are drawn.

outndarray or scalar

Drawn samples from the parameterized beta distribution.

Policies.Posterior.DiscountedBeta.GAMMA = 0.95

Default value for the discount factor \(\gamma\in(0,1)\). 0.95 is empirically a reasonable value for short-term non-stationary experiments.

class Policies.Posterior.DiscountedBeta.DiscountedBeta(gamma=0.95, a=1, b=1)[source]

Bases: Policies.Posterior.Beta.Beta

Manipulate posteriors of Bernoulli/Beta experiments, for discounted Bayesian policies (Policies.DiscountedBayesianIndexPolicy).

  • It keeps \(\tilde{S}(t)\) and \(\tilde{F}(t)\) the discounted counts of successes and failures (S and F).

__init__(gamma=0.95, a=1, b=1)[source]

Create a Beta posterior \(\mathrm{Beta}(\alpha, \beta)\) with no observation, i.e., \(\alpha = 1\) and \(\beta = 1\) by default.

N = None

List of two parameters [a, b]

gamma = None

Discount factor \(\gamma\in(0,1)\).

__str__()[source]

Return str(self).

reset(a=None, b=None)[source]

Reset alpha and beta, both to 0 as when creating a new default DiscountedBeta.

sample()[source]

Get a random sample from the DiscountedBeta posterior (using numpy.random.betavariate()).

  • Used only by Thompson Sampling and AdBandits so far.

quantile(p)[source]

Return the p quantile of the DiscountedBeta posterior (using scipy.stats.btdtri()).

  • Used only by BayesUCB and AdBandits so far.

forget(obs)[source]

Forget the last observation, and undiscount the count of observations.

update(obs)[source]

Add an observation, and discount the previous observations.

  • If obs is 1, update \(\alpha\) the count of positive observations,

  • If it is 0, update \(\beta\) the count of negative observations.

  • But instead of using \(\tilde{S}(t) = S(t)\) and \(\tilde{N}(t) = N(t)\), they are updated at each time step using the discount factor \(\gamma\):

\[\tilde{S}(t+1) &= \gamma \tilde{S}(t) + r(t), \tilde{F}(t+1) &= \gamma \tilde{F}(t) + (1 - r(t)).\]

Note

Otherwise, a trick with bernoulliBinarization() has to be used.

discount()[source]

Simply discount the old observation, when no observation is given at this time.

\[\tilde{S}(t+1) &= \gamma \tilde{S}(t), \tilde{F}(t+1) &= \gamma \tilde{F}(t).\]
undiscount()[source]

Simply cancel the discount on the old observation, when no observation is given at this time.

\[\tilde{S}(t+1) &= \frac{1}{\gamma} \tilde{S}(t), \tilde{F}(t+1) &= \frac{1}{\gamma} \tilde{F}(t).\]
__module__ = 'Policies.Posterior.DiscountedBeta'