# Policies.CPUCB module¶

The Clopper-Pearson UCB policy for bounded bandits. Reference: [Garivier & Cappé, COLT 2011](https://arxiv.org/pdf/1102.2490.pdf).

Policies.CPUCB.binofit_scalar(x, n, alpha=0.05)[source]

Parameter estimates and confidence intervals for binomial data.

For example:

>>> np.random.seed(1234)  # reproducible results
>>> true_p = 0.6
>>> N = 100
>>> x = np.random.binomial(N, true_p)
>>> (phat, pci) = binofit_scalar(x, N)
>>> phat
0.61
>>> pci  # 0.6 of course lies in the 95% confidence interval
(0.507..., 0.705...)
>>> (phat, pci) = binofit_scalar(x, N, 0.01)
>>> pci  # 0.6 is also in the 99% confidence interval, but it is larger
(0.476..., 0.732...)


Like binofit_scalar in MATLAB, see https://fr.mathworks.com/help/stats/binofit_scalar.html.

• (phat, pci) = binofit_scalar(x, n) returns a maximum likelihood estimate of the probability of success in a given binomial trial based on the number of successes, x, observed in n independent trials.

• (phat, pci) = binofit_scalar(x, n) returns the probability estimate, phat, and the 95% confidence intervals, pci, by using the Clopper-Pearson method to calculate confidence intervals.

• (phat, pci) = binofit_scalar(x, n, alpha) returns the 100(1 - alpha)% confidence intervals. For example, alpha = 0.01 yields 99% confidence intervals.

For the Clopper-Pearson UCB algorithms:

• x is the cum rewards of some arm k, $$x = X_k(t)$$,

• n is the number of samples of that arm k, $$n = N_k(t)$$,

• and alpha is a small positive number, $$\alpha = \frac{1}{t^c}$$ in this algorithm (for $$c > 1, \simeq 1$$, for instance c = 1.01).

Returns: (phat, pci)

• phat: is the estimate of p

• pci: is the confidence interval

Note

My reference implementation was https://github.com/sjara/extracellpy/blob/master/extrastats.py#L35, but http://statsmodels.sourceforge.net/devel/generated/statsmodels.stats.proportion.proportion_confint.html can also be used (it implies an extra requirement for the project).

Policies.CPUCB.binofit(xArray, nArray, alpha=0.05)[source]

Parameter estimates and confidence intervals for binomial data, for vectorial inputs.

For example:

>>> np.random.seed(1234)  # reproducible results
>>> true_p = 0.6
>>> N = 100
>>> xArray = np.random.binomial(N, true_p, 4)
>>> xArray
array([61, 54, 61, 52])

>>> (phat, pci) = binofit(xArray, N)
>>> phat
array([0.61, 0.54, 0.61, 0.52])
>>> pci  # 0.6 of course lies in the 95% confidence intervals
array([[0.507..., 0.705...],
[0.437..., 0.640...],
[0.507..., 0.705...],
[0.417..., 0.620...]])

>>> (phat, pci) = binofit(xArray, N, 0.01)
>>> pci  # 0.6 is also in the 99% confidence intervals, but it is larger
array([[0.476..., 0.732...],
[0.407..., 0.668...],
[0.476..., 0.732...],
[0.387..., 0.650...]])

Policies.CPUCB.ClopperPearsonUCB(x, N, alpha=0.05)[source]

Returns just the upper-confidence bound of the confidence interval.

Policies.CPUCB.C = 1.01

Default value for the parameter c for CP-UCB

class Policies.CPUCB.CPUCB(nbArms, c=1.01, lower=0.0, amplitude=1.0)[source]

The Clopper-Pearson UCB policy for bounded bandits. Reference: [Garivier & Cappé, COLT 2011].

__init__(nbArms, c=1.01, lower=0.0, amplitude=1.0)[source]

New generic index policy.

• nbArms: the number of arms,

• lower, amplitude: lower value and known amplitude of the rewards.

c = None

Parameter c for the CP-UCB formula (see below)

computeIndex(arm)[source]

Compute the current index, at time t and after $$N_k(t)$$ pulls of arm k:

$I_k(t) = \mathrm{ClopperPearsonUCB}\left( X_k(t), N_k(t), \frac{1}{t^c} \right).$

Where $$\mathrm{ClopperPearsonUCB}$$ is defined above. The index is the upper-confidence bound of the binomial trial of $$N_k(t)$$ samples from arm k, having mean $$\mu_k$$, and empirical outcome $$X_k(t)$$. The confidence interval is with $$\alpha = 1 / t^c$$, for a $$100(1 - \alpha)\%$$ confidence bound.

__module__ = 'Policies.CPUCB'