Policies.CPUCB module¶

The Clopper-Pearson UCB policy for bounded bandits. Reference: [Garivier & Cappé, COLT 2011](https://arxiv.org/pdf/1102.2490.pdf).

Policies.CPUCB.binofit_scalar(x, n, alpha=0.05)[source]¶

Parameter estimates and confidence intervals for binomial data.

For example:

>>> np.random.seed(1234)  # reproducible results
>>> true_p = 0.6
>>> N = 100
>>> x = np.random.binomial(N, true_p)
>>> (phat, pci) = binofit_scalar(x, N)
>>> phat
0.61
>>> pci  # 0.6 of course lies in the 95% confidence interval  
(0.507..., 0.705...)
>>> (phat, pci) = binofit_scalar(x, N, 0.01)
>>> pci  # 0.6 is also in the 99% confidence interval, but it is larger  
(0.476..., 0.732...)

Like binofit_scalar in MATLAB, see https://fr.mathworks.com/help/stats/binofit_scalar.html.

(phat, pci) = binofit_scalar(x, n) returns a maximum likelihood estimate of the probability of success in a given binomial trial based on the number of successes, x, observed in n independent trials.
(phat, pci) = binofit_scalar(x, n) returns the probability estimate, phat, and the 95% confidence intervals, pci, by using the Clopper-Pearson method to calculate confidence intervals.
(phat, pci) = binofit_scalar(x, n, alpha) returns the 100(1 - alpha)% confidence intervals. For example, alpha = 0.01 yields 99% confidence intervals.

For the Clopper-Pearson UCB algorithms:

x is the cum rewards of some arm k, \(x = X_k(t)\),
n is the number of samples of that arm k, \(n = N_k(t)\),
and alpha is a small positive number, \(\alpha = \frac{1}{t^c}\) in this algorithm (for \(c > 1, \simeq 1\), for instance c = 1.01).

Returns: (phat, pci)

phat: is the estimate of p
pci: is the confidence interval

Note

My reference implementation was https://github.com/sjara/extracellpy/blob/master/extrastats.py#L35, but http://statsmodels.sourceforge.net/devel/generated/statsmodels.stats.proportion.proportion_confint.html can also be used (it implies an extra requirement for the project).

Policies.CPUCB.binofit(xArray, nArray, alpha=0.05)[source]¶

Parameter estimates and confidence intervals for binomial data, for vectorial inputs.

For example:

>>> np.random.seed(1234)  # reproducible results
>>> true_p = 0.6
>>> N = 100
>>> xArray = np.random.binomial(N, true_p, 4)
>>> xArray
array([61, 54, 61, 52])

>>> (phat, pci) = binofit(xArray, N)
>>> phat
array([0.61, 0.54, 0.61, 0.52])
>>> pci  # 0.6 of course lies in the 95% confidence intervals  
array([[0.507..., 0.705...],
       [0.437..., 0.640...],
       [0.507..., 0.705...],
       [0.417..., 0.620...]])

>>> (phat, pci) = binofit(xArray, N, 0.01)
>>> pci  # 0.6 is also in the 99% confidence intervals, but it is larger  
array([[0.476..., 0.732...],
       [0.407..., 0.668...],
       [0.476..., 0.732...],
       [0.387..., 0.650...]])

Policies.CPUCB.ClopperPearsonUCB(x, N, alpha=0.05)[source]¶: Returns just the upper-confidence bound of the confidence interval.

Policies.CPUCB.C = 1.01¶: Default value for the parameter c for CP-UCB

class Policies.CPUCB.CPUCB(nbArms, c=1.01, lower=0.0, amplitude=1.0)[source]¶

Bases: Policies.UCB.UCB

The Clopper-Pearson UCB policy for bounded bandits. Reference: [Garivier & Cappé, COLT 2011].

__init__(nbArms, c=1.01, lower=0.0, amplitude=1.0)[source]¶

New generic index policy.

nbArms: the number of arms,
lower, amplitude: lower value and known amplitude of the rewards.

c = None¶: Parameter c for the CP-UCB formula (see below)

computeIndex(arm)[source]¶

Compute the current index, at time t and after \(N_k(t)\) pulls of arm k:

\[I_k(t) = \mathrm{ClopperPearsonUCB}\left( X_k(t), N_k(t), \frac{1}{t^c} \right).\]

Where \(\mathrm{ClopperPearsonUCB}\) is defined above. The index is the upper-confidence bound of the binomial trial of \(N_k(t)\) samples from arm k, having mean \(\mu_k\), and empirical outcome \(X_k(t)\). The confidence interval is with \(\alpha = 1 / t^c\), for a \(100(1 - \alpha)\%\) confidence bound.

__module__ = 'Policies.CPUCB'¶