Policies.CPUCB module

The Clopper-Pearson UCB policy for bounded bandits. Reference: [Garivier & Cappé, COLT 2011](https://arxiv.org/pdf/1102.2490.pdf).

Policies.CPUCB.binofit_scalar(x, n, alpha=0.05)[source]

Parameter estimates and confidence intervals for binomial data.

For example:

>>> np.random.seed(1234)  # reproducible results
>>> true_p = 0.6
>>> N = 100
>>> x = np.random.binomial(N, true_p)
>>> (phat, pci) = binofit_scalar(x, N)
>>> phat
0.61
>>> pci  # 0.6 of course lies in the 95% confidence interval  
(0.507..., 0.705...)
>>> (phat, pci) = binofit_scalar(x, N, 0.01)
>>> pci  # 0.6 is also in the 99% confidence interval, but it is larger  
(0.476..., 0.732...)

Like binofit_scalar in MATLAB, see https://fr.mathworks.com/help/stats/binofit_scalar.html.

  • (phat, pci) = binofit_scalar(x, n) returns a maximum likelihood estimate of the probability of success in a given binomial trial based on the number of successes, x, observed in n independent trials.

  • (phat, pci) = binofit_scalar(x, n) returns the probability estimate, phat, and the 95% confidence intervals, pci, by using the Clopper-Pearson method to calculate confidence intervals.

  • (phat, pci) = binofit_scalar(x, n, alpha) returns the 100(1 - alpha)% confidence intervals. For example, alpha = 0.01 yields 99% confidence intervals.

For the Clopper-Pearson UCB algorithms:

  • x is the cum rewards of some arm k, \(x = X_k(t)\),

  • n is the number of samples of that arm k, \(n = N_k(t)\),

  • and alpha is a small positive number, \(\alpha = \frac{1}{t^c}\) in this algorithm (for \(c > 1, \simeq 1\), for instance c = 1.01).

Returns: (phat, pci)

  • phat: is the estimate of p

  • pci: is the confidence interval

Note

My reference implementation was https://github.com/sjara/extracellpy/blob/master/extrastats.py#L35, but http://statsmodels.sourceforge.net/devel/generated/statsmodels.stats.proportion.proportion_confint.html can also be used (it implies an extra requirement for the project).

Policies.CPUCB.binofit(xArray, nArray, alpha=0.05)[source]

Parameter estimates and confidence intervals for binomial data, for vectorial inputs.

For example:

>>> np.random.seed(1234)  # reproducible results
>>> true_p = 0.6
>>> N = 100
>>> xArray = np.random.binomial(N, true_p, 4)
>>> xArray
array([61, 54, 61, 52])
>>> (phat, pci) = binofit(xArray, N)
>>> phat
array([0.61, 0.54, 0.61, 0.52])
>>> pci  # 0.6 of course lies in the 95% confidence intervals  
array([[0.507..., 0.705...],
       [0.437..., 0.640...],
       [0.507..., 0.705...],
       [0.417..., 0.620...]])
>>> (phat, pci) = binofit(xArray, N, 0.01)
>>> pci  # 0.6 is also in the 99% confidence intervals, but it is larger  
array([[0.476..., 0.732...],
       [0.407..., 0.668...],
       [0.476..., 0.732...],
       [0.387..., 0.650...]])
Policies.CPUCB.ClopperPearsonUCB(x, N, alpha=0.05)[source]

Returns just the upper-confidence bound of the confidence interval.

Policies.CPUCB.C = 1.01

Default value for the parameter c for CP-UCB

class Policies.CPUCB.CPUCB(nbArms, c=1.01, lower=0.0, amplitude=1.0)[source]

Bases: Policies.UCB.UCB

The Clopper-Pearson UCB policy for bounded bandits. Reference: [Garivier & Cappé, COLT 2011].

__init__(nbArms, c=1.01, lower=0.0, amplitude=1.0)[source]

New generic index policy.

  • nbArms: the number of arms,

  • lower, amplitude: lower value and known amplitude of the rewards.

c = None

Parameter c for the CP-UCB formula (see below)

computeIndex(arm)[source]

Compute the current index, at time t and after \(N_k(t)\) pulls of arm k:

\[I_k(t) = \mathrm{ClopperPearsonUCB}\left( X_k(t), N_k(t), \frac{1}{t^c} \right).\]

Where \(\mathrm{ClopperPearsonUCB}\) is defined above. The index is the upper-confidence bound of the binomial trial of \(N_k(t)\) samples from arm k, having mean \(\mu_k\), and empirical outcome \(X_k(t)\). The confidence interval is with \(\alpha = 1 / t^c\), for a \(100(1 - \alpha)\%\) confidence bound.

__module__ = 'Policies.CPUCB'