Policies.CPUCB module¶
The Clopper-Pearson UCB policy for bounded bandits. Reference: [Garivier & Cappé, COLT 2011](https://arxiv.org/pdf/1102.2490.pdf).
- 
Policies.CPUCB.binofit_scalar(x, n, alpha=0.05)[source]¶
- Parameter estimates and confidence intervals for binomial data. - For example: - >>> np.random.seed(1234) # reproducible results >>> true_p = 0.6 >>> N = 100 >>> x = np.random.binomial(N, true_p) >>> (phat, pci) = binofit_scalar(x, N) >>> phat 0.61 >>> pci # 0.6 of course lies in the 95% confidence interval (0.507..., 0.705...) >>> (phat, pci) = binofit_scalar(x, N, 0.01) >>> pci # 0.6 is also in the 99% confidence interval, but it is larger (0.476..., 0.732...) - Like binofit_scalar in MATLAB, see https://fr.mathworks.com/help/stats/binofit_scalar.html. - (phat, pci) = binofit_scalar(x, n)returns a maximum likelihood estimate of the probability of success in a given binomial trial based on the number of successes,- x, observed in- nindependent trials.
- (phat, pci) = binofit_scalar(x, n)returns the probability estimate, phat, and the 95% confidence intervals, pci, by using the Clopper-Pearson method to calculate confidence intervals.
- (phat, pci) = binofit_scalar(x, n, alpha)returns the- 100(1 - alpha)%confidence intervals. For example,- alpha = 0.01yields- 99%confidence intervals.
 - For the Clopper-Pearson UCB algorithms: - x is the cum rewards of some arm k, \(x = X_k(t)\), 
- n is the number of samples of that arm k, \(n = N_k(t)\), 
- and alpha is a small positive number, \(\alpha = \frac{1}{t^c}\) in this algorithm (for \(c > 1, \simeq 1\), for instance c = 1.01). 
 - Returns: (phat, pci) - phat: is the estimate of p 
- pci: is the confidence interval 
 - Note - My reference implementation was https://github.com/sjara/extracellpy/blob/master/extrastats.py#L35, but http://statsmodels.sourceforge.net/devel/generated/statsmodels.stats.proportion.proportion_confint.html can also be used (it implies an extra requirement for the project). 
- 
Policies.CPUCB.binofit(xArray, nArray, alpha=0.05)[source]¶
- Parameter estimates and confidence intervals for binomial data, for vectorial inputs. - For example: - >>> np.random.seed(1234) # reproducible results >>> true_p = 0.6 >>> N = 100 >>> xArray = np.random.binomial(N, true_p, 4) >>> xArray array([61, 54, 61, 52]) - >>> (phat, pci) = binofit(xArray, N) >>> phat array([0.61, 0.54, 0.61, 0.52]) >>> pci # 0.6 of course lies in the 95% confidence intervals array([[0.507..., 0.705...], [0.437..., 0.640...], [0.507..., 0.705...], [0.417..., 0.620...]]) - >>> (phat, pci) = binofit(xArray, N, 0.01) >>> pci # 0.6 is also in the 99% confidence intervals, but it is larger array([[0.476..., 0.732...], [0.407..., 0.668...], [0.476..., 0.732...], [0.387..., 0.650...]]) 
- 
Policies.CPUCB.ClopperPearsonUCB(x, N, alpha=0.05)[source]¶
- Returns just the upper-confidence bound of the confidence interval. 
- 
Policies.CPUCB.C= 1.01¶
- Default value for the parameter c for CP-UCB 
- 
class Policies.CPUCB.CPUCB(nbArms, c=1.01, lower=0.0, amplitude=1.0)[source]¶
- Bases: - Policies.UCB.UCB- The Clopper-Pearson UCB policy for bounded bandits. Reference: [Garivier & Cappé, COLT 2011]. - 
__init__(nbArms, c=1.01, lower=0.0, amplitude=1.0)[source]¶
- New generic index policy. - nbArms: the number of arms, 
- lower, amplitude: lower value and known amplitude of the rewards. 
 
 - 
c= None¶
- Parameter c for the CP-UCB formula (see below) 
 - 
computeIndex(arm)[source]¶
- Compute the current index, at time t and after \(N_k(t)\) pulls of arm k: \[I_k(t) = \mathrm{ClopperPearsonUCB}\left( X_k(t), N_k(t), \frac{1}{t^c} \right).\]- Where \(\mathrm{ClopperPearsonUCB}\) is defined above. The index is the upper-confidence bound of the binomial trial of \(N_k(t)\) samples from arm k, having mean \(\mu_k\), and empirical outcome \(X_k(t)\). The confidence interval is with \(\alpha = 1 / t^c\), for a \(100(1 - \alpha)\%\) confidence bound. 
 - 
__module__= 'Policies.CPUCB'¶
 
-