Policies.Experimentals.KLempUCB module¶
The Empirical KL-UCB algorithm non-parametric policy. Reference: [Maillard, Munos & Stoltz - COLT, 2011], [Cappé, Garivier, Maillard, Munos & Stoltz, 2012].
- 
class Policies.Experimentals.KLempUCB.KLempUCB(nbArms, maxReward=1.0, lower=0.0, amplitude=1.0)[source]¶
- Bases: - IndexPolicy.IndexPolicy- The Empirical KL-UCB algorithm non-parametric policy. References: [Maillard, Munos & Stoltz - COLT, 2011], [Cappé, Garivier, Maillard, Munos & Stoltz, 2012]. - 
__init__(nbArms, maxReward=1.0, lower=0.0, amplitude=1.0)[source]¶
- New generic index policy. - nbArms: the number of arms, 
- lower, amplitude: lower value and known amplitude of the rewards. 
 
 - 
c= None¶
- Parameter c 
 - 
maxReward= None¶
- Known upper bound on the rewards 
 - 
pulls= None¶
- Keep track of pulls of each arm 
 - 
obs= None¶
- UNBOUNDED dictionnary for each arm: keep track of how many observation of each rewards were seen. Warning: KLempUCB works better for discrete distributions! 
 - 
computeIndex(arm)[source]¶
- Compute the current index, at time t and after \(N_k(t)\) pulls of arm k. 
 - 
getReward(arm, reward)[source]¶
- Give a reward: increase t, pulls, and update count of observations for that arm. 
 - 
__module__= 'Policies.Experimentals.KLempUCB'¶
 
-