Policies.DMED module¶
The DMED policy of [Honda & Takemura, COLT 2010] in the special case of Bernoulli rewards (can be used on any [0,1]-valued rewards, but warning: in the non-binary case, this is not the algorithm of [Honda & Takemura, COLT 2010]) (see note below on the variant).
Reference: [Garivier & Cappé - COLT, 2011](https://arxiv.org/pdf/1102.2490.pdf).
-
class
Policies.DMED.
DMED
(nbArms, genuine=False, tolerance=0.0001, kl=CPUDispatcher(<function klBern>), lower=0.0, amplitude=1.0)[source]¶ Bases:
Policies.BasePolicy.BasePolicy
The DMED policy of [Honda & Takemura, COLT 2010] in the special case of Bernoulli rewards (can be used on any [0,1]-valued rewards, but warning: in the non-binary case, this is not the algorithm of [Honda & Takemura, COLT 2010]) (see note below on the variant).
Reference: [Garivier & Cappé - COLT, 2011](https://arxiv.org/pdf/1102.2490.pdf).
-
__init__
(nbArms, genuine=False, tolerance=0.0001, kl=CPUDispatcher(<function klBern>), lower=0.0, amplitude=1.0)[source]¶ New policy.
-
kl
= None¶ kl function to use
-
tolerance
= None¶ Numerical tolerance
-
genuine
= None¶ Flag to know which variant is implemented, DMED or DMED+
-
nextActions
= None¶ List of next actions to play, every next step is playing
nextActions.pop(0)
-
choice
()[source]¶ If there is still a next action to play, pop it and play it, otherwise make new list and play first action.
The list of action is obtained as all the indexes k satisfying the following equation.
For the naive version (
genuine = False
), DMED:
kl(ˆμk(t),ˆμ∗(t))<log(t)Nk(t).For the original version (
genuine = True
), DMED+:
kl(ˆμk(t),ˆμ∗(t))<log(tNk(t))Nk(t).Where Xk(t) is the sum of rewards from arm k, ˆμk(t) is the empirical mean, and ˆμ∗(t) is the best empirical mean.
Xk(t)=t∑σ=11(A(σ)=k)rk(σ)ˆμk(t)=Xk(t)Nk(t),ˆμ∗(t)=max
-
choiceMultiple
(nb=1)[source]¶ If there is still enough actions to play, pop them and play them, otherwise make new list and play nb first actions.
-
__module__
= 'Policies.DMED'¶
-
class
Policies.DMED.
DMEDPlus
(nbArms, tolerance=0.0001, kl=CPUDispatcher(<function klBern>), lower=0.0, amplitude=1.0)[source]¶ Bases:
Policies.DMED.DMED
The DMED+ policy of [Honda & Takemura, COLT 2010] in the special case of Bernoulli rewards (can be used on any [0,1]-valued rewards, but warning: in the non-binary case, this is not the algorithm of [Honda & Takemura, COLT 2010]).
Reference: [Garivier & Cappé - COLT, 2011](https://arxiv.org/pdf/1102.2490.pdf).
-
__init__
(nbArms, tolerance=0.0001, kl=CPUDispatcher(<function klBern>), lower=0.0, amplitude=1.0)[source]¶ New policy.
-
__module__
= 'Policies.DMED'¶