Policies.DMED module

The DMED policy of [Honda & Takemura, COLT 2010] in the special case of Bernoulli rewards (can be used on any [0,1]-valued rewards, but warning: in the non-binary case, this is not the algorithm of [Honda & Takemura, COLT 2010]) (see note below on the variant).

class Policies.DMED.DMED(nbArms, genuine=False, tolerance=0.0001, kl=CPUDispatcher(<function klBern>), lower=0.0, amplitude=1.0)[source]

Bases: Policies.BasePolicy.BasePolicy

The DMED policy of [Honda & Takemura, COLT 2010] in the special case of Bernoulli rewards (can be used on any [0,1]-valued rewards, but warning: in the non-binary case, this is not the algorithm of [Honda & Takemura, COLT 2010]) (see note below on the variant).

__init__(nbArms, genuine=False, tolerance=0.0001, kl=CPUDispatcher(<function klBern>), lower=0.0, amplitude=1.0)[source]

New policy.

kl = None

kl function to use

tolerance = None

Numerical tolerance

genuine = None

Flag to know which variant is implemented, DMED or DMED+

nextActions = None

List of next actions to play, every next step is playing nextActions.pop(0)

__str__()[source]

-> str

startGame()[source]

Initialize the policy for a new game.

choice()[source]

If there is still a next action to play, pop it and play it, otherwise make new list and play first action.

The list of action is obtained as all the indexes k satisfying the following equation.

  • For the naive version (genuine = False), DMED:

kl(ˆμk(t),ˆμ(t))<log(t)Nk(t).
  • For the original version (genuine = True), DMED+:

kl(ˆμk(t),ˆμ(t))<log(tNk(t))Nk(t).

Where Xk(t) is the sum of rewards from arm k, ˆμk(t) is the empirical mean, and ˆμ(t) is the best empirical mean.

Xk(t)=tσ=11(A(σ)=k)rk(σ)ˆμk(t)=Xk(t)Nk(t),ˆμ(t)=max
choiceMultiple(nb=1)[source]

If there is still enough actions to play, pop them and play them, otherwise make new list and play nb first actions.

__module__ = 'Policies.DMED'
class Policies.DMED.DMEDPlus(nbArms, tolerance=0.0001, kl=CPUDispatcher(<function klBern>), lower=0.0, amplitude=1.0)[source]

Bases: Policies.DMED.DMED

The DMED+ policy of [Honda & Takemura, COLT 2010] in the special case of Bernoulli rewards (can be used on any [0,1]-valued rewards, but warning: in the non-binary case, this is not the algorithm of [Honda & Takemura, COLT 2010]).

__init__(nbArms, tolerance=0.0001, kl=CPUDispatcher(<function klBern>), lower=0.0, amplitude=1.0)[source]

New policy.

__module__ = 'Policies.DMED'