# Policies.DMED module¶

The DMED policy of [Honda & Takemura, COLT 2010] in the special case of Bernoulli rewards (can be used on any [0,1]-valued rewards, but warning: in the non-binary case, this is not the algorithm of [Honda & Takemura, COLT 2010]) (see note below on the variant).

class Policies.DMED.DMED(nbArms, genuine=False, tolerance=0.0001, kl=CPUDispatcher(<function klBern>), lower=0.0, amplitude=1.0)[source]

The DMED policy of [Honda & Takemura, COLT 2010] in the special case of Bernoulli rewards (can be used on any [0,1]-valued rewards, but warning: in the non-binary case, this is not the algorithm of [Honda & Takemura, COLT 2010]) (see note below on the variant).

__init__(nbArms, genuine=False, tolerance=0.0001, kl=CPUDispatcher(<function klBern>), lower=0.0, amplitude=1.0)[source]

New policy.

kl = None

kl function to use

tolerance = None

Numerical tolerance

genuine = None

Flag to know which variant is implemented, DMED or DMED+

nextActions = None

List of next actions to play, every next step is playing nextActions.pop(0)

__str__()[source]

-> str

startGame()[source]

Initialize the policy for a new game.

choice()[source]

If there is still a next action to play, pop it and play it, otherwise make new list and play first action.

The list of action is obtained as all the indexes $$k$$ satisfying the following equation.

• For the naive version (genuine = False), DMED:

$\mathrm{kl}(\hat{\mu}_k(t), \hat{\mu}^*(t)) < \frac{\log(t)}{N_k(t)}.$
• For the original version (genuine = True), DMED+:

$\mathrm{kl}(\hat{\mu}_k(t), \hat{\mu}^*(t)) < \frac{\log(\frac{t}{N_k(t)})}{N_k(t)}.$

Where $$X_k(t)$$ is the sum of rewards from arm k, $$\hat{\mu}_k(t)$$ is the empirical mean, and $$\hat{\mu}^*(t)$$ is the best empirical mean.

$\begin{split}X_k(t) &= \sum_{\sigma=1}^{t} 1(A(\sigma) = k) r_k(\sigma) \\ \hat{\mu}_k(t) &= \frac{X_k(t)}{N_k(t)}, \\ \hat{\mu}^*(t) &= \max_{k=1}^{K} \hat{\mu}_k(t)\end{split}$
choiceMultiple(nb=1)[source]

If there is still enough actions to play, pop them and play them, otherwise make new list and play nb first actions.

__module__ = 'Policies.DMED'
class Policies.DMED.DMEDPlus(nbArms, tolerance=0.0001, kl=CPUDispatcher(<function klBern>), lower=0.0, amplitude=1.0)[source]

The DMED+ policy of [Honda & Takemura, COLT 2010] in the special case of Bernoulli rewards (can be used on any [0,1]-valued rewards, but warning: in the non-binary case, this is not the algorithm of [Honda & Takemura, COLT 2010]).

__init__(nbArms, tolerance=0.0001, kl=CPUDispatcher(<function klBern>), lower=0.0, amplitude=1.0)[source]

New policy.

__module__ = 'Policies.DMED'