# Policies.IMED module¶

The IMED policy of [Honda & Takemura, JMLR 2015].

Policies.IMED.Dinf(x=None, mu=None, kl=CPUDispatcher(<function klBern>), lowerbound=0, upperbound=1, precision=1e-06, max_iterations=50)[source]

The generic Dinf index computation.

• x: value of the cum reward,

• mu: upperbound on the mean y,

• kl: the KL divergence to be used (klBern(), klGauss(), etc),

• lowerbound, upperbound=1: the known bound of the values y and x,

• precision=1e-6: the threshold from where to stop the research,

• max_iterations: max number of iterations of the loop (safer to bound it to reduce time complexity).

$D_{\inf}(x, d) \simeq \inf_{\max(\mu, \mathrm{lowerbound}) \leq y \leq \mathrm{upperbound}} \mathrm{kl}(x, y).$

Note

It uses a call the scipy.optimize.minimize_scalar(). If this fails, it uses a bisection search, and one call to kl for each step of the bisection search.

class Policies.IMED.IMED(nbArms, tolerance=0.0001, kl=CPUDispatcher(<function klBern>), lower=0.0, amplitude=1.0)[source]

The IMED policy of [Honda & Takemura, JMLR 2015].

__init__(nbArms, tolerance=0.0001, kl=CPUDispatcher(<function klBern>), lower=0.0, amplitude=1.0)[source]

New policy.

__str__()[source]

-> str

one_Dinf(x, mu)[source]

Compute the $$D_{\inf}$$ solution, for one value of x, and one value for mu.

Dinf(xs, mu)[source]

Compute the $$D_{\inf}$$ solution, for a vector of value of xs, and one value for mu.

choice()[source]

Choose an arm with minimal index (uniformly at random):

$A(t) \sim U(\arg\min_{1 \leq k \leq K} I_k(t)).$

Where the indexes are:

$I_k(t) = N_k(t) D_{\inf}(\hat{\mu_{k}}(t), \max_{k'} \hat{\mu_{k'}}(t)) + \log(N_k(t)).$
__module__ = 'Policies.IMED'