Policies.IMED module¶

The IMED policy of [Honda & Takemura, JMLR 2015].

Reference: [[“Non-asymptotic analysis of a new bandit algorithm for semi-bounded rewards”, J. Honda and A. Takemura, JMLR, 2015](http://jmlr.csail.mit.edu/papers/volume16/honda15a/honda15a.pdf)].

Policies.IMED.Dinf(x=None, mu=None, kl=CPUDispatcher(<function klBern>), lowerbound=0, upperbound=1, precision=1e-06, max_iterations=50)[source]¶

The generic Dinf index computation.

x: value of the cum reward,
mu: upperbound on the mean y,
kl: the KL divergence to be used (klBern(), klGauss(), etc),
lowerbound, upperbound=1: the known bound of the values y and x,
precision=1e-6: the threshold from where to stop the research,
max_iterations: max number of iterations of the loop (safer to bound it to reduce time complexity).

\[D_{\inf}(x, d) \simeq \inf_{\max(\mu, \mathrm{lowerbound}) \leq y \leq \mathrm{upperbound}} \mathrm{kl}(x, y).\]

Note

It uses a call the scipy.optimize.minimize_scalar(). If this fails, it uses a bisection search, and one call to kl for each step of the bisection search.

class Policies.IMED.IMED(nbArms, tolerance=0.0001, kl=CPUDispatcher(<function klBern>), lower=0.0, amplitude=1.0)[source]¶

Bases: Policies.DMED.DMED

The IMED policy of [Honda & Takemura, JMLR 2015].

Reference: [[“Non-asymptotic analysis of a new bandit algorithm for semi-bounded rewards”, J. Honda and A. Takemura, JMLR, 2015](http://jmlr.csail.mit.edu/papers/volume16/honda15a/honda15a.pdf)].

__init__(nbArms, tolerance=0.0001, kl=CPUDispatcher(<function klBern>), lower=0.0, amplitude=1.0)[source]¶: New policy.

__str__()[source]¶: -> str

one_Dinf(x, mu)[source]¶: Compute the \(D_{\inf}\) solution, for one value of x, and one value for mu.

Dinf(xs, mu)[source]¶: Compute the \(D_{\inf}\) solution, for a vector of value of xs, and one value for mu.

choice()[source]¶

Choose an arm with minimal index (uniformly at random):

\[A(t) \sim U(\arg\min_{1 \leq k \leq K} I_k(t)).\]

Where the indexes are:

\[I_k(t) = N_k(t) D_{\inf}(\hat{\mu_{k}}(t), \max_{k'} \hat{\mu_{k'}}(t)) + \log(N_k(t)).\]

__module__ = 'Policies.IMED'¶