Policies.kullback module¶

Kullback-Leibler divergence functions and klUCB utilities.

Faster implementation can be found in a C file, in Policies/C, and should be compiled to speedup computations.
However, the version here have examples, doctests, and are jit compiled on the fly (with numba, cf. http://numba.pydata.org/).
Cf. https://en.wikipedia.org/wiki/Kullback%E2%80%93Leibler_divergence
Reference: [Filippi, Cappé & Garivier - Allerton, 2011](https://arxiv.org/pdf/1004.5229.pdf) and [Garivier & Cappé, 2011](https://arxiv.org/pdf/1102.2490.pdf)

Warning

All functions are not vectorized, and assume only one value for each argument. If you want vectorized function, use the wrapper numpy.vectorize:

>>> import numpy as np
>>> klBern_vect = np.vectorize(klBern)
>>> klBern_vect([0.1, 0.5, 0.9], 0.2)  
array([0.036..., 0.223..., 1.145...])
>>> klBern_vect(0.4, [0.2, 0.3, 0.4])  
array([0.104..., 0.022..., 0...])
>>> klBern_vect([0.1, 0.5, 0.9], [0.2, 0.3, 0.4])  
array([0.036..., 0.087..., 0.550...])

For some functions, you would be better off writing a vectorized version manually, for instance if you want to fix a value of some optional parameters:

>>> # WARNING using np.vectorize gave weird result on klGauss
>>> # klGauss_vect = np.vectorize(klGauss, excluded="y")
>>> def klGauss_vect(xs, y, sig2x=0.25):  # vectorized for first input only
...    return np.array([klGauss(x, y, sig2x) for x in xs])
>>> klGauss_vect([-1, 0, 1], 0.1)  
array([2.42, 0.02, 1.62])

Policies.kullback.eps = 1e-15¶: Threshold value: everything in [0, 1] is truncated to [eps, 1 - eps]

Policies.kullback.klBern¶

Kullback-Leibler divergence for Bernoulli distributions. https://en.wikipedia.org/wiki/Bernoulli_distribution#Kullback.E2.80.93Leibler_divergence

\[\mathrm{KL}(\mathcal{B}(x), \mathcal{B}(y)) = x \log(\frac{x}{y}) + (1-x) \log(\frac{1-x}{1-y}).\]

>>> klBern(0.5, 0.5)
0.0
>>> klBern(0.1, 0.9)  
1.757779...
>>> klBern(0.9, 0.1)  # And this KL is symmetric  
1.757779...
>>> klBern(0.4, 0.5)  
0.020135...
>>> klBern(0.01, 0.99)  
4.503217...

Special values:

>>> klBern(0, 1)  # Should be +inf, but 0 --> eps, 1 --> 1 - eps  
34.539575...

Policies.kullback.klBin¶

Kullback-Leibler divergence for Binomial distributions. https://math.stackexchange.com/questions/320399/kullback-leibner-divergence-of-binomial-distributions

It is simply the n times klBern() on x and y.

\[\mathrm{KL}(\mathrm{Bin}(x, n), \mathrm{Bin}(y, n)) = n \times \left(x \log(\frac{x}{y}) + (1-x) \log(\frac{1-x}{1-y}) \right).\]

Warning

The two distributions must have the same parameter n, and x, y are p, q in (0, 1).

>>> klBin(0.5, 0.5, 10)
0.0
>>> klBin(0.1, 0.9, 10)  
17.57779...
>>> klBin(0.9, 0.1, 10)  # And this KL is symmetric  
17.57779...
>>> klBin(0.4, 0.5, 10)  
0.20135...
>>> klBin(0.01, 0.99, 10)  
45.03217...

Special values:

>>> klBin(0, 1, 10)  # Should be +inf, but 0 --> eps, 1 --> 1 - eps  
345.39575...

Policies.kullback.klPoisson¶

Kullback-Leibler divergence for Poison distributions. https://en.wikipedia.org/wiki/Poisson_distribution#Kullback.E2.80.93Leibler_divergence

\[\mathrm{KL}(\mathrm{Poisson}(x), \mathrm{Poisson}(y)) = y - x + x \times \log(\frac{x}{y}).\]

>>> klPoisson(3, 3)
0.0
>>> klPoisson(2, 1)  
0.386294...
>>> klPoisson(1, 2)  # And this KL is non-symmetric  
0.306852...
>>> klPoisson(3, 6)  
0.920558...
>>> klPoisson(6, 8)  
0.273907...

Special values:

>>> klPoisson(1, 0)  # Should be +inf, but 0 --> eps, 1 --> 1 - eps  
33.538776...
>>> klPoisson(0, 0)
0.0

Policies.kullback.klExp¶

Kullback-Leibler divergence for exponential distributions. https://en.wikipedia.org/wiki/Exponential_distribution#Kullback.E2.80.93Leibler_divergence

\[\begin{split}\mathrm{KL}(\mathrm{Exp}(x), \mathrm{Exp}(y)) = \begin{cases} \frac{x}{y} - 1 - \log(\frac{x}{y}) & \text{if} x > 0, y > 0\\ +\infty & \text{otherwise} \end{cases}\end{split}\]

>>> klExp(3, 3)
0.0
>>> klExp(3, 6)  
0.193147...
>>> klExp(1, 2)  # Only the proportion between x and y is used  
0.193147...
>>> klExp(2, 1)  # And this KL is non-symmetric  
0.306852...
>>> klExp(4, 2)  # Only the proportion between x and y is used  
0.306852...
>>> klExp(6, 8)  
0.037682...

x, y have to be positive:

>>> klExp(-3, 2)
inf
>>> klExp(3, -2)
inf
>>> klExp(-3, -2)
inf

Policies.kullback.klGamma¶

Kullback-Leibler divergence for gamma distributions. https://en.wikipedia.org/wiki/Gamma_distribution#Kullback.E2.80.93Leibler_divergence

It is simply the a times klExp() on x and y.

\[\begin{split}\mathrm{KL}(\Gamma(x, a), \Gamma(y, a)) = \begin{cases} a \times \left( \frac{x}{y} - 1 - \log(\frac{x}{y}) \right) & \text{if} x > 0, y > 0\\ +\infty & \text{otherwise} \end{cases}\end{split}\]

Warning

The two distributions must have the same parameter a.

>>> klGamma(3, 3)
0.0
>>> klGamma(3, 6)  
0.193147...
>>> klGamma(1, 2)  # Only the proportion between x and y is used  
0.193147...
>>> klGamma(2, 1)  # And this KL is non-symmetric  
0.306852...
>>> klGamma(4, 2)  # Only the proportion between x and y is used  
0.306852...
>>> klGamma(6, 8)  
0.037682...

x, y have to be positive:

>>> klGamma(-3, 2)
inf
>>> klGamma(3, -2)
inf
>>> klGamma(-3, -2)
inf

Policies.kullback.klNegBin¶

Kullback-Leibler divergence for negative binomial distributions. https://en.wikipedia.org/wiki/Negative_binomial_distribution

\[\mathrm{KL}(\mathrm{NegBin}(x, r), \mathrm{NegBin}(y, r)) = r \times \log((r + x) / (r + y)) - x \times \log(y \times (r + x) / (x \times (r + y))).\]

Warning

The two distributions must have the same parameter r.

>>> klNegBin(0.5, 0.5)
0.0
>>> klNegBin(0.1, 0.9)  
-0.711611...
>>> klNegBin(0.9, 0.1)  # And this KL is non-symmetric  
2.0321564...
>>> klNegBin(0.4, 0.5)  
-0.130653...
>>> klNegBin(0.01, 0.99)  
-0.717353...

Special values:

>>> klBern(0, 1)  # Should be +inf, but 0 --> eps, 1 --> 1 - eps  
34.539575...

With other values for r:

>>> klNegBin(0.5, 0.5, r=2)
0.0
>>> klNegBin(0.1, 0.9, r=2)  
-0.832991...
>>> klNegBin(0.1, 0.9, r=4)  
-0.914890...
>>> klNegBin(0.9, 0.1, r=2)  # And this KL is non-symmetric  
2.3325528...
>>> klNegBin(0.4, 0.5, r=2)  
-0.154572...
>>> klNegBin(0.01, 0.99, r=2)  
-0.836257...

Policies.kullback.klGauss¶

Kullback-Leibler divergence for Gaussian distributions of means x and y and variances sig2x and sig2y, \(\nu_1 = \mathcal{N}(x, \sigma_x^2)\) and \(\nu_2 = \mathcal{N}(y, \sigma_x^2)\):

\[\mathrm{KL}(\nu_1, \nu_2) = \frac{(x - y)^2}{2 \sigma_y^2} + \frac{1}{2}\left( \frac{\sigma_x^2}{\sigma_y^2} - 1 \log\left(\frac{\sigma_x^2}{\sigma_y^2}\right) \right).\]

See https://en.wikipedia.org/wiki/Normal_distribution#Other_properties

By default, sig2y is assumed to be sig2x (same variance).

Warning

The C version does not support different variances.

>>> klGauss(3, 3)
0.0
>>> klGauss(3, 6)
18.0
>>> klGauss(1, 2)
2.0
>>> klGauss(2, 1)  # And this KL is symmetric
2.0
>>> klGauss(4, 2)
8.0
>>> klGauss(6, 8)
8.0

x, y can be negative:

>>> klGauss(-3, 2)
50.0
>>> klGauss(3, -2)
50.0
>>> klGauss(-3, -2)
2.0
>>> klGauss(3, 2)
2.0

With other values for sig2x:

>>> klGauss(3, 3, sig2x=10)
0.0
>>> klGauss(3, 6, sig2x=10)
0.45
>>> klGauss(1, 2, sig2x=10)
0.05
>>> klGauss(2, 1, sig2x=10)  # And this KL is symmetric
0.05
>>> klGauss(4, 2, sig2x=10)
0.2
>>> klGauss(6, 8, sig2x=10)
0.2

With different values for sig2x and sig2y:

>>> klGauss(0, 0, sig2x=0.25, sig2y=0.5)  
-0.0284...
>>> klGauss(0, 0, sig2x=0.25, sig2y=1.0)  
0.2243...
>>> klGauss(0, 0, sig2x=0.5, sig2y=0.25)  # not symmetric here!  
1.1534...

>>> klGauss(0, 1, sig2x=0.25, sig2y=0.5)  
0.9715...
>>> klGauss(0, 1, sig2x=0.25, sig2y=1.0)  
0.7243...
>>> klGauss(0, 1, sig2x=0.5, sig2y=0.25)  # not symmetric here!  
3.1534...

>>> klGauss(1, 0, sig2x=0.25, sig2y=0.5)  
0.9715...
>>> klGauss(1, 0, sig2x=0.25, sig2y=1.0)  
0.7243...
>>> klGauss(1, 0, sig2x=0.5, sig2y=0.25)  # not symmetric here!  
3.1534...

Warning

Using Policies.klUCB (and variants) with klGauss() is equivalent to use Policies.UCB, so prefer the simpler version.

Policies.kullback.klucb¶

The generic KL-UCB index computation.

x: value of the cum reward,
d: upper bound on the divergence,
kl: the KL divergence to be used (klBern(), klGauss(), etc),
upperbound, lowerbound=float('-inf'): the known bound of the values x,
precision=1e-6: the threshold from where to stop the research,
max_iterations=50: max number of iterations of the loop (safer to bound it to reduce time complexity).

\[\mathrm{klucb}(x, d) \simeq \sup_{\mathrm{lowerbound} \leq y \leq \mathrm{upperbound}} \{ y : \mathrm{kl}(x, y) < d \}.\]

Note

It uses a bisection search, and one call to kl for each step of the bisection search.

For example, for klucbBern(), the two steps are to first compute an upperbound (as precise as possible) and the compute the kl-UCB index:

>>> x, d = 0.9, 0.2   # mean x, exploration term d
>>> upperbound = min(1., klucbGauss(x, d, sig2x=0.25))  # variance 1/4 for [0,1] bounded distributions
>>> upperbound  
1.0
>>> klucb(x, d, klBern, upperbound, lowerbound=0, precision=1e-3, max_iterations=10)  
0.9941...
>>> klucb(x, d, klBern, upperbound, lowerbound=0, precision=1e-6, max_iterations=10)  
0.9944...
>>> klucb(x, d, klBern, upperbound, lowerbound=0, precision=1e-3, max_iterations=50)  
0.9941...
>>> klucb(x, d, klBern, upperbound, lowerbound=0, precision=1e-6, max_iterations=100)  # more and more precise!  
0.994489...

Note

See below for more examples for different KL divergence functions.

Policies.kullback.klucbBern¶

KL-UCB index computation for Bernoulli distributions, using klucb().

Influence of x:

>>> klucbBern(0.1, 0.2)  
0.378391...
>>> klucbBern(0.5, 0.2)  
0.787088...
>>> klucbBern(0.9, 0.2)  
0.994489...

Influence of d:

>>> klucbBern(0.1, 0.4)  
0.519475...
>>> klucbBern(0.1, 0.9)  
0.734714...

>>> klucbBern(0.5, 0.4)  
0.871035...
>>> klucbBern(0.5, 0.9)  
0.956809...

>>> klucbBern(0.9, 0.4)  
0.999285...
>>> klucbBern(0.9, 0.9)  
0.999995...

Policies.kullback.klucbGauss¶

KL-UCB index computation for Gaussian distributions.

Note that it does not require any search.

Warning

it works only if the good variance constant is given.

Influence of x:

>>> klucbGauss(0.1, 0.2)  
0.416227...
>>> klucbGauss(0.5, 0.2)  
0.816227...
>>> klucbGauss(0.9, 0.2)  
1.216227...

Influence of d:

>>> klucbGauss(0.1, 0.4)  
0.547213...
>>> klucbGauss(0.1, 0.9)  
0.770820...

>>> klucbGauss(0.5, 0.4)  
0.947213...
>>> klucbGauss(0.5, 0.9)  
1.170820...

>>> klucbGauss(0.9, 0.4)  
1.347213...
>>> klucbGauss(0.9, 0.9)  
1.570820...

Warning

Using Policies.klUCB (and variants) with klucbGauss() is equivalent to use Policies.UCB, so prefer the simpler version.

Policies.kullback.klucbPoisson¶

KL-UCB index computation for Poisson distributions, using klucb().

Influence of x:

>>> klucbPoisson(0.1, 0.2)  
0.450523...
>>> klucbPoisson(0.5, 0.2)  
1.089376...
>>> klucbPoisson(0.9, 0.2)  
1.640112...

Influence of d:

>>> klucbPoisson(0.1, 0.4)  
0.693684...
>>> klucbPoisson(0.1, 0.9)  
1.252796...

>>> klucbPoisson(0.5, 0.4)  
1.422933...
>>> klucbPoisson(0.5, 0.9)  
2.122985...

>>> klucbPoisson(0.9, 0.4)  
2.033691...
>>> klucbPoisson(0.9, 0.9)  
2.831573...

Policies.kullback.klucbExp¶

KL-UCB index computation for exponential distributions, using klucb().

Influence of x:

>>> klucbExp(0.1, 0.2)  
0.202741...
>>> klucbExp(0.5, 0.2)  
1.013706...
>>> klucbExp(0.9, 0.2)  
1.824671...

Influence of d:

>>> klucbExp(0.1, 0.4)  
0.285792...
>>> klucbExp(0.1, 0.9)  
0.559088...

>>> klucbExp(0.5, 0.4)  
1.428962...
>>> klucbExp(0.5, 0.9)  
2.795442...

>>> klucbExp(0.9, 0.4)  
2.572132...
>>> klucbExp(0.9, 0.9)  
5.031795...

Policies.kullback.klucbGamma¶

KL-UCB index computation for Gamma distributions, using klucb().

Influence of x:

>>> klucbGamma(0.1, 0.2)  
0.202...
>>> klucbGamma(0.5, 0.2)  
1.013...
>>> klucbGamma(0.9, 0.2)  
1.824...

Influence of d:

>>> klucbGamma(0.1, 0.4)  
0.285...
>>> klucbGamma(0.1, 0.9)  
0.559...

>>> klucbGamma(0.5, 0.4)  
1.428...
>>> klucbGamma(0.5, 0.9)  
2.795...

>>> klucbGamma(0.9, 0.4)  
2.572...
>>> klucbGamma(0.9, 0.9)  
5.031...

Policies.kullback.kllcb¶

The generic KL-LCB index computation.

x: value of the cum reward,
d: lower bound on the divergence,
kl: the KL divergence to be used (klBern(), klGauss(), etc),
lowerbound, upperbound=float('-inf'): the known bound of the values x,
precision=1e-6: the threshold from where to stop the research,
max_iterations=50: max number of iterations of the loop (safer to bound it to reduce time complexity).

\[\mathrm{kllcb}(x, d) \simeq \inf_{\mathrm{lowerbound} \leq y \leq \mathrm{upperbound}} \{ y : \mathrm{kl}(x, y) > d \}.\]

Note

It uses a bisection search, and one call to kl for each step of the bisection search.

For example, for kllcbBern(), the two steps are to first compute an upperbound (as precise as possible) and the compute the kl-UCB index:

>>> x, d = 0.9, 0.2   # mean x, exploration term d
>>> lowerbound = max(0., kllcbGauss(x, d, sig2x=0.25))  # variance 1/4 for [0,1] bounded distributions
>>> lowerbound  
0.5837...
>>> kllcb(x, d, klBern, lowerbound, upperbound=0, precision=1e-3, max_iterations=10)  
0.29...
>>> kllcb(x, d, klBern, lowerbound, upperbound=0, precision=1e-6, max_iterations=10)  
0.29188...
>>> kllcb(x, d, klBern, lowerbound, upperbound=0, precision=1e-3, max_iterations=50)  
0.291886...
>>> kllcb(x, d, klBern, lowerbound, upperbound=0, precision=1e-6, max_iterations=100)  # more and more precise!  
0.29188611...

Note

See below for more examples for different KL divergence functions.

Policies.kullback.kllcbBern¶

KL-LCB index computation for Bernoulli distributions, using kllcb().

Influence of x:

>>> kllcbBern(0.1, 0.2)  
0.09999...
>>> kllcbBern(0.5, 0.2)  
0.49999...
>>> kllcbBern(0.9, 0.2)  
0.89999...

Influence of d:

>>> kllcbBern(0.1, 0.4)  
0.09999...
>>> kllcbBern(0.1, 0.9)  
0.09999...

>>> kllcbBern(0.5, 0.4)  
0.4999...
>>> kllcbBern(0.5, 0.9)  
0.4999...

>>> kllcbBern(0.9, 0.4)  
0.8999...
>>> kllcbBern(0.9, 0.9)  
0.8999...

Policies.kullback.kllcbGauss¶

KL-LCB index computation for Gaussian distributions.

Note that it does not require any search.

Warning

it works only if the good variance constant is given.

Influence of x:

>>> kllcbGauss(0.1, 0.2)  
-0.21622...
>>> kllcbGauss(0.5, 0.2)  
0.18377...
>>> kllcbGauss(0.9, 0.2)  
0.58377...

Influence of d:

>>> kllcbGauss(0.1, 0.4)  
-0.3472...
>>> kllcbGauss(0.1, 0.9)  
-0.5708...

>>> kllcbGauss(0.5, 0.4)  
0.0527...
>>> kllcbGauss(0.5, 0.9)  
-0.1708...

>>> kllcbGauss(0.9, 0.4)  
0.4527...
>>> kllcbGauss(0.9, 0.9)  
0.2291...

Warning

Using Policies.kllCB (and variants) with kllcbGauss() is equivalent to use Policies.UCB, so prefer the simpler version.

Policies.kullback.kllcbPoisson¶

KL-LCB index computation for Poisson distributions, using kllcb().

Influence of x:

>>> kllcbPoisson(0.1, 0.2)  
0.09999...
>>> kllcbPoisson(0.5, 0.2)  
0.49999...
>>> kllcbPoisson(0.9, 0.2)  
0.89999...

Influence of d:

>>> kllcbPoisson(0.1, 0.4)  
0.09999...
>>> kllcbPoisson(0.1, 0.9)  
0.09999...

>>> kllcbPoisson(0.5, 0.4)  
0.49999...
>>> kllcbPoisson(0.5, 0.9)  
0.49999...

>>> kllcbPoisson(0.9, 0.4)  
0.89999...
>>> kllcbPoisson(0.9, 0.9)  
0.89999...

Policies.kullback.kllcbExp¶

KL-LCB index computation for exponential distributions, using kllcb().

Influence of x:

>>> kllcbExp(0.1, 0.2)  
0.15267...
>>> kllcbExp(0.5, 0.2)  
0.7633...
>>> kllcbExp(0.9, 0.2)  
1.3740...

Influence of d:

>>> kllcbExp(0.1, 0.4)  
0.2000...
>>> kllcbExp(0.1, 0.9)  
0.3842...

>>> kllcbExp(0.5, 0.4)  
1.0000...
>>> kllcbExp(0.5, 0.9)  
1.9214...

>>> kllcbExp(0.9, 0.4)  
1.8000...
>>> kllcbExp(0.9, 0.9)  
3.4586...

Policies.kullback.maxEV(p, V, klMax)¶

Maximize expectation of \(V\) with respect to \(q\) st. \(\mathrm{KL}(p, q) < \text{klMax}\).

Input args.: p, V, klMax.
Reference: Section 3.2 of [Filippi, Cappé & Garivier - Allerton, 2011](https://arxiv.org/pdf/1004.5229.pdf).

Policies.kullback.reseqp(p, V, klMax, max_iterations=50)¶

Solve f(reseqp(p, V, klMax)) = klMax, using Newton method.

Note

This is a subroutine of maxEV().

Reference: Eq. (4) in Section 3.2 of [Filippi, Cappé & Garivier - Allerton, 2011](https://arxiv.org/pdf/1004.5229.pdf).

Warning

np.dot is very slow!

Policies.kullback.reseqp2(p, V, klMax)¶

Solve f(reseqp(p, V, klMax)) = klMax, using a blackbox minimizer, from scipy.optimize.

FIXME it does not work well yet!

Note

This is a subroutine of maxEV().

Reference: Eq. (4) in Section 3.2 of [Filippi, Cappé & Garivier - Allerton, 2011].

Warning

np.dot is very slow!