# Policies.kullback module¶

Kullback-Leibler divergence functions and klUCB utilities.

Warning

All functions are not vectorized, and assume only one value for each argument. If you want vectorized function, use the wrapper numpy.vectorize:

>>> import numpy as np
>>> klBern_vect = np.vectorize(klBern)
>>> klBern_vect([0.1, 0.5, 0.9], 0.2)
array([0.036..., 0.223..., 1.145...])
>>> klBern_vect(0.4, [0.2, 0.3, 0.4])
array([0.104..., 0.022..., 0...])
>>> klBern_vect([0.1, 0.5, 0.9], [0.2, 0.3, 0.4])
array([0.036..., 0.087..., 0.550...])


For some functions, you would be better off writing a vectorized version manually, for instance if you want to fix a value of some optional parameters:

>>> # WARNING using np.vectorize gave weird result on klGauss
>>> # klGauss_vect = np.vectorize(klGauss, excluded="y")
>>> def klGauss_vect(xs, y, sig2x=0.25):  # vectorized for first input only
...    return np.array([klGauss(x, y, sig2x) for x in xs])
>>> klGauss_vect([-1, 0, 1], 0.1)
array([2.42, 0.02, 1.62])

Policies.kullback.eps = 1e-15

Threshold value: everything in [0, 1] is truncated to [eps, 1 - eps]

Policies.kullback.klBern

Kullback-Leibler divergence for Bernoulli distributions. https://en.wikipedia.org/wiki/Bernoulli_distribution#Kullback.E2.80.93Leibler_divergence

$\mathrm{KL}(\mathcal{B}(x), \mathcal{B}(y)) = x \log(\frac{x}{y}) + (1-x) \log(\frac{1-x}{1-y}).$
>>> klBern(0.5, 0.5)
0.0
>>> klBern(0.1, 0.9)
1.757779...
>>> klBern(0.9, 0.1)  # And this KL is symmetric
1.757779...
>>> klBern(0.4, 0.5)
0.020135...
>>> klBern(0.01, 0.99)
4.503217...

• Special values:

>>> klBern(0, 1)  # Should be +inf, but 0 --> eps, 1 --> 1 - eps
34.539575...

Policies.kullback.klBin

Kullback-Leibler divergence for Binomial distributions. https://math.stackexchange.com/questions/320399/kullback-leibner-divergence-of-binomial-distributions

$\mathrm{KL}(\mathrm{Bin}(x, n), \mathrm{Bin}(y, n)) = n \times \left(x \log(\frac{x}{y}) + (1-x) \log(\frac{1-x}{1-y}) \right).$

Warning

The two distributions must have the same parameter n, and x, y are p, q in (0, 1).

>>> klBin(0.5, 0.5, 10)
0.0
>>> klBin(0.1, 0.9, 10)
17.57779...
>>> klBin(0.9, 0.1, 10)  # And this KL is symmetric
17.57779...
>>> klBin(0.4, 0.5, 10)
0.20135...
>>> klBin(0.01, 0.99, 10)
45.03217...

• Special values:

>>> klBin(0, 1, 10)  # Should be +inf, but 0 --> eps, 1 --> 1 - eps
345.39575...

Policies.kullback.klPoisson

Kullback-Leibler divergence for Poison distributions. https://en.wikipedia.org/wiki/Poisson_distribution#Kullback.E2.80.93Leibler_divergence

$\mathrm{KL}(\mathrm{Poisson}(x), \mathrm{Poisson}(y)) = y - x + x \times \log(\frac{x}{y}).$
>>> klPoisson(3, 3)
0.0
>>> klPoisson(2, 1)
0.386294...
>>> klPoisson(1, 2)  # And this KL is non-symmetric
0.306852...
>>> klPoisson(3, 6)
0.920558...
>>> klPoisson(6, 8)
0.273907...

• Special values:

>>> klPoisson(1, 0)  # Should be +inf, but 0 --> eps, 1 --> 1 - eps
33.538776...
>>> klPoisson(0, 0)
0.0

Policies.kullback.klExp

Kullback-Leibler divergence for exponential distributions. https://en.wikipedia.org/wiki/Exponential_distribution#Kullback.E2.80.93Leibler_divergence

$\begin{split}\mathrm{KL}(\mathrm{Exp}(x), \mathrm{Exp}(y)) = \begin{cases} \frac{x}{y} - 1 - \log(\frac{x}{y}) & \text{if} x > 0, y > 0\\ +\infty & \text{otherwise} \end{cases}\end{split}$
>>> klExp(3, 3)
0.0
>>> klExp(3, 6)
0.193147...
>>> klExp(1, 2)  # Only the proportion between x and y is used
0.193147...
>>> klExp(2, 1)  # And this KL is non-symmetric
0.306852...
>>> klExp(4, 2)  # Only the proportion between x and y is used
0.306852...
>>> klExp(6, 8)
0.037682...

• x, y have to be positive:

>>> klExp(-3, 2)
inf
>>> klExp(3, -2)
inf
>>> klExp(-3, -2)
inf

Policies.kullback.klGamma

Kullback-Leibler divergence for gamma distributions. https://en.wikipedia.org/wiki/Gamma_distribution#Kullback.E2.80.93Leibler_divergence

$\begin{split}\mathrm{KL}(\Gamma(x, a), \Gamma(y, a)) = \begin{cases} a \times \left( \frac{x}{y} - 1 - \log(\frac{x}{y}) \right) & \text{if} x > 0, y > 0\\ +\infty & \text{otherwise} \end{cases}\end{split}$

Warning

The two distributions must have the same parameter a.

>>> klGamma(3, 3)
0.0
>>> klGamma(3, 6)
0.193147...
>>> klGamma(1, 2)  # Only the proportion between x and y is used
0.193147...
>>> klGamma(2, 1)  # And this KL is non-symmetric
0.306852...
>>> klGamma(4, 2)  # Only the proportion between x and y is used
0.306852...
>>> klGamma(6, 8)
0.037682...

• x, y have to be positive:

>>> klGamma(-3, 2)
inf
>>> klGamma(3, -2)
inf
>>> klGamma(-3, -2)
inf

Policies.kullback.klNegBin

Kullback-Leibler divergence for negative binomial distributions. https://en.wikipedia.org/wiki/Negative_binomial_distribution

$\mathrm{KL}(\mathrm{NegBin}(x, r), \mathrm{NegBin}(y, r)) = r \times \log((r + x) / (r + y)) - x \times \log(y \times (r + x) / (x \times (r + y))).$

Warning

The two distributions must have the same parameter r.

>>> klNegBin(0.5, 0.5)
0.0
>>> klNegBin(0.1, 0.9)
-0.711611...
>>> klNegBin(0.9, 0.1)  # And this KL is non-symmetric
2.0321564...
>>> klNegBin(0.4, 0.5)
-0.130653...
>>> klNegBin(0.01, 0.99)
-0.717353...

• Special values:

>>> klBern(0, 1)  # Should be +inf, but 0 --> eps, 1 --> 1 - eps
34.539575...

• With other values for r:

>>> klNegBin(0.5, 0.5, r=2)
0.0
>>> klNegBin(0.1, 0.9, r=2)
-0.832991...
>>> klNegBin(0.1, 0.9, r=4)
-0.914890...
>>> klNegBin(0.9, 0.1, r=2)  # And this KL is non-symmetric
2.3325528...
>>> klNegBin(0.4, 0.5, r=2)
-0.154572...
>>> klNegBin(0.01, 0.99, r=2)
-0.836257...

Policies.kullback.klGauss

Kullback-Leibler divergence for Gaussian distributions of means x and y and variances sig2x and sig2y, $$\nu_1 = \mathcal{N}(x, \sigma_x^2)$$ and $$\nu_2 = \mathcal{N}(y, \sigma_x^2)$$:

$\mathrm{KL}(\nu_1, \nu_2) = \frac{(x - y)^2}{2 \sigma_y^2} + \frac{1}{2}\left( \frac{\sigma_x^2}{\sigma_y^2} - 1 \log\left(\frac{\sigma_x^2}{\sigma_y^2}\right) \right).$
• By default, sig2y is assumed to be sig2x (same variance).

Warning

The C version does not support different variances.

>>> klGauss(3, 3)
0.0
>>> klGauss(3, 6)
18.0
>>> klGauss(1, 2)
2.0
>>> klGauss(2, 1)  # And this KL is symmetric
2.0
>>> klGauss(4, 2)
8.0
>>> klGauss(6, 8)
8.0

• x, y can be negative:

>>> klGauss(-3, 2)
50.0
>>> klGauss(3, -2)
50.0
>>> klGauss(-3, -2)
2.0
>>> klGauss(3, 2)
2.0

• With other values for sig2x:

>>> klGauss(3, 3, sig2x=10)
0.0
>>> klGauss(3, 6, sig2x=10)
0.45
>>> klGauss(1, 2, sig2x=10)
0.05
>>> klGauss(2, 1, sig2x=10)  # And this KL is symmetric
0.05
>>> klGauss(4, 2, sig2x=10)
0.2
>>> klGauss(6, 8, sig2x=10)
0.2

• With different values for sig2x and sig2y:

>>> klGauss(0, 0, sig2x=0.25, sig2y=0.5)
-0.0284...
>>> klGauss(0, 0, sig2x=0.25, sig2y=1.0)
0.2243...
>>> klGauss(0, 0, sig2x=0.5, sig2y=0.25)  # not symmetric here!
1.1534...

>>> klGauss(0, 1, sig2x=0.25, sig2y=0.5)
0.9715...
>>> klGauss(0, 1, sig2x=0.25, sig2y=1.0)
0.7243...
>>> klGauss(0, 1, sig2x=0.5, sig2y=0.25)  # not symmetric here!
3.1534...

>>> klGauss(1, 0, sig2x=0.25, sig2y=0.5)
0.9715...
>>> klGauss(1, 0, sig2x=0.25, sig2y=1.0)
0.7243...
>>> klGauss(1, 0, sig2x=0.5, sig2y=0.25)  # not symmetric here!
3.1534...


Warning

Using Policies.klUCB (and variants) with klGauss() is equivalent to use Policies.UCB, so prefer the simpler version.

Policies.kullback.klucb

The generic KL-UCB index computation.

• x: value of the cum reward,

• d: upper bound on the divergence,

• kl: the KL divergence to be used (klBern(), klGauss(), etc),

• upperbound, lowerbound=float('-inf'): the known bound of the values x,

• precision=1e-6: the threshold from where to stop the research,

• max_iterations=50: max number of iterations of the loop (safer to bound it to reduce time complexity).

$\mathrm{klucb}(x, d) \simeq \sup_{\mathrm{lowerbound} \leq y \leq \mathrm{upperbound}} \{ y : \mathrm{kl}(x, y) < d \}.$

Note

It uses a bisection search, and one call to kl for each step of the bisection search.

For example, for klucbBern(), the two steps are to first compute an upperbound (as precise as possible) and the compute the kl-UCB index:

>>> x, d = 0.9, 0.2   # mean x, exploration term d
>>> upperbound = min(1., klucbGauss(x, d, sig2x=0.25))  # variance 1/4 for [0,1] bounded distributions
>>> upperbound
1.0
>>> klucb(x, d, klBern, upperbound, lowerbound=0, precision=1e-3, max_iterations=10)
0.9941...
>>> klucb(x, d, klBern, upperbound, lowerbound=0, precision=1e-6, max_iterations=10)
0.9944...
>>> klucb(x, d, klBern, upperbound, lowerbound=0, precision=1e-3, max_iterations=50)
0.9941...
>>> klucb(x, d, klBern, upperbound, lowerbound=0, precision=1e-6, max_iterations=100)  # more and more precise!
0.994489...


Note

See below for more examples for different KL divergence functions.

Policies.kullback.klucbBern

KL-UCB index computation for Bernoulli distributions, using klucb().

• Influence of x:

>>> klucbBern(0.1, 0.2)
0.378391...
>>> klucbBern(0.5, 0.2)
0.787088...
>>> klucbBern(0.9, 0.2)
0.994489...

• Influence of d:

>>> klucbBern(0.1, 0.4)
0.519475...
>>> klucbBern(0.1, 0.9)
0.734714...

>>> klucbBern(0.5, 0.4)
0.871035...
>>> klucbBern(0.5, 0.9)
0.956809...

>>> klucbBern(0.9, 0.4)
0.999285...
>>> klucbBern(0.9, 0.9)
0.999995...

Policies.kullback.klucbGauss

KL-UCB index computation for Gaussian distributions.

• Note that it does not require any search.

Warning

it works only if the good variance constant is given.

• Influence of x:

>>> klucbGauss(0.1, 0.2)
0.416227...
>>> klucbGauss(0.5, 0.2)
0.816227...
>>> klucbGauss(0.9, 0.2)
1.216227...

• Influence of d:

>>> klucbGauss(0.1, 0.4)
0.547213...
>>> klucbGauss(0.1, 0.9)
0.770820...

>>> klucbGauss(0.5, 0.4)
0.947213...
>>> klucbGauss(0.5, 0.9)
1.170820...

>>> klucbGauss(0.9, 0.4)
1.347213...
>>> klucbGauss(0.9, 0.9)
1.570820...


Warning

Using Policies.klUCB (and variants) with klucbGauss() is equivalent to use Policies.UCB, so prefer the simpler version.

Policies.kullback.klucbPoisson

KL-UCB index computation for Poisson distributions, using klucb().

• Influence of x:

>>> klucbPoisson(0.1, 0.2)
0.450523...
>>> klucbPoisson(0.5, 0.2)
1.089376...
>>> klucbPoisson(0.9, 0.2)
1.640112...

• Influence of d:

>>> klucbPoisson(0.1, 0.4)
0.693684...
>>> klucbPoisson(0.1, 0.9)
1.252796...

>>> klucbPoisson(0.5, 0.4)
1.422933...
>>> klucbPoisson(0.5, 0.9)
2.122985...

>>> klucbPoisson(0.9, 0.4)
2.033691...
>>> klucbPoisson(0.9, 0.9)
2.831573...

Policies.kullback.klucbExp

KL-UCB index computation for exponential distributions, using klucb().

• Influence of x:

>>> klucbExp(0.1, 0.2)
0.202741...
>>> klucbExp(0.5, 0.2)
1.013706...
>>> klucbExp(0.9, 0.2)
1.824671...

• Influence of d:

>>> klucbExp(0.1, 0.4)
0.285792...
>>> klucbExp(0.1, 0.9)
0.559088...

>>> klucbExp(0.5, 0.4)
1.428962...
>>> klucbExp(0.5, 0.9)
2.795442...

>>> klucbExp(0.9, 0.4)
2.572132...
>>> klucbExp(0.9, 0.9)
5.031795...

Policies.kullback.klucbGamma

KL-UCB index computation for Gamma distributions, using klucb().

• Influence of x:

>>> klucbGamma(0.1, 0.2)
0.202...
>>> klucbGamma(0.5, 0.2)
1.013...
>>> klucbGamma(0.9, 0.2)
1.824...

• Influence of d:

>>> klucbGamma(0.1, 0.4)
0.285...
>>> klucbGamma(0.1, 0.9)
0.559...

>>> klucbGamma(0.5, 0.4)
1.428...
>>> klucbGamma(0.5, 0.9)
2.795...

>>> klucbGamma(0.9, 0.4)
2.572...
>>> klucbGamma(0.9, 0.9)
5.031...

Policies.kullback.kllcb

The generic KL-LCB index computation.

• x: value of the cum reward,

• d: lower bound on the divergence,

• kl: the KL divergence to be used (klBern(), klGauss(), etc),

• lowerbound, upperbound=float('-inf'): the known bound of the values x,

• precision=1e-6: the threshold from where to stop the research,

• max_iterations=50: max number of iterations of the loop (safer to bound it to reduce time complexity).

$\mathrm{kllcb}(x, d) \simeq \inf_{\mathrm{lowerbound} \leq y \leq \mathrm{upperbound}} \{ y : \mathrm{kl}(x, y) > d \}.$

Note

It uses a bisection search, and one call to kl for each step of the bisection search.

For example, for kllcbBern(), the two steps are to first compute an upperbound (as precise as possible) and the compute the kl-UCB index:

>>> x, d = 0.9, 0.2   # mean x, exploration term d
>>> lowerbound = max(0., kllcbGauss(x, d, sig2x=0.25))  # variance 1/4 for [0,1] bounded distributions
>>> lowerbound
0.5837...
>>> kllcb(x, d, klBern, lowerbound, upperbound=0, precision=1e-3, max_iterations=10)
0.29...
>>> kllcb(x, d, klBern, lowerbound, upperbound=0, precision=1e-6, max_iterations=10)
0.29188...
>>> kllcb(x, d, klBern, lowerbound, upperbound=0, precision=1e-3, max_iterations=50)
0.291886...
>>> kllcb(x, d, klBern, lowerbound, upperbound=0, precision=1e-6, max_iterations=100)  # more and more precise!
0.29188611...


Note

See below for more examples for different KL divergence functions.

Policies.kullback.kllcbBern

KL-LCB index computation for Bernoulli distributions, using kllcb().

• Influence of x:

>>> kllcbBern(0.1, 0.2)
0.09999...
>>> kllcbBern(0.5, 0.2)
0.49999...
>>> kllcbBern(0.9, 0.2)
0.89999...

• Influence of d:

>>> kllcbBern(0.1, 0.4)
0.09999...
>>> kllcbBern(0.1, 0.9)
0.09999...

>>> kllcbBern(0.5, 0.4)
0.4999...
>>> kllcbBern(0.5, 0.9)
0.4999...

>>> kllcbBern(0.9, 0.4)
0.8999...
>>> kllcbBern(0.9, 0.9)
0.8999...

Policies.kullback.kllcbGauss

KL-LCB index computation for Gaussian distributions.

• Note that it does not require any search.

Warning

it works only if the good variance constant is given.

• Influence of x:

>>> kllcbGauss(0.1, 0.2)
-0.21622...
>>> kllcbGauss(0.5, 0.2)
0.18377...
>>> kllcbGauss(0.9, 0.2)
0.58377...

• Influence of d:

>>> kllcbGauss(0.1, 0.4)
-0.3472...
>>> kllcbGauss(0.1, 0.9)
-0.5708...

>>> kllcbGauss(0.5, 0.4)
0.0527...
>>> kllcbGauss(0.5, 0.9)
-0.1708...

>>> kllcbGauss(0.9, 0.4)
0.4527...
>>> kllcbGauss(0.9, 0.9)
0.2291...


Warning

Using Policies.kllCB (and variants) with kllcbGauss() is equivalent to use Policies.UCB, so prefer the simpler version.

Policies.kullback.kllcbPoisson

KL-LCB index computation for Poisson distributions, using kllcb().

• Influence of x:

>>> kllcbPoisson(0.1, 0.2)
0.09999...
>>> kllcbPoisson(0.5, 0.2)
0.49999...
>>> kllcbPoisson(0.9, 0.2)
0.89999...

• Influence of d:

>>> kllcbPoisson(0.1, 0.4)
0.09999...
>>> kllcbPoisson(0.1, 0.9)
0.09999...

>>> kllcbPoisson(0.5, 0.4)
0.49999...
>>> kllcbPoisson(0.5, 0.9)
0.49999...

>>> kllcbPoisson(0.9, 0.4)
0.89999...
>>> kllcbPoisson(0.9, 0.9)
0.89999...

Policies.kullback.kllcbExp

KL-LCB index computation for exponential distributions, using kllcb().

• Influence of x:

>>> kllcbExp(0.1, 0.2)
0.15267...
>>> kllcbExp(0.5, 0.2)
0.7633...
>>> kllcbExp(0.9, 0.2)
1.3740...

• Influence of d:

>>> kllcbExp(0.1, 0.4)
0.2000...
>>> kllcbExp(0.1, 0.9)
0.3842...

>>> kllcbExp(0.5, 0.4)
1.0000...
>>> kllcbExp(0.5, 0.9)
1.9214...

>>> kllcbExp(0.9, 0.4)
1.8000...
>>> kllcbExp(0.9, 0.9)
3.4586...

Policies.kullback.maxEV(p, V, klMax)

Maximize expectation of $$V$$ with respect to $$q$$ st. $$\mathrm{KL}(p, q) < \text{klMax}$$.

Policies.kullback.reseqp(p, V, klMax, max_iterations=50)

Solve f(reseqp(p, V, klMax)) = klMax, using Newton method.

Note

This is a subroutine of maxEV().

Warning

np.dot is very slow!

Policies.kullback.reseqp2(p, V, klMax)

Solve f(reseqp(p, V, klMax)) = klMax, using a blackbox minimizer, from scipy.optimize.

• FIXME it does not work well yet!

Note

This is a subroutine of maxEV().

• Reference: Eq. (4) in Section 3.2 of [Filippi, Cappé & Garivier - Allerton, 2011].

Warning

np.dot is very slow!