Table of Contents¶

1 Easily creating MAB problems

1.1 Constant arms

1.2 Bernoulli arms

1.3 Gaussian arms

1.3.1 Wrong means for Gaussian arms ?

1.3.2 Closed form formula

1.3.3 With a larger variance ?

1.4 Exponential arms

1.5 Uniform arms

1.6 Arms with rewards outside of [0,1]

$[0, 1]$

1.7 Gamma arms

1.8 Non-truncated Gaussian and Gamma arms

1.9 Conclusion

Easily creating MAB problems¶

First, be sure to be in the main folder, or to have installed `SMPyBandits <https://github.com/SMPyBandits/SMPyBandits>`__, and import MAB from Environment package:

[1]:

!pip install SMPyBandits watermark
%load_ext watermark
%watermark -v -m -p SMPyBandits -a "Lilian Besson"

Requirement already satisfied: SMPyBandits in ./venv3/lib/python3.6/site-packages (0.9.4)
Requirement already satisfied: watermark in ./venv3/lib/python3.6/site-packages (1.7.0)
Requirement already satisfied: scikit-learn in ./venv3/lib/python3.6/site-packages (from SMPyBandits) (0.20.0)
Requirement already satisfied: scikit-optimize in ./venv3/lib/python3.6/site-packages (from SMPyBandits) (0.5.2)
Requirement already satisfied: scipy>0.9 in ./venv3/lib/python3.6/site-packages (from SMPyBandits) (1.1.0)
Requirement already satisfied: numpy in ./venv3/lib/python3.6/site-packages (from SMPyBandits) (1.15.4)
Requirement already satisfied: matplotlib>=2 in ./venv3/lib/python3.6/site-packages (from SMPyBandits) (3.0.2)
Requirement already satisfied: seaborn in ./venv3/lib/python3.6/site-packages (from SMPyBandits) (0.9.0)
Requirement already satisfied: joblib in ./venv3/lib/python3.6/site-packages (from SMPyBandits) (0.13.0)
Requirement already satisfied: ipython in ./venv3/lib/python3.6/site-packages (from watermark) (7.1.1)
Requirement already satisfied: pyparsing!=2.0.4,!=2.1.2,!=2.1.6,>=2.0.1 in ./venv3/lib/python3.6/site-packages (from matplotlib>=2->SMPyBandits) (2.3.0)
Requirement already satisfied: kiwisolver>=1.0.1 in ./venv3/lib/python3.6/site-packages (from matplotlib>=2->SMPyBandits) (1.0.1)
Requirement already satisfied: python-dateutil>=2.1 in ./venv3/lib/python3.6/site-packages (from matplotlib>=2->SMPyBandits) (2.7.5)
Requirement already satisfied: cycler>=0.10 in ./venv3/lib/python3.6/site-packages (from matplotlib>=2->SMPyBandits) (0.10.0)
Requirement already satisfied: pandas>=0.15.2 in ./venv3/lib/python3.6/site-packages (from seaborn->SMPyBandits) (0.23.4)
Requirement already satisfied: setuptools>=18.5 in ./venv3/lib/python3.6/site-packages (from ipython->watermark) (40.6.2)
Requirement already satisfied: pexpect; sys_platform != "win32" in ./venv3/lib/python3.6/site-packages (from ipython->watermark) (4.6.0)
Requirement already satisfied: jedi>=0.10 in ./venv3/lib/python3.6/site-packages (from ipython->watermark) (0.13.1)
Requirement already satisfied: backcall in ./venv3/lib/python3.6/site-packages (from ipython->watermark) (0.1.0)
Requirement already satisfied: traitlets>=4.2 in ./venv3/lib/python3.6/site-packages (from ipython->watermark) (4.3.2)
Requirement already satisfied: pickleshare in ./venv3/lib/python3.6/site-packages (from ipython->watermark) (0.7.5)
Requirement already satisfied: pygments in ./venv3/lib/python3.6/site-packages (from ipython->watermark) (2.2.0)
Requirement already satisfied: prompt-toolkit<2.1.0,>=2.0.0 in ./venv3/lib/python3.6/site-packages (from ipython->watermark) (2.0.7)
Requirement already satisfied: decorator in ./venv3/lib/python3.6/site-packages (from ipython->watermark) (4.3.0)
Requirement already satisfied: six>=1.5 in ./venv3/lib/python3.6/site-packages (from python-dateutil>=2.1->matplotlib>=2->SMPyBandits) (1.11.0)
Requirement already satisfied: pytz>=2011k in ./venv3/lib/python3.6/site-packages (from pandas>=0.15.2->seaborn->SMPyBandits) (2018.7)
Requirement already satisfied: ptyprocess>=0.5 in ./venv3/lib/python3.6/site-packages (from pexpect; sys_platform != "win32"->ipython->watermark) (0.6.0)
Requirement already satisfied: parso>=0.3.0 in ./venv3/lib/python3.6/site-packages (from jedi>=0.10->ipython->watermark) (0.3.1)
Requirement already satisfied: ipython-genutils in ./venv3/lib/python3.6/site-packages (from traitlets>=4.2->ipython->watermark) (0.2.0)
Requirement already satisfied: wcwidth in ./venv3/lib/python3.6/site-packages (from prompt-toolkit<2.1.0,>=2.0.0->ipython->watermark) (0.1.7)
Info: Using the Jupyter notebook version of the tqdm() decorator, tqdm_notebook() ...
Lilian Besson

CPython 3.6.6
IPython 7.1.1

SMPyBandits 0.9.4

compiler   : GCC 8.0.1 20180414 (experimental) [trunk revision 259383
system     : Linux
release    : 4.15.0-38-generic
machine    : x86_64
processor  : x86_64
CPU cores  : 4
interpreter: 64bit

[2]:

from SMPyBandits.Environment import MAB

And also, import all the types of arms.

[3]:

from SMPyBandits.Arms import *
# Check it exists:
Constant, Bernoulli, Gaussian, Exponential, ExponentialFromMean, Poisson, UniformArm, Gamma, GammaFromMean

[3]:

(SMPyBandits.Arms.Constant.Constant,
 SMPyBandits.Arms.Bernoulli.Bernoulli,
 SMPyBandits.Arms.Gaussian.Gaussian,
 SMPyBandits.Arms.Exponential.Exponential,
 SMPyBandits.Arms.Exponential.ExponentialFromMean,
 SMPyBandits.Arms.Poisson.Poisson,
 SMPyBandits.Arms.UniformArm.UniformArm,
 SMPyBandits.Arms.Gamma.Gamma,
 SMPyBandits.Arms.Gamma.GammaFromMean)

[4]:

import matplotlib as mpl
mpl.rcParams['figure.figsize'] = (12.4, 7)

Constant arms¶

This is the simpler example of arms : rewards are constant, and not randomly drawn from a distribution. Let consider an example with $K = 3$ arms.

[5]:

M_C = MAB([Constant(mu) for mu in [0.1, 0.5, 0.9]])



Creating a new MAB problem ...
  Taking arms of this MAB problem from a list of arms 'configuration' = [Constant(0.1), Constant(0.5), Constant(0.9)] ...
 - with 'arms' = [Constant(0.1), Constant(0.5), Constant(0.9)]
 - with 'means' = [0.1 0.5 0.9]
 - with 'nbArms' = 3
 - with 'maxArm' = 0.9
 - with 'minArm' = 0.1

This MAB problem has:
 - a [Lai & Robbins] complexity constant C(mu) = 2 ...
 - a Optimal Arm Identification factor H_OI(mu) = 26.67% ...
 - with 'arms' represented as: $[Constant(0.1), Constant(0.5), Constant(0.9)^*]$

The plotHistogram() method draws samples from each arm, and plot a histogram of their repartition. For constant arms, no need to take a lot of samples as they are constant.

[6]:

_ = M_C.plotHistogram(10)

Warning: forcing to use putatright = False because there is 3 items in the legend.

../_images/notebooks_Easily_creating_MAB_problems_10_1.png

Bernoulli arms¶

Then it’s easy to create a Multi-Armed Bandit problem, instance of MAB class, either from a list of Arm objects:

[7]:

M_B = MAB([Bernoulli(mu) for mu in [0.1, 0.5, 0.9]])



Creating a new MAB problem ...
  Taking arms of this MAB problem from a list of arms 'configuration' = [B(0.1), B(0.5), B(0.9)] ...
 - with 'arms' = [B(0.1), B(0.5), B(0.9)]
 - with 'means' = [0.1 0.5 0.9]
 - with 'nbArms' = 3
 - with 'maxArm' = 0.9
 - with 'minArm' = 0.1

This MAB problem has:
 - a [Lai & Robbins] complexity constant C(mu) = 1.24 ...
 - a Optimal Arm Identification factor H_OI(mu) = 26.67% ...
 - with 'arms' represented as: $[B(0.1), B(0.5), B(0.9)^*]$

Or from a dictionary, with keys "arm_type" and "params":

[8]:

M_B = MAB({
    "arm_type": Bernoulli,
    "params": [0.1, 0.5, 0.9]
})



Creating a new MAB problem ...
  Reading arms of this MAB problem from a dictionnary 'configuration' = {'arm_type': <class 'SMPyBandits.Arms.Bernoulli.Bernoulli'>, 'params': [0.1, 0.5, 0.9]} ...
 - with 'arm_type' = <class 'SMPyBandits.Arms.Bernoulli.Bernoulli'>
 - with 'params' = [0.1, 0.5, 0.9]
 - with 'arms' = [B(0.1), B(0.5), B(0.9)]
 - with 'means' = [0.1 0.5 0.9]
 - with 'nbArms' = 3
 - with 'maxArm' = 0.9
 - with 'minArm' = 0.1

This MAB problem has:
 - a [Lai & Robbins] complexity constant C(mu) = 1.24 ...
 - a Optimal Arm Identification factor H_OI(mu) = 26.67% ...
 - with 'arms' represented as: $[B(0.1), B(0.5), B(0.9)^*]$

The plotHistogram() method draws a lot of samples from each arm, and plot a histogram of their repartition:

[9]:

_ = M_B.plotHistogram()

Warning: forcing to use putatright = False because there is 3 items in the legend.

../_images/notebooks_Easily_creating_MAB_problems_16_1.png

Gaussian arms¶

And with Gaussian arms, with a small variance of $\sigma^2 = 0.05$, for rewards truncated into $[0, 1]$:

[10]:

M_G = MAB([Gaussian(mu, sigma=0.05) for mu in [0.1, 0.5, 0.9]])



Creating a new MAB problem ...
  Taking arms of this MAB problem from a list of arms 'configuration' = [N(0.1, 0.05), N(0.5, 0.05), N(0.9, 0.05)] ...
 - with 'arms' = [N(0.1, 0.05), N(0.5, 0.05), N(0.9, 0.05)]
 - with 'means' = [0.1 0.5 0.9]
 - with 'nbArms' = 3
 - with 'maxArm' = 0.9
 - with 'minArm' = 0.1

This MAB problem has:
 - a [Lai & Robbins] complexity constant C(mu) = 0.375 ...
 - a Optimal Arm Identification factor H_OI(mu) = 26.67% ...
 - with 'arms' represented as: $[N(0.1), N(0.5), N(0.9)^*], \sigma^2=0.05$

The histogram clearly shows that low-variance Gaussian arms are easy to separate:

[11]:

_ = M_G.plotHistogram(100000)

Warning: forcing to use putatright = False because there is 3 items in the legend.

../_images/notebooks_Easily_creating_MAB_problems_20_1.png

Wrong means for Gaussian arms ?¶

The truncation seems to change the means.

For instance, the first arm (in red) has a small mass on the special value $0$, so it probably reduces its mean.

Let’s estimate it empirically, and then check with the closed form solution.

[12]:

arm = Gaussian(0.1, sigma=0.05)

[13]:

mean = arm.mean
estimated_mean = np.mean(arm.draw_nparray((10000000,)))

[14]:

mean, estimated_mean

[14]:

(0.1, 0.10042691516597835)

[15]:

def relative_error(x, y):
    return abs(x - y) / x

relative_error(mean, estimated_mean)

[15]:

0.004269151659783421

$\implies$ That’s a relative difference of $0.4\%$, really negligible!

And for other values for $(\mu, \sigma)$:

[16]:

arm = Gaussian(0.7, sigma=3)

[17]:

mean = arm.mean
estimated_mean = np.mean(arm.draw_nparray((10000000,)))

[18]:

mean, estimated_mean

[18]:

(0.7, 0.5266068636711595)

[19]:

relative_error(mean, estimated_mean)

[19]:

0.2477044804697721

$\implies$ That’s a relative difference of $25\%$!

Clearly, this effect cannot be neglected!

Closed form formula¶

Apparently, the closed form formula for the mean of a Gaussian arm $\mathcal{N}(\mu, \sigma)$, truncated to :math:`[a,b]` is :

\[\mathbb{E} (X\mid a<X<b)=\mu +\sigma {\frac {\phi ({\frac {a-\mu }{\sigma }})-\phi ({\frac {b-\mu }{\sigma }})}{\Phi ({\frac {b-\mu }{\sigma }})-\Phi ({\frac {a-\mu }{\sigma }})}}\!=\mu +\sigma {\frac {\phi (\alpha )-\phi (\beta )}{\Phi (\beta )-\Phi (\alpha )}}.\]

Let’s compute that.

[20]:

import numpy as np
from scipy.special import erf

The fonction

\[\phi(x) := \frac{1}{\sqrt{2 \pi}} \exp\left(- \frac{1}{2} x^2 \right).\]

[21]:

def phi(xi):
    r"""The :math:`\phi(\xi)` function, defined by:

    .. math:: \phi(\xi) := \frac{1}{\sqrt{2 \pi}} \exp\left(- \frac12 \xi^2 \right)

    It is the probability density function of the standard normal distribution, see https://en.wikipedia.org/wiki/Standard_normal_distribution.
    """
    return np.exp(- 0.5 * xi**2) / np.sqrt(2. * np.pi)

The fonction

\[\Phi(x) := \frac{1}{2} \left(1 + \mathrm{erf}\left( \frac{x}{\sqrt{2}} \right) \right).\]

[22]:

def Phi(x):
    r"""The :math:`\Phi(x)` function, defined by:

    .. math:: \Phi(x) := \frac{1}{2} \left(1 + \mathrm{erf}\left( \frac{x}{\sqrt{2}} \right) \right).

    It is the probability density function of the standard normal distribution, see https://en.wikipedia.org/wiki/Cumulative_distribution_function
    """
    return (1. + erf(x / np.sqrt(2.))) / 2.

[23]:

mu, sigma, mini, maxi = arm.mu, arm.sigma, arm.min, arm.max
mu, sigma, mini, maxi

[23]:

(0.7, 3, 0, 1)

[24]:

other_mean = mu + sigma * (phi(mini) - phi(maxi)) / (Phi(maxi) - Phi(mini))

[25]:

mean, estimated_mean, other_mean

[25]:

(0.7, 0.5266068636711595, 2.0795866878592797)

Well, apparently, the theoretical formula is false for this case. It is not even bounded in $[0, 1]$!

Let’s forget about this possible issue, and consider that the mean $\mu$ of a Gaussian arm $\mathcal{N}(\mu, \sigma)$ truncated to $[0,1]$ is indeed $\mu$.

With a larger variance ?¶

But if the variance is larger, it can be very hard to differentiate between arms, and so MAB learning will be harder. With a big variance of $\sigma^2 = 0.5$, for rewards truncated into $[0, 1]$:

[26]:

M_G = MAB([Gaussian(mu, sigma=0.10) for mu in [0.1, 0.5, 0.9]])
_ = M_G.plotHistogram(100000)



Creating a new MAB problem ...
  Taking arms of this MAB problem from a list of arms 'configuration' = [N(0.1, 0.1), N(0.5, 0.1), N(0.9, 0.1)] ...
 - with 'arms' = [N(0.1, 0.1), N(0.5, 0.1), N(0.9, 0.1)]
 - with 'means' = [0.1 0.5 0.9]
 - with 'nbArms' = 3
 - with 'maxArm' = 0.9
 - with 'minArm' = 0.1

This MAB problem has:
 - a [Lai & Robbins] complexity constant C(mu) = 0.75 ...
 - a Optimal Arm Identification factor H_OI(mu) = 26.67% ...
 - with 'arms' represented as: $[N(0.1), N(0.5), N(0.9)^*], \sigma^2=0.1$
Warning: forcing to use putatright = False because there is 3 items in the legend.

../_images/notebooks_Easily_creating_MAB_problems_44_1.png

We see that due to the truncation, if mean of the Gaussian is too close to $0$ or $1$, then actual mean rewards is pushed to $0$ or $1$ (here the blue arm clearly has a mean higher than $0.9$).

And for larger variances, it is even stronger:

[27]:

M_G = MAB([Gaussian(mu, sigma=0.25) for mu in [0.1, 0.5, 0.9]])
_ = M_G.plotHistogram()



Creating a new MAB problem ...
  Taking arms of this MAB problem from a list of arms 'configuration' = [N(0.1, 0.25), N(0.5, 0.25), N(0.9, 0.25)] ...
 - with 'arms' = [N(0.1, 0.25), N(0.5, 0.25), N(0.9, 0.25)]
 - with 'means' = [0.1 0.5 0.9]
 - with 'nbArms' = 3
 - with 'maxArm' = 0.9
 - with 'minArm' = 0.1

This MAB problem has:
 - a [Lai & Robbins] complexity constant C(mu) = 1.87 ...
 - a Optimal Arm Identification factor H_OI(mu) = 26.67% ...
 - with 'arms' represented as: $[N(0.1), N(0.5), N(0.9)^*], \sigma^2=0.25$
Warning: forcing to use putatright = False because there is 3 items in the legend.

../_images/notebooks_Easily_creating_MAB_problems_46_1.png

Exponential arms¶

We can do the same with (truncated) Exponential arms, and as a convenience I prefer to work with ExponentialFromMean, to use the mean and not the $\lambda$ parameter to create the arm.

[28]:

M_E = MAB({ "arm_type": ExponentialFromMean, "params": [0.1, 0.5, 0.9]})



Creating a new MAB problem ...
  Reading arms of this MAB problem from a dictionnary 'configuration' = {'arm_type': <class 'SMPyBandits.Arms.Exponential.ExponentialFromMean'>, 'params': [0.1, 0.5, 0.9]} ...
 - with 'arm_type' = <class 'SMPyBandits.Arms.Exponential.ExponentialFromMean'>
 - with 'params' = [0.1, 0.5, 0.9]
 - with 'arms' = [\mathrm{Exp}(10, 1), \mathrm{Exp}(1.59, 1), \mathrm{Exp}(0.215, 1)]
 - with 'means' = [0.1 0.5 0.9]
 - with 'nbArms' = 3
 - with 'maxArm' = 0.9000000032329611
 - with 'minArm' = 0.10000000005466392

This MAB problem has:
 - a [Lai & Robbins] complexity constant C(mu) = 3.4 ...
 - a Optimal Arm Identification factor H_OI(mu) = 26.67% ...
 - with 'arms' represented as: $[\mathrm{Exp}(10, 1), \mathrm{Exp}(1.59, 1), \mathrm{Exp}(0.215, 1)^*]$

[29]:

_ = M_E.plotHistogram()

Warning: forcing to use putatright = False because there is 3 items in the legend.

../_images/notebooks_Easily_creating_MAB_problems_49_1.png

Uniform arms¶

Arms with rewards uniform in $[0,1]$, are continuous versions of Bernoulli$(0.5)$. They can also be uniform in other intervals.

[30]:

UniformArm(0, 1).lower_amplitude
UniformArm(0, 0.1).lower_amplitude
UniformArm(0.4, 0.5).lower_amplitude
UniformArm(0.8, 0.9).lower_amplitude

[30]:

(0, 1)

[30]:

(0, 0.1)

[30]:

(0.4, 0.09999999999999998)

[30]:

(0.8, 0.09999999999999998)

[31]:

M_U = MAB([UniformArm(0, 1), UniformArm(0, 0.1), UniformArm(0.4, 0.5), UniformArm(0.8, 0.9)])



Creating a new MAB problem ...
  Taking arms of this MAB problem from a list of arms 'configuration' = [U(0, 1), U(0, 0.1), U(0.4, 0.5), U(0.8, 0.9)] ...
 - with 'arms' = [U(0, 1), U(0, 0.1), U(0.4, 0.5), U(0.8, 0.9)]
 - with 'means' = [0.5  0.05 0.45 0.85]
 - with 'nbArms' = 4
 - with 'maxArm' = 0.8500000000000001
 - with 'minArm' = 0.05

This MAB problem has:
 - a [Lai & Robbins] complexity constant C(mu) = 2.47 ...
 - a Optimal Arm Identification factor H_OI(mu) = 36.25% ...
 - with 'arms' represented as: $[U(0, 1), U(0, 0.1), U(0.4, 0.5), U(0.8, 0.9)^*]$

[32]:

_ = M_U.plotHistogram(100000)

Warning: forcing to use putatright = False because there is 4 items in the legend.

../_images/notebooks_Easily_creating_MAB_problems_53_1.png

Arms with rewards outside of $[0, 1]$¶

Of course, everything work similarly if rewards are not in $[0, 1]$ but in any interval $[a, b]$.

Note that all my algorithms assume $a = \text{lower} = 0$ and $b = 1$ (and use $\text{amplitude} = b - a$ instead of $b$). They just need to be specified if we stop using the default choice $[0, 1]$.

For example, Gaussian arms can be truncated into $[-10, 10]$ instead of $[0, 1]$. Let define some Gaussian arms, with means $-5, 0, 5$ and a variance of $\sigma^2 = 2$.

[33]:

M_G = MAB([Gaussian(mu, sigma=2, mini=-10, maxi=10) for mu in [-5, 0, 5]])



Creating a new MAB problem ...
  Taking arms of this MAB problem from a list of arms 'configuration' = [N(-5, 2), N(0, 2), N(5, 2)] ...
 - with 'arms' = [N(-5, 2), N(0, 2), N(5, 2)]
 - with 'means' = [-5  0  5]
 - with 'nbArms' = 3
 - with 'maxArm' = 5
 - with 'minArm' = -5

This MAB problem has:
 - a [Lai & Robbins] complexity constant C(mu) = 1.2 ...
 - a Optimal Arm Identification factor H_OI(mu) = 16.67% ...
 - with 'arms' represented as: $[N(-5), N(0), N(5)^*], \sigma^2=2$

[34]:

_ = M_G.plotHistogram(100000)

Warning: forcing to use putatright = False because there is 3 items in the legend.

../_images/notebooks_Easily_creating_MAB_problems_56_1.png

[35]:

M_G = MAB([Gaussian(mu, sigma=0.1, mini=-10, maxi=10) for mu in [-5, 0, 5]])



Creating a new MAB problem ...
  Taking arms of this MAB problem from a list of arms 'configuration' = [N(-5, 0.1), N(0, 0.1), N(5, 0.1)] ...
 - with 'arms' = [N(-5, 0.1), N(0, 0.1), N(5, 0.1)]
 - with 'means' = [-5  0  5]
 - with 'nbArms' = 3
 - with 'maxArm' = 5
 - with 'minArm' = -5

This MAB problem has:
 - a [Lai & Robbins] complexity constant C(mu) = 0.06 ...
 - a Optimal Arm Identification factor H_OI(mu) = 16.67% ...
 - with 'arms' represented as: $[N(-5), N(0), N(5)^*], \sigma^2=0.1$

[36]:

_ = M_G.plotHistogram()

Warning: forcing to use putatright = False because there is 3 items in the legend.

../_images/notebooks_Easily_creating_MAB_problems_58_1.png

Gamma arms¶

We can do the same with (truncated) Gamma arms, and as a convenience I prefer to work with GammaFromMean, to use the mean and not the $k$ shape parameter to create the arm. The scale $\theta$ is fixed to $1$ by default, and here the rewards will be in $[0, 10]$.

[37]:

M_Gamma = MAB([GammaFromMean(shape, scale=1, mini=0, maxi=10) for shape in [1, 2, 3, 4, 5]])



Creating a new MAB problem ...
  Taking arms of this MAB problem from a list of arms 'configuration' = [\Gamma(1, 1), \Gamma(2, 1), \Gamma(3, 1), \Gamma(4, 1), \Gamma(5, 1)] ...
 - with 'arms' = [\Gamma(1, 1), \Gamma(2, 1), \Gamma(3, 1), \Gamma(4, 1), \Gamma(5, 1)]
 - with 'means' = [1. 2. 3. 4. 5.]
 - with 'nbArms' = 5
 - with 'maxArm' = 5.0
 - with 'minArm' = 1.0

This MAB problem has:
 - a [Lai & Robbins] complexity constant C(mu) = 75.7 ...
 - a Optimal Arm Identification factor H_OI(mu) = 60.00% ...
 - with 'arms' represented as: $[\Gamma(1, 1), \Gamma(2, 1), \Gamma(3, 1), \Gamma(4, 1), \Gamma(5, 1)^*]$

[38]:

_ = M_Gamma.plotHistogram(100000)

Warning: forcing to use putatright = False because there is 5 items in the legend.

../_images/notebooks_Easily_creating_MAB_problems_61_1.png

As for Gaussian arms, the truncation is strongly changing the means of the arm rewards. Here the arm with mean parameter $5$ has an empirical mean close to $10$ due to truncation.

Non-truncated Gaussian and Gamma arms¶

Let try with non-truncated rewards.

[39]:

M_G = MAB([Gaussian(mu, sigma=3, mini=float('-inf'), maxi=float('+inf')) for mu in [-10, 0, 10]])



Creating a new MAB problem ...
  Taking arms of this MAB problem from a list of arms 'configuration' = [N(-10, 3), N(0, 3), N(10, 3)] ...
 - with 'arms' = [N(-10, 3), N(0, 3), N(10, 3)]
 - with 'means' = [-10   0  10]
 - with 'nbArms' = 3
 - with 'maxArm' = 10
 - with 'minArm' = -10

This MAB problem has:
 - a [Lai & Robbins] complexity constant C(mu) = 0.9 ...
 - a Optimal Arm Identification factor H_OI(mu) = 66.67% ...
 - with 'arms' represented as: $[N(-10), N(0), N(10)^*], \sigma^2=3$

[40]:

_ = M_G.plotHistogram(100000)

Warning: forcing to use putatright = False because there is 3 items in the legend.

../_images/notebooks_Easily_creating_MAB_problems_65_1.png

And with non-truncated Gamma arms ?

[41]:

M_Gamma = MAB([GammaFromMean(shape, scale=1, mini=float('-inf'), maxi=float('+inf')) for shape in [1, 2, 3, 4, 5]])
_ = M_Gamma.plotHistogram(100000)



Creating a new MAB problem ...
  Taking arms of this MAB problem from a list of arms 'configuration' = [\Gamma(1, 1), \Gamma(2, 1), \Gamma(3, 1), \Gamma(4, 1), \Gamma(5, 1)] ...
 - with 'arms' = [\Gamma(1, 1), \Gamma(2, 1), \Gamma(3, 1), \Gamma(4, 1), \Gamma(5, 1)]
 - with 'means' = [1. 2. 3. 4. 5.]
 - with 'nbArms' = 5
 - with 'maxArm' = 5.0
 - with 'minArm' = 1.0

This MAB problem has:
 - a [Lai & Robbins] complexity constant C(mu) = 75.7 ...
 - a Optimal Arm Identification factor H_OI(mu) = 80.00% ...
 - with 'arms' represented as: $[\Gamma(1, 1), \Gamma(2, 1), \Gamma(3, 1), \Gamma(4, 1), \Gamma(5, 1)^*]$
Warning: forcing to use putatright = False because there is 5 items in the legend.

../_images/notebooks_Easily_creating_MAB_problems_67_1.png

[42]:

M_Gamma = MAB([GammaFromMean(shape, scale=1, mini=float('-inf'), maxi=float('+inf')) for shape in [10, 20, 30, 40, 50]])
_ = M_Gamma.plotHistogram(1000000)



Creating a new MAB problem ...
  Taking arms of this MAB problem from a list of arms 'configuration' = [\Gamma(10, 1), \Gamma(20, 1), \Gamma(30, 1), \Gamma(40, 1), \Gamma(50, 1)] ...
 - with 'arms' = [\Gamma(10, 1), \Gamma(20, 1), \Gamma(30, 1), \Gamma(40, 1), \Gamma(50, 1)]
 - with 'means' = [10. 20. 30. 40. 50.]
 - with 'nbArms' = 5
 - with 'maxArm' = 50.0
 - with 'minArm' = 10.0

This MAB problem has:
 - a [Lai & Robbins] complexity constant C(mu) = 757 ...
 - a Optimal Arm Identification factor H_OI(mu) = 80.00% ...
 - with 'arms' represented as: $[\Gamma(10, 1), \Gamma(20, 1), \Gamma(30, 1), \Gamma(40, 1), \Gamma(50, 1)^*]$
Warning: forcing to use putatright = False because there is 5 items in the legend.

../_images/notebooks_Easily_creating_MAB_problems_68_1.png

Conclusion¶

This small notebook demonstrated how to define arms and Multi-Armed Bandit problems in my framework, SMPyBandits.