Environment.MAB module¶
MAB
, MarkovianMAB
, ChangingAtEachRepMAB
, IncreasingMAB
, PieceWiseStationaryMAB
and NonStationaryMAB
classes to wrap the arms of some MultiArmed Bandit problems.
Such class has to have at least these methods:
draw(armId, t)
to draw one sample from thatarmId
at timet
,and
reprarms()
to pretty print the arms (for titles of a plot),and more, see below.
Warning
FIXME it is still a work in progress, I need to add continuously varying environments. See https://github.com/SMPyBandits/SMPyBandits/issues/71

class
Environment.MAB.
MAB
(configuration)[source]¶ Bases:
object
Basic MultiArmed Bandit problem, for stochastic and i.i.d. arms.
configuration can be a dict with ‘arm_type’ and ‘params’ keys. ‘arm_type’ is a class from the Arms module, and ‘params’ is a dict, used as a list/tuple/iterable of named parameters given to ‘arm_type’. Example:
configuration = { 'arm_type': Bernoulli, 'params': [0.1, 0.5, 0.9] } configuration = { # for fixed variance Gaussian 'arm_type': Gaussian, 'params': [0.1, 0.5, 0.9] }
But it can also accept a list of already created arms:
configuration = [ Bernoulli(0.1), Bernoulli(0.5), Bernoulli(0.9), ]
Both will create three Bernoulli arms, of parameters (means) 0.1, 0.5 and 0.9.

isChangingAtEachRepetition
= None¶ Flag to know if the problem is changing at each repetition or not.

isDynamic
= None¶ Flag to know if the problem is static or not.

isMarkovian
= None¶ Flag to know if the problem is Markovian or not.

arms
= None¶ List of arms

means
= None¶ Means of arms

nbArms
= None¶ Number of arms

maxArm
= None¶ Max mean of arms

minArm
= None¶ Min mean of arms

new_order_of_arm
(arms)[source]¶ Feed a new order of the arms to the environment.
Updates
means
correctly.Return the new position(s) of the best arm (to count and plot
BestArmPulls
correctly).
Warning
This is a very limited support of nonstationary environment: only permutations of the arms are allowed, see
NonStationaryMAB
for more.

reprarms
(nbPlayers=None, openTag='', endTag='^*', latex=True)[source]¶ Return a str representation of the list of the arms (like repr(self.arms) but better)
If nbPlayers > 0, it surrounds the representation of the best arms by openTag, endTag (for plot titles, in a multiplayer setting).
Example: openTag = ‘’, endTag = ‘^*’ for LaTeX tags to put a star exponent.
Example: openTag = ‘<red>’, endTag = ‘</red>’ for HTMLlike tags.
Example: openTag = r’ extcolor{red}{‘, endTag = ‘}’ for LaTeX tags.

draw
(armId, t=1)[source]¶ Return a random sample from the armIdth arm, at time t. Usually t is not used.

draw_nparray
(armId, shape=(1, ))[source]¶ Return a numpy array of random sample from the armIdth arm, of a certain shape.

draw_each_nparray
(shape=(1, ))[source]¶ Return a numpy array of random sample from each arm, of a certain shape.

get_minArm
(horizon=None)[source]¶ Return the vector of min mean of the arms.
It is a vector of length horizon.

get_maxArm
(horizon=None)[source]¶ Return the vector of max mean of the arms.
It is a vector of length horizon.

get_maxArms
(M=1, horizon=None)[source]¶ Return the vector of sum of the Mbest means of the arms.
It is a vector of length horizon.

get_allMeans
(horizon=None)[source]¶ Return the vector of means of the arms.
It is a numpy array of shape (nbArms, horizon).

property
sparsity
¶ Estimate the sparsity of the problem, i.e., the number of arms with positive means.

str_sparsity
()[source]¶ Empty string if
sparsity = nbArms
, or a small string ‘, $s={}$’ if the sparsity is strictly less than the number of arm.

lowerbound
()[source]¶ Compute the constant \(C(\mu)\), for the [Lai & Robbins] lowerbound for this MAB problem (complexity), using functions from
kullback.py
orkullback.so
(seeArms.kullback
).

lowerbound_sparse
(sparsity=None)[source]¶ Compute the constant \(C(\mu)\), for [Kwon et al, 2017] lowerbound for sparse bandits for this MAB problem (complexity)
I recomputed suboptimal solution to the optimization problem, and found the same as in [[“Sparse Stochastic Bandits”, by J. Kwon, V. Perchet & C. Vernade, COLT 2017](https://arxiv.org/abs/1706.01383)].

hoifactor
()[source]¶ Compute the HOI factor H_OI(mu), the Optimal Arm Identification (OI) factor, for this MAB problem (complexity). Cf. (3.3) in Navikkumar MODI’s thesis, “Machine Learning and Statistical Decision Making for Green Radio” (2017).

lowerbound_multiplayers
(nbPlayers=1)[source]¶ Compute our multiplayers lower bound for this MAB problem (complexity), using functions from
kullback
.

upperbound_collisions
(nbPlayers, times)[source]¶ Compute Anandkumar et al. multiplayers upper bound for this MAB problem (complexity), for UCB only. Warning: it is HIGHLY asymptotic!

plotComparison_our_anandkumar
(savefig=None)[source]¶ Plot a comparison of our lowerbound and their lowerbound.

plotHistogram
(horizon=10000, savefig=None, bins=50, alpha=0.9, density=None)[source]¶ Plot a horizon=10000 draws of each arms.

__dict__
= mappingproxy({'__module__': 'Environment.MAB', '__doc__': " Basic MultiArmed Bandit problem, for stochastic and i.i.d. arms.\n\n  configuration can be a dict with 'arm_type' and 'params' keys. 'arm_type' is a class from the Arms module, and 'params' is a dict, used as a list/tuple/iterable of named parameters given to 'arm_type'. Example::\n\n configuration = {\n 'arm_type': Bernoulli,\n 'params': [0.1, 0.5, 0.9]\n }\n\n configuration = { # for fixed variance Gaussian\n 'arm_type': Gaussian,\n 'params': [0.1, 0.5, 0.9]\n }\n\n  But it can also accept a list of already created arms::\n\n configuration = [\n Bernoulli(0.1),\n Bernoulli(0.5),\n Bernoulli(0.9),\n ]\n\n  Both will create three Bernoulli arms, of parameters (means) 0.1, 0.5 and 0.9.\n ", '__init__': <function MAB.__init__>, 'new_order_of_arm': <function MAB.new_order_of_arm>, '__repr__': <function MAB.__repr__>, 'reprarms': <function MAB.reprarms>, 'draw': <function MAB.draw>, 'draw_nparray': <function MAB.draw_nparray>, 'draw_each': <function MAB.draw_each>, 'draw_each_nparray': <function MAB.draw_each_nparray>, 'Mbest': <function MAB.Mbest>, 'Mworst': <function MAB.Mworst>, 'sumBestMeans': <function MAB.sumBestMeans>, 'get_minArm': <function MAB.get_minArm>, 'get_maxArm': <function MAB.get_maxArm>, 'get_maxArms': <function MAB.get_maxArms>, 'get_allMeans': <function MAB.get_allMeans>, 'sparsity': <property object>, 'str_sparsity': <function MAB.str_sparsity>, 'lowerbound': <function MAB.lowerbound>, 'lowerbound_sparse': <function MAB.lowerbound_sparse>, 'hoifactor': <function MAB.hoifactor>, 'lowerbound_multiplayers': <function MAB.lowerbound_multiplayers>, 'upperbound_collisions': <function MAB.upperbound_collisions>, 'plotComparison_our_anandkumar': <function MAB.plotComparison_our_anandkumar>, 'plotHistogram': <function MAB.plotHistogram>, '__dict__': <attribute '__dict__' of 'MAB' objects>, '__weakref__': <attribute '__weakref__' of 'MAB' objects>})¶

__module__
= 'Environment.MAB'¶

__weakref__
¶ list of weak references to the object (if defined)

Environment.MAB.
RESTED
= True¶ Default is rested Markovian.

Environment.MAB.
dict_of_transition_matrix
(mat)[source]¶ Convert a transition matrix (list of list or numpy array) to a dictionary mapping (state, state) to probabilities (as used by
pykov.Chain
).

Environment.MAB.
transition_matrix_of_dict
(dic)[source]¶ Convert a dictionary mapping (state, state) to probabilities (as used by
pykov.Chain
) to a transition matrix (numpy array).

class
Environment.MAB.
MarkovianMAB
(configuration)[source]¶ Bases:
Environment.MAB.MAB
Classic MAB problem but the rewards are drawn from a rested/restless Markov chain.
configuration is a dict with
rested
andtransitions
keys.rested
is a Boolean. See [Kalathil et al., 2012](https://arxiv.org/abs/1206.3582) page 2 for a description.transitions
is list of K transition matrices or dictionary (to specify noninteger states), one for each arm.
Example:
configuration = { "arm_type": "Markovian", "params": { "rested": True, # or False # Example from [Kalathil et al., 2012](https://arxiv.org/abs/1206.3582) Table 1 "transitions": [ # 1st arm, Either a dictionary { # Mean = 0.375 (0, 0): 0.7, (0, 1): 0.3, (1, 0): 0.5, (1, 1): 0.5, }, # 2nd arm, Or a right transition matrix [[0.2, 0.8], [0.6, 0.4]], # Mean = 0.571 ], # FIXME make this by default! include it in MAB.py and not in the configuration! "steadyArm": Bernoulli } }
This class requires the [pykov](https://github.com/riccardoscalco/Pykov) module to represent and use Markov chain.

isChangingAtEachRepetition
= None¶ The problem is not changing at each repetition.

isDynamic
= None¶ The problem is static.

isMarkovian
= None¶ The problem is Markovian.

rested
= None¶ Rested or not Markovian model?

nbArms
= None¶ Number of arms

means
= None¶ Means of each arms, from their steady distributions.

maxArm
= None¶ Max mean of arms

minArm
= None¶ Min mean of arms

states
= None¶ States of each arm, initially they are all busy

reprarms
(nbPlayers=None, openTag='', endTag='^*', latex=True)[source]¶ Return a str representation of the list of the arms (like repr(self.arms) but better).
For Markovian MAB, the chain and the steady Bernoulli arm is represented.
If nbPlayers > 0, it surrounds the representation of the best arms by openTag, endTag (for plot titles, in a multiplayer setting).
Example: openTag = ‘’, endTag = ‘^*’ for LaTeX tags to put a star exponent.
Example: openTag = ‘<red>’, endTag = ‘</red>’ for HTMLlike tags.
Example: openTag = r’ extcolor{red}{‘, endTag = ‘}’ for LaTeX tags.

draw
(armId, t=1)[source]¶ Move on the Markov chain and return its state as a reward (0 or 1, or else).
If rested Markovian, only the state of the Markov chain of arm armId changes. It is the simpler model, and the default model.
But if restless (non rested) Markovian, the states of all the Markov chain of all arms change (not only armId).

__module__
= 'Environment.MAB'¶

Environment.MAB.
VERBOSE
= False¶ Whether to be verbose when generating new arms for Dynamic MAB

class
Environment.MAB.
ChangingAtEachRepMAB
(configuration, verbose=False)[source]¶ Bases:
Environment.MAB.MAB
Like a stationary MAB problem, but the arms are (randomly) regenerated for each repetition, with the
newRandomArms()
method.M.arms
andM.means
is changed after each call tonewRandomArms()
, but notnbArm
. All the other methods are carefully written to still make sense (Mbest
,Mworst
,minArm
,maxArm
).
Warning
It works perfectly fine, but it is still experimental, be careful when using this feature.
Note
Testing bandit algorithms against randomly generated problems at each repetitions is usually referred to as “Bayesian problems” in the literature: a prior is set on problems (eg. uniform on \([0,1]^K\) or less obvious for instance if a
mingap
is set), and the performance is assessed against this prior. It differs from the frequentist point of view of having one fixed problem and doing eg.n=1000
repetitions on the same problem.
isChangingAtEachRepetition
= None¶ The problem is changing at each repetition or not.

isDynamic
= None¶ The problem is static.

isMarkovian
= None¶ The problem is not Markovian.

newMeans
= None¶ Function to generate the means

args
= None¶ Args to give to function

nbArms
= None¶ Means of arms

reprarms
(nbPlayers=None, openTag='', endTag='^*', latex=True)[source]¶ Cannot represent the dynamic arms, so print the ChangingAtEachRepMAB object

newRandomArms
(t=None, verbose=False)[source]¶ Generate a new list of arms, from
arm_type(params['newMeans'](*params['args']))
.

property
arms
¶ Return the current list of arms.

property
means
¶ Return the list of means of arms for this ChangingAtEachRepMAB: after \(x\) calls to
newRandomArms()
, the return mean of arm \(k\) is the mean of the \(x\) means of that arm.Warning
Highly experimental!

property
minArm
¶ Return the smallest mean of the arms, for a dynamic MAB (averaged on all the draws of new means).

property
maxArm
¶ Return the largest mean of the arms, for a dynamic MAB (averaged on all the draws of new means).

lowerbound
()[source]¶ Compute the constant C(mu), for [Lai & Robbins] lowerbound for this MAB problem (complexity), using functions from
kullback
(averaged on all the draws of new means).

hoifactor
()[source]¶ Compute the HOI factor H_OI(mu), the Optimal Arm Identification (OI) factor, for this MAB problem (complexity). Cf. (3.3) in Navikkumar MODI’s thesis, “Machine Learning and Statistical Decision Making for Green Radio” (2017) (averaged on all the draws of new means).

lowerbound_multiplayers
(nbPlayers=1)[source]¶ Compute our multiplayers lower bound for this MAB problem (complexity), using functions from
kullback
.

__module__
= 'Environment.MAB'¶

class
Environment.MAB.
PieceWiseStationaryMAB
(configuration, verbose=False)[source]¶ Bases:
Environment.MAB.MAB
Like a stationary MAB problem, but piecewise stationary.
Give it a list of vector of means, and a list of changepoint locations.
You can use
plotHistoryOfMeans()
to see a nice plot of the history of means.
Note
This is a generic class to implement one “easy” kind of nonstationary bandits, abruptly changing nonstationary bandits, if changepoints are fixed and decided in advanced.
Warning
It works fine, but it is still experimental, be careful when using this feature.
Warning
The number of arms is fixed, see https://github.com/SMPyBandits/SMPyBandits/issues/123 if you are curious about bandit problems with a varying number of arms (or sleeping bandits where some arms can be enabled or disabled at each time).

isChangingAtEachRepetition
= None¶ The problem is not changing at each repetition.

isDynamic
= None¶ The problem is dynamic.

isMarkovian
= None¶ The problem is not Markovian.

listOfMeans
= None¶ The list of means

nbArms
= None¶ Number of arms

changePoints
= None¶ List of the change points

reprarms
(nbPlayers=None, openTag='', endTag='^*', latex=True)[source]¶ Cannot represent the dynamic arms, so print the PieceWiseStationaryMAB object

newRandomArms
(t=None, onlyOneArm=None, verbose=False)[source]¶ Fake function, there is nothing random here, it is just to tell the piecewise stationary MAB problem to maybe use the next interval.

plotHistoryOfMeans
(horizon=None, savefig=None, forceTo01=False, showplot=True, pickleit=False)[source]¶ Plot the history of means, as a plot with x axis being the time, y axis the mean rewards, and K curves one for each arm.

property
arms
¶ Return the current list of arms. at time \(t\) , the return mean of arm \(k\) is the mean during the time interval containing \(t\).

property
means
¶ Return the list of means of arms for this PieceWiseStationaryMAB: at time \(t\) , the return mean of arm \(k\) is the mean during the time interval containing \(t\).

property
minArm
¶ Return the smallest mean of the arms, for the current vector of means.

property
maxArm
¶ Return the largest mean of the arms, for the current vector of means.

get_minArm
(horizon=None)[source]¶ Return the smallest mean of the arms, for a piecewise stationary MAB
It is a vector of length horizon.

get_minArms
(M=1, horizon=None)[source]¶ Return the vector of sum of the Mworst means of the arms, for a piecewise stationary MAB.
It is a vector of length horizon.

get_maxArm
(horizon=None)[source]¶ Return the vector of max mean of the arms, for a piecewise stationary MAB.
It is a vector of length horizon.

get_maxArms
(M=1, horizon=None)[source]¶ Return the vector of sum of the Mbest means of the arms, for a piecewise stationary MAB.
It is a vector of length horizon.

get_allMeans
(horizon=None)[source]¶ Return the vector of mean of the arms, for a piecewise stationary MAB.
It is a numpy array of shape (nbArms, horizon).

__module__
= 'Environment.MAB'¶

class
Environment.MAB.
NonStationaryMAB
(configuration, verbose=False)[source]¶ Bases:
Environment.MAB.PieceWiseStationaryMAB
Like a stationary MAB problem, but the arms can be modified at each time step, with the
newRandomArms()
method.M.arms
andM.means
is changed after each call tonewRandomArms()
, but notnbArm
. All the other methods are carefully written to still make sense (Mbest
,Mworst
,minArm
,maxArm
).
Note
This is a generic class to implement different kinds of nonstationary bandits:
Abruptly changing nonstationary bandits, in different variants: changepoints are randomly drawn (once for all
n
repetitions or at different location fo each repetition).Slowly varying nonstationary bandits, where the underlying mean of each arm is slowing randomly modified and a bound on the speed of change (e.g., Lipschitz constant of \(t \mapsto \mu_i(t)\)) is known.
Warning
It works fine, but it is still experimental, be careful when using this feature.
Warning
The number of arms is fixed, see https://github.com/SMPyBandits/SMPyBandits/issues/123 if you are curious about bandit problems with a varying number of arms (or sleeping bandits where some arms can be enabled or disabled at each time).

isChangingAtEachRepetition
= None¶ The problem is not changing at each repetition.

isDynamic
= None¶ The problem is dynamic.

isMarkovian
= None¶ The problem is not Markovian.

newMeans
= None¶ Function to generate the means

changePoints
= None¶ List of the change points

onlyOneArm
= None¶ None by default, but can be “uniform” to only change one arm at each change point.

args
= None¶ Args to give to function

nbArms
= None¶ Means of arms

reprarms
(nbPlayers=None, openTag='', endTag='^*', latex=True)[source]¶ Cannot represent the dynamic arms, so print the NonStationaryMAB object

newRandomArms
(t=None, onlyOneArm=None, verbose=False)[source]¶ Generate a new list of arms, from
arm_type(params['newMeans'](t, **params['args']))
.If
onlyOneArm
is given and is an integer, the change of mean only occurs for this arm and the others stay the same.If
onlyOneArm="uniform"
, the change of mean only occurs for one arm and the others stay the same, and the changing arm is chosen uniformly at random.
Note
Only the means of the arms change (and so, their order), not their family.
Warning
TODO? So far the only change points we consider is when the means of arms change, but the family of distributions stay the same. I could implement a more generic way, for instance to be able to test algorithms that detect change between different families of distribution (e.g., from a Gaussian of variance=1 to a Gaussian of variance=2, with different or not means).

get_minArm
(horizon=None)[source]¶ Return the smallest mean of the arms, for a nonstationary MAB
It is a vector of length horizon.

get_maxArm
(horizon=None)[source]¶ Return the vector of max mean of the arms, for a nonstationary MAB.
It is a vector of length horizon.

get_allMeans
(horizon=None)[source]¶ Return the vector of mean of the arms, for a nonstationary MAB.
It is a numpy array of shape (nbArms, horizon).

__module__
= 'Environment.MAB'¶

Environment.MAB.
static_change_lower_amplitude
(t, l_t, a_t)[source]¶ A function called by
IncreasingMAB
at every time t, to compute the (possibly) knew values for \(l_t\) and \(a_t\).First argument is a boolean, True if a change occurred, False otherwise.

Environment.MAB.
L0
= 1¶ Default value for the
doubling_change_lower_amplitude()
function.

Environment.MAB.
A0
= 2¶ Default value for the
doubling_change_lower_amplitude()
function.

Environment.MAB.
DELTA
= 0¶ Default value for the
doubling_change_lower_amplitude()
function.

Environment.MAB.
T0
= 1¶ Default value for the
doubling_change_lower_amplitude()
function.

Environment.MAB.
DELTA_T
= 1¶ Default value for the
doubling_change_lower_amplitude()
function.

Environment.MAB.
ZOOM
= 2¶ Default value for the
doubling_change_lower_amplitude()
function.

Environment.MAB.
doubling_change_lower_amplitude
(t, l_t, a_t, l0=1, a0=2, delta=0, T0=1, deltaT=1, zoom=2)[source]¶ A function called by
IncreasingMAB
at every time t, to compute the (possibly) knew values for \(l_t\) and \(a_t\).At time 0, it forces to use \(l_0, a_0\) if they are given and not
None
.At step T0, it reduces \(l_t\) by delta (typically from 0 to 1).
Every deltaT steps, it multiplies both \(l_t\) and \(a_t\) by zoom.
First argument is a boolean, True if a change occurred, False otherwise.

Environment.MAB.
default_change_lower_amplitude
(t, l_t, a_t, l0=1, a0=2, delta=0, T0=1, deltaT=1, zoom=2)¶ A function called by
IncreasingMAB
at every time t, to compute the (possibly) knew values for \(l_t\) and \(a_t\).At time 0, it forces to use \(l_0, a_0\) if they are given and not
None
.At step T0, it reduces \(l_t\) by delta (typically from 0 to 1).
Every deltaT steps, it multiplies both \(l_t\) and \(a_t\) by zoom.
First argument is a boolean, True if a change occurred, False otherwise.

class
Environment.MAB.
IncreasingMAB
(configuration)[source]¶ Bases:
Environment.MAB.MAB
Like a stationary MAB problem, but the range of the rewards is increased from time to time, to test the
Policy.WrapRange
policy.M.arms and M.means is NOT changed after each call to
newRandomArms()
, but not nbArm.
Warning
It is purely experimental, be careful when using this feature.

__module__
= 'Environment.MAB'¶

isDynamic
= None¶ Flag to know if the problem is static or not.

Environment.MAB.
binomialCoefficient
(k, n)[source]¶ Compute a binomial coefficient \(C^n_k\) by a direct multiplicative method: \(C^n_k = {k \choose n}\).
Exact, using integers, not like https://docs.scipy.org/doc/scipy/reference/generated/scipy.special.binom.html#scipy.special.binom which uses float numbers.
Complexity: :math`mathcal{O}(1)` in memory, :math`mathcal{O}(n)` in time.
From https://en.wikipedia.org/wiki/Binomial_coefficient#Binomial_coefficient_in_programming_languages
From: http://userpages.umbc.edu/~rcampbel/Computers/Python/probstat.html#ProbStatCombinCombinations
Examples:
>>> binomialCoefficient(3, 10) 0 >>> binomialCoefficient(1, 10) 0 >>> binomialCoefficient(1, 10) 10 >>> binomialCoefficient(5, 10) 80 >>> binomialCoefficient(5, 20) 12960 >>> binomialCoefficient(10, 30) 10886400