Environment.MAB module¶

MAB, MarkovianMAB, ChangingAtEachRepMAB, IncreasingMAB, PieceWiseStationaryMAB and NonStationaryMAB classes to wrap the arms of some Multi-Armed Bandit problems.

Such class has to have at least these methods:

draw(armId, t) to draw one sample from that armId at time t,
and reprarms() to pretty print the arms (for titles of a plot),
and more, see below.

Warning

FIXME it is still a work in progress, I need to add continuously varying environments. See https://github.com/SMPyBandits/SMPyBandits/issues/71

class Environment.MAB.MAB(configuration)[source]¶

Bases: object

Basic Multi-Armed Bandit problem, for stochastic and i.i.d. arms.

configuration can be a dict with ‘arm_type’ and ‘params’ keys. ‘arm_type’ is a class from the Arms module, and ‘params’ is a dict, used as a list/tuple/iterable of named parameters given to ‘arm_type’. Example:

configuration = {
    'arm_type': Bernoulli,
    'params':   [0.1, 0.5, 0.9]
}

configuration = {  # for fixed variance Gaussian
    'arm_type': Gaussian,
    'params':   [0.1, 0.5, 0.9]
}

But it can also accept a list of already created arms:

configuration = [
    Bernoulli(0.1),
    Bernoulli(0.5),
    Bernoulli(0.9),
]

Both will create three Bernoulli arms, of parameters (means) 0.1, 0.5 and 0.9.

__init__(configuration)[source]¶: New MAB.

isChangingAtEachRepetition = None¶: Flag to know if the problem is changing at each repetition or not.

isDynamic = None¶: Flag to know if the problem is static or not.

isMarkovian = None¶: Flag to know if the problem is Markovian or not.

arms = None¶: List of arms

means = None¶: Means of arms

nbArms = None¶: Number of arms

maxArm = None¶: Max mean of arms

minArm = None¶: Min mean of arms

new_order_of_arm(arms)[source]¶

Feed a new order of the arms to the environment.

Updates means correctly.
Return the new position(s) of the best arm (to count and plot BestArmPulls correctly).

Warning

This is a very limited support of non-stationary environment: only permutations of the arms are allowed, see NonStationaryMAB for more.

__repr__()[source]¶: Return repr(self).

reprarms(nbPlayers=None, openTag='', endTag='^*', latex=True)[source]¶

Return a str representation of the list of the arms (like repr(self.arms) but better)

If nbPlayers > 0, it surrounds the representation of the best arms by openTag, endTag (for plot titles, in a multi-player setting).
Example: openTag = ‘’, endTag = ‘^*’ for LaTeX tags to put a star exponent.
Example: openTag = ‘<red>’, endTag = ‘</red>’ for HTML-like tags.
Example: openTag = r’ extcolor{red}{‘, endTag = ‘}’ for LaTeX tags.

draw(armId, t=1)[source]¶: Return a random sample from the armId-th arm, at time t. Usually t is not used.

draw_nparray(armId, shape=(1, ))[source]¶: Return a numpy array of random sample from the armId-th arm, of a certain shape.

draw_each(t=1)[source]¶: Return a random sample from each arm, at time t. Usually t is not used.

draw_each_nparray(shape=(1, ))[source]¶: Return a numpy array of random sample from each arm, of a certain shape.

Mbest(M=1)[source]¶: Set of M best means.

Mworst(M=1)[source]¶: Set of M worst means.

sumBestMeans(M=1)[source]¶: Sum of the M best means.

get_minArm(horizon=None)[source]¶

Return the vector of min mean of the arms.

It is a vector of length horizon.

get_maxArm(horizon=None)[source]¶

Return the vector of max mean of the arms.

It is a vector of length horizon.

get_maxArms(M=1, horizon=None)[source]¶

Return the vector of sum of the M-best means of the arms.

It is a vector of length horizon.

get_allMeans(horizon=None)[source]¶

Return the vector of means of the arms.

It is a numpy array of shape (nbArms, horizon).

property sparsity¶: Estimate the sparsity of the problem, i.e., the number of arms with positive means.

str_sparsity()[source]¶: Empty string if sparsity = nbArms, or a small string ‘, $s={}$’ if the sparsity is strictly less than the number of arm.

lowerbound()[source]¶: Compute the constant $C(\mu)$, for the [Lai & Robbins] lower-bound for this MAB problem (complexity), using functions from kullback.py or kullback.so (see Arms.kullback).

lowerbound_sparse(sparsity=None)[source]¶

Compute the constant $C(\mu)$, for [Kwon et al, 2017] lower-bound for sparse bandits for this MAB problem (complexity)

I recomputed suboptimal solution to the optimization problem, and found the same as in [[“Sparse Stochastic Bandits”, by J. Kwon, V. Perchet & C. Vernade, COLT 2017](https://arxiv.org/abs/1706.01383)].

hoifactor()[source]¶: Compute the HOI factor H_OI(mu), the Optimal Arm Identification (OI) factor, for this MAB problem (complexity). Cf. (3.3) in Navikkumar MODI’s thesis, “Machine Learning and Statistical Decision Making for Green Radio” (2017).

lowerbound_multiplayers(nbPlayers=1)[source]¶: Compute our multi-players lower bound for this MAB problem (complexity), using functions from kullback.

upperbound_collisions(nbPlayers, times)[source]¶: Compute Anandkumar et al. multi-players upper bound for this MAB problem (complexity), for UCB only. Warning: it is HIGHLY asymptotic!

plotComparison_our_anandkumar(savefig=None)[source]¶: Plot a comparison of our lowerbound and their lowerbound.

plotHistogram(horizon=10000, savefig=None, bins=50, alpha=0.9, density=None)[source]¶: Plot a horizon=10000 draws of each arms.

__dict__ = mappingproxy({'__module__': 'Environment.MAB', '__doc__': " Basic Multi-Armed Bandit problem, for stochastic and i.i.d. arms.\n\n - configuration can be a dict with 'arm_type' and 'params' keys. 'arm_type' is a class from the Arms module, and 'params' is a dict, used as a list/tuple/iterable of named parameters given to 'arm_type'. Example::\n\n configuration = {\n 'arm_type': Bernoulli,\n 'params': [0.1, 0.5, 0.9]\n }\n\n configuration = { # for fixed variance Gaussian\n 'arm_type': Gaussian,\n 'params': [0.1, 0.5, 0.9]\n }\n\n - But it can also accept a list of already created arms::\n\n configuration = [\n Bernoulli(0.1),\n Bernoulli(0.5),\n Bernoulli(0.9),\n ]\n\n - Both will create three Bernoulli arms, of parameters (means) 0.1, 0.5 and 0.9.\n ", '__init__': <function MAB.__init__>, 'new_order_of_arm': <function MAB.new_order_of_arm>, '__repr__': <function MAB.__repr__>, 'reprarms': <function MAB.reprarms>, 'draw': <function MAB.draw>, 'draw_nparray': <function MAB.draw_nparray>, 'draw_each': <function MAB.draw_each>, 'draw_each_nparray': <function MAB.draw_each_nparray>, 'Mbest': <function MAB.Mbest>, 'Mworst': <function MAB.Mworst>, 'sumBestMeans': <function MAB.sumBestMeans>, 'get_minArm': <function MAB.get_minArm>, 'get_maxArm': <function MAB.get_maxArm>, 'get_maxArms': <function MAB.get_maxArms>, 'get_allMeans': <function MAB.get_allMeans>, 'sparsity': <property object>, 'str_sparsity': <function MAB.str_sparsity>, 'lowerbound': <function MAB.lowerbound>, 'lowerbound_sparse': <function MAB.lowerbound_sparse>, 'hoifactor': <function MAB.hoifactor>, 'lowerbound_multiplayers': <function MAB.lowerbound_multiplayers>, 'upperbound_collisions': <function MAB.upperbound_collisions>, 'plotComparison_our_anandkumar': <function MAB.plotComparison_our_anandkumar>, 'plotHistogram': <function MAB.plotHistogram>, '__dict__': <attribute '__dict__' of 'MAB' objects>, '__weakref__': <attribute '__weakref__' of 'MAB' objects>})¶

__module__ = 'Environment.MAB'¶

__weakref__¶: list of weak references to the object (if defined)

Environment.MAB.RESTED = True¶: Default is rested Markovian.

Environment.MAB.dict_of_transition_matrix(mat)[source]¶: Convert a transition matrix (list of list or numpy array) to a dictionary mapping (state, state) to probabilities (as used by pykov.Chain).

Environment.MAB.transition_matrix_of_dict(dic)[source]¶: Convert a dictionary mapping (state, state) to probabilities (as used by pykov.Chain) to a transition matrix (numpy array).

class Environment.MAB.MarkovianMAB(configuration)[source]¶

Bases: Environment.MAB.MAB

Classic MAB problem but the rewards are drawn from a rested/restless Markov chain.

configuration is a dict with rested and transitions keys.
rested is a Boolean. See [Kalathil et al., 2012](https://arxiv.org/abs/1206.3582) page 2 for a description.
transitions is list of K transition matrices or dictionary (to specify non-integer states), one for each arm.

Example:

configuration = {
    "arm_type": "Markovian",
    "params": {
        "rested": True,  # or False
        # Example from [Kalathil et al., 2012](https://arxiv.org/abs/1206.3582) Table 1
        "transitions": [
            # 1st arm, Either a dictionary
            {   # Mean = 0.375
                (0, 0): 0.7, (0, 1): 0.3,
                (1, 0): 0.5, (1, 1): 0.5,
            },
            # 2nd arm, Or a right transition matrix
            [[0.2, 0.8], [0.6, 0.4]],  # Mean = 0.571
        ],
        # FIXME make this by default! include it in MAB.py and not in the configuration!
        "steadyArm": Bernoulli
    }
}

This class requires the [pykov](https://github.com/riccardoscalco/Pykov) module to represent and use Markov chain.

__init__(configuration)[source]¶: New MarkovianMAB.

isChangingAtEachRepetition = None¶: The problem is not changing at each repetition.

isDynamic = None¶: The problem is static.

isMarkovian = None¶: The problem is Markovian.

rested = None¶: Rested or not Markovian model?

nbArms = None¶: Number of arms

means = None¶: Means of each arms, from their steady distributions.

maxArm = None¶: Max mean of arms

minArm = None¶: Min mean of arms

states = None¶: States of each arm, initially they are all busy

__repr__()[source]¶: Return repr(self).

reprarms(nbPlayers=None, openTag='', endTag='^*', latex=True)[source]¶

Return a str representation of the list of the arms (like repr(self.arms) but better).

For Markovian MAB, the chain and the steady Bernoulli arm is represented.
If nbPlayers > 0, it surrounds the representation of the best arms by openTag, endTag (for plot titles, in a multi-player setting).
Example: openTag = ‘’, endTag = ‘^*’ for LaTeX tags to put a star exponent.
Example: openTag = ‘<red>’, endTag = ‘</red>’ for HTML-like tags.
Example: openTag = r’ extcolor{red}{‘, endTag = ‘}’ for LaTeX tags.

draw(armId, t=1)[source]¶

Move on the Markov chain and return its state as a reward (0 or 1, or else).

If rested Markovian, only the state of the Markov chain of arm armId changes. It is the simpler model, and the default model.
But if restless (non rested) Markovian, the states of all the Markov chain of all arms change (not only armId).

__module__ = 'Environment.MAB'¶

Environment.MAB.VERBOSE = False¶: Whether to be verbose when generating new arms for Dynamic MAB

class Environment.MAB.ChangingAtEachRepMAB(configuration, verbose=False)[source]¶

Bases: Environment.MAB.MAB

Like a stationary MAB problem, but the arms are (randomly) regenerated for each repetition, with the newRandomArms() method.

M.arms and M.means is changed after each call to newRandomArms(), but not nbArm. All the other methods are carefully written to still make sense (Mbest, Mworst, minArm, maxArm).

Warning

It works perfectly fine, but it is still experimental, be careful when using this feature.

Note

Testing bandit algorithms against randomly generated problems at each repetitions is usually referred to as “Bayesian problems” in the literature: a prior is set on problems (eg. uniform on $[0,1]^K$ or less obvious for instance if a mingap is set), and the performance is assessed against this prior. It differs from the frequentist point of view of having one fixed problem and doing eg. n=1000 repetitions on the same problem.

__init__(configuration, verbose=False)[source]¶: New ChangingAtEachRepMAB.

isChangingAtEachRepetition = None¶: The problem is changing at each repetition or not.

isDynamic = None¶: The problem is static.

isMarkovian = None¶: The problem is not Markovian.

newMeans = None¶: Function to generate the means

args = None¶: Args to give to function

nbArms = None¶: Means of arms

__repr__()[source]¶: Return repr(self).

reprarms(nbPlayers=None, openTag='', endTag='^*', latex=True)[source]¶: Cannot represent the dynamic arms, so print the ChangingAtEachRepMAB object

newRandomArms(t=None, verbose=False)[source]¶: Generate a new list of arms, from arm_type(params['newMeans'](*params['args'])).

property arms¶: Return the current list of arms.

property means¶: Return the list of means of arms for this ChangingAtEachRepMAB: after $x$ calls to newRandomArms(), the return mean of arm $k$ is the mean of the $x$ means of that arm.

Warning

Highly experimental!

Mbest(M=1)[source]¶: Set of M best means (averaged on all the draws of new means).

Mworst(M=1)[source]¶: Set of M worst means (averaged on all the draws of new means).

property minArm¶: Return the smallest mean of the arms, for a dynamic MAB (averaged on all the draws of new means).

property maxArm¶: Return the largest mean of the arms, for a dynamic MAB (averaged on all the draws of new means).

lowerbound()[source]¶: Compute the constant C(mu), for [Lai & Robbins] lower-bound for this MAB problem (complexity), using functions from kullback (averaged on all the draws of new means).

hoifactor()[source]¶: Compute the HOI factor H_OI(mu), the Optimal Arm Identification (OI) factor, for this MAB problem (complexity). Cf. (3.3) in Navikkumar MODI’s thesis, “Machine Learning and Statistical Decision Making for Green Radio” (2017) (averaged on all the draws of new means).

lowerbound_multiplayers(nbPlayers=1)[source]¶: Compute our multi-players lower bound for this MAB problem (complexity), using functions from kullback.

__module__ = 'Environment.MAB'¶

class Environment.MAB.PieceWiseStationaryMAB(configuration, verbose=False)[source]¶

Bases: Environment.MAB.MAB

Like a stationary MAB problem, but piece-wise stationary.

Give it a list of vector of means, and a list of change-point locations.
You can use plotHistoryOfMeans() to see a nice plot of the history of means.

Note

This is a generic class to implement one “easy” kind of non-stationary bandits, abruptly changing non-stationary bandits, if changepoints are fixed and decided in advanced.

Warning

It works fine, but it is still experimental, be careful when using this feature.

Warning

The number of arms is fixed, see https://github.com/SMPyBandits/SMPyBandits/issues/123 if you are curious about bandit problems with a varying number of arms (or sleeping bandits where some arms can be enabled or disabled at each time).

__init__(configuration, verbose=False)[source]¶: New PieceWiseStationaryMAB.

isChangingAtEachRepetition = None¶: The problem is not changing at each repetition.

isDynamic = None¶: The problem is dynamic.

isMarkovian = None¶: The problem is not Markovian.

listOfMeans = None¶: The list of means

nbArms = None¶: Number of arms

changePoints = None¶: List of the change points

__repr__()[source]¶: Return repr(self).

reprarms(nbPlayers=None, openTag='', endTag='^*', latex=True)[source]¶: Cannot represent the dynamic arms, so print the PieceWiseStationaryMAB object

newRandomArms(t=None, onlyOneArm=None, verbose=False)[source]¶: Fake function, there is nothing random here, it is just to tell the piece-wise stationary MAB problem to maybe use the next interval.

plotHistoryOfMeans(horizon=None, savefig=None, forceTo01=False, showplot=True, pickleit=False)[source]¶: Plot the history of means, as a plot with x axis being the time, y axis the mean rewards, and K curves one for each arm.

property arms¶: Return the current list of arms. at time $t$ , the return mean of arm $k$ is the mean during the time interval containing $t$.

property means¶: Return the list of means of arms for this PieceWiseStationaryMAB: at time $t$ , the return mean of arm $k$ is the mean during the time interval containing $t$.

property minArm¶: Return the smallest mean of the arms, for the current vector of means.

property maxArm¶: Return the largest mean of the arms, for the current vector of means.

get_minArm(horizon=None)[source]¶

Return the smallest mean of the arms, for a piece-wise stationary MAB

It is a vector of length horizon.

get_minArms(M=1, horizon=None)[source]¶

Return the vector of sum of the M-worst means of the arms, for a piece-wise stationary MAB.

It is a vector of length horizon.

get_maxArm(horizon=None)[source]¶

Return the vector of max mean of the arms, for a piece-wise stationary MAB.

It is a vector of length horizon.

get_maxArms(M=1, horizon=None)[source]¶

Return the vector of sum of the M-best means of the arms, for a piece-wise stationary MAB.

It is a vector of length horizon.

get_allMeans(horizon=None)[source]¶

Return the vector of mean of the arms, for a piece-wise stationary MAB.

It is a numpy array of shape (nbArms, horizon).

__module__ = 'Environment.MAB'¶

class Environment.MAB.NonStationaryMAB(configuration, verbose=False)[source]¶

Bases: Environment.MAB.PieceWiseStationaryMAB

Like a stationary MAB problem, but the arms can be modified at each time step, with the newRandomArms() method.

M.arms and M.means is changed after each call to newRandomArms(), but not nbArm. All the other methods are carefully written to still make sense (Mbest, Mworst, minArm, maxArm).

Note

This is a generic class to implement different kinds of non-stationary bandits:

Abruptly changing non-stationary bandits, in different variants: changepoints are randomly drawn (once for all n repetitions or at different location fo each repetition).
Slowly varying non-stationary bandits, where the underlying mean of each arm is slowing randomly modified and a bound on the speed of change (e.g., Lipschitz constant of $t \mapsto \mu_i(t)$) is known.

Warning

It works fine, but it is still experimental, be careful when using this feature.

Warning

The number of arms is fixed, see https://github.com/SMPyBandits/SMPyBandits/issues/123 if you are curious about bandit problems with a varying number of arms (or sleeping bandits where some arms can be enabled or disabled at each time).

__init__(configuration, verbose=False)[source]¶: New NonStationaryMAB.

isChangingAtEachRepetition = None¶: The problem is not changing at each repetition.

isDynamic = None¶: The problem is dynamic.

isMarkovian = None¶: The problem is not Markovian.

newMeans = None¶: Function to generate the means

changePoints = None¶: List of the change points

onlyOneArm = None¶: None by default, but can be “uniform” to only change one arm at each change point.

args = None¶: Args to give to function

nbArms = None¶: Means of arms

reprarms(nbPlayers=None, openTag='', endTag='^*', latex=True)[source]¶: Cannot represent the dynamic arms, so print the NonStationaryMAB object

newRandomArms(t=None, onlyOneArm=None, verbose=False)[source]¶

Generate a new list of arms, from arm_type(params['newMeans'](t, **params['args'])).

If onlyOneArm is given and is an integer, the change of mean only occurs for this arm and the others stay the same.
If onlyOneArm="uniform", the change of mean only occurs for one arm and the others stay the same, and the changing arm is chosen uniformly at random.

Note

Only the means of the arms change (and so, their order), not their family.

Warning

TODO? So far the only change points we consider is when the means of arms change, but the family of distributions stay the same. I could implement a more generic way, for instance to be able to test algorithms that detect change between different families of distribution (e.g., from a Gaussian of variance=1 to a Gaussian of variance=2, with different or not means).

get_minArm(horizon=None)[source]¶

Return the smallest mean of the arms, for a non-stationary MAB

It is a vector of length horizon.

get_maxArm(horizon=None)[source]¶

Return the vector of max mean of the arms, for a non-stationary MAB.

It is a vector of length horizon.

get_allMeans(horizon=None)[source]¶

Return the vector of mean of the arms, for a non-stationary MAB.

It is a numpy array of shape (nbArms, horizon).

__module__ = 'Environment.MAB'¶

Environment.MAB.static_change_lower_amplitude(t, l_t, a_t)[source]¶

A function called by IncreasingMAB at every time t, to compute the (possibly) knew values for $l_t$ and $a_t$.

First argument is a boolean, True if a change occurred, False otherwise.

Environment.MAB.L0 = -1¶: Default value for the doubling_change_lower_amplitude() function.

Environment.MAB.A0 = 2¶: Default value for the doubling_change_lower_amplitude() function.

Environment.MAB.DELTA = 0¶: Default value for the doubling_change_lower_amplitude() function.

Environment.MAB.T0 = -1¶: Default value for the doubling_change_lower_amplitude() function.

Environment.MAB.DELTA_T = -1¶: Default value for the doubling_change_lower_amplitude() function.

Environment.MAB.ZOOM = 2¶: Default value for the doubling_change_lower_amplitude() function.

Environment.MAB.doubling_change_lower_amplitude(t, l_t, a_t, l0=-1, a0=2, delta=0, T0=-1, deltaT=-1, zoom=2)[source]¶

A function called by IncreasingMAB at every time t, to compute the (possibly) knew values for $l_t$ and $a_t$.

At time 0, it forces to use $l_0, a_0$ if they are given and not None.
At step T0, it reduces $l_t$ by delta (typically from 0 to -1).
Every deltaT steps, it multiplies both $l_t$ and $a_t$ by zoom.
First argument is a boolean, True if a change occurred, False otherwise.

Environment.MAB.default_change_lower_amplitude(t, l_t, a_t, l0=-1, a0=2, delta=0, T0=-1, deltaT=-1, zoom=2)¶

A function called by IncreasingMAB at every time t, to compute the (possibly) knew values for $l_t$ and $a_t$.

At time 0, it forces to use $l_0, a_0$ if they are given and not None.
At step T0, it reduces $l_t$ by delta (typically from 0 to -1).
Every deltaT steps, it multiplies both $l_t$ and $a_t$ by zoom.
First argument is a boolean, True if a change occurred, False otherwise.

class Environment.MAB.IncreasingMAB(configuration)[source]¶

Bases: Environment.MAB.MAB

Like a stationary MAB problem, but the range of the rewards is increased from time to time, to test the Policy.WrapRange policy.

M.arms and M.means is NOT changed after each call to newRandomArms(), but not nbArm.

Warning

It is purely experimental, be careful when using this feature.

__module__ = 'Environment.MAB'¶

__init__(configuration)[source]¶: New MAB.

isDynamic = None¶: Flag to know if the problem is static or not.

draw(armId, t=1)[source]¶: Return a random sample from the armId-th arm, at time t. Usually t is not used.

Environment.MAB.binomialCoefficient(k, n)[source]¶

Compute a binomial coefficient $C^n_k$ by a direct multiplicative method: $C^n_k = {k \choose n}$.

Exact, using integers, not like https://docs.scipy.org/doc/scipy/reference/generated/scipy.special.binom.html#scipy.special.binom which uses float numbers.
Complexity: :math`mathcal{O}(1)` in memory, :math`mathcal{O}(n)` in time.
From https://en.wikipedia.org/wiki/Binomial_coefficient#Binomial_coefficient_in_programming_languages
From: http://userpages.umbc.edu/~rcampbel/Computers/Python/probstat.html#ProbStat-Combin-Combinations
Examples:

>>> binomialCoefficient(-3, 10)
0
>>> binomialCoefficient(1, -10)
0
>>> binomialCoefficient(1, 10)
10
>>> binomialCoefficient(5, 10)
80
>>> binomialCoefficient(5, 20)
12960
>>> binomialCoefficient(10, 30)
10886400