Short documentation of the API¶
This short document aim at documenting the API used in my SMPyBandits environment, and closing this issue #3.
Code organization¶
Layout of the code:¶
Arms are defined in this folder (
Arms/), see for exampleArms.BernoulliMAB algorithms (also called policies) are defined in this folder (
Policies/), see for examplePolicies.Dummyfor a fully random policy,Policies.EpsilonGreedyfor the epsilon-greedy random policy,Policies.UCBfor the “simple” UCB algorithm, or alsoPolicies.BayesUCB,Policies.klUCBfor two UCB-like algorithms,Policies.AdBanditsfor the AdBandits algorithm, andPolicies.Aggregatorfor my aggregated bandits algorithms.Environments to encapsulate date are defined in this folder (
Environment/): MAB problem use the classEnvironment.MAB, simulation results are stored in aEnvironment.Result, and the class to evaluate multi-policy single-player multi-env isEnvironment.Evaluator.very_simple_configuration.py` imports all the classes, and define the simulation parameters as a dictionary (JSON-like).
main.pyruns the simulations, then display the final ranking of the different policies and plots the results (saved to this folder (plots/)).
UML diagrams¶
For more details, see these UML diagrams.
Question: How to change the simulations?¶
To customize the plots¶
Change the default settings defined in
Environment/plotsettings.py.
To change the configuration of the simulations¶
Change the config file, i.e.,
configuration.pyfor single-player simulations, orconfiguration_multiplayers.pyfor multi-players simulations.A good example of a very simple configuration file is given in very_simple_configuration.py`
To change how to results are exploited¶
Change the main script, i.e.,
main.pyfor single-player simulations,main_multiplayers.pyfor multi-players simulations. Some plots can be disabled or enabled by commenting a few lines, and some options are given as flags (constants in the beginning of the file).If needed, change, improve or add some methods to the simulation environment class, i.e.,
Environment.Evaluatorfor single-player simulations, andEnvironment.EvaluatorMultiPlayersfor multi-players simulations. They use a class to store their simulation result,Environment.ResultandEnvironment.ResultMultiPlayers.
Question: How to add something to this project?¶
In other words, what’s the API of this project?
For a new arm¶
Make a new file, e.g.,
MyArm.pySave it in
Arms/The file should contain a class of the same name, inheriting from
Arms/Arm, e.g., like thisclass MyArm(Arm): ...(no need for anysupercall)This class
MyArmhas to have at least an__init__(...)method to create the arm object (with or without arguments - named or not); a__str__method to print it as a string; adraw(t)method to draw a reward from this arm (tis the time, which can be used or not); and should have amean()method that gives/computes the mean of the armFinally, add it to the
Arms/__init__.pyfile:from .MyArm import MyArm
For examples, see
Arms.Bernoulli,Arms.Gaussian,Arms.Exponential,Arms.Poisson.
For example, use this template:
from .Arm import Arm
class MyArm(Arm):
def __init__(self, *args, **kwargs):
# TODO Finish this method that initialize the arm MyArm
def __str__(self):
return "MyArm(...)".format('...') # TODO
def draw(self, t=None):
# TODO Simulates a pull of this arm. t might be used, but not necessarily
def mean(self):
# TODO Returns the mean of this arm
For a new (single-user) policy¶
Make a new file, e.g.,
MyPolicy.pySave it in
Policies/The file should contain a class of the same name, it can inherit from
Policies/IndexPolicyif it is a simple index policy, e.g., like this,class MyPolicy(IndexPolicy): ...(no need for anysupercall), or simply likeclass MyPolicy(object): ...This class
MyPolicyhas to have at least an__init__(nbArms, ...)method to create the policy object (with or without arguments - named or not), with at least the parameternbArms(number of arms); a__str__method to print it as a string; achoice()method to choose an arm (index among0, ..., nbArms - 1, e.g., at random, or based on a maximum index if it is an index policy); and agetReward(arm, reward)method called when the armarmgave the rewardreward, and finally astartGame()method (possibly empty) which is called when a new simulation is ran.Optionally, a policy class can have a
handleCollision(arm)method to handle a collision after choosing the armarm(eg. update an internal index, change a fixed offset etc).Finally, add it to the
Policies/__init__.pyfile:from .MyPolicy import MyPolicy
For examples, see
Arms.Uniformfor a fully randomized policy,Arms.EpsilonGreedyfor a simple exploratory policy,Arms.Softmaxfor another simple approach,Arms.UCBfor the class Upper Confidence-Bounds policy based on indexes, so inheriting fromPolicies/IndexPolicy). There is alsoArms.ThompsonandArms.BayesUCBfor Bayesian policies (using a posterior, e.g., likeArms.Beta),Arms.klUCBfor a policy based on the Kullback-Leibler divergence.For less classical
Arms.AdBanditis an approach combining Bayesian and frequentist point of view, andArms.Aggregatoris my aggregating policy.
For example, use this template:
class MyPolicy(object):
def __init__(self, nbArms, *args, **kwargs):
self.nbArms = nbArms
# TODO Finish this method that initialize the arm MyArm
def __str__(self):
return "MyArm(...)".format('...') # TODO
def startGame(self):
pass # Can be non-trivial, TODO if needed
def getReward(self, arm, reward):
# TODO After the arm 'arm' has been pulled, it gave the reward 'reward'
pass # Can be non-trivial, TODO if needed
def choice(self):
# TODO Do a smart choice of arm
return random.randint(self.nbArms)
def handleCollision(self, arm):
pass # Can be non-trivial, TODO if needed
Other
choice...()methods can be added, if this policyMyPolicyhas to be used for multiple play, ranked play, etc.
For a new multi-users policy¶
Make a new file, e.g.,
MyPoliciesMultiPlayers.pySave it in
PoliciesMultiPlayers/The file should contain a class, of the same name, e.g., like this,
class MyPoliciesMultiPlayers(object):This class
MyPoliciesMultiPlayershas to have at least an__init__method to create the arm; a__str__method to print it as a string; and achildrenattribute that gives a list of players (single-player policies).Finally, add it to the
PoliciesMultiPlayers/__init__.pyfile:from .MyPoliciesMultiPlayers import MyPoliciesMultiPlayers
For examples, see
PoliciesMultiPlayers.OracleNotFairandPoliciesMultiPlayers.OracleFairfor full-knowledge centralized policies (fair or not),PoliciesMultiPlayers.CentralizedFixedandPoliciesMultiPlayers.CentralizedCyclingfor non-full-knowledge centralized policies (fair or not). There is also thePoliciesMultiPlayers.Selfishdecentralized policy, where all players runs in without any knowledge on the number of players, and no communication (decentralized).
PoliciesMultiPlayers.Selfishis the simplest possible example I could give as a template.