configuration_multiplayers_nonstationary module

Configuration for the simulations, for the piecewise stationary multi-players case.

configuration_multiplayers_nonstationary.HORIZON = 1000

HORIZON : number of time steps of the experiments. Warning Should be >= 10000 to be interesting “asymptotically”.

configuration_multiplayers_nonstationary.REPETITIONS = 200

REPETITIONS : number of repetitions of the experiments. Warning: Should be >= 10 to be statistically trustworthy.

configuration_multiplayers_nonstationary.DO_PARALLEL = True

To profile the code, turn down parallel computing

configuration_multiplayers_nonstationary.N_JOBS = -1

Number of jobs to use for the parallel computations. -1 means all the CPU cores, 1 means no parallelization.

configuration_multiplayers_nonstationary.NB_PLAYERS = 3

NB_PLAYERS : number of players for the game. Should be >= 2 and <= number of arms.

configuration_multiplayers_nonstationary.collisionModel(t, arms, players, choices, rewards, pulls, collisions)

The best collision model: none of the colliding users get any reward

configuration_multiplayers_nonstationary.VARIANCE = 0.05

Variance of Gaussian arms

configuration_multiplayers_nonstationary.CACHE_REWARDS = False

Should we cache rewards? The random rewards will be the same for all the REPETITIONS simulations for each algorithms.

configuration_multiplayers_nonstationary.NB_ARMS = 6

Number of arms for non-hard-coded problems (Bayesian problems)

configuration_multiplayers_nonstationary.LOWER = 0.0

Default value for the lower value of means

configuration_multiplayers_nonstationary.AMPLITUDE = 1.0

Default value for the amplitude value of means

configuration_multiplayers_nonstationary.ARM_TYPE

alias of Arms.Bernoulli.Bernoulli

configuration_multiplayers_nonstationary.ENVIRONMENT_BAYESIAN = False

True to use bayesian problem

configuration_multiplayers_nonstationary.MEANS = [0.05, 0.22999999999999998, 0.41, 0.5900000000000001, 0.77, 0.95]

Means of arms for non-hard-coded problems (non Bayesian)

configuration_multiplayers_nonstationary.configuration = {'averageOn': 0.001, 'collisionModel': <function onlyUniqUserGetsReward>, 'environment': [{'arm_type': <class 'Arms.Bernoulli.Bernoulli'>, 'params': {'listOfMeans': [[0.3, 0.5, 0.9], [0.3, 0.2, 0.9], [0.3, 0.2, 0.1], [0.7, 0.2, 0.1], [0.7, 0.5, 0.1]], 'changePoints': [0, 200, 400, 600, 800]}}, {'arm_type': <class 'Arms.Bernoulli.Bernoulli'>, 'params': {'listOfMeans': [[0.4, 0.5, 0.9], [0.5, 0.4, 0.7], [0.6, 0.3, 0.5], [0.7, 0.2, 0.3], [0.8, 0.1, 0.1]], 'changePoints': [0, 200, 400, 600, 800]}}], 'finalRanksOnAverage': True, 'horizon': 1000, 'n_jobs': -1, 'nb_break_points': 4, 'players': [Selfish(GLR-UCB(Local, Localization)), Selfish(GLR-UCB(Local, Localization)), Selfish(GLR-UCB(Local, Localization))], 'plot_lowerbounds': True, 'repetitions': 200, 'successive_players': [[rhoRand(kl-UCB), rhoRand(kl-UCB), rhoRand(kl-UCB)], [rhoRand(GLR-klUCB_forGLR(Local, Localization, $\Delta n=20$, $\Delta s=20$)), rhoRand(GLR-klUCB_forGLR(Local, Localization, $\Delta n=20$, $\Delta s=20$)), rhoRand(GLR-klUCB_forGLR(Local, Localization, $\Delta n=20$, $\Delta s=20$))], [RandTopM(kl-UCB), RandTopM(kl-UCB), RandTopM(kl-UCB)], [RandTopM(GLR-klUCB_forGLR(Local, Localization, $\Delta n=20$, $\Delta s=20$)), RandTopM(GLR-klUCB_forGLR(Local, Localization, $\Delta n=20$, $\Delta s=20$)), RandTopM(GLR-klUCB_forGLR(Local, Localization, $\Delta n=20$, $\Delta s=20$))], [MCTopM(kl-UCB), MCTopM(kl-UCB), MCTopM(kl-UCB)], [MCTopM(GLR-klUCB_forGLR(Local, Localization, $\Delta n=20$, $\Delta s=20$)), MCTopM(GLR-klUCB_forGLR(Local, Localization, $\Delta n=20$, $\Delta s=20$)), MCTopM(GLR-klUCB_forGLR(Local, Localization, $\Delta n=20$, $\Delta s=20$))], [Selfish(Thompson Sampling), Selfish(Thompson Sampling), Selfish(Thompson Sampling)], [Selfish(kl-UCB), Selfish(kl-UCB), Selfish(kl-UCB)], [Selfish(Oracle-klUCB), Selfish(Oracle-klUCB), Selfish(Oracle-klUCB)], [Selfish(DiscountedThompson($\gamma=0.99$)), Selfish(DiscountedThompson($\gamma=0.99$)), Selfish(DiscountedThompson($\gamma=0.99$))], [Selfish(M-klUCB($w=60$, Global)), Selfish(M-klUCB($w=60$, Global)), Selfish(M-klUCB($w=60$, Global))], [Selfish(CUSUM-klUCB(Localization, lazy detect 20)), Selfish(CUSUM-klUCB(Localization, lazy detect 20)), Selfish(CUSUM-klUCB(Localization, lazy detect 20))], [Selfish(GLR-klUCB_forGLR(Local, Localization, $\Delta n=20$, $\Delta s=20$)), Selfish(GLR-klUCB_forGLR(Local, Localization, $\Delta n=20$, $\Delta s=20$)), Selfish(GLR-klUCB_forGLR(Local, Localization, $\Delta n=20$, $\Delta s=20$))], [CentralizedMultiplePlay(kl-UCB), CentralizedMultiplePlay(kl-UCB), CentralizedMultiplePlay(kl-UCB)], [CentralizedMultiplePlay(Oracle-klUCB), CentralizedMultiplePlay(Oracle-klUCB), CentralizedMultiplePlay(Oracle-klUCB)], [CentralizedMultiplePlay(DiscountedThompson($\gamma=0.99$)), CentralizedMultiplePlay(DiscountedThompson($\gamma=0.99$)), CentralizedMultiplePlay(DiscountedThompson($\gamma=0.99$))], [CentralizedMultiplePlay(M-klUCB($w=60$, Global)), CentralizedMultiplePlay(M-klUCB($w=60$, Global)), CentralizedMultiplePlay(M-klUCB($w=60$, Global))], [CentralizedMultiplePlay(CUSUM-klUCB(Localization, lazy detect 20)), CentralizedMultiplePlay(CUSUM-klUCB(Localization, lazy detect 20)), CentralizedMultiplePlay(CUSUM-klUCB(Localization, lazy detect 20))], [CentralizedMultiplePlay(GLR-klUCB_forGLR(Local, Localization, $\Delta n=20$, $\Delta s=20$)), CentralizedMultiplePlay(GLR-klUCB_forGLR(Local, Localization, $\Delta n=20$, $\Delta s=20$)), CentralizedMultiplePlay(GLR-klUCB_forGLR(Local, Localization, $\Delta n=20$, $\Delta s=20$))], [<Policies.MusicalChair.MusicalChair object>, <Policies.MusicalChair.MusicalChair object>, <Policies.MusicalChair.MusicalChair object>], [<Policies.MusicalChair.MusicalChair object>, <Policies.MusicalChair.MusicalChair object>, <Policies.MusicalChair.MusicalChair object>], [<Policies.MusicalChair.MusicalChair object>, <Policies.MusicalChair.MusicalChair object>, <Policies.MusicalChair.MusicalChair object>], [<Policies.SIC_MMAB.SIC_MMAB_klUCB object>, <Policies.SIC_MMAB.SIC_MMAB_klUCB object>, <Policies.SIC_MMAB.SIC_MMAB_klUCB object>]], 'verbosity': 6}

This dictionary configures the experiments

configuration_multiplayers_nonstationary.NB_BREAK_POINTS = 4

Number of true breakpoints. They are uniformly spaced in time steps (and the first one at t=0 does not count).

configuration_multiplayers_nonstationary.nbArms = 3

Number of arms in the first environment

configuration_multiplayers_nonstationary.klucb

Warning: if using Exponential or Gaussian arms, gives klExp or klGauss to KL-UCB-like policies!

configuration_multiplayers_nonstationary.WINDOW_SIZE = 60

Default window size \(w\) for the M-UCB and SW-UCB algorithm.