configuration module¶

Configuration for the simulations, for the single-player case.

configuration.CPU_COUNT = 4¶: Number of CPU on the local machine

configuration.HORIZON = 10000¶: HORIZON : number of time steps of the experiments. Warning Should be >= 10000 to be interesting “asymptotically”.

configuration.DO_PARALLEL = True¶: To profile the code, turn down parallel computing

configuration.N_JOBS = -1¶: Number of jobs to use for the parallel computations. -1 means all the CPU cores, 1 means no parallelization.

configuration.REPETITIONS = 4¶: REPETITIONS : number of repetitions of the experiments. Warning: Should be >= 10 to be statistically trustworthy.

configuration.RANDOM_SHUFFLE = False¶: The arms won’t be shuffled (shuffle(arms)).

configuration.RANDOM_INVERT = False¶: The arms won’t be inverted (arms = arms[::-1]).

configuration.NB_BREAK_POINTS = 0¶: Number of true breakpoints. They are uniformly spaced in time steps (and the first one at t=0 does not count).

configuration.EPSILON = 0.1¶: Parameters for the epsilon-greedy and epsilon-… policies.

configuration.TEMPERATURE = 0.05¶: Temperature for the Softmax policies.

configuration.LEARNING_RATE = 0.01¶: Learning rate for my aggregated bandit (it can be autotuned)

configuration.TEST_WrapRange = False¶: To know if my WrapRange policy is tested.

configuration.CACHE_REWARDS = True¶: Should we cache rewards? The random rewards will be the same for all the REPETITIONS simulations for each algorithms.

configuration.UPDATE_ALL_CHILDREN = False¶: Should the Aggregator policy update the trusts in each child or just the one trusted for last decision?

configuration.UNBIASED = False¶: Should the rewards for Aggregator policy use as biased estimator, ie just r_t, or unbiased estimators, r_t / p_t

configuration.UPDATE_LIKE_EXP4 = False¶: Should we update the trusts proba like in Exp4 or like in my initial Aggregator proposal

configuration.UNBOUNDED_VARIANCE = 1¶: Variance of unbounded Gaussian arms

configuration.NB_ARMS = 9¶: Number of arms for non-hard-coded problems (Bayesian problems)

configuration.LOWER = 0.0¶: Default value for the lower value of means

configuration.AMPLITUDE = 1.0¶: Default value for the amplitude value of means

configuration.VARIANCE = 0.05¶: Variance of Gaussian arms

configuration.ARM_TYPE¶: alias of Arms.Bernoulli.Bernoulli

configuration.ENVIRONMENT_BAYESIAN = False¶: True to use bayesian problem

configuration.MEANS = [0.05, 0.16249999999999998, 0.27499999999999997, 0.38749999999999996, 0.49999999999999994, 0.6125, 0.725, 0.8374999999999999, 0.95]¶: Means of arms for non-hard-coded problems (non Bayesian)

configuration.USE_FULL_RESTART = True¶: True to use full-restart Doubling Trick

configuration.configuration = {'append_labels': {}, 'cache_rewards': True, 'change_labels': {0: 'Pure exploration', 1: 'Pure exploitation', 2: '$\\varepsilon$-greedy', 3: 'Explore-then-Exploit', 5: 'Bernoulli kl-UCB', 6: 'Thompson sampling'}, 'environment': [{'arm_type': <class 'Arms.Bernoulli.Bernoulli'>, 'params': [0.1, 0.2, 0.30000000000000004, 0.4, 0.5, 0.6, 0.7000000000000001, 0.8, 0.9]}], 'environment_bayesian': False, 'horizon': 10000, 'n_jobs': -1, 'nb_break_points': 0, 'plot_lowerbound': True, 'policies': [{'archtype': <class 'Policies.Uniform.Uniform'>, 'params': {}, 'change_label': 'Pure exploration'}, {'archtype': <class 'Policies.EmpiricalMeans.EmpiricalMeans'>, 'params': {}, 'change_label': 'Pure exploitation'}, {'archtype': <class 'Policies.EpsilonGreedy.EpsilonDecreasing'>, 'params': {'epsilon': 479.99999999999983}, 'change_label': '$\\varepsilon$-greedy'}, {'archtype': <class 'Policies.ExploreThenCommit.ETC_KnownGap'>, 'params': {'horizon': 10000, 'gap': 0.11250000000000004}, 'change_label': 'Explore-then-Exploit'}, {'archtype': <class 'Policies.UCBalpha.UCBalpha'>, 'params': {'alpha': 1}}, {'archtype': <class 'Policies.klUCB.klUCB'>, 'params': {'klucb': CPUDispatcher(<function klucbBern>)}, 'change_label': 'Bernoulli kl-UCB'}, {'archtype': <class 'Policies.Thompson.Thompson'>, 'params': {'posterior': <class 'Policies.Posterior.Beta.Beta'>}, 'change_label': 'Thompson sampling'}], 'random_invert': False, 'random_shuffle': False, 'repetitions': 4, 'verbosity': 6}¶: This dictionary configures the experiments

configuration.nbArms = 9¶: Number of arms in the first environment

configuration.klucb¶: Warning: if using Exponential or Gaussian arms, gives klExp or klGauss to KL-UCB-like policies!