Loaded experiments configuration from 'configuration.py' : configuration['policies'] = [{'archtype': , 'params': {'alpha': 1}}, {'archtype': , 'params': {}}, {'archtype': , 'params': {'horizon': 10000}}, {'archtype': , 'params': {'alpha': 1.35}}, {'archtype': , 'params': {}}, {'archtype': , 'params': {'posterior': }}, {'archtype': , 'params': {'klucb': }}, {'archtype': , 'params': {'horizon': 10000, 'klucb': }}, {'archtype': , 'params': {'horizon': 10000, 'klucb': , 'threshold': 'best'}}, {'archtype': , 'params': {'horizon': 10000, 'klucb': , 'threshold': 'delayed'}}, {'archtype': , 'params': {'klucb': , 'threshold': 'best'}}, {'archtype': , 'params': {'klucb': , 'threshold': 'delayed'}}, {'archtype': , 'params': {'posterior': }}, {'archtype': , 'params': {'alpha': 0.5, 'horizon': 10000}}, {'archtype': , 'params': {'alpha': 0.5, 'horizon': 10500}}, {'archtype': , 'params': {}}] ====> TURNING NOPLOTS MODE ON <===== ====> TURNING DEBUG MODE ON <===== plots/ is already a directory here... Number of policies in this comparison: 16 Time horizon: 10000 Number of repetitions: 1 Sampling rate for plotting, delta_t_plot: 1 Number of jobs for parallelization: 1 Creating a new MAB problem ... Reading arms of this MAB problem from a dictionnary 'configuration' = {'arm_type': , 'params': [0.1, 0.2, 0.30000000000000004, 0.4, 0.5, 0.6, 0.7000000000000001, 0.8, 0.9]} ... - with 'arm_type' = - with 'params' = [0.1, 0.2, 0.30000000000000004, 0.4, 0.5, 0.6, 0.7000000000000001, 0.8, 0.9] - with 'arms' = [B(0.1), B(0.2), B(0.3), B(0.4), B(0.5), B(0.6), B(0.7), B(0.8), B(0.9)] - with 'means' = [0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9] - with 'nbArms' = 9 - with 'maxArm' = 0.9 - with 'minArm' = 0.1 This MAB problem has: - a [Lai & Robbins] complexity constant C(mu) = 7.52 ... - a Optimal Arm Identification factor H_OI(mu) = 48.89% ... - with 'arms' represented as: $[B(0.1), B(0.2), B(0.3), B(0.4), B(0.5), B(0.6), B(0.7), B(0.8), B(0.9)^*]$ Number of environments to try: 1 Evaluating environment: MAB(nbArms: 9, arms: [B(0.1), B(0.2), B(0.3), B(0.4), B(0.5), B(0.6), B(0.7), B(0.8), B(0.9)], minArm: 0.1, maxArm: 0.9) - Adding policy #1 = {'archtype': , 'params': {'alpha': 1}} ... Creating this policy from a dictionnary 'self.cfg['policies'][0]' = {'archtype': , 'params': {'alpha': 1}} ... - Adding policy #2 = {'archtype': , 'params': {}} ... Creating this policy from a dictionnary 'self.cfg['policies'][1]' = {'archtype': , 'params': {}} ... - Adding policy #3 = {'archtype': , 'params': {'horizon': 10000}} ... Creating this policy from a dictionnary 'self.cfg['policies'][2]' = {'archtype': , 'params': {'horizon': 10000}} ... - Adding policy #4 = {'archtype': , 'params': {'alpha': 1.35}} ... Creating this policy from a dictionnary 'self.cfg['policies'][3]' = {'archtype': , 'params': {'alpha': 1.35}} ... - Adding policy #5 = {'archtype': , 'params': {}} ... Creating this policy from a dictionnary 'self.cfg['policies'][4]' = {'archtype': , 'params': {}} ... - Adding policy #6 = {'archtype': , 'params': {'posterior': }} ... Creating this policy from a dictionnary 'self.cfg['policies'][5]' = {'archtype': , 'params': {'posterior': }} ... - Adding policy #7 = {'archtype': , 'params': {'klucb': }} ... Creating this policy from a dictionnary 'self.cfg['policies'][6]' = {'archtype': , 'params': {'klucb': }} ... - Adding policy #8 = {'archtype': , 'params': {'horizon': 10000, 'klucb': }} ... Creating this policy from a dictionnary 'self.cfg['policies'][7]' = {'archtype': , 'params': {'horizon': 10000, 'klucb': }} ... - Adding policy #9 = {'archtype': , 'params': {'horizon': 10000, 'klucb': , 'threshold': 'best'}} ... Creating this policy from a dictionnary 'self.cfg['policies'][8]' = {'archtype': , 'params': {'horizon': 10000, 'klucb': , 'threshold': 'best'}} ... - Adding policy #10 = {'archtype': , 'params': {'horizon': 10000, 'klucb': , 'threshold': 'delayed'}} ... Creating this policy from a dictionnary 'self.cfg['policies'][9]' = {'archtype': , 'params': {'horizon': 10000, 'klucb': , 'threshold': 'delayed'}} ... - Adding policy #11 = {'archtype': , 'params': {'klucb': , 'threshold': 'best'}} ... Creating this policy from a dictionnary 'self.cfg['policies'][10]' = {'archtype': , 'params': {'klucb': , 'threshold': 'best'}} ... - Adding policy #12 = {'archtype': , 'params': {'klucb': , 'threshold': 'delayed'}} ... Creating this policy from a dictionnary 'self.cfg['policies'][11]' = {'archtype': , 'params': {'klucb': , 'threshold': 'delayed'}} ... - Adding policy #13 = {'archtype': , 'params': {'posterior': }} ... Creating this policy from a dictionnary 'self.cfg['policies'][12]' = {'archtype': , 'params': {'posterior': }} ... - Adding policy #14 = {'archtype': , 'params': {'alpha': 0.5, 'horizon': 10000}} ... Creating this policy from a dictionnary 'self.cfg['policies'][13]' = {'archtype': , 'params': {'alpha': 0.5, 'horizon': 10000}} ... - Adding policy #15 = {'archtype': , 'params': {'alpha': 0.5, 'horizon': 10500}} ... Creating this policy from a dictionnary 'self.cfg['policies'][14]' = {'archtype': , 'params': {'alpha': 0.5, 'horizon': 10500}} ... - Adding policy #16 = {'archtype': , 'params': {}} ... Creating this policy from a dictionnary 'self.cfg['policies'][15]' = {'archtype': , 'params': {}} ... - Evaluating policy #1/16: UCB($\alpha=1$) ... Estimated order by the policy UCB($\alpha=1$) after 10000 steps: [0 1 4 5 6 2 3 7 8] ... ==> Optimal arm identification: 100.00% (relative success)... ==> Manhattan distance from optimal ordering: 70.37% (relative success)... ==> Gestalt distance from optimal ordering: 77.78% (relative success)... ==> Mean distance from optimal ordering: 74.07% (relative success)... - Evaluating policy #2/16: MOSS ... Estimated order by the policy MOSS after 10000 steps: [0 1 2 4 6 7 5 3 8] ... ==> Optimal arm identification: 100.00% (relative success)... ==> Manhattan distance from optimal ordering: 75.31% (relative success)... ==> Gestalt distance from optimal ordering: 77.78% (relative success)... ==> Mean distance from optimal ordering: 76.54% (relative success)... - Evaluating policy #3/16: MOSS-H($T=10000$) ... Estimated order by the policy MOSS-H($T=10000$) after 10000 steps: [1 0 5 2 6 7 3 4 8] ... ==> Optimal arm identification: 100.00% (relative success)... ==> Manhattan distance from optimal ordering: 60.49% (relative success)... ==> Gestalt distance from optimal ordering: 55.56% (relative success)... ==> Mean distance from optimal ordering: 58.02% (relative success)... - Evaluating policy #4/16: MOSS-Anytime($\alpha=1.35$) ... Estimated order by the policy MOSS-Anytime($\alpha=1.35$) after 10000 steps: [1 0 3 2 5 6 7 4 8] ... ==> Optimal arm identification: 100.00% (relative success)... ==> Manhattan distance from optimal ordering: 75.31% (relative success)... ==> Gestalt distance from optimal ordering: 66.67% (relative success)... ==> Mean distance from optimal ordering: 70.99% (relative success)... - Evaluating policy #5/16: DMED$^+$(Bern) ... Estimated order by the policy DMED$^+$(Bern) after 10000 steps: [7 3 8 1 6 4 5 0 2] ... ==> Optimal arm identification: 33.33% (relative success)... ==> Manhattan distance from optimal ordering: 16.05% (relative success)... ==> Gestalt distance from optimal ordering: 33.33% (relative success)... ==> Mean distance from optimal ordering: 24.69% (relative success)... - Evaluating policy #6/16: Thompson ... Estimated order by the policy Thompson after 10000 steps: [1 3 2 4 5 0 6 7 8] ... ==> Optimal arm identification: 100.00% (relative success)... ==> Manhattan distance from optimal ordering: 75.31% (relative success)... ==> Gestalt distance from optimal ordering: 77.78% (relative success)... ==> Mean distance from optimal ordering: 76.54% (relative success)... - Evaluating policy #7/16: kl-UCB ... Estimated order by the policy kl-UCB after 10000 steps: [2 3 6 0 4 1 5 7 8] ... ==> Optimal arm identification: 100.00% (relative success)... ==> Manhattan distance from optimal ordering: 60.49% (relative success)... ==> Gestalt distance from optimal ordering: 55.56% (relative success)... ==> Mean distance from optimal ordering: 58.02% (relative success)... - Evaluating policy #8/16: kl-UCB$^{++}$($T=10000$) ... Estimated order by the policy kl-UCB$^{++}$($T=10000$) after 10000 steps: [0 3 6 4 1 5 7 2 8] ... ==> Optimal arm identification: 100.00% (relative success)... ==> Manhattan distance from optimal ordering: 60.49% (relative success)... ==> Gestalt distance from optimal ordering: 55.56% (relative success)... ==> Mean distance from optimal ordering: 58.02% (relative success)... - Evaluating policy #9/16: kl-UCB-switch($T=10000$) ... Estimated order by the policy kl-UCB-switch($T=10000$) after 10000 steps: [0 2 3 4 1 6 5 7 8] ... ==> Optimal arm identification: 100.00% (relative success)... ==> Manhattan distance from optimal ordering: 80.25% (relative success)... ==> Gestalt distance from optimal ordering: 77.78% (relative success)... ==> Mean distance from optimal ordering: 79.01% (relative success)... - Evaluating policy #10/16: kl-UCB-switch($T=10000$, delayed f) ... Estimated order by the policy kl-UCB-switch($T=10000$, delayed f) after 10000 steps: [3 2 1 0 5 4 7 6 8] ... ==> Optimal arm identification: 100.00% (relative success)... ==> Manhattan distance from optimal ordering: 70.37% (relative success)... ==> Gestalt distance from optimal ordering: 44.44% (relative success)... ==> Mean distance from optimal ordering: 57.41% (relative success)... - Evaluating policy #11/16: kl-UCB-switch ... Estimated order by the policy kl-UCB-switch after 10000 steps: [0 1 3 4 5 6 2 7 8] ... ==> Optimal arm identification: 100.00% (relative success)... ==> Manhattan distance from optimal ordering: 80.25% (relative success)... ==> Gestalt distance from optimal ordering: 88.89% (relative success)... ==> Mean distance from optimal ordering: 84.57% (relative success)... - Evaluating policy #12/16: kl-UCB-switch(delayed f) ... Estimated order by the policy kl-UCB-switch(delayed f) after 10000 steps: [5 0 1 3 4 2 7 6 8] ... ==> Optimal arm identification: 100.00% (relative success)... ==> Manhattan distance from optimal ordering: 70.37% (relative success)... ==> Gestalt distance from optimal ordering: 66.67% (relative success)... ==> Mean distance from optimal ordering: 68.52% (relative success)... - Evaluating policy #13/16: BayesUCB ... Estimated order by the policy BayesUCB after 10000 steps: [2 4 5 0 1 3 6 7 8] ... ==> Optimal arm identification: 100.00% (relative success)... ==> Manhattan distance from optimal ordering: 60.49% (relative success)... ==> Gestalt distance from optimal ordering: 66.67% (relative success)... ==> Mean distance from optimal ordering: 63.58% (relative success)... - Evaluating policy #14/16: AdBandits($T=10000$, $\alpha=0.5$) ... Estimated order by the policy AdBandits($T=10000$, $\alpha=0.5$) after 10000 steps: [2 8 0 4 3 5 1 7 6] ... ==> Optimal arm identification: 77.78% (relative success)... ==> Manhattan distance from optimal ordering: 50.62% (relative success)... ==> Gestalt distance from optimal ordering: 22.22% (relative success)... ==> Mean distance from optimal ordering: 36.42% (relative success)... - Evaluating policy #15/16: ApprFHG($T=10500$) ... Estimated order by the policy ApprFHG($T=10500$) after 10000 steps: [1 2 0 3 4 5 6 7 8] ... ==> Optimal arm identification: 100.00% (relative success)... ==> Manhattan distance from optimal ordering: 90.12% (relative success)... ==> Gestalt distance from optimal ordering: 88.89% (relative success)... ==> Mean distance from optimal ordering: 89.51% (relative success)... - Evaluating policy #16/16: $\mathrm{UCB}_{d=d_{lb}}$($c=0$) ... Estimated order by the policy $\mathrm{UCB}_{d=d_{lb}}$($c=0$) after 10000 steps: [2 0 1 7 5 3 4 6 8] ... ==> Optimal arm identification: 100.00% (relative success)... ==> Manhattan distance from optimal ordering: 65.43% (relative success)... ==> Gestalt distance from optimal ordering: 66.67% (relative success)... ==> Mean distance from optimal ordering: 66.05% (relative success)... Giving the vector of final regrets ... For policy #0 called 'UCB($\alpha=1$)' ... Last regrets (for all repetitions) have: Min of last regrets R_T = 101 Mean of last regrets R_T = 101 Median of last regrets R_T = 101 Max of last regrets R_T = 101 STD of last regrets R_T = 0 For policy #1 called 'MOSS' ... Last regrets (for all repetitions) have: Min of last regrets R_T = 84.9 Mean of last regrets R_T = 84.9 Median of last regrets R_T = 84.9 Max of last regrets R_T = 84.9 STD of last regrets R_T = 0 For policy #2 called 'MOSS-H($T=10000$)' ... Last regrets (for all repetitions) have: Min of last regrets R_T = 77 Mean of last regrets R_T = 77 Median of last regrets R_T = 77 Max of last regrets R_T = 77 STD of last regrets R_T = 0 For policy #3 called 'MOSS-Anytime($\alpha=1.35$)' ... Last regrets (for all repetitions) have: Min of last regrets R_T = 85.7 Mean of last regrets R_T = 85.7 Median of last regrets R_T = 85.7 Max of last regrets R_T = 85.7 STD of last regrets R_T = 0 For policy #4 called 'DMED$^+$(Bern)' ... Last regrets (for all repetitions) have: Min of last regrets R_T = 51.1 Mean of last regrets R_T = 51.1 Median of last regrets R_T = 51.1 Max of last regrets R_T = 51.1 STD of last regrets R_T = 0 For policy #5 called 'Thompson' ... Last regrets (for all repetitions) have: Min of last regrets R_T = 51 Mean of last regrets R_T = 51 Median of last regrets R_T = 51 Max of last regrets R_T = 51 STD of last regrets R_T = 0 For policy #6 called 'kl-UCB' ... Last regrets (for all repetitions) have: Min of last regrets R_T = 54.4 Mean of last regrets R_T = 54.4 Median of last regrets R_T = 54.4 Max of last regrets R_T = 54.4 STD of last regrets R_T = 0 For policy #7 called 'kl-UCB$^{++}$($T=10000$)' ... Last regrets (for all repetitions) have: Min of last regrets R_T = 52.2 Mean of last regrets R_T = 52.2 Median of last regrets R_T = 52.2 Max of last regrets R_T = 52.2 STD of last regrets R_T = 0 For policy #8 called 'kl-UCB-switch($T=10000$)' ... Last regrets (for all repetitions) have: Min of last regrets R_T = 52.6 Mean of last regrets R_T = 52.6 Median of last regrets R_T = 52.6 Max of last regrets R_T = 52.6 STD of last regrets R_T = 0 For policy #9 called 'kl-UCB-switch($T=10000$, delayed f)' ... Last regrets (for all repetitions) have: Min of last regrets R_T = 59.5 Mean of last regrets R_T = 59.5 Median of last regrets R_T = 59.5 Max of last regrets R_T = 59.5 STD of last regrets R_T = 0 For policy #10 called 'kl-UCB-switch' ... Last regrets (for all repetitions) have: Min of last regrets R_T = 43.6 Mean of last regrets R_T = 43.6 Median of last regrets R_T = 43.6 Max of last regrets R_T = 43.6 STD of last regrets R_T = 0 For policy #11 called 'kl-UCB-switch(delayed f)' ... Last regrets (for all repetitions) have: Min of last regrets R_T = 52 Mean of last regrets R_T = 52 Median of last regrets R_T = 52 Max of last regrets R_T = 52 STD of last regrets R_T = 0 For policy #12 called 'BayesUCB' ... Last regrets (for all repetitions) have: Min of last regrets R_T = 52.5 Mean of last regrets R_T = 52.5 Median of last regrets R_T = 52.5 Max of last regrets R_T = 52.5 STD of last regrets R_T = 0 For policy #13 called 'AdBandits($T=10000$, $\alpha=0.5$)' ... Last regrets (for all repetitions) have: Min of last regrets R_T = 33.5 Mean of last regrets R_T = 33.5 Median of last regrets R_T = 33.5 Max of last regrets R_T = 33.5 STD of last regrets R_T = 0 For policy #14 called 'ApprFHG($T=10500$)' ... Last regrets (for all repetitions) have: Min of last regrets R_T = 101 Mean of last regrets R_T = 101 Median of last regrets R_T = 101 Max of last regrets R_T = 101 STD of last regrets R_T = 0 For policy #15 called '$\mathrm{UCB}_{d=d_{lb}}$($c=0$)' ... Last regrets (for all repetitions) have: Min of last regrets R_T = 26 Mean of last regrets R_T = 26 Median of last regrets R_T = 26 Max of last regrets R_T = 26 STD of last regrets R_T = 0 Giving the final ranks ... Final ranking for this environment #0 : - Policy '$\mathrm{UCB}_{d=d_{lb}}$($c=0$)' was ranked 1 / 16 for this simulation (last regret = 26). - Policy 'AdBandits($T=10000$, $\alpha=0.5$)' was ranked 2 / 16 for this simulation (last regret = 33.5). - Policy 'kl-UCB-switch' was ranked 3 / 16 for this simulation (last regret = 43.6). - Policy 'Thompson' was ranked 4 / 16 for this simulation (last regret = 50.9). - Policy 'DMED$^+$(Bern)' was ranked 5 / 16 for this simulation (last regret = 51.1). - Policy 'kl-UCB-switch(delayed f)' was ranked 6 / 16 for this simulation (last regret = 52). - Policy 'kl-UCB$^{++}$($T=10000$)' was ranked 7 / 16 for this simulation (last regret = 52.2). - Policy 'BayesUCB' was ranked 8 / 16 for this simulation (last regret = 52.5). - Policy 'kl-UCB-switch($T=10000$)' was ranked 9 / 16 for this simulation (last regret = 52.6). - Policy 'kl-UCB' was ranked 10 / 16 for this simulation (last regret = 54.4). - Policy 'kl-UCB-switch($T=10000$, delayed f)' was ranked 11 / 16 for this simulation (last regret = 59.5). - Policy 'MOSS-H($T=10000$)' was ranked 12 / 16 for this simulation (last regret = 77). - Policy 'MOSS' was ranked 13 / 16 for this simulation (last regret = 84.9). - Policy 'MOSS-Anytime($\alpha=1.35$)' was ranked 14 / 16 for this simulation (last regret = 85.7). - Policy 'UCB($\alpha=1$)' was ranked 15 / 16 for this simulation (last regret = 100.6). - Policy 'ApprFHG($T=10500$)' was ranked 16 / 16 for this simulation (last regret = 101.1). Giving the mean and std running times ... For policy #1 called 'MOSS' ... 1.04 s ± 0 ns per loop (mean ± std. dev. of 1 run) For policy #0 called 'UCB($\alpha=1$)' ... 1.05 s ± 0 ns per loop (mean ± std. dev. of 1 run) For policy #3 called 'MOSS-Anytime($\alpha=1.35$)' ... 1.07 s ± 0 ns per loop (mean ± std. dev. of 1 run) For policy #2 called 'MOSS-H($T=10000$)' ... 1.09 s ± 0 ns per loop (mean ± std. dev. of 1 run) For policy #5 called 'Thompson' ... 1.17 s ± 0 ns per loop (mean ± std. dev. of 1 run) For policy #14 called 'ApprFHG($T=10500$)' ... 1.28 s ± 0 ns per loop (mean ± std. dev. of 1 run) For policy #4 called 'DMED$^+$(Bern)' ... 2.02 s ± 0 ns per loop (mean ± std. dev. of 1 run) For policy #8 called 'kl-UCB-switch($T=10000$)' ... 2.36 s ± 0 ns per loop (mean ± std. dev. of 1 run) For policy #9 called 'kl-UCB-switch($T=10000$, delayed f)' ... 2.4 s ± 0 ns per loop (mean ± std. dev. of 1 run) For policy #10 called 'kl-UCB-switch' ... 2.43 s ± 0 ns per loop (mean ± std. dev. of 1 run) For policy #6 called 'kl-UCB' ... 2.45 s ± 0 ns per loop (mean ± std. dev. of 1 run) For policy #13 called 'AdBandits($T=10000$, $\alpha=0.5$)' ... 2.48 s ± 0 ns per loop (mean ± std. dev. of 1 run) For policy #11 called 'kl-UCB-switch(delayed f)' ... 2.49 s ± 0 ns per loop (mean ± std. dev. of 1 run) For policy #12 called 'BayesUCB' ... 2.65 s ± 0 ns per loop (mean ± std. dev. of 1 run) For policy #7 called 'kl-UCB$^{++}$($T=10000$)' ... 2.81 s ± 0 ns per loop (mean ± std. dev. of 1 run) For policy #15 called '$\mathrm{UCB}_{d=d_{lb}}$($c=0$)' ... 3 s ± 0 ns per loop (mean ± std. dev. of 1 run) Giving the mean and std memory consumption ... For policy #10 called 'kl-UCB-switch' ... 16 B (mean of 1 run) For policy #8 called 'kl-UCB-switch($T=10000$)' ... 24 B (mean of 1 run) For policy #9 called 'kl-UCB-switch($T=10000$, delayed f)' ... 24 B (mean of 1 run) For policy #7 called 'kl-UCB$^{++}$($T=10000$)' ... 40 B (mean of 1 run) For policy #0 called 'UCB($\alpha=1$)' ... 260 B (mean of 1 run) For policy #1 called 'MOSS' ... nan YiB (mean of 1 run) For policy #2 called 'MOSS-H($T=10000$)' ... nan YiB (mean of 1 run) For policy #3 called 'MOSS-Anytime($\alpha=1.35$)' ... nan YiB (mean of 1 run) For policy #4 called 'DMED$^+$(Bern)' ... nan YiB (mean of 1 run) For policy #5 called 'Thompson' ... nan YiB (mean of 1 run) For policy #6 called 'kl-UCB' ... nan YiB (mean of 1 run) For policy #11 called 'kl-UCB-switch(delayed f)' ... nan YiB (mean of 1 run) For policy #12 called 'BayesUCB' ... nan YiB (mean of 1 run) For policy #13 called 'AdBandits($T=10000$, $\alpha=0.5$)' ... nan YiB (mean of 1 run) For policy #14 called 'ApprFHG($T=10500$)' ... nan YiB (mean of 1 run) For policy #15 called '$\mathrm{UCB}_{d=d_{lb}}$($c=0$)' ... nan YiB (mean of 1 run) Done for simulations main.py ... Top 20 lines ranked by memory consumption: #1: Environment/Evaluator.py:131: 11 MiB self.allPulls[env] = np.zeros((self.nbPolicies, self.envs[env].nbArms, self.horizon)) #2: python3.6/linecache.py:137: 2.1 MiB lines = fp.readlines() #3: python3.6/abc.py:133: 1.4 MiB cls = super().__new__(mcls, name, bases, namespace, **kwargs) #4: Environment/Evaluator.py:111: 1.2 MiB self.rewards = np.zeros((self.nbPolicies, len(self.envs), self.horizon)) #: For each env, history of rewards, ie accumulated rewards #5: Environment/Evaluator.py:114: 1.2 MiB self.maxCumRewards = -np.inf + np.zeros((self.nbPolicies, len(self.envs), self.horizon)) #: For each env, history of maximum of rewards, to compute amplitude (+- STD) #6: Environment/Evaluator.py:113: 1.2 MiB self.minCumRewards = np.inf + np.zeros((self.nbPolicies, len(self.envs), self.horizon)) #: For each env, history of minimum of rewards, to compute amplitude (+- STD) #7: Environment/Evaluator.py:129: 1.2 MiB self.bestArmPulls[env] = np.zeros((self.nbPolicies, self.horizon)) #8: collections/__init__.py:429: 1.1 MiB exec(class_definition, namespace) #9: json/decoder.py:355: 579.6 KiB obj, end = self.scan_once(s, idx) #10: python3.6/_weakrefset.py:37: 511.8 KiB self.data = set() #11: python3.6/functools.py:67: 419.9 KiB getattr(wrapper, attr).update(getattr(wrapped, attr, {})) #12: python3.6/_weakrefset.py:38: 403.6 KiB def _remove(item, selfref=ref(self)): #13: misc/doccer.py:68: 402.3 KiB return docstring % indented #14: matplotlib/font_manager.py:965: 379.5 KiB r.__dict__.update(o) #15: python3.6/_weakrefset.py:48: 371 KiB self._iterating = set() #16: :5: 350.9 KiB #17: traitlets/traitlets.py:735: 339.8 KiB return super(MetaHasDescriptors, mcls).__new__(mcls, name, bases, classdict) #18: collections/__init__.py:423: 337.3 KiB for index, name in enumerate(field_names)) #19: typing/templates.py:654: 317.2 KiB class Template(cls): #20: stats/_distn_infrastructure.py:694: 232.4 KiB exec_(parse_arg_template % dct, ns) 61036 others: 35.5 MiB Total allocated size: 60.5 MiB Estimated order by the policy Aggregator[all non Aggr] after 10000 steps: [16 14 10 11 18 1 3 0 4 7 8 5 19 15 12 17 6 2 13 9] ... ==> Optimal arm identification: -51.79% (relative success)... ==> Manhattan distance from optimal ordering: 30.00% (relative success)... ==> Gestalt distance from optimal ordering: 20.00% (relative success)... ==> Mean distance from optimal ordering: 25.00% (relative success)... - Evaluating policy #14/16: Exp4[all non Aggr] ... Estimated order by the policy Exp4[all non Aggr] after 10000 steps: [ 0 2 1 6 5 3 7 4 9 8 10 12 13 14 11 15 16 17 18 19] ... ==> Optimal arm identification: 100.00% (relative success)... ==> Manhattan distance from optimal ordering: 90.00% (relative success)... ==> Gestalt distance from optimal ordering: 70.00% (relative success)... ==> Mean distance from optimal ordering: 80.00% (relative success)... - Evaluating policy #15/16: Aggregator[Sparse-KLUCB for s=1..20] ... Estimated order by the policy Aggregator[Sparse-KLUCB for s=1..20] after 10000 steps: [15 16 17 18 19 1 2 5 6 7 4 9 8 12 11 14 10 13 0 3] ... ==> Optimal arm identification: -83.93% (relative success)... ==> Manhattan distance from optimal ordering: 25.00% (relative success)... ==> Gestalt distance from optimal ordering: 25.00% (relative success)... ==> Mean distance from optimal ordering: 25.00% (relative success)... - Evaluating policy #16/16: Exp4[Sparse-KLUCB for s=1..20] ... Estimated order by the policy Exp4[Sparse-KLUCB for s=1..20] after 10000 steps: [15 16 17 18 19 1 8 10 4 5 6 7 12 11 9 14 13 0 3 2] ... ==> Optimal arm identification: -89.29% (relative success)... ==> Manhattan distance from optimal ordering: 20.00% (relative success)... ==> Gestalt distance from optimal ordering: 25.00% (relative success)... ==> Mean distance from optimal ordering: 22.50% (relative success)... Giving the vector of final regrets ... For policy #0 called 'EmpiricalMeans' ... Last regrets (for all repetitions) have: Min of last regrets R_T = 210 Mean of last regrets R_T = 210 Median of last regrets R_T = 210 Max of last regrets R_T = 213 STD of last regrets R_T = 0.839 For policy #1 called 'UCB($\alpha=1$)' ... Last regrets (for all repetitions) have: Min of last regrets R_T = 3.81e+03 Mean of last regrets R_T = 3.93e+03 Median of last regrets R_T = 3.92e+03 Max of last regrets R_T = 4.04e+03 STD of last regrets R_T = 49 For policy #2 called 'Thompson' ... Last regrets (for all repetitions) have: Min of last regrets R_T = 1.45e+03 Mean of last regrets R_T = 1.86e+03 Median of last regrets R_T = 1.85e+03 Max of last regrets R_T = 2.62e+03 STD of last regrets R_T = 280 For policy #3 called 'Thompson(Gauss)' ... Last regrets (for all repetitions) have: Min of last regrets R_T = 4.09e+03 Mean of last regrets R_T = 4.21e+03 Median of last regrets R_T = 4.21e+03 Max of last regrets R_T = 4.28e+03 STD of last regrets R_T = 48.6 For policy #4 called 'Sparse($s=5$)[BayesUCB, UCB for K and J]' ... Last regrets (for all repetitions) have: Min of last regrets R_T = 806 Mean of last regrets R_T = 1.34e+03 Median of last regrets R_T = 1.48e+03 Max of last regrets R_T = 1.83e+03 STD of last regrets R_T = 309 For policy #5 called 'Sparse-kl-UCB($s=5$, Bern)' ... Last regrets (for all repetitions) have: Min of last regrets R_T = 1.11e+03 Mean of last regrets R_T = 1.18e+03 Median of last regrets R_T = 1.16e+03 Max of last regrets R_T = 1.33e+03 STD of last regrets R_T = 50.3 For policy #6 called 'SparseUCB($s=5$, $\alpha=1$)' ... Last regrets (for all repetitions) have: Min of last regrets R_T = 2.13e+03 Mean of last regrets R_T = 2.53e+03 Median of last regrets R_T = 2.62e+03 Max of last regrets R_T = 2.9e+03 STD of last regrets R_T = 278 For policy #7 called 'OSSB($\varepsilon=0$, $\gamma=0$, Gauss)' ... Last regrets (for all repetitions) have: Min of last regrets R_T = 210 Mean of last regrets R_T = 211 Median of last regrets R_T = 210 Max of last regrets R_T = 218 STD of last regrets R_T = 1.91 For policy #8 called 'OSSB($\varepsilon=0$, $\gamma=0$, Gauss, $s=5$)' ... Last regrets (for all repetitions) have: Min of last regrets R_T = 213 Mean of last regrets R_T = 481 Median of last regrets R_T = 231 Max of last regrets R_T = 2.63e+03 STD of last regrets R_T = 662 For policy #9 called 'OSSB($\varepsilon=0.01$, $\gamma=0$, Gauss, $s=5$)' ... Last regrets (for all repetitions) have: Min of last regrets R_T = 214 Mean of last regrets R_T = 269 Median of last regrets R_T = 230 Max of last regrets R_T = 764 STD of last regrets R_T = 129 For policy #10 called 'OSSB($\varepsilon=0$, $\gamma=0.1$, Gauss, $s=5$)' ... Last regrets (for all repetitions) have: Min of last regrets R_T = 222 Mean of last regrets R_T = 313 Median of last regrets R_T = 249 Max of last regrets R_T = 702 STD of last regrets R_T = 128 For policy #11 called 'OSSB($\varepsilon=0.01$, $\gamma=0.1$, Gauss, $s=5$)' ... Last regrets (for all repetitions) have: Min of last regrets R_T = 214 Mean of last regrets R_T = 246 Median of last regrets R_T = 229 Max of last regrets R_T = 450 STD of last regrets R_T = 54.5 For policy #12 called 'Aggregator[all non Aggr]' ... Last regrets (for all repetitions) have: Min of last regrets R_T = 227 Mean of last regrets R_T = 364 Median of last regrets R_T = 260 Max of last regrets R_T = 1.96e+03 STD of last regrets R_T = 414 For policy #13 called 'Exp4[all non Aggr]' ... Last regrets (for all repetitions) have: Min of last regrets R_T = 210 Mean of last regrets R_T = 753 Median of last regrets R_T = 264 Max of last regrets R_T = 3.93e+03 STD of last regrets R_T = 947 For policy #14 called 'Aggregator[Sparse-KLUCB for s=1..20]' ... Last regrets (for all repetitions) have: Min of last regrets R_T = 1.13e+03 Mean of last regrets R_T = 1.26e+03 Median of last regrets R_T = 1.28e+03 Max of last regrets R_T = 1.34e+03 STD of last regrets R_T = 60.9 For policy #15 called 'Exp4[Sparse-KLUCB for s=1..20]' ... Last regrets (for all repetitions) have: Min of last regrets R_T = 1.05e+03 Mean of last regrets R_T = 1.18e+03 Median of last regrets R_T = 1.18e+03 Max of last regrets R_T = 1.27e+03 STD of last regrets R_T = 51.8 Giving the final ranks ... Final ranking for this environment #0 : - Policy 'EmpiricalMeans' was ranked 1 / 16 for this simulation (last regret = 210.37). - Policy 'OSSB($\varepsilon=0$, $\gamma=0$, Gauss)' was ranked 2 / 16 for this simulation (last regret = 210.66). - Policy 'OSSB($\varepsilon=0.01$, $\gamma=0.1$, Gauss, $s=5$)' was ranked 3 / 16 for this simulation (last regret = 246.39). - Policy 'OSSB($\varepsilon=0.01$, $\gamma=0$, Gauss, $s=5$)' was ranked 4 / 16 for this simulation (last regret = 268.63). - Policy 'OSSB($\varepsilon=0$, $\gamma=0.1$, Gauss, $s=5$)' was ranked 5 / 16 for this simulation (last regret = 313.38). - Policy 'Aggregator[all non Aggr]' was ranked 6 / 16 for this simulation (last regret = 363.86). - Policy 'OSSB($\varepsilon=0$, $\gamma=0$, Gauss, $s=5$)' was ranked 7 / 16 for this simulation (last regret = 481.31). - Policy 'Exp4[all non Aggr]' was ranked 8 / 16 for this simulation (last regret = 752.4). - Policy 'Sparse-kl-UCB($s=5$, Bern)' was ranked 9 / 16 for this simulation (last regret = 1173.3). - Policy 'Exp4[Sparse-KLUCB for s=1..20]' was ranked 10 / 16 for this simulation (last regret = 1177.7). - Policy 'Aggregator[Sparse-KLUCB for s=1..20]' was ranked 11 / 16 for this simulation (last regret = 1255.5). - Policy 'Sparse($s=5$)[BayesUCB, UCB for K and J]' was ranked 12 / 16 for this simulation (last regret = 1340.4). - Policy 'Thompson' was ranked 13 / 16 for this simulation (last regret = 1857.3). - Policy 'SparseUCB($s=5$, $\alpha=1$)' was ranked 14 / 16 for this simulation (last regret = 2528.8). - Policy 'UCB($\alpha=1$)' was ranked 15 / 16 for this simulation (last regret = 3916.8). - Policy 'Thompson(Gauss)' was ranked 16 / 16 for this simulation (last regret = 4207.9). Giving the mean and std running times ... For policy #0 called 'EmpiricalMeans' ... 676 ms ± 30.4 ms per loop (mean ± std. dev. of 16 runs) For policy #1 called 'UCB($\alpha=1$)' ... 971 ms ± 93.6 ms per loop (mean ± std. dev. of 16 runs) For policy #2 called 'Thompson' ... 1.71 s ± 210 ms per loop (mean ± std. dev. of 16 runs) For policy #7 called 'OSSB($\varepsilon=0$, $\gamma=0$, Gauss)' ... 2.45 s ± 291 ms per loop (mean ± std. dev. of 16 runs) For policy #6 called 'SparseUCB($s=5$, $\alpha=1$)' ... 2.92 s ± 175 ms per loop (mean ± std. dev. of 16 runs) For policy #10 called 'OSSB($\varepsilon=0$, $\gamma=0.1$, Gauss, $s=5$)' ... 3.32 s ± 569 ms per loop (mean ± std. dev. of 16 runs) For policy #8 called 'OSSB($\varepsilon=0$, $\gamma=0$, Gauss, $s=5$)' ... 3.41 s ± 770 ms per loop (mean ± std. dev. of 16 runs) For policy #9 called 'OSSB($\varepsilon=0.01$, $\gamma=0$, Gauss, $s=5$)' ... 3.43 s ± 809 ms per loop (mean ± std. dev. of 16 runs) For policy #4 called 'Sparse($s=5$)[BayesUCB, UCB for K and J]' ... 3.7 s ± 598 ms per loop (mean ± std. dev. of 16 runs) For policy #11 called 'OSSB($\varepsilon=0.01$, $\gamma=0.1$, Gauss, $s=5$)' ... 4.01 s ± 673 ms per loop (mean ± std. dev. of 16 runs) For policy #5 called 'Sparse-kl-UCB($s=5$, Bern)' ... 12.6 s ± 1.25 s per loop (mean ± std. dev. of 16 runs) For policy #3 called 'Thompson(Gauss)' ... 39.2 s ± 7.11 s per loop (mean ± std. dev. of 16 runs) For policy #13 called 'Exp4[all non Aggr]' ... 1min 11s ± 3.38 s per loop (mean ± std. dev. of 16 runs) For policy #12 called 'Aggregator[all non Aggr]' ... 1min 17s ± 10.6 s per loop (mean ± std. dev. of 16 runs) For policy #14 called 'Aggregator[Sparse-KLUCB for s=1..20]' ... 2min 17s ± 17.8 s per loop (mean ± std. dev. of 16 runs) For policy #15 called 'Exp4[Sparse-KLUCB for s=1..20]' ... 3min 26s ± 58.8 s per loop (mean ± std. dev. of 16 runs) Giving the mean and std memory consumption ... For policy #4 called 'Sparse($s=5$)[BayesUCB, UCB for K and J]' ... 1.3 KiB ± 0 B (mean ± std. dev. of 16 runs) For policy #9 called 'OSSB($\varepsilon=0.01$, $\gamma=0$, Gauss, $s=5$)' ... 1.3 KiB ± 1.3 KiB (mean ± std. dev. of 16 runs) For policy #8 called 'OSSB($\varepsilon=0$, $\gamma=0$, Gauss, $s=5$)' ... 1.3 KiB ± 1.3 KiB (mean ± std. dev. of 16 runs) For policy #11 called 'OSSB($\varepsilon=0.01$, $\gamma=0.1$, Gauss, $s=5$)' ... 1.3 KiB ± 1.3 KiB (mean ± std. dev. of 16 runs) For policy #10 called 'OSSB($\varepsilon=0$, $\gamma=0.1$, Gauss, $s=5$)' ... 1.4 KiB ± 1.4 KiB (mean ± std. dev. of 16 runs) For policy #7 called 'OSSB($\varepsilon=0$, $\gamma=0$, Gauss)' ... 1.5 KiB ± 1.3 KiB (mean ± std. dev. of 16 runs) For policy #2 called 'Thompson' ... 1.8 KiB ± 558.9 B (mean ± std. dev. of 16 runs) For policy #6 called 'SparseUCB($s=5$, $\alpha=1$)' ... 2.1 KiB ± 423.3 B (mean ± std. dev. of 16 runs) For policy #5 called 'Sparse-kl-UCB($s=5$, Bern)' ... 2.2 KiB ± 304.8 B (mean ± std. dev. of 16 runs) For policy #0 called 'EmpiricalMeans' ... 2.3 KiB ± 212.4 B (mean ± std. dev. of 16 runs) For policy #1 called 'UCB($\alpha=1$)' ... 2.3 KiB ± 253.9 B (mean ± std. dev. of 16 runs) For policy #3 called 'Thompson(Gauss)' ... 2.4 KiB ± 541.6 B (mean ± std. dev. of 16 runs) For policy #14 called 'Aggregator[Sparse-KLUCB for s=1..20]' ... 2.5 KiB ± 277.1 B (mean ± std. dev. of 16 runs) For policy #15 called 'Exp4[Sparse-KLUCB for s=1..20]' ... 2.5 KiB ± 104.9 B (mean ± std. dev. of 16 runs) For policy #12 called 'Aggregator[all non Aggr]' ... 3.3 KiB ± 261.5 B (mean ± std. dev. of 16 runs) For policy #13 called 'Exp4[all non Aggr]' ... 3.4 KiB ± 161.1 B (mean ± std. dev. of 16 runs) Top 20 lines ranked by memory consumption: #1: Policies/BasePolicy.py:31: 39 KiB self.pulls = np.zeros(nbArms, dtype=int) #: Number of pulls of each arms #2: Policies/BasePolicy.py:32: 31.4 KiB self.rewards = np.zeros(nbArms) #: Cumulated rewards of each arms #3: Policies/BasePolicy.py:25: 18.1 KiB self.nbArms = nbArms #: Number of arms #4: Policies/IndexPolicy.py:30: 15.1 KiB self.index = np.zeros(nbArms) #: Numerical index for each arms #5: core/numeric.py:298: 12.8 KiB a = empty(shape, dtype, order) #6: lib/function_base.py:2715: 12.1 KiB self.otypes = otypes #7: Posterior/Beta.py:72: 11.4 KiB self._a = a #8: Policies/BayesianIndexPolicy.py:20: 10.7 KiB self.posterior[arm] = posterior(*args, **kwargs) #9: Policies/Aggregator.py:109: 10.4 KiB self.children.append(child['archtype'](nbArms, **localparams)) #10: lib/function_base.py:2720: 9.4 KiB self.excluded = set(excluded) #11: joblib/pool.py:259: 9.2 KiB return (loads, (dumps(a, protocol=HIGHEST_PROTOCOL),)) #12: Posterior/Gauss.py:81: 8.6 KiB self._mu = float(mu) #13: Posterior/Beta.py:75: 6.9 KiB self.N = [a, b] #: List of two parameters [a, b] #14: python3.6/threading.py:347: 6.1 KiB waiters_to_notify = _deque(_islice(all_waiters, n)) #15: python3.6/threading.py:884: 5.7 KiB self._bootstrap_inner() #16: Environment/Evaluator.py:167: 5.7 KiB self.policies.append(policy['archtype'](env.nbArms, **policy['params'])) #17: Policies/SparseWrapper.py:122: 5.4 KiB # now for the underlying policy #18: lib/function_base.py:2703: 4.7 KiB self.__doc__ = pyfunc.__doc__ #19: joblib/pool.py:371: 4.4 KiB CustomizablePickler(buffer, self._reducers).dump(obj) #20: Environment/Evaluator.py:237: 4 KiB for repeatId in tqdm(range(self.repetitions), desc="Repeat||") 359 others: 182.5 KiB Total allocated size: 413.8 KiB Done for simulations main.py ... Top 20 lines ranked by memory consumption: #1: python3.6/linecache.py:137: 1011.7 KiB lines = fp.readlines() #2: Policies/BasePolicy.py:31: 39 KiB self.pulls = np.zeros(nbArms, dtype=int) #: Number of pulls of each arms #3: Policies/BasePolicy.py:32: 31.4 KiB self.rewards = np.zeros(nbArms) #: Cumulated rewards of each arms #4: python3.6/tracemalloc.py:65: 22 KiB return (self.size, self.count, self.traceback) #5: Policies/BasePolicy.py:25: 18.1 KiB self.nbArms = nbArms #: Number of arms #6: Policies/IndexPolicy.py:30: 15.1 KiB self.index = np.zeros(nbArms) #: Numerical index for each arms #7: core/numeric.py:298: 12.8 KiB a = empty(shape, dtype, order) #8: lib/function_base.py:2715: 12.1 KiB self.otypes = otypes #9: Posterior/Beta.py:72: 11.4 KiB self._a = a #10: Policies/BayesianIndexPolicy.py:20: 10.7 KiB self.posterior[arm] = posterior(*args, **kwargs) #11: Policies/Aggregator.py:109: 10.4 KiB self.children.append(child['archtype'](nbArms, **localparams)) #12: lib/function_base.py:2720: 9.4 KiB self.excluded = set(excluded) #13: joblib/pool.py:259: 9 KiB return (loads, (dumps(a, protocol=HIGHEST_PROTOCOL),)) #14: Posterior/Gauss.py:81: 8.6 KiB self._mu = float(mu) #15: Posterior/Beta.py:75: 6.9 KiB self.N = [a, b] #: List of two parameters [a, b] #16: python3.6/threading.py:347: 6.1 KiB waiters_to_notify = _deque(_islice(all_waiters, n)) #17: python3.6/threading.py:884: 5.7 KiB self._bootstrap_inner() #18: Environment/Evaluator.py:167: 5.7 KiB self.policies.append(policy['archtype'](env.nbArms, **policy['params'])) #19: Policies/SparseWrapper.py:122: 5.4 KiB # now for the underlying policy #20: lib/function_base.py:2703: 4.7 KiB self.__doc__ = pyfunc.__doc__ 418 others: 217.4 KiB Total allocated size: 1.4 MiB