Loaded experiments configuration from 'configuration.py' : configuration['policies'] = [{'archtype': , 'params': {'alpha': 1}}, {'archtype': , 'params': {}}, {'archtype': , 'params': {'horizon': 1000}}, {'archtype': , 'params': {'alpha': 1.35}}, {'archtype': , 'params': {}}, {'archtype': , 'params': {'posterior': }}, {'archtype': , 'params': {'klucb': }}, {'archtype': , 'params': {'horizon': 1000, 'klucb': }}, {'archtype': , 'params': {'horizon': 1000, 'klucb': , 'threshold': 'best'}}, {'archtype': , 'params': {'horizon': 1000, 'klucb': , 'threshold': 'delayed'}}, {'archtype': , 'params': {'klucb': , 'threshold': 'best'}}, {'archtype': , 'params': {'klucb': , 'threshold': 'delayed'}}, {'archtype': , 'params': {'posterior': }}, {'archtype': , 'params': {'alpha': 0.5, 'horizon': 1000}}, {'archtype': , 'params': {'alpha': 0.5, 'horizon': 1100}}, {'archtype': , 'params': {}}] ====> TURNING NOPLOTS MODE ON <===== ====> TURNING DEBUG MODE ON <===== plots/ is already a directory here... Number of policies in this comparison: 16 Time horizon: 1000 Number of repetitions: 16 Sampling rate for plotting, delta_t_plot: 1 Number of jobs for parallelization: 1 Creating a new MAB problem ... Reading arms of this MAB problem from a dictionnary 'configuration' = {'arm_type': , 'params': [0.1, 0.2, 0.30000000000000004, 0.4, 0.5, 0.6, 0.7000000000000001, 0.8, 0.9]} ... - with 'arm_type' = - with 'params' = [0.1, 0.2, 0.30000000000000004, 0.4, 0.5, 0.6, 0.7000000000000001, 0.8, 0.9] - with 'arms' = [B(0.1), B(0.2), B(0.3), B(0.4), B(0.5), B(0.6), B(0.7), B(0.8), B(0.9)] - with 'means' = [0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9] - with 'nbArms' = 9 - with 'maxArm' = 0.9 - with 'minArm' = 0.1 This MAB problem has: - a [Lai & Robbins] complexity constant C(mu) = 7.52 ... - a Optimal Arm Identification factor H_OI(mu) = 48.89% ... - with 'arms' represented as: $[B(0.1), B(0.2), B(0.3), B(0.4), B(0.5), B(0.6), B(0.7), B(0.8), B(0.9)^*]$ Number of environments to try: 1 Evaluating environment: MAB(nbArms: 9, arms: [B(0.1), B(0.2), B(0.3), B(0.4), B(0.5), B(0.6), B(0.7), B(0.8), B(0.9)], minArm: 0.1, maxArm: 0.9) - Adding policy #1 = {'archtype': , 'params': {'alpha': 1}} ... Creating this policy from a dictionnary 'self.cfg['policies'][0]' = {'archtype': , 'params': {'alpha': 1}} ... - Adding policy #2 = {'archtype': , 'params': {}} ... Creating this policy from a dictionnary 'self.cfg['policies'][1]' = {'archtype': , 'params': {}} ... - Adding policy #3 = {'archtype': , 'params': {'horizon': 1000}} ... Creating this policy from a dictionnary 'self.cfg['policies'][2]' = {'archtype': , 'params': {'horizon': 1000}} ... - Adding policy #4 = {'archtype': , 'params': {'alpha': 1.35}} ... Creating this policy from a dictionnary 'self.cfg['policies'][3]' = {'archtype': , 'params': {'alpha': 1.35}} ... - Adding policy #5 = {'archtype': , 'params': {}} ... Creating this policy from a dictionnary 'self.cfg['policies'][4]' = {'archtype': , 'params': {}} ... - Adding policy #6 = {'archtype': , 'params': {'posterior': }} ... Creating this policy from a dictionnary 'self.cfg['policies'][5]' = {'archtype': , 'params': {'posterior': }} ... - Adding policy #7 = {'archtype': , 'params': {'klucb': }} ... Creating this policy from a dictionnary 'self.cfg['policies'][6]' = {'archtype': , 'params': {'klucb': }} ... - Adding policy #8 = {'archtype': , 'params': {'horizon': 1000, 'klucb': }} ... Creating this policy from a dictionnary 'self.cfg['policies'][7]' = {'archtype': , 'params': {'horizon': 1000, 'klucb': }} ... - Adding policy #9 = {'archtype': , 'params': {'horizon': 1000, 'klucb': , 'threshold': 'best'}} ... Creating this policy from a dictionnary 'self.cfg['policies'][8]' = {'archtype': , 'params': {'horizon': 1000, 'klucb': , 'threshold': 'best'}} ... - Adding policy #10 = {'archtype': , 'params': {'horizon': 1000, 'klucb': , 'threshold': 'delayed'}} ... Creating this policy from a dictionnary 'self.cfg['policies'][9]' = {'archtype': , 'params': {'horizon': 1000, 'klucb': , 'threshold': 'delayed'}} ... - Adding policy #11 = {'archtype': , 'params': {'klucb': , 'threshold': 'best'}} ... Creating this policy from a dictionnary 'self.cfg['policies'][10]' = {'archtype': , 'params': {'klucb': , 'threshold': 'best'}} ... - Adding policy #12 = {'archtype': , 'params': {'klucb': , 'threshold': 'delayed'}} ... Creating this policy from a dictionnary 'self.cfg['policies'][11]' = {'archtype': , 'params': {'klucb': , 'threshold': 'delayed'}} ... - Adding policy #13 = {'archtype': , 'params': {'posterior': }} ... Creating this policy from a dictionnary 'self.cfg['policies'][12]' = {'archtype': , 'params': {'posterior': }} ... - Adding policy #14 = {'archtype': , 'params': {'alpha': 0.5, 'horizon': 1000}} ... Creating this policy from a dictionnary 'self.cfg['policies'][13]' = {'archtype': , 'params': {'alpha': 0.5, 'horizon': 1000}} ... - Adding policy #15 = {'archtype': , 'params': {'alpha': 0.5, 'horizon': 1100}} ... Creating this policy from a dictionnary 'self.cfg['policies'][14]' = {'archtype': , 'params': {'alpha': 0.5, 'horizon': 1100}} ... - Adding policy #16 = {'archtype': , 'params': {}} ... Creating this policy from a dictionnary 'self.cfg['policies'][15]' = {'archtype': , 'params': {}} ... - Evaluating policy #1/16: UCB($\alpha=1$) ... Estimated order by the policy UCB($\alpha=1$) after 1000 steps: [2 0 1 3 5 4 7 6 8] ... ==> Optimal arm identification: 100.00% (relative success)... ==> Manhattan distance from optimal ordering: 80.25% (relative success)... ==> Gestalt distance from optimal ordering: 66.67% (relative success)... ==> Mean distance from optimal ordering: 73.46% (relative success)... - Evaluating policy #2/16: MOSS ... Estimated order by the policy MOSS after 1000 steps: [0 2 4 1 5 7 3 6 8] ... ==> Optimal arm identification: 100.00% (relative success)... ==> Manhattan distance from optimal ordering: 70.37% (relative success)... ==> Gestalt distance from optimal ordering: 66.67% (relative success)... ==> Mean distance from optimal ordering: 68.52% (relative success)... - Evaluating policy #3/16: MOSS-H($T=1000$) ... Estimated order by the policy MOSS-H($T=1000$) after 1000 steps: [1 5 6 7 0 2 3 4 8] ... ==> Optimal arm identification: 100.00% (relative success)... ==> Manhattan distance from optimal ordering: 35.80% (relative success)... ==> Gestalt distance from optimal ordering: 55.56% (relative success)... ==> Mean distance from optimal ordering: 45.68% (relative success)... - Evaluating policy #4/16: MOSS-Anytime($\alpha=1.35$) ... Estimated order by the policy MOSS-Anytime($\alpha=1.35$) after 1000 steps: [0 3 1 2 5 6 4 7 8] ... ==> Optimal arm identification: 100.00% (relative success)... ==> Manhattan distance from optimal ordering: 80.25% (relative success)... ==> Gestalt distance from optimal ordering: 77.78% (relative success)... ==> Mean distance from optimal ordering: 79.01% (relative success)... - Evaluating policy #5/16: DMED$^+$(Bern) ... Estimated order by the policy DMED$^+$(Bern) after 1000 steps: [8 0 1 6 4 7 3 5 2] ... ==> Optimal arm identification: 33.33% (relative success)... ==> Manhattan distance from optimal ordering: 35.80% (relative success)... ==> Gestalt distance from optimal ordering: 44.44% (relative success)... ==> Mean distance from optimal ordering: 40.12% (relative success)... - Evaluating policy #6/16: Thompson ... Estimated order by the policy Thompson after 1000 steps: [4 3 2 1 0 5 7 6 8] ... ==> Optimal arm identification: 100.00% (relative success)... ==> Manhattan distance from optimal ordering: 65.43% (relative success)... ==> Gestalt distance from optimal ordering: 44.44% (relative success)... ==> Mean distance from optimal ordering: 54.94% (relative success)... - Evaluating policy #7/16: kl-UCB ... Estimated order by the policy kl-UCB after 1000 steps: [3 0 1 4 5 2 6 7 8] ... ==> Optimal arm identification: 100.00% (relative success)... ==> Manhattan distance from optimal ordering: 75.31% (relative success)... ==> Gestalt distance from optimal ordering: 77.78% (relative success)... ==> Mean distance from optimal ordering: 76.54% (relative success)... - Evaluating policy #8/16: kl-UCB$^{++}$($T=1000$) ... Estimated order by the policy kl-UCB$^{++}$($T=1000$) after 1000 steps: [3 4 5 6 0 1 2 7 8] ... ==> Optimal arm identification: 100.00% (relative success)... ==> Manhattan distance from optimal ordering: 40.74% (relative success)... ==> Gestalt distance from optimal ordering: 66.67% (relative success)... ==> Mean distance from optimal ordering: 53.70% (relative success)... - Evaluating policy #9/16: kl-UCB-switch($T=1000$) ... Estimated order by the policy kl-UCB-switch($T=1000$) after 1000 steps: [2 4 3 0 1 7 6 5 8] ... ==> Optimal arm identification: 100.00% (relative success)... ==> Manhattan distance from optimal ordering: 60.49% (relative success)... ==> Gestalt distance from optimal ordering: 44.44% (relative success)... ==> Mean distance from optimal ordering: 52.47% (relative success)... - Evaluating policy #10/16: kl-UCB-switch($T=1000$, delayed f) ... Estimated order by the policy kl-UCB-switch($T=1000$, delayed f) after 1000 steps: [0 1 2 3 4 5 7 6 8] ... ==> Optimal arm identification: 100.00% (relative success)... ==> Manhattan distance from optimal ordering: 95.06% (relative success)... ==> Gestalt distance from optimal ordering: 88.89% (relative success)... ==> Mean distance from optimal ordering: 91.98% (relative success)... - Evaluating policy #11/16: kl-UCB-switch ... Estimated order by the policy kl-UCB-switch after 1000 steps: [1 4 0 2 3 5 7 6 8] ... ==> Optimal arm identification: 100.00% (relative success)... ==> Manhattan distance from optimal ordering: 75.31% (relative success)... ==> Gestalt distance from optimal ordering: 66.67% (relative success)... ==> Mean distance from optimal ordering: 70.99% (relative success)... - Evaluating policy #12/16: kl-UCB-switch(delayed f) ... Estimated order by the policy kl-UCB-switch(delayed f) after 1000 steps: [0 1 6 2 3 5 7 4 8] ... ==> Optimal arm identification: 100.00% (relative success)... ==> Manhattan distance from optimal ordering: 75.31% (relative success)... ==> Gestalt distance from optimal ordering: 77.78% (relative success)... ==> Mean distance from optimal ordering: 76.54% (relative success)... - Evaluating policy #13/16: BayesUCB ... Estimated order by the policy BayesUCB after 1000 steps: [4 0 1 2 3 6 5 7 8] ... ==> Optimal arm identification: 100.00% (relative success)... ==> Manhattan distance from optimal ordering: 75.31% (relative success)... ==> Gestalt distance from optimal ordering: 77.78% (relative success)... ==> Mean distance from optimal ordering: 76.54% (relative success)... - Evaluating policy #14/16: AdBandits($T=1000$, $\alpha=0.5$) ... Estimated order by the policy AdBandits($T=1000$, $\alpha=0.5$) after 1000 steps: [7 0 5 8 4 1 2 3 6] ... ==> Optimal arm identification: 77.78% (relative success)... ==> Manhattan distance from optimal ordering: 25.93% (relative success)... ==> Gestalt distance from optimal ordering: 55.56% (relative success)... ==> Mean distance from optimal ordering: 40.74% (relative success)... - Evaluating policy #15/16: ApprFHG($T=1100$) ... Estimated order by the policy ApprFHG($T=1100$) after 1000 steps: [0 1 3 2 6 4 5 7 8] ... ==> Optimal arm identification: 100.00% (relative success)... ==> Manhattan distance from optimal ordering: 85.19% (relative success)... ==> Gestalt distance from optimal ordering: 77.78% (relative success)... ==> Mean distance from optimal ordering: 81.48% (relative success)... - Evaluating policy #16/16: $\mathrm{UCB}_{d=d_{lb}}$($c=0$) ... Estimated order by the policy $\mathrm{UCB}_{d=d_{lb}}$($c=0$) after 1000 steps: [1 3 2 0 4 7 5 6 8] ... ==> Optimal arm identification: 100.00% (relative success)... ==> Manhattan distance from optimal ordering: 75.31% (relative success)... ==> Gestalt distance from optimal ordering: 66.67% (relative success)... ==> Mean distance from optimal ordering: 70.99% (relative success)... Giving the vector of final regrets ... For policy #0 called 'UCB($\alpha=1$)' ... Last regrets (for all repetitions) have: Min of last regrets R_T = 47.2 Mean of last regrets R_T = 56.9 Median of last regrets R_T = 53.1 Max of last regrets R_T = 68.8 STD of last regrets R_T = 8.18 For policy #1 called 'MOSS' ... Last regrets (for all repetitions) have: Min of last regrets R_T = 30.5 Mean of last regrets R_T = 44.1 Median of last regrets R_T = 42.7 Max of last regrets R_T = 55.5 STD of last regrets R_T = 6.68 For policy #2 called 'MOSS-H($T=1000$)' ... Last regrets (for all repetitions) have: Min of last regrets R_T = 32.9 Mean of last regrets R_T = 49 Median of last regrets R_T = 46.3 Max of last regrets R_T = 76.4 STD of last regrets R_T = 10.9 For policy #3 called 'MOSS-Anytime($\alpha=1.35$)' ... Last regrets (for all repetitions) have: Min of last regrets R_T = 33.5 Mean of last regrets R_T = 47.2 Median of last regrets R_T = 46.5 Max of last regrets R_T = 72.8 STD of last regrets R_T = 9.75 For policy #4 called 'DMED$^+$(Bern)' ... Last regrets (for all repetitions) have: Min of last regrets R_T = 23.1 Mean of last regrets R_T = 35.7 Median of last regrets R_T = 34.8 Max of last regrets R_T = 51.1 STD of last regrets R_T = 7.69 For policy #5 called 'Thompson' ... Last regrets (for all repetitions) have: Min of last regrets R_T = 23.2 Mean of last regrets R_T = 30.6 Median of last regrets R_T = 28.6 Max of last regrets R_T = 43.3 STD of last regrets R_T = 6.18 For policy #6 called 'kl-UCB' ... Last regrets (for all repetitions) have: Min of last regrets R_T = 20.9 Mean of last regrets R_T = 37.4 Median of last regrets R_T = 37.8 Max of last regrets R_T = 63.9 STD of last regrets R_T = 9.45 For policy #7 called 'kl-UCB$^{++}$($T=1000$)' ... Last regrets (for all repetitions) have: Min of last regrets R_T = 24.3 Mean of last regrets R_T = 34.9 Median of last regrets R_T = 30.2 Max of last regrets R_T = 61.6 STD of last regrets R_T = 10 For policy #8 called 'kl-UCB-switch($T=1000$)' ... Last regrets (for all repetitions) have: Min of last regrets R_T = 17.9 Mean of last regrets R_T = 37.2 Median of last regrets R_T = 36.1 Max of last regrets R_T = 64.5 STD of last regrets R_T = 9.95 For policy #9 called 'kl-UCB-switch($T=1000$, delayed f)' ... Last regrets (for all repetitions) have: Min of last regrets R_T = 23.5 Mean of last regrets R_T = 34.8 Median of last regrets R_T = 33.3 Max of last regrets R_T = 49.3 STD of last regrets R_T = 6.94 For policy #10 called 'kl-UCB-switch' ... Last regrets (for all repetitions) have: Min of last regrets R_T = 25.3 Mean of last regrets R_T = 32.4 Median of last regrets R_T = 30.5 Max of last regrets R_T = 43 STD of last regrets R_T = 5.29 For policy #11 called 'kl-UCB-switch(delayed f)' ... Last regrets (for all repetitions) have: Min of last regrets R_T = 22.2 Mean of last regrets R_T = 34.8 Median of last regrets R_T = 33.2 Max of last regrets R_T = 48.6 STD of last regrets R_T = 8.47 For policy #12 called 'BayesUCB' ... Last regrets (for all repetitions) have: Min of last regrets R_T = 10.1 Mean of last regrets R_T = 20.9 Median of last regrets R_T = 20.1 Max of last regrets R_T = 35.5 STD of last regrets R_T = 5.93 For policy #13 called 'AdBandits($T=1000$, $\alpha=0.5$)' ... Last regrets (for all repetitions) have: Min of last regrets R_T = 12.2 Mean of last regrets R_T = 20.5 Median of last regrets R_T = 19.6 Max of last regrets R_T = 33.6 STD of last regrets R_T = 5.06 For policy #14 called 'ApprFHG($T=1100$)' ... Last regrets (for all repetitions) have: Min of last regrets R_T = 41 Mean of last regrets R_T = 51.9 Median of last regrets R_T = 48 Max of last regrets R_T = 73 STD of last regrets R_T = 10.1 For policy #15 called '$\mathrm{UCB}_{d=d_{lb}}$($c=0$)' ... Last regrets (for all repetitions) have: Min of last regrets R_T = 17.8 Mean of last regrets R_T = 24.8 Median of last regrets R_T = 24.6 Max of last regrets R_T = 30.5 STD of last regrets R_T = 3.61 Giving the final ranks ... Final ranking for this environment #0 : - Policy 'AdBandits($T=1000$, $\alpha=0.5$)' was ranked 1 / 16 for this simulation (last regret = 20.463). - Policy 'BayesUCB' was ranked 2 / 16 for this simulation (last regret = 20.75). - Policy '$\mathrm{UCB}_{d=d_{lb}}$($c=0$)' was ranked 3 / 16 for this simulation (last regret = 24.831). - Policy 'Thompson' was ranked 4 / 16 for this simulation (last regret = 30.525). - Policy 'kl-UCB-switch' was ranked 5 / 16 for this simulation (last regret = 32.244). - Policy 'kl-UCB-switch($T=1000$, delayed f)' was ranked 6 / 16 for this simulation (last regret = 34.663). - Policy 'kl-UCB-switch(delayed f)' was ranked 7 / 16 for this simulation (last regret = 34.819). - Policy 'kl-UCB$^{++}$($T=1000$)' was ranked 8 / 16 for this simulation (last regret = 34.881). - Policy 'DMED$^+$(Bern)' was ranked 9 / 16 for this simulation (last regret = 35.675). - Policy 'kl-UCB-switch($T=1000$)' was ranked 10 / 16 for this simulation (last regret = 37.138). - Policy 'kl-UCB' was ranked 11 / 16 for this simulation (last regret = 37.338). - Policy 'MOSS' was ranked 12 / 16 for this simulation (last regret = 43.756). - Policy 'MOSS-Anytime($\alpha=1.35$)' was ranked 13 / 16 for this simulation (last regret = 47.119). - Policy 'MOSS-H($T=1000$)' was ranked 14 / 16 for this simulation (last regret = 49). - Policy 'ApprFHG($T=1100$)' was ranked 15 / 16 for this simulation (last regret = 51.888). - Policy 'UCB($\alpha=1$)' was ranked 16 / 16 for this simulation (last regret = 56.638). Giving the mean and std running times ... For policy #2 called 'MOSS-H($T=1000$)' ... 628 ms ± 13.4 ms per loop (mean ± std. dev. of 16 runs) For policy #3 called 'MOSS-Anytime($\alpha=1.35$)' ... 679 ms ± 24.6 ms per loop (mean ± std. dev. of 16 runs) For policy #5 called 'Thompson' ... 716 ms ± 54 ms per loop (mean ± std. dev. of 16 runs) For policy #14 called 'ApprFHG($T=1100$)' ... 722 ms ± 85.4 ms per loop (mean ± std. dev. of 16 runs) For policy #1 called 'MOSS' ... 811 ms ± 127 ms per loop (mean ± std. dev. of 16 runs) For policy #10 called 'kl-UCB-switch' ... 814 ms ± 27.3 ms per loop (mean ± std. dev. of 16 runs) For policy #7 called 'kl-UCB$^{++}$($T=1000$)' ... 819 ms ± 10.4 ms per loop (mean ± std. dev. of 16 runs) For policy #4 called 'DMED$^+$(Bern)' ... 819 ms ± 87.5 ms per loop (mean ± std. dev. of 16 runs) For policy #12 called 'BayesUCB' ... 821 ms ± 77.1 ms per loop (mean ± std. dev. of 16 runs) For policy #9 called 'kl-UCB-switch($T=1000$, delayed f)' ... 828 ms ± 33.3 ms per loop (mean ± std. dev. of 16 runs) For policy #0 called 'UCB($\alpha=1$)' ... 832 ms ± 196 ms per loop (mean ± std. dev. of 16 runs) For policy #8 called 'kl-UCB-switch($T=1000$)' ... 839 ms ± 50.6 ms per loop (mean ± std. dev. of 16 runs) For policy #11 called 'kl-UCB-switch(delayed f)' ... 863 ms ± 65.3 ms per loop (mean ± std. dev. of 16 runs) For policy #15 called '$\mathrm{UCB}_{d=d_{lb}}$($c=0$)' ... 871 ms ± 75.2 ms per loop (mean ± std. dev. of 16 runs) For policy #13 called 'AdBandits($T=1000$, $\alpha=0.5$)' ... 872 ms ± 73.4 ms per loop (mean ± std. dev. of 16 runs) For policy #6 called 'kl-UCB' ... 904 ms ± 86.6 ms per loop (mean ± std. dev. of 16 runs) Done for simulations main.py ... Filename: /home/lilian/ownCloud/owncloud.crans.org/Crans/These_2016-17/src/SMPyBandits/SMPyBandits/Environment/Evaluator.py Line # Mem usage Increment Line Contents ================================================ 728 180.559 MiB 180.148 MiB @profile 729 def delayed_play(env, policy, horizon, 730 random_shuffle=random_shuffle, random_invert=random_invert, nb_random_events=nb_random_events, 731 seed=None, allrewards=None, repeatId=0, 732 useJoblib=False): 733 """Helper function for the parallelization.""" 734 180.559 MiB 0.000 MiB start_time = time.time() 735 180.559 MiB 0.000 MiB start_memory = getCurrentMemory(thread=useJoblib) 736 # Give a unique seed to random & numpy.random for each call of this function 737 180.559 MiB 0.000 MiB try: 738 180.559 MiB 0.000 MiB if seed is not None: 739 random.seed(seed) 740 np.random.seed(seed) 741 except (ValueError, SystemError): 742 print("Warning: setting random.seed and np.random.seed seems to not be available. Are you using Windows?") # XXX 743 # We have to deepcopy because this function is Parallel-ized 744 180.559 MiB 0.000 MiB if random_shuffle or random_invert: 745 env = deepcopy(env) # XXX this uses a LOT of RAM memory!!! 746 180.559 MiB 0.000 MiB means = env.means 747 180.559 MiB 0.000 MiB if env.isDynamic: 748 means = env.newRandomArms() 749 180.559 MiB 0.258 MiB policy = deepcopy(policy) # XXX this uses a LOT of RAM memory!!! 750 751 180.559 MiB 0.000 MiB indexes_bestarm = np.nonzero(np.isclose(env.means, env.maxArm))[0] 752 753 # Start game 754 180.559 MiB 0.000 MiB policy.startGame() 755 180.559 MiB 0.000 MiB result = Result(env.nbArms, horizon, indexes_bestarm=indexes_bestarm, means=means) # One Result object, for every policy 756 757 # XXX Experimental support for random events: shuffling or inverting the list of arms, at these time steps 758 180.559 MiB 0.000 MiB t_events = [i * int(horizon / float(nb_random_events)) for i in range(nb_random_events)] 759 180.559 MiB 0.000 MiB if nb_random_events is None or nb_random_events <= 0: 760 random_shuffle = False 761 random_invert = False 762 763 180.559 MiB 0.000 MiB prettyRange = tqdm(range(horizon), desc="Time t") if repeatId == 0 else range(horizon) 764 180.559 MiB 0.000 MiB for t in prettyRange: 765 180.559 MiB 0.008 MiB choice = policy.choice() 766 767 # XXX do this quicker!? 768 180.559 MiB 0.000 MiB if allrewards is None: 769 180.559 MiB 0.000 MiB reward = env.draw(choice, t) 770 else: 771 reward = allrewards[choice, repeatId, t] 772 773 180.559 MiB 0.000 MiB policy.getReward(choice, reward) 774 775 # Finally we store the results 776 180.559 MiB 0.145 MiB result.store(t, choice, reward) 777 778 # XXX Experimental : shuffle the arms at the middle of the simulation 779 180.559 MiB 0.000 MiB if random_shuffle and t in t_events: 780 indexes_bestarm = env.new_order_of_arm(shuffled(env.arms)) 781 result.change_in_arms(t, indexes_bestarm) 782 if repeatId == 0: 783 print("\nShuffling the arms at time t = {} ...".format(t)) # DEBUG 784 # XXX Experimental : invert the order of the arms at the middle of the simulation 785 180.559 MiB 0.000 MiB if random_invert and t in t_events: 786 indexes_bestarm = env.new_order_of_arm(env.arms[::-1]) 787 result.change_in_arms(t, indexes_bestarm) 788 if repeatId == 0: 789 print("\nInverting the order of the arms at time t = {} ...".format(t)) # DEBUG 790 791 # Print the quality of estimation of arm ranking for this policy, just for 1st repetition 792 180.559 MiB 0.000 MiB if repeatId == 0 and hasattr(policy, 'estimatedOrder'): 793 180.426 MiB 0.000 MiB order = policy.estimatedOrder() 794 180.426 MiB 0.000 MiB print("\nEstimated order by the policy {} after {} steps: {} ...".format(policy, horizon, order)) 795 180.426 MiB 0.000 MiB print(" ==> Optimal arm identification: {:.2%} (relative success)...".format(weightedDistance(order, env.means, n=1))) 796 180.426 MiB 0.000 MiB print(" ==> Manhattan distance from optimal ordering: {:.2%} (relative success)...".format(manhattan(order))) 797 # print(" ==> Kendell Tau distance from optimal ordering: {:.2%} (relative success)...".format(kendalltau(order))) 798 # print(" ==> Spearman distance from optimal ordering: {:.2%} (relative success)...".format(spearmanr(order))) 799 180.426 MiB 0.000 MiB print(" ==> Gestalt distance from optimal ordering: {:.2%} (relative success)...".format(gestalt(order))) 800 180.426 MiB 0.000 MiB print(" ==> Mean distance from optimal ordering: {:.2%} (relative success)...".format(meanDistance(order))) 801 802 # Finally, store running time and consumed memory 803 180.559 MiB 0.000 MiB result.running_time = time.time() - start_time 804 180.559 MiB 0.000 MiB result.memory_consumption = getCurrentMemory(thread=useJoblib) - start_memory 805 180.559 MiB 0.000 MiB return result