Loaded experiments configuration from 'configuration.py' : configuration['policies'] = [{'archtype': , 'params': {'alpha': 1}}, {'archtype': , 'params': {}}, {'archtype': , 'params': {'horizon': 1000}}, {'archtype': , 'params': {'alpha': 1.35}}, {'archtype': , 'params': {}}, {'archtype': , 'params': {'posterior': }}, {'archtype': , 'params': {'klucb': }}, {'archtype': , 'params': {'horizon': 1000, 'klucb': }}, {'archtype': , 'params': {'horizon': 1000, 'klucb': , 'threshold': 'best'}}, {'archtype': , 'params': {'horizon': 1000, 'klucb': , 'threshold': 'delayed'}}, {'archtype': , 'params': {'klucb': , 'threshold': 'best'}}, {'archtype': , 'params': {'klucb': , 'threshold': 'delayed'}}, {'archtype': , 'params': {'posterior': }}, {'archtype': , 'params': {'alpha': 0.5, 'horizon': 1000}}, {'archtype': , 'params': {'alpha': 0.5, 'horizon': 1100}}, {'archtype': , 'params': {}}] ====> TURNING NOPLOTS MODE ON <===== ====> TURNING DEBUG MODE ON <===== plots/ is already a directory here... Number of policies in this comparison: 16 Time horizon: 1000 Number of repetitions: 16 Sampling rate for plotting, delta_t_plot: 1 Number of jobs for parallelization: 1 Creating a new MAB problem ... Reading arms of this MAB problem from a dictionnary 'configuration' = {'arm_type': , 'params': [0.1, 0.2, 0.30000000000000004, 0.4, 0.5, 0.6, 0.7000000000000001, 0.8, 0.9]} ... - with 'arm_type' = - with 'params' = [0.1, 0.2, 0.30000000000000004, 0.4, 0.5, 0.6, 0.7000000000000001, 0.8, 0.9] - with 'arms' = [B(0.1), B(0.2), B(0.3), B(0.4), B(0.5), B(0.6), B(0.7), B(0.8), B(0.9)] - with 'means' = [0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9] - with 'nbArms' = 9 - with 'maxArm' = 0.9 - with 'minArm' = 0.1 This MAB problem has: - a [Lai & Robbins] complexity constant C(mu) = 7.52 ... - a Optimal Arm Identification factor H_OI(mu) = 48.89% ... - with 'arms' represented as: $[B(0.1), B(0.2), B(0.3), B(0.4), B(0.5), B(0.6), B(0.7), B(0.8), B(0.9)^*]$ Number of environments to try: 1 Evaluating environment: MAB(nbArms: 9, arms: [B(0.1), B(0.2), B(0.3), B(0.4), B(0.5), B(0.6), B(0.7), B(0.8), B(0.9)], minArm: 0.1, maxArm: 0.9) - Adding policy #1 = {'archtype': , 'params': {'alpha': 1}} ... Creating this policy from a dictionnary 'self.cfg['policies'][0]' = {'archtype': , 'params': {'alpha': 1}} ... - Adding policy #2 = {'archtype': , 'params': {}} ... Creating this policy from a dictionnary 'self.cfg['policies'][1]' = {'archtype': , 'params': {}} ... - Adding policy #3 = {'archtype': , 'params': {'horizon': 1000}} ... Creating this policy from a dictionnary 'self.cfg['policies'][2]' = {'archtype': , 'params': {'horizon': 1000}} ... - Adding policy #4 = {'archtype': , 'params': {'alpha': 1.35}} ... Creating this policy from a dictionnary 'self.cfg['policies'][3]' = {'archtype': , 'params': {'alpha': 1.35}} ... - Adding policy #5 = {'archtype': , 'params': {}} ... Creating this policy from a dictionnary 'self.cfg['policies'][4]' = {'archtype': , 'params': {}} ... - Adding policy #6 = {'archtype': , 'params': {'posterior': }} ... Creating this policy from a dictionnary 'self.cfg['policies'][5]' = {'archtype': , 'params': {'posterior': }} ... - Adding policy #7 = {'archtype': , 'params': {'klucb': }} ... Creating this policy from a dictionnary 'self.cfg['policies'][6]' = {'archtype': , 'params': {'klucb': }} ... - Adding policy #8 = {'archtype': , 'params': {'horizon': 1000, 'klucb': }} ... Creating this policy from a dictionnary 'self.cfg['policies'][7]' = {'archtype': , 'params': {'horizon': 1000, 'klucb': }} ... - Adding policy #9 = {'archtype': , 'params': {'horizon': 1000, 'klucb': , 'threshold': 'best'}} ... Creating this policy from a dictionnary 'self.cfg['policies'][8]' = {'archtype': , 'params': {'horizon': 1000, 'klucb': , 'threshold': 'best'}} ... - Adding policy #10 = {'archtype': , 'params': {'horizon': 1000, 'klucb': , 'threshold': 'delayed'}} ... Creating this policy from a dictionnary 'self.cfg['policies'][9]' = {'archtype': , 'params': {'horizon': 1000, 'klucb': , 'threshold': 'delayed'}} ... - Adding policy #11 = {'archtype': , 'params': {'klucb': , 'threshold': 'best'}} ... Creating this policy from a dictionnary 'self.cfg['policies'][10]' = {'archtype': , 'params': {'klucb': , 'threshold': 'best'}} ... - Adding policy #12 = {'archtype': , 'params': {'klucb': , 'threshold': 'delayed'}} ... Creating this policy from a dictionnary 'self.cfg['policies'][11]' = {'archtype': , 'params': {'klucb': , 'threshold': 'delayed'}} ... - Adding policy #13 = {'archtype': , 'params': {'posterior': }} ... Creating this policy from a dictionnary 'self.cfg['policies'][12]' = {'archtype': , 'params': {'posterior': }} ... - Adding policy #14 = {'archtype': , 'params': {'alpha': 0.5, 'horizon': 1000}} ... Creating this policy from a dictionnary 'self.cfg['policies'][13]' = {'archtype': , 'params': {'alpha': 0.5, 'horizon': 1000}} ... - Adding policy #15 = {'archtype': , 'params': {'alpha': 0.5, 'horizon': 1100}} ... Creating this policy from a dictionnary 'self.cfg['policies'][14]' = {'archtype': , 'params': {'alpha': 0.5, 'horizon': 1100}} ... - Adding policy #16 = {'archtype': , 'params': {}} ... Creating this policy from a dictionnary 'self.cfg['policies'][15]' = {'archtype': , 'params': {}} ... - Evaluating policy #1/16: UCB($\alpha=1$) ... Estimated order by the policy UCB($\alpha=1$) after 1000 steps: [2 0 1 4 3 6 5 7 8] ... ==> Optimal arm identification: 100.00% (relative success)... ==> Manhattan distance from optimal ordering: 80.25% (relative success)... ==> Gestalt distance from optimal ordering: 66.67% (relative success)... ==> Mean distance from optimal ordering: 73.46% (relative success)... - Evaluating policy #2/16: MOSS ... Estimated order by the policy MOSS after 1000 steps: [1 2 3 0 5 4 6 7 8] ... ==> Optimal arm identification: 100.00% (relative success)... ==> Manhattan distance from optimal ordering: 80.25% (relative success)... ==> Gestalt distance from optimal ordering: 77.78% (relative success)... ==> Mean distance from optimal ordering: 79.01% (relative success)... - Evaluating policy #3/16: MOSS-H($T=1000$) ... Estimated order by the policy MOSS-H($T=1000$) after 1000 steps: [1 4 5 2 6 0 3 7 8] ... ==> Optimal arm identification: 100.00% (relative success)... ==> Manhattan distance from optimal ordering: 55.56% (relative success)... ==> Gestalt distance from optimal ordering: 66.67% (relative success)... ==> Mean distance from optimal ordering: 61.11% (relative success)... - Evaluating policy #4/16: MOSS-Anytime($\alpha=1.35$) ... Estimated order by the policy MOSS-Anytime($\alpha=1.35$) after 1000 steps: [0 1 2 3 5 6 4 7 8] ... ==> Optimal arm identification: 100.00% (relative success)... ==> Manhattan distance from optimal ordering: 90.12% (relative success)... ==> Gestalt distance from optimal ordering: 88.89% (relative success)... ==> Mean distance from optimal ordering: 89.51% (relative success)... - Evaluating policy #5/16: DMED$^+$(Bern) ... Estimated order by the policy DMED$^+$(Bern) after 1000 steps: [5 4 8 3 6 0 1 2 7] ... ==> Optimal arm identification: 88.89% (relative success)... ==> Manhattan distance from optimal ordering: 20.99% (relative success)... ==> Gestalt distance from optimal ordering: 44.44% (relative success)... ==> Mean distance from optimal ordering: 32.72% (relative success)... - Evaluating policy #6/16: Thompson ... Estimated order by the policy Thompson after 1000 steps: [3 6 5 1 2 0 4 7 8] ... ==> Optimal arm identification: 100.00% (relative success)... ==> Manhattan distance from optimal ordering: 45.68% (relative success)... ==> Gestalt distance from optimal ordering: 55.56% (relative success)... ==> Mean distance from optimal ordering: 50.62% (relative success)... - Evaluating policy #7/16: kl-UCB ... Estimated order by the policy kl-UCB after 1000 steps: [0 1 2 3 4 6 5 7 8] ... ==> Optimal arm identification: 100.00% (relative success)... ==> Manhattan distance from optimal ordering: 95.06% (relative success)... ==> Gestalt distance from optimal ordering: 88.89% (relative success)... ==> Mean distance from optimal ordering: 91.98% (relative success)... - Evaluating policy #8/16: kl-UCB$^{++}$($T=1000$) ... Estimated order by the policy kl-UCB$^{++}$($T=1000$) after 1000 steps: [4 0 5 3 1 2 6 7 8] ... ==> Optimal arm identification: 100.00% (relative success)... ==> Manhattan distance from optimal ordering: 65.43% (relative success)... ==> Gestalt distance from optimal ordering: 66.67% (relative success)... ==> Mean distance from optimal ordering: 66.05% (relative success)... - Evaluating policy #9/16: kl-UCB-switch($T=1000$) ... Estimated order by the policy kl-UCB-switch($T=1000$) after 1000 steps: [3 0 1 2 5 4 6 7 8] ... ==> Optimal arm identification: 100.00% (relative success)... ==> Manhattan distance from optimal ordering: 80.25% (relative success)... ==> Gestalt distance from optimal ordering: 77.78% (relative success)... ==> Mean distance from optimal ordering: 79.01% (relative success)... - Evaluating policy #10/16: kl-UCB-switch($T=1000$, delayed f) ... Estimated order by the policy kl-UCB-switch($T=1000$, delayed f) after 1000 steps: [4 5 0 2 6 1 3 7 8] ... ==> Optimal arm identification: 100.00% (relative success)... ==> Manhattan distance from optimal ordering: 50.62% (relative success)... ==> Gestalt distance from optimal ordering: 55.56% (relative success)... ==> Mean distance from optimal ordering: 53.09% (relative success)... - Evaluating policy #11/16: kl-UCB-switch ... Estimated order by the policy kl-UCB-switch after 1000 steps: [2 3 1 0 6 7 4 8 5] ... ==> Optimal arm identification: 66.67% (relative success)... ==> Manhattan distance from optimal ordering: 55.56% (relative success)... ==> Gestalt distance from optimal ordering: 55.56% (relative success)... ==> Mean distance from optimal ordering: 55.56% (relative success)... - Evaluating policy #12/16: kl-UCB-switch(delayed f) ... Estimated order by the policy kl-UCB-switch(delayed f) after 1000 steps: [1 5 0 2 3 4 6 7 8] ... ==> Optimal arm identification: 100.00% (relative success)... ==> Manhattan distance from optimal ordering: 75.31% (relative success)... ==> Gestalt distance from optimal ordering: 77.78% (relative success)... ==> Mean distance from optimal ordering: 76.54% (relative success)... - Evaluating policy #13/16: BayesUCB ... Estimated order by the policy BayesUCB after 1000 steps: [4 0 1 3 2 5 7 6 8] ... ==> Optimal arm identification: 100.00% (relative success)... ==> Manhattan distance from optimal ordering: 75.31% (relative success)... ==> Gestalt distance from optimal ordering: 66.67% (relative success)... ==> Mean distance from optimal ordering: 70.99% (relative success)... - Evaluating policy #14/16: AdBandits($T=1000$, $\alpha=0.5$) ... Estimated order by the policy AdBandits($T=1000$, $\alpha=0.5$) after 1000 steps: [3 1 2 4 8 6 0 7 5] ... ==> Optimal arm identification: 66.67% (relative success)... ==> Manhattan distance from optimal ordering: 55.56% (relative success)... ==> Gestalt distance from optimal ordering: 44.44% (relative success)... ==> Mean distance from optimal ordering: 50.00% (relative success)... - Evaluating policy #15/16: ApprFHG($T=1100$) ... Estimated order by the policy ApprFHG($T=1100$) after 1000 steps: [3 0 4 1 2 5 6 7 8] ... ==> Optimal arm identification: 100.00% (relative success)... ==> Manhattan distance from optimal ordering: 75.31% (relative success)... ==> Gestalt distance from optimal ordering: 77.78% (relative success)... ==> Mean distance from optimal ordering: 76.54% (relative success)... - Evaluating policy #16/16: $\mathrm{UCB}_{d=d_{lb}}$($c=0$) ... Estimated order by the policy $\mathrm{UCB}_{d=d_{lb}}$($c=0$) after 1000 steps: [0 2 5 1 4 7 3 6 8] ... ==> Optimal arm identification: 100.00% (relative success)... ==> Manhattan distance from optimal ordering: 70.37% (relative success)... ==> Gestalt distance from optimal ordering: 55.56% (relative success)... ==> Mean distance from optimal ordering: 62.96% (relative success)... Giving the vector of final regrets ... For policy #0 called 'UCB($\alpha=1$)' ... Last regrets (for all repetitions) have: Min of last regrets R_T = 41.5 Mean of last regrets R_T = 57.1 Median of last regrets R_T = 59.9 Max of last regrets R_T = 69.8 STD of last regrets R_T = 8.98 For policy #1 called 'MOSS' ... Last regrets (for all repetitions) have: Min of last regrets R_T = 37.1 Mean of last regrets R_T = 45.8 Median of last regrets R_T = 44.5 Max of last regrets R_T = 64.7 STD of last regrets R_T = 7.16 For policy #2 called 'MOSS-H($T=1000$)' ... Last regrets (for all repetitions) have: Min of last regrets R_T = 36.7 Mean of last regrets R_T = 44.2 Median of last regrets R_T = 44.1 Max of last regrets R_T = 53.6 STD of last regrets R_T = 5.7 For policy #3 called 'MOSS-Anytime($\alpha=1.35$)' ... Last regrets (for all repetitions) have: Min of last regrets R_T = 39.1 Mean of last regrets R_T = 48.1 Median of last regrets R_T = 47.3 Max of last regrets R_T = 56.6 STD of last regrets R_T = 5.01 For policy #4 called 'DMED$^+$(Bern)' ... Last regrets (for all repetitions) have: Min of last regrets R_T = 26 Mean of last regrets R_T = 35.9 Median of last regrets R_T = 33.5 Max of last regrets R_T = 61 STD of last regrets R_T = 9.54 For policy #5 called 'Thompson' ... Last regrets (for all repetitions) have: Min of last regrets R_T = 12.7 Mean of last regrets R_T = 26.9 Median of last regrets R_T = 26.3 Max of last regrets R_T = 50.7 STD of last regrets R_T = 9.21 For policy #6 called 'kl-UCB' ... Last regrets (for all repetitions) have: Min of last regrets R_T = 27.1 Mean of last regrets R_T = 34.4 Median of last regrets R_T = 32.3 Max of last regrets R_T = 48.1 STD of last regrets R_T = 5.8 For policy #7 called 'kl-UCB$^{++}$($T=1000$)' ... Last regrets (for all repetitions) have: Min of last regrets R_T = 26.8 Mean of last regrets R_T = 34.6 Median of last regrets R_T = 34 Max of last regrets R_T = 43.8 STD of last regrets R_T = 4.91 For policy #8 called 'kl-UCB-switch($T=1000$)' ... Last regrets (for all repetitions) have: Min of last regrets R_T = 20.2 Mean of last regrets R_T = 32.9 Median of last regrets R_T = 32.7 Max of last regrets R_T = 47.2 STD of last regrets R_T = 6.39 For policy #9 called 'kl-UCB-switch($T=1000$, delayed f)' ... Last regrets (for all repetitions) have: Min of last regrets R_T = 20.4 Mean of last regrets R_T = 33.5 Median of last regrets R_T = 34.4 Max of last regrets R_T = 51.6 STD of last regrets R_T = 8.18 For policy #10 called 'kl-UCB-switch' ... Last regrets (for all repetitions) have: Min of last regrets R_T = 19.3 Mean of last regrets R_T = 34.7 Median of last regrets R_T = 35.5 Max of last regrets R_T = 48.9 STD of last regrets R_T = 7.69 For policy #11 called 'kl-UCB-switch(delayed f)' ... Last regrets (for all repetitions) have: Min of last regrets R_T = 22.8 Mean of last regrets R_T = 33.1 Median of last regrets R_T = 30.5 Max of last regrets R_T = 48 STD of last regrets R_T = 7.16 For policy #12 called 'BayesUCB' ... Last regrets (for all repetitions) have: Min of last regrets R_T = 14.7 Mean of last regrets R_T = 25.2 Median of last regrets R_T = 21.9 Max of last regrets R_T = 48.2 STD of last regrets R_T = 9.55 For policy #13 called 'AdBandits($T=1000$, $\alpha=0.5$)' ... Last regrets (for all repetitions) have: Min of last regrets R_T = 10.6 Mean of last regrets R_T = 19.1 Median of last regrets R_T = 17.4 Max of last regrets R_T = 49.7 STD of last regrets R_T = 9.11 For policy #14 called 'ApprFHG($T=1100$)' ... Last regrets (for all repetitions) have: Min of last regrets R_T = 36.1 Mean of last regrets R_T = 51.7 Median of last regrets R_T = 50.6 Max of last regrets R_T = 68.3 STD of last regrets R_T = 7.01 For policy #15 called '$\mathrm{UCB}_{d=d_{lb}}$($c=0$)' ... Last regrets (for all repetitions) have: Min of last regrets R_T = 16.4 Mean of last regrets R_T = 22.5 Median of last regrets R_T = 23 Max of last regrets R_T = 27.4 STD of last regrets R_T = 3.21 Giving the final ranks ... Final ranking for this environment #0 : - Policy 'AdBandits($T=1000$, $\alpha=0.5$)' was ranked 1 / 16 for this simulation (last regret = 19.063). - Policy '$\mathrm{UCB}_{d=d_{lb}}$($c=0$)' was ranked 2 / 16 for this simulation (last regret = 22.513). - Policy 'BayesUCB' was ranked 3 / 16 for this simulation (last regret = 25.038). - Policy 'Thompson' was ranked 4 / 16 for this simulation (last regret = 26.894). - Policy 'kl-UCB-switch($T=1000$)' was ranked 5 / 16 for this simulation (last regret = 32.856). - Policy 'kl-UCB-switch(delayed f)' was ranked 6 / 16 for this simulation (last regret = 33.025). - Policy 'kl-UCB-switch($T=1000$, delayed f)' was ranked 7 / 16 for this simulation (last regret = 33.506). - Policy 'kl-UCB' was ranked 8 / 16 for this simulation (last regret = 34.388). - Policy 'kl-UCB-switch' was ranked 9 / 16 for this simulation (last regret = 34.569). - Policy 'kl-UCB$^{++}$($T=1000$)' was ranked 10 / 16 for this simulation (last regret = 34.65). - Policy 'DMED$^+$(Bern)' was ranked 11 / 16 for this simulation (last regret = 35.875). - Policy 'MOSS-H($T=1000$)' was ranked 12 / 16 for this simulation (last regret = 44.238). - Policy 'MOSS' was ranked 13 / 16 for this simulation (last regret = 45.7). - Policy 'MOSS-Anytime($\alpha=1.35$)' was ranked 14 / 16 for this simulation (last regret = 47.869). - Policy 'ApprFHG($T=1100$)' was ranked 15 / 16 for this simulation (last regret = 51.613). - Policy 'UCB($\alpha=1$)' was ranked 16 / 16 for this simulation (last regret = 57.056). Giving the mean and std running times ... For policy #1 called 'MOSS' ... 74.9 ms ± 11.9 ms per loop (mean ± std. dev. of 16 runs) For policy #14 called 'ApprFHG($T=1100$)' ... 76.6 ms ± 4.49 ms per loop (mean ± std. dev. of 16 runs) For policy #3 called 'MOSS-Anytime($\alpha=1.35$)' ... 77.1 ms ± 7.34 ms per loop (mean ± std. dev. of 16 runs) For policy #2 called 'MOSS-H($T=1000$)' ... 77.9 ms ± 5.61 ms per loop (mean ± std. dev. of 16 runs) For policy #0 called 'UCB($\alpha=1$)' ... 85.3 ms ± 13.1 ms per loop (mean ± std. dev. of 16 runs) For policy #5 called 'Thompson' ... 88.4 ms ± 10.5 ms per loop (mean ± std. dev. of 16 runs) For policy #4 called 'DMED$^+$(Bern)' ... 114 ms ± 17.4 ms per loop (mean ± std. dev. of 16 runs) For policy #12 called 'BayesUCB' ... 125 ms ± 6.89 ms per loop (mean ± std. dev. of 16 runs) For policy #13 called 'AdBandits($T=1000$, $\alpha=0.5$)' ... 131 ms ± 6.86 ms per loop (mean ± std. dev. of 16 runs) For policy #15 called '$\mathrm{UCB}_{d=d_{lb}}$($c=0$)' ... 142 ms ± 15.5 ms per loop (mean ± std. dev. of 16 runs) For policy #11 called 'kl-UCB-switch(delayed f)' ... 144 ms ± 5.23 ms per loop (mean ± std. dev. of 16 runs) For policy #8 called 'kl-UCB-switch($T=1000$)' ... 161 ms ± 13.3 ms per loop (mean ± std. dev. of 16 runs) For policy #10 called 'kl-UCB-switch' ... 169 ms ± 30.4 ms per loop (mean ± std. dev. of 16 runs) For policy #6 called 'kl-UCB' ... 171 ms ± 13.6 ms per loop (mean ± std. dev. of 16 runs) For policy #7 called 'kl-UCB$^{++}$($T=1000$)' ... 174 ms ± 14.4 ms per loop (mean ± std. dev. of 16 runs) For policy #9 called 'kl-UCB-switch($T=1000$, delayed f)' ... 209 ms ± 35.9 ms per loop (mean ± std. dev. of 16 runs) Done for simulations main.py ... Wrote profile results to main.py.lprof Timer unit: 1e-06 s Total time: 28.982 s File: ./SMPyBandits/Environment/Evaluator.py Function: delayed_play at line 728 Line # Hits Time Per Hit % Time Line Contents ============================================================== 728 @profile 729 def delayed_play(env, policy, horizon, 730 random_shuffle=random_shuffle, random_invert=random_invert, nb_random_events=nb_random_events, 731 seed=None, allrewards=None, repeatId=0, 732 useJoblib=False): 733 """Helper function for the parallelization.""" 734 256 1204.0 4.7 0.0 start_time = time.time() 735 256 3584.0 14.0 0.0 start_memory = getCurrentMemory(thread=useJoblib) 736 # Give a unique seed to random & numpy.random for each call of this function 737 256 424.0 1.7 0.0 try: 738 256 478.0 1.9 0.0 if seed is not None: 739 random.seed(seed) 740 np.random.seed(seed) 741 except (ValueError, SystemError): 742 print("Warning: setting random.seed and np.random.seed seems to not be available. Are you using Windows?") # XXX 743 # We have to deepcopy because this function is Parallel-ized 744 256 417.0 1.6 0.0 if random_shuffle or random_invert: 745 env = deepcopy(env) # XXX this uses a LOT of RAM memory!!! 746 256 503.0 2.0 0.0 means = env.means 747 256 520.0 2.0 0.0 if env.isDynamic: 748 means = env.newRandomArms() 749 256 97505.0 380.9 0.3 policy = deepcopy(policy) # XXX this uses a LOT of RAM memory!!! 750 751 256 29591.0 115.6 0.1 indexes_bestarm = np.nonzero(np.isclose(env.means, env.maxArm))[0] 752 753 # Start game 754 256 3716.0 14.5 0.0 policy.startGame() 755 256 58523.0 228.6 0.2 result = Result(env.nbArms, horizon, indexes_bestarm=indexes_bestarm, means=means) # One Result object, for every policy 756 757 # XXX Experimental support for random events: shuffling or inverting the list of arms, at these time steps 758 256 4265.0 16.7 0.0 t_events = [i * int(horizon / float(nb_random_events)) for i in range(nb_random_events)] 759 256 436.0 1.7 0.0 if nb_random_events is None or nb_random_events <= 0: 760 random_shuffle = False 761 random_invert = False 762 763 256 5776.0 22.6 0.0 prettyRange = tqdm(range(horizon), desc="Time t") if repeatId == 0 else range(horizon) 764 256256 426426.0 1.7 1.5 for t in prettyRange: 765 256000 22452432.0 87.7 77.5 choice = policy.choice() 766 767 # XXX do this quicker!? 768 256000 441122.0 1.7 1.5 if allrewards is None: 769 256000 2259099.0 8.8 7.8 reward = env.draw(choice, t) 770 else: 771 reward = allrewards[choice, repeatId, t] 772 773 256000 1555841.0 6.1 5.4 policy.getReward(choice, reward) 774 775 # Finally we store the results 776 256000 934676.0 3.7 3.2 result.store(t, choice, reward) 777 778 # XXX Experimental : shuffle the arms at the middle of the simulation 779 256000 346394.0 1.4 1.2 if random_shuffle and t in t_events: 780 indexes_bestarm = env.new_order_of_arm(shuffled(env.arms)) 781 result.change_in_arms(t, indexes_bestarm) 782 if repeatId == 0: 783 print("\nShuffling the arms at time t = {} ...".format(t)) # DEBUG 784 # XXX Experimental : invert the order of the arms at the middle of the simulation 785 256000 335128.0 1.3 1.2 if random_invert and t in t_events: 786 indexes_bestarm = env.new_order_of_arm(env.arms[::-1]) 787 result.change_in_arms(t, indexes_bestarm) 788 if repeatId == 0: 789 print("\nInverting the order of the arms at time t = {} ...".format(t)) # DEBUG 790 791 # Print the quality of estimation of arm ranking for this policy, just for 1st repetition 792 256 414.0 1.6 0.0 if repeatId == 0 and hasattr(policy, 'estimatedOrder'): 793 16 2012.0 125.8 0.0 order = policy.estimatedOrder() 794 16 4648.0 290.5 0.0 print("\nEstimated order by the policy {} after {} steps: {} ...".format(policy, horizon, order)) 795 16 1487.0 92.9 0.0 print(" ==> Optimal arm identification: {:.2%} (relative success)...".format(weightedDistance(order, env.means, n=1))) 796 16 1107.0 69.2 0.0 print(" ==> Manhattan distance from optimal ordering: {:.2%} (relative success)...".format(manhattan(order))) 797 # print(" ==> Kendell Tau distance from optimal ordering: {:.2%} (relative success)...".format(kendalltau(order))) 798 # print(" ==> Spearman distance from optimal ordering: {:.2%} (relative success)...".format(spearmanr(order))) 799 16 3995.0 249.7 0.0 print(" ==> Gestalt distance from optimal ordering: {:.2%} (relative success)...".format(gestalt(order))) 800 16 5051.0 315.7 0.0 print(" ==> Mean distance from optimal ordering: {:.2%} (relative success)...".format(meanDistance(order))) 801 802 # Finally, store running time and consumed memory 803 256 1112.0 4.3 0.0 result.running_time = time.time() - start_time 804 256 3729.0 14.6 0.0 result.memory_consumption = getCurrentMemory(thread=useJoblib) - start_memory 805 256 346.0 1.4 0.0 return result