Loaded experiments configuration from 'configuration.py' :
configuration['policies'] = [{'archtype': , 'params': {'alpha': 1}}, {'archtype': , 'params': {}}, {'archtype': , 'params': {'horizon': 10000}}, {'archtype': , 'params': {'alpha': 1.35}}, {'archtype': , 'params': {}}, {'archtype': , 'params': {'posterior': }}, {'archtype': , 'params': {'klucb': }}, {'archtype': , 'params': {'horizon': 10000, 'klucb': }}, {'archtype': , 'params': {'horizon': 10000, 'klucb': , 'threshold': 'best'}}, {'archtype': , 'params': {'horizon': 10000, 'klucb': , 'threshold': 'delayed'}}, {'archtype': , 'params': {'klucb': , 'threshold': 'best'}}, {'archtype': , 'params': {'klucb': , 'threshold': 'delayed'}}, {'archtype': , 'params': {'posterior': }}, {'archtype': , 'params': {'alpha': 0.5, 'horizon': 10000}}, {'archtype': , 'params': {'alpha': 0.5, 'horizon': 10500}}, {'archtype': , 'params': {}}]
====> TURNING NOPLOTS MODE ON <=====
====> TURNING DEBUG MODE ON <=====
plots/ is already a directory here...
Number of policies in this comparison: 16
Time horizon: 10000
Number of repetitions: 1
Sampling rate for plotting, delta_t_plot: 1
Number of jobs for parallelization: 1
Creating a new MAB problem ...
Reading arms of this MAB problem from a dictionnary 'configuration' = {'arm_type': , 'params': [0.1, 0.2, 0.30000000000000004, 0.4, 0.5, 0.6, 0.7000000000000001, 0.8, 0.9]} ...
- with 'arm_type' =
- with 'params' = [0.1, 0.2, 0.30000000000000004, 0.4, 0.5, 0.6, 0.7000000000000001, 0.8, 0.9]
- with 'arms' = [B(0.1), B(0.2), B(0.3), B(0.4), B(0.5), B(0.6), B(0.7), B(0.8), B(0.9)]
- with 'means' = [0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9]
- with 'nbArms' = 9
- with 'maxArm' = 0.9
- with 'minArm' = 0.1
This MAB problem has:
- a [Lai & Robbins] complexity constant C(mu) = 7.52 ...
- a Optimal Arm Identification factor H_OI(mu) = 48.89% ...
- with 'arms' represented as: $[B(0.1), B(0.2), B(0.3), B(0.4), B(0.5), B(0.6), B(0.7), B(0.8), B(0.9)^*]$
Number of environments to try: 1
Evaluating environment: MAB(nbArms: 9, arms: [B(0.1), B(0.2), B(0.3), B(0.4), B(0.5), B(0.6), B(0.7), B(0.8), B(0.9)], minArm: 0.1, maxArm: 0.9)
- Adding policy #1 = {'archtype': , 'params': {'alpha': 1}} ...
Creating this policy from a dictionnary 'self.cfg['policies'][0]' = {'archtype': , 'params': {'alpha': 1}} ...
- Adding policy #2 = {'archtype': , 'params': {}} ...
Creating this policy from a dictionnary 'self.cfg['policies'][1]' = {'archtype': , 'params': {}} ...
- Adding policy #3 = {'archtype': , 'params': {'horizon': 10000}} ...
Creating this policy from a dictionnary 'self.cfg['policies'][2]' = {'archtype': , 'params': {'horizon': 10000}} ...
- Adding policy #4 = {'archtype': , 'params': {'alpha': 1.35}} ...
Creating this policy from a dictionnary 'self.cfg['policies'][3]' = {'archtype': , 'params': {'alpha': 1.35}} ...
- Adding policy #5 = {'archtype': , 'params': {}} ...
Creating this policy from a dictionnary 'self.cfg['policies'][4]' = {'archtype': , 'params': {}} ...
- Adding policy #6 = {'archtype': , 'params': {'posterior': }} ...
Creating this policy from a dictionnary 'self.cfg['policies'][5]' = {'archtype': , 'params': {'posterior': }} ...
- Adding policy #7 = {'archtype': , 'params': {'klucb': }} ...
Creating this policy from a dictionnary 'self.cfg['policies'][6]' = {'archtype': , 'params': {'klucb': }} ...
- Adding policy #8 = {'archtype': , 'params': {'horizon': 10000, 'klucb': }} ...
Creating this policy from a dictionnary 'self.cfg['policies'][7]' = {'archtype': , 'params': {'horizon': 10000, 'klucb': }} ...
- Adding policy #9 = {'archtype': , 'params': {'horizon': 10000, 'klucb': , 'threshold': 'best'}} ...
Creating this policy from a dictionnary 'self.cfg['policies'][8]' = {'archtype': , 'params': {'horizon': 10000, 'klucb': , 'threshold': 'best'}} ...
- Adding policy #10 = {'archtype': , 'params': {'horizon': 10000, 'klucb': , 'threshold': 'delayed'}} ...
Creating this policy from a dictionnary 'self.cfg['policies'][9]' = {'archtype': , 'params': {'horizon': 10000, 'klucb': , 'threshold': 'delayed'}} ...
- Adding policy #11 = {'archtype': , 'params': {'klucb': , 'threshold': 'best'}} ...
Creating this policy from a dictionnary 'self.cfg['policies'][10]' = {'archtype': , 'params': {'klucb': , 'threshold': 'best'}} ...
- Adding policy #12 = {'archtype': , 'params': {'klucb': , 'threshold': 'delayed'}} ...
Creating this policy from a dictionnary 'self.cfg['policies'][11]' = {'archtype': , 'params': {'klucb': , 'threshold': 'delayed'}} ...
- Adding policy #13 = {'archtype': , 'params': {'posterior': }} ...
Creating this policy from a dictionnary 'self.cfg['policies'][12]' = {'archtype': , 'params': {'posterior': }} ...
- Adding policy #14 = {'archtype': , 'params': {'alpha': 0.5, 'horizon': 10000}} ...
Creating this policy from a dictionnary 'self.cfg['policies'][13]' = {'archtype': , 'params': {'alpha': 0.5, 'horizon': 10000}} ...
- Adding policy #15 = {'archtype': , 'params': {'alpha': 0.5, 'horizon': 10500}} ...
Creating this policy from a dictionnary 'self.cfg['policies'][14]' = {'archtype': , 'params': {'alpha': 0.5, 'horizon': 10500}} ...
- Adding policy #16 = {'archtype': , 'params': {}} ...
Creating this policy from a dictionnary 'self.cfg['policies'][15]' = {'archtype': , 'params': {}} ...
- Evaluating policy #1/16: UCB($\alpha=1$) ...
Estimated order by the policy UCB($\alpha=1$) after 10000 steps: [0 1 4 5 6 2 3 7 8] ...
==> Optimal arm identification: 100.00% (relative success)...
==> Manhattan distance from optimal ordering: 70.37% (relative success)...
==> Gestalt distance from optimal ordering: 77.78% (relative success)...
==> Mean distance from optimal ordering: 74.07% (relative success)...
- Evaluating policy #2/16: MOSS ...
Estimated order by the policy MOSS after 10000 steps: [0 1 2 4 6 7 5 3 8] ...
==> Optimal arm identification: 100.00% (relative success)...
==> Manhattan distance from optimal ordering: 75.31% (relative success)...
==> Gestalt distance from optimal ordering: 77.78% (relative success)...
==> Mean distance from optimal ordering: 76.54% (relative success)...
- Evaluating policy #3/16: MOSS-H($T=10000$) ...
Estimated order by the policy MOSS-H($T=10000$) after 10000 steps: [1 0 5 2 6 7 3 4 8] ...
==> Optimal arm identification: 100.00% (relative success)...
==> Manhattan distance from optimal ordering: 60.49% (relative success)...
==> Gestalt distance from optimal ordering: 55.56% (relative success)...
==> Mean distance from optimal ordering: 58.02% (relative success)...
- Evaluating policy #4/16: MOSS-Anytime($\alpha=1.35$) ...
Estimated order by the policy MOSS-Anytime($\alpha=1.35$) after 10000 steps: [1 0 3 2 5 6 7 4 8] ...
==> Optimal arm identification: 100.00% (relative success)...
==> Manhattan distance from optimal ordering: 75.31% (relative success)...
==> Gestalt distance from optimal ordering: 66.67% (relative success)...
==> Mean distance from optimal ordering: 70.99% (relative success)...
- Evaluating policy #5/16: DMED$^+$(Bern) ...
Estimated order by the policy DMED$^+$(Bern) after 10000 steps: [7 3 8 1 6 4 5 0 2] ...
==> Optimal arm identification: 33.33% (relative success)...
==> Manhattan distance from optimal ordering: 16.05% (relative success)...
==> Gestalt distance from optimal ordering: 33.33% (relative success)...
==> Mean distance from optimal ordering: 24.69% (relative success)...
- Evaluating policy #6/16: Thompson ...
Estimated order by the policy Thompson after 10000 steps: [1 3 2 4 5 0 6 7 8] ...
==> Optimal arm identification: 100.00% (relative success)...
==> Manhattan distance from optimal ordering: 75.31% (relative success)...
==> Gestalt distance from optimal ordering: 77.78% (relative success)...
==> Mean distance from optimal ordering: 76.54% (relative success)...
- Evaluating policy #7/16: kl-UCB ...
Estimated order by the policy kl-UCB after 10000 steps: [2 3 6 0 4 1 5 7 8] ...
==> Optimal arm identification: 100.00% (relative success)...
==> Manhattan distance from optimal ordering: 60.49% (relative success)...
==> Gestalt distance from optimal ordering: 55.56% (relative success)...
==> Mean distance from optimal ordering: 58.02% (relative success)...
- Evaluating policy #8/16: kl-UCB$^{++}$($T=10000$) ...
Estimated order by the policy kl-UCB$^{++}$($T=10000$) after 10000 steps: [0 3 6 4 1 5 7 2 8] ...
==> Optimal arm identification: 100.00% (relative success)...
==> Manhattan distance from optimal ordering: 60.49% (relative success)...
==> Gestalt distance from optimal ordering: 55.56% (relative success)...
==> Mean distance from optimal ordering: 58.02% (relative success)...
- Evaluating policy #9/16: kl-UCB-switch($T=10000$) ...
Estimated order by the policy kl-UCB-switch($T=10000$) after 10000 steps: [0 2 3 4 1 6 5 7 8] ...
==> Optimal arm identification: 100.00% (relative success)...
==> Manhattan distance from optimal ordering: 80.25% (relative success)...
==> Gestalt distance from optimal ordering: 77.78% (relative success)...
==> Mean distance from optimal ordering: 79.01% (relative success)...
- Evaluating policy #10/16: kl-UCB-switch($T=10000$, delayed f) ...
Estimated order by the policy kl-UCB-switch($T=10000$, delayed f) after 10000 steps: [3 2 1 0 5 4 7 6 8] ...
==> Optimal arm identification: 100.00% (relative success)...
==> Manhattan distance from optimal ordering: 70.37% (relative success)...
==> Gestalt distance from optimal ordering: 44.44% (relative success)...
==> Mean distance from optimal ordering: 57.41% (relative success)...
- Evaluating policy #11/16: kl-UCB-switch ...
Estimated order by the policy kl-UCB-switch after 10000 steps: [0 1 3 4 5 6 2 7 8] ...
==> Optimal arm identification: 100.00% (relative success)...
==> Manhattan distance from optimal ordering: 80.25% (relative success)...
==> Gestalt distance from optimal ordering: 88.89% (relative success)...
==> Mean distance from optimal ordering: 84.57% (relative success)...
- Evaluating policy #12/16: kl-UCB-switch(delayed f) ...
Estimated order by the policy kl-UCB-switch(delayed f) after 10000 steps: [5 0 1 3 4 2 7 6 8] ...
==> Optimal arm identification: 100.00% (relative success)...
==> Manhattan distance from optimal ordering: 70.37% (relative success)...
==> Gestalt distance from optimal ordering: 66.67% (relative success)...
==> Mean distance from optimal ordering: 68.52% (relative success)...
- Evaluating policy #13/16: BayesUCB ...
Estimated order by the policy BayesUCB after 10000 steps: [2 4 5 0 1 3 6 7 8] ...
==> Optimal arm identification: 100.00% (relative success)...
==> Manhattan distance from optimal ordering: 60.49% (relative success)...
==> Gestalt distance from optimal ordering: 66.67% (relative success)...
==> Mean distance from optimal ordering: 63.58% (relative success)...
- Evaluating policy #14/16: AdBandits($T=10000$, $\alpha=0.5$) ...
Estimated order by the policy AdBandits($T=10000$, $\alpha=0.5$) after 10000 steps: [2 8 0 4 3 5 1 7 6] ...
==> Optimal arm identification: 77.78% (relative success)...
==> Manhattan distance from optimal ordering: 50.62% (relative success)...
==> Gestalt distance from optimal ordering: 22.22% (relative success)...
==> Mean distance from optimal ordering: 36.42% (relative success)...
- Evaluating policy #15/16: ApprFHG($T=10500$) ...
Estimated order by the policy ApprFHG($T=10500$) after 10000 steps: [1 2 0 3 4 5 6 7 8] ...
==> Optimal arm identification: 100.00% (relative success)...
==> Manhattan distance from optimal ordering: 90.12% (relative success)...
==> Gestalt distance from optimal ordering: 88.89% (relative success)...
==> Mean distance from optimal ordering: 89.51% (relative success)...
- Evaluating policy #16/16: $\mathrm{UCB}_{d=d_{lb}}$($c=0$) ...
Estimated order by the policy $\mathrm{UCB}_{d=d_{lb}}$($c=0$) after 10000 steps: [2 0 1 7 5 3 4 6 8] ...
==> Optimal arm identification: 100.00% (relative success)...
==> Manhattan distance from optimal ordering: 65.43% (relative success)...
==> Gestalt distance from optimal ordering: 66.67% (relative success)...
==> Mean distance from optimal ordering: 66.05% (relative success)...
Giving the vector of final regrets ...
For policy #0 called 'UCB($\alpha=1$)' ...
Last regrets (for all repetitions) have:
Min of last regrets R_T = 101
Mean of last regrets R_T = 101
Median of last regrets R_T = 101
Max of last regrets R_T = 101
STD of last regrets R_T = 0
For policy #1 called 'MOSS' ...
Last regrets (for all repetitions) have:
Min of last regrets R_T = 84.9
Mean of last regrets R_T = 84.9
Median of last regrets R_T = 84.9
Max of last regrets R_T = 84.9
STD of last regrets R_T = 0
For policy #2 called 'MOSS-H($T=10000$)' ...
Last regrets (for all repetitions) have:
Min of last regrets R_T = 77
Mean of last regrets R_T = 77
Median of last regrets R_T = 77
Max of last regrets R_T = 77
STD of last regrets R_T = 0
For policy #3 called 'MOSS-Anytime($\alpha=1.35$)' ...
Last regrets (for all repetitions) have:
Min of last regrets R_T = 85.7
Mean of last regrets R_T = 85.7
Median of last regrets R_T = 85.7
Max of last regrets R_T = 85.7
STD of last regrets R_T = 0
For policy #4 called 'DMED$^+$(Bern)' ...
Last regrets (for all repetitions) have:
Min of last regrets R_T = 51.1
Mean of last regrets R_T = 51.1
Median of last regrets R_T = 51.1
Max of last regrets R_T = 51.1
STD of last regrets R_T = 0
For policy #5 called 'Thompson' ...
Last regrets (for all repetitions) have:
Min of last regrets R_T = 51
Mean of last regrets R_T = 51
Median of last regrets R_T = 51
Max of last regrets R_T = 51
STD of last regrets R_T = 0
For policy #6 called 'kl-UCB' ...
Last regrets (for all repetitions) have:
Min of last regrets R_T = 54.4
Mean of last regrets R_T = 54.4
Median of last regrets R_T = 54.4
Max of last regrets R_T = 54.4
STD of last regrets R_T = 0
For policy #7 called 'kl-UCB$^{++}$($T=10000$)' ...
Last regrets (for all repetitions) have:
Min of last regrets R_T = 52.2
Mean of last regrets R_T = 52.2
Median of last regrets R_T = 52.2
Max of last regrets R_T = 52.2
STD of last regrets R_T = 0
For policy #8 called 'kl-UCB-switch($T=10000$)' ...
Last regrets (for all repetitions) have:
Min of last regrets R_T = 52.6
Mean of last regrets R_T = 52.6
Median of last regrets R_T = 52.6
Max of last regrets R_T = 52.6
STD of last regrets R_T = 0
For policy #9 called 'kl-UCB-switch($T=10000$, delayed f)' ...
Last regrets (for all repetitions) have:
Min of last regrets R_T = 59.5
Mean of last regrets R_T = 59.5
Median of last regrets R_T = 59.5
Max of last regrets R_T = 59.5
STD of last regrets R_T = 0
For policy #10 called 'kl-UCB-switch' ...
Last regrets (for all repetitions) have:
Min of last regrets R_T = 43.6
Mean of last regrets R_T = 43.6
Median of last regrets R_T = 43.6
Max of last regrets R_T = 43.6
STD of last regrets R_T = 0
For policy #11 called 'kl-UCB-switch(delayed f)' ...
Last regrets (for all repetitions) have:
Min of last regrets R_T = 52
Mean of last regrets R_T = 52
Median of last regrets R_T = 52
Max of last regrets R_T = 52
STD of last regrets R_T = 0
For policy #12 called 'BayesUCB' ...
Last regrets (for all repetitions) have:
Min of last regrets R_T = 52.5
Mean of last regrets R_T = 52.5
Median of last regrets R_T = 52.5
Max of last regrets R_T = 52.5
STD of last regrets R_T = 0
For policy #13 called 'AdBandits($T=10000$, $\alpha=0.5$)' ...
Last regrets (for all repetitions) have:
Min of last regrets R_T = 33.5
Mean of last regrets R_T = 33.5
Median of last regrets R_T = 33.5
Max of last regrets R_T = 33.5
STD of last regrets R_T = 0
For policy #14 called 'ApprFHG($T=10500$)' ...
Last regrets (for all repetitions) have:
Min of last regrets R_T = 101
Mean of last regrets R_T = 101
Median of last regrets R_T = 101
Max of last regrets R_T = 101
STD of last regrets R_T = 0
For policy #15 called '$\mathrm{UCB}_{d=d_{lb}}$($c=0$)' ...
Last regrets (for all repetitions) have:
Min of last regrets R_T = 26
Mean of last regrets R_T = 26
Median of last regrets R_T = 26
Max of last regrets R_T = 26
STD of last regrets R_T = 0
Giving the final ranks ...
Final ranking for this environment #0 :
- Policy '$\mathrm{UCB}_{d=d_{lb}}$($c=0$)' was ranked 1 / 16 for this simulation (last regret = 26).
- Policy 'AdBandits($T=10000$, $\alpha=0.5$)' was ranked 2 / 16 for this simulation (last regret = 33.5).
- Policy 'kl-UCB-switch' was ranked 3 / 16 for this simulation (last regret = 43.6).
- Policy 'Thompson' was ranked 4 / 16 for this simulation (last regret = 50.9).
- Policy 'DMED$^+$(Bern)' was ranked 5 / 16 for this simulation (last regret = 51.1).
- Policy 'kl-UCB-switch(delayed f)' was ranked 6 / 16 for this simulation (last regret = 52).
- Policy 'kl-UCB$^{++}$($T=10000$)' was ranked 7 / 16 for this simulation (last regret = 52.2).
- Policy 'BayesUCB' was ranked 8 / 16 for this simulation (last regret = 52.5).
- Policy 'kl-UCB-switch($T=10000$)' was ranked 9 / 16 for this simulation (last regret = 52.6).
- Policy 'kl-UCB' was ranked 10 / 16 for this simulation (last regret = 54.4).
- Policy 'kl-UCB-switch($T=10000$, delayed f)' was ranked 11 / 16 for this simulation (last regret = 59.5).
- Policy 'MOSS-H($T=10000$)' was ranked 12 / 16 for this simulation (last regret = 77).
- Policy 'MOSS' was ranked 13 / 16 for this simulation (last regret = 84.9).
- Policy 'MOSS-Anytime($\alpha=1.35$)' was ranked 14 / 16 for this simulation (last regret = 85.7).
- Policy 'UCB($\alpha=1$)' was ranked 15 / 16 for this simulation (last regret = 100.6).
- Policy 'ApprFHG($T=10500$)' was ranked 16 / 16 for this simulation (last regret = 101.1).
Giving the mean and std running times ...
For policy #1 called 'MOSS' ...
1.04 s ± 0 ns per loop (mean ± std. dev. of 1 run)
For policy #0 called 'UCB($\alpha=1$)' ...
1.05 s ± 0 ns per loop (mean ± std. dev. of 1 run)
For policy #3 called 'MOSS-Anytime($\alpha=1.35$)' ...
1.07 s ± 0 ns per loop (mean ± std. dev. of 1 run)
For policy #2 called 'MOSS-H($T=10000$)' ...
1.09 s ± 0 ns per loop (mean ± std. dev. of 1 run)
For policy #5 called 'Thompson' ...
1.17 s ± 0 ns per loop (mean ± std. dev. of 1 run)
For policy #14 called 'ApprFHG($T=10500$)' ...
1.28 s ± 0 ns per loop (mean ± std. dev. of 1 run)
For policy #4 called 'DMED$^+$(Bern)' ...
2.02 s ± 0 ns per loop (mean ± std. dev. of 1 run)
For policy #8 called 'kl-UCB-switch($T=10000$)' ...
2.36 s ± 0 ns per loop (mean ± std. dev. of 1 run)
For policy #9 called 'kl-UCB-switch($T=10000$, delayed f)' ...
2.4 s ± 0 ns per loop (mean ± std. dev. of 1 run)
For policy #10 called 'kl-UCB-switch' ...
2.43 s ± 0 ns per loop (mean ± std. dev. of 1 run)
For policy #6 called 'kl-UCB' ...
2.45 s ± 0 ns per loop (mean ± std. dev. of 1 run)
For policy #13 called 'AdBandits($T=10000$, $\alpha=0.5$)' ...
2.48 s ± 0 ns per loop (mean ± std. dev. of 1 run)
For policy #11 called 'kl-UCB-switch(delayed f)' ...
2.49 s ± 0 ns per loop (mean ± std. dev. of 1 run)
For policy #12 called 'BayesUCB' ...
2.65 s ± 0 ns per loop (mean ± std. dev. of 1 run)
For policy #7 called 'kl-UCB$^{++}$($T=10000$)' ...
2.81 s ± 0 ns per loop (mean ± std. dev. of 1 run)
For policy #15 called '$\mathrm{UCB}_{d=d_{lb}}$($c=0$)' ...
3 s ± 0 ns per loop (mean ± std. dev. of 1 run)
Giving the mean and std memory consumption ...
For policy #10 called 'kl-UCB-switch' ...
16 B (mean of 1 run)
For policy #8 called 'kl-UCB-switch($T=10000$)' ...
24 B (mean of 1 run)
For policy #9 called 'kl-UCB-switch($T=10000$, delayed f)' ...
24 B (mean of 1 run)
For policy #7 called 'kl-UCB$^{++}$($T=10000$)' ...
40 B (mean of 1 run)
For policy #0 called 'UCB($\alpha=1$)' ...
260 B (mean of 1 run)
For policy #1 called 'MOSS' ...
nan YiB (mean of 1 run)
For policy #2 called 'MOSS-H($T=10000$)' ...
nan YiB (mean of 1 run)
For policy #3 called 'MOSS-Anytime($\alpha=1.35$)' ...
nan YiB (mean of 1 run)
For policy #4 called 'DMED$^+$(Bern)' ...
nan YiB (mean of 1 run)
For policy #5 called 'Thompson' ...
nan YiB (mean of 1 run)
For policy #6 called 'kl-UCB' ...
nan YiB (mean of 1 run)
For policy #11 called 'kl-UCB-switch(delayed f)' ...
nan YiB (mean of 1 run)
For policy #12 called 'BayesUCB' ...
nan YiB (mean of 1 run)
For policy #13 called 'AdBandits($T=10000$, $\alpha=0.5$)' ...
nan YiB (mean of 1 run)
For policy #14 called 'ApprFHG($T=10500$)' ...
nan YiB (mean of 1 run)
For policy #15 called '$\mathrm{UCB}_{d=d_{lb}}$($c=0$)' ...
nan YiB (mean of 1 run)
Done for simulations main.py ...
Top 20 lines ranked by memory consumption:
#1: Environment/Evaluator.py:131: 11 MiB
self.allPulls[env] = np.zeros((self.nbPolicies, self.envs[env].nbArms, self.horizon))
#2: python3.6/linecache.py:137: 2.1 MiB
lines = fp.readlines()
#3: python3.6/abc.py:133: 1.4 MiB
cls = super().__new__(mcls, name, bases, namespace, **kwargs)
#4: Environment/Evaluator.py:111: 1.2 MiB
self.rewards = np.zeros((self.nbPolicies, len(self.envs), self.horizon)) #: For each env, history of rewards, ie accumulated rewards
#5: Environment/Evaluator.py:114: 1.2 MiB
self.maxCumRewards = -np.inf + np.zeros((self.nbPolicies, len(self.envs), self.horizon)) #: For each env, history of maximum of rewards, to compute amplitude (+- STD)
#6: Environment/Evaluator.py:113: 1.2 MiB
self.minCumRewards = np.inf + np.zeros((self.nbPolicies, len(self.envs), self.horizon)) #: For each env, history of minimum of rewards, to compute amplitude (+- STD)
#7: Environment/Evaluator.py:129: 1.2 MiB
self.bestArmPulls[env] = np.zeros((self.nbPolicies, self.horizon))
#8: collections/__init__.py:429: 1.1 MiB
exec(class_definition, namespace)
#9: json/decoder.py:355: 579.6 KiB
obj, end = self.scan_once(s, idx)
#10: python3.6/_weakrefset.py:37: 511.8 KiB
self.data = set()
#11: python3.6/functools.py:67: 419.9 KiB
getattr(wrapper, attr).update(getattr(wrapped, attr, {}))
#12: python3.6/_weakrefset.py:38: 403.6 KiB
def _remove(item, selfref=ref(self)):
#13: misc/doccer.py:68: 402.3 KiB
return docstring % indented
#14: matplotlib/font_manager.py:965: 379.5 KiB
r.__dict__.update(o)
#15: python3.6/_weakrefset.py:48: 371 KiB
self._iterating = set()
#16: :5: 350.9 KiB
#17: traitlets/traitlets.py:735: 339.8 KiB
return super(MetaHasDescriptors, mcls).__new__(mcls, name, bases, classdict)
#18: collections/__init__.py:423: 337.3 KiB
for index, name in enumerate(field_names))
#19: typing/templates.py:654: 317.2 KiB
class Template(cls):
#20: stats/_distn_infrastructure.py:694: 232.4 KiB
exec_(parse_arg_template % dct, ns)
61036 others: 35.5 MiB
Total allocated size: 60.5 MiB
Estimated order by the policy Aggregator[all non Aggr] after 10000 steps: [16 14 10 11 18 1 3 0 4 7 8 5 19 15 12 17 6 2 13 9] ...
==> Optimal arm identification: -51.79% (relative success)...
==> Manhattan distance from optimal ordering: 30.00% (relative success)...
==> Gestalt distance from optimal ordering: 20.00% (relative success)...
==> Mean distance from optimal ordering: 25.00% (relative success)...
- Evaluating policy #14/16: Exp4[all non Aggr] ...
Estimated order by the policy Exp4[all non Aggr] after 10000 steps: [ 0 2 1 6 5 3 7 4 9 8 10 12 13 14 11 15 16 17 18 19] ...
==> Optimal arm identification: 100.00% (relative success)...
==> Manhattan distance from optimal ordering: 90.00% (relative success)...
==> Gestalt distance from optimal ordering: 70.00% (relative success)...
==> Mean distance from optimal ordering: 80.00% (relative success)...
- Evaluating policy #15/16: Aggregator[Sparse-KLUCB for s=1..20] ...
Estimated order by the policy Aggregator[Sparse-KLUCB for s=1..20] after 10000 steps: [15 16 17 18 19 1 2 5 6 7 4 9 8 12 11 14 10 13 0 3] ...
==> Optimal arm identification: -83.93% (relative success)...
==> Manhattan distance from optimal ordering: 25.00% (relative success)...
==> Gestalt distance from optimal ordering: 25.00% (relative success)...
==> Mean distance from optimal ordering: 25.00% (relative success)...
- Evaluating policy #16/16: Exp4[Sparse-KLUCB for s=1..20] ...
Estimated order by the policy Exp4[Sparse-KLUCB for s=1..20] after 10000 steps: [15 16 17 18 19 1 8 10 4 5 6 7 12 11 9 14 13 0 3 2] ...
==> Optimal arm identification: -89.29% (relative success)...
==> Manhattan distance from optimal ordering: 20.00% (relative success)...
==> Gestalt distance from optimal ordering: 25.00% (relative success)...
==> Mean distance from optimal ordering: 22.50% (relative success)...
Giving the vector of final regrets ...
For policy #0 called 'EmpiricalMeans' ...
Last regrets (for all repetitions) have:
Min of last regrets R_T = 210
Mean of last regrets R_T = 210
Median of last regrets R_T = 210
Max of last regrets R_T = 213
STD of last regrets R_T = 0.839
For policy #1 called 'UCB($\alpha=1$)' ...
Last regrets (for all repetitions) have:
Min of last regrets R_T = 3.81e+03
Mean of last regrets R_T = 3.93e+03
Median of last regrets R_T = 3.92e+03
Max of last regrets R_T = 4.04e+03
STD of last regrets R_T = 49
For policy #2 called 'Thompson' ...
Last regrets (for all repetitions) have:
Min of last regrets R_T = 1.45e+03
Mean of last regrets R_T = 1.86e+03
Median of last regrets R_T = 1.85e+03
Max of last regrets R_T = 2.62e+03
STD of last regrets R_T = 280
For policy #3 called 'Thompson(Gauss)' ...
Last regrets (for all repetitions) have:
Min of last regrets R_T = 4.09e+03
Mean of last regrets R_T = 4.21e+03
Median of last regrets R_T = 4.21e+03
Max of last regrets R_T = 4.28e+03
STD of last regrets R_T = 48.6
For policy #4 called 'Sparse($s=5$)[BayesUCB, UCB for K and J]' ...
Last regrets (for all repetitions) have:
Min of last regrets R_T = 806
Mean of last regrets R_T = 1.34e+03
Median of last regrets R_T = 1.48e+03
Max of last regrets R_T = 1.83e+03
STD of last regrets R_T = 309
For policy #5 called 'Sparse-kl-UCB($s=5$, Bern)' ...
Last regrets (for all repetitions) have:
Min of last regrets R_T = 1.11e+03
Mean of last regrets R_T = 1.18e+03
Median of last regrets R_T = 1.16e+03
Max of last regrets R_T = 1.33e+03
STD of last regrets R_T = 50.3
For policy #6 called 'SparseUCB($s=5$, $\alpha=1$)' ...
Last regrets (for all repetitions) have:
Min of last regrets R_T = 2.13e+03
Mean of last regrets R_T = 2.53e+03
Median of last regrets R_T = 2.62e+03
Max of last regrets R_T = 2.9e+03
STD of last regrets R_T = 278
For policy #7 called 'OSSB($\varepsilon=0$, $\gamma=0$, Gauss)' ...
Last regrets (for all repetitions) have:
Min of last regrets R_T = 210
Mean of last regrets R_T = 211
Median of last regrets R_T = 210
Max of last regrets R_T = 218
STD of last regrets R_T = 1.91
For policy #8 called 'OSSB($\varepsilon=0$, $\gamma=0$, Gauss, $s=5$)' ...
Last regrets (for all repetitions) have:
Min of last regrets R_T = 213
Mean of last regrets R_T = 481
Median of last regrets R_T = 231
Max of last regrets R_T = 2.63e+03
STD of last regrets R_T = 662
For policy #9 called 'OSSB($\varepsilon=0.01$, $\gamma=0$, Gauss, $s=5$)' ...
Last regrets (for all repetitions) have:
Min of last regrets R_T = 214
Mean of last regrets R_T = 269
Median of last regrets R_T = 230
Max of last regrets R_T = 764
STD of last regrets R_T = 129
For policy #10 called 'OSSB($\varepsilon=0$, $\gamma=0.1$, Gauss, $s=5$)' ...
Last regrets (for all repetitions) have:
Min of last regrets R_T = 222
Mean of last regrets R_T = 313
Median of last regrets R_T = 249
Max of last regrets R_T = 702
STD of last regrets R_T = 128
For policy #11 called 'OSSB($\varepsilon=0.01$, $\gamma=0.1$, Gauss, $s=5$)' ...
Last regrets (for all repetitions) have:
Min of last regrets R_T = 214
Mean of last regrets R_T = 246
Median of last regrets R_T = 229
Max of last regrets R_T = 450
STD of last regrets R_T = 54.5
For policy #12 called 'Aggregator[all non Aggr]' ...
Last regrets (for all repetitions) have:
Min of last regrets R_T = 227
Mean of last regrets R_T = 364
Median of last regrets R_T = 260
Max of last regrets R_T = 1.96e+03
STD of last regrets R_T = 414
For policy #13 called 'Exp4[all non Aggr]' ...
Last regrets (for all repetitions) have:
Min of last regrets R_T = 210
Mean of last regrets R_T = 753
Median of last regrets R_T = 264
Max of last regrets R_T = 3.93e+03
STD of last regrets R_T = 947
For policy #14 called 'Aggregator[Sparse-KLUCB for s=1..20]' ...
Last regrets (for all repetitions) have:
Min of last regrets R_T = 1.13e+03
Mean of last regrets R_T = 1.26e+03
Median of last regrets R_T = 1.28e+03
Max of last regrets R_T = 1.34e+03
STD of last regrets R_T = 60.9
For policy #15 called 'Exp4[Sparse-KLUCB for s=1..20]' ...
Last regrets (for all repetitions) have:
Min of last regrets R_T = 1.05e+03
Mean of last regrets R_T = 1.18e+03
Median of last regrets R_T = 1.18e+03
Max of last regrets R_T = 1.27e+03
STD of last regrets R_T = 51.8
Giving the final ranks ...
Final ranking for this environment #0 :
- Policy 'EmpiricalMeans' was ranked 1 / 16 for this simulation (last regret = 210.37).
- Policy 'OSSB($\varepsilon=0$, $\gamma=0$, Gauss)' was ranked 2 / 16 for this simulation (last regret = 210.66).
- Policy 'OSSB($\varepsilon=0.01$, $\gamma=0.1$, Gauss, $s=5$)' was ranked 3 / 16 for this simulation (last regret = 246.39).
- Policy 'OSSB($\varepsilon=0.01$, $\gamma=0$, Gauss, $s=5$)' was ranked 4 / 16 for this simulation (last regret = 268.63).
- Policy 'OSSB($\varepsilon=0$, $\gamma=0.1$, Gauss, $s=5$)' was ranked 5 / 16 for this simulation (last regret = 313.38).
- Policy 'Aggregator[all non Aggr]' was ranked 6 / 16 for this simulation (last regret = 363.86).
- Policy 'OSSB($\varepsilon=0$, $\gamma=0$, Gauss, $s=5$)' was ranked 7 / 16 for this simulation (last regret = 481.31).
- Policy 'Exp4[all non Aggr]' was ranked 8 / 16 for this simulation (last regret = 752.4).
- Policy 'Sparse-kl-UCB($s=5$, Bern)' was ranked 9 / 16 for this simulation (last regret = 1173.3).
- Policy 'Exp4[Sparse-KLUCB for s=1..20]' was ranked 10 / 16 for this simulation (last regret = 1177.7).
- Policy 'Aggregator[Sparse-KLUCB for s=1..20]' was ranked 11 / 16 for this simulation (last regret = 1255.5).
- Policy 'Sparse($s=5$)[BayesUCB, UCB for K and J]' was ranked 12 / 16 for this simulation (last regret = 1340.4).
- Policy 'Thompson' was ranked 13 / 16 for this simulation (last regret = 1857.3).
- Policy 'SparseUCB($s=5$, $\alpha=1$)' was ranked 14 / 16 for this simulation (last regret = 2528.8).
- Policy 'UCB($\alpha=1$)' was ranked 15 / 16 for this simulation (last regret = 3916.8).
- Policy 'Thompson(Gauss)' was ranked 16 / 16 for this simulation (last regret = 4207.9).
Giving the mean and std running times ...
For policy #0 called 'EmpiricalMeans' ...
676 ms ± 30.4 ms per loop (mean ± std. dev. of 16 runs)
For policy #1 called 'UCB($\alpha=1$)' ...
971 ms ± 93.6 ms per loop (mean ± std. dev. of 16 runs)
For policy #2 called 'Thompson' ...
1.71 s ± 210 ms per loop (mean ± std. dev. of 16 runs)
For policy #7 called 'OSSB($\varepsilon=0$, $\gamma=0$, Gauss)' ...
2.45 s ± 291 ms per loop (mean ± std. dev. of 16 runs)
For policy #6 called 'SparseUCB($s=5$, $\alpha=1$)' ...
2.92 s ± 175 ms per loop (mean ± std. dev. of 16 runs)
For policy #10 called 'OSSB($\varepsilon=0$, $\gamma=0.1$, Gauss, $s=5$)' ...
3.32 s ± 569 ms per loop (mean ± std. dev. of 16 runs)
For policy #8 called 'OSSB($\varepsilon=0$, $\gamma=0$, Gauss, $s=5$)' ...
3.41 s ± 770 ms per loop (mean ± std. dev. of 16 runs)
For policy #9 called 'OSSB($\varepsilon=0.01$, $\gamma=0$, Gauss, $s=5$)' ...
3.43 s ± 809 ms per loop (mean ± std. dev. of 16 runs)
For policy #4 called 'Sparse($s=5$)[BayesUCB, UCB for K and J]' ...
3.7 s ± 598 ms per loop (mean ± std. dev. of 16 runs)
For policy #11 called 'OSSB($\varepsilon=0.01$, $\gamma=0.1$, Gauss, $s=5$)' ...
4.01 s ± 673 ms per loop (mean ± std. dev. of 16 runs)
For policy #5 called 'Sparse-kl-UCB($s=5$, Bern)' ...
12.6 s ± 1.25 s per loop (mean ± std. dev. of 16 runs)
For policy #3 called 'Thompson(Gauss)' ...
39.2 s ± 7.11 s per loop (mean ± std. dev. of 16 runs)
For policy #13 called 'Exp4[all non Aggr]' ...
1min 11s ± 3.38 s per loop (mean ± std. dev. of 16 runs)
For policy #12 called 'Aggregator[all non Aggr]' ...
1min 17s ± 10.6 s per loop (mean ± std. dev. of 16 runs)
For policy #14 called 'Aggregator[Sparse-KLUCB for s=1..20]' ...
2min 17s ± 17.8 s per loop (mean ± std. dev. of 16 runs)
For policy #15 called 'Exp4[Sparse-KLUCB for s=1..20]' ...
3min 26s ± 58.8 s per loop (mean ± std. dev. of 16 runs)
Giving the mean and std memory consumption ...
For policy #4 called 'Sparse($s=5$)[BayesUCB, UCB for K and J]' ...
1.3 KiB ± 0 B (mean ± std. dev. of 16 runs)
For policy #9 called 'OSSB($\varepsilon=0.01$, $\gamma=0$, Gauss, $s=5$)' ...
1.3 KiB ± 1.3 KiB (mean ± std. dev. of 16 runs)
For policy #8 called 'OSSB($\varepsilon=0$, $\gamma=0$, Gauss, $s=5$)' ...
1.3 KiB ± 1.3 KiB (mean ± std. dev. of 16 runs)
For policy #11 called 'OSSB($\varepsilon=0.01$, $\gamma=0.1$, Gauss, $s=5$)' ...
1.3 KiB ± 1.3 KiB (mean ± std. dev. of 16 runs)
For policy #10 called 'OSSB($\varepsilon=0$, $\gamma=0.1$, Gauss, $s=5$)' ...
1.4 KiB ± 1.4 KiB (mean ± std. dev. of 16 runs)
For policy #7 called 'OSSB($\varepsilon=0$, $\gamma=0$, Gauss)' ...
1.5 KiB ± 1.3 KiB (mean ± std. dev. of 16 runs)
For policy #2 called 'Thompson' ...
1.8 KiB ± 558.9 B (mean ± std. dev. of 16 runs)
For policy #6 called 'SparseUCB($s=5$, $\alpha=1$)' ...
2.1 KiB ± 423.3 B (mean ± std. dev. of 16 runs)
For policy #5 called 'Sparse-kl-UCB($s=5$, Bern)' ...
2.2 KiB ± 304.8 B (mean ± std. dev. of 16 runs)
For policy #0 called 'EmpiricalMeans' ...
2.3 KiB ± 212.4 B (mean ± std. dev. of 16 runs)
For policy #1 called 'UCB($\alpha=1$)' ...
2.3 KiB ± 253.9 B (mean ± std. dev. of 16 runs)
For policy #3 called 'Thompson(Gauss)' ...
2.4 KiB ± 541.6 B (mean ± std. dev. of 16 runs)
For policy #14 called 'Aggregator[Sparse-KLUCB for s=1..20]' ...
2.5 KiB ± 277.1 B (mean ± std. dev. of 16 runs)
For policy #15 called 'Exp4[Sparse-KLUCB for s=1..20]' ...
2.5 KiB ± 104.9 B (mean ± std. dev. of 16 runs)
For policy #12 called 'Aggregator[all non Aggr]' ...
3.3 KiB ± 261.5 B (mean ± std. dev. of 16 runs)
For policy #13 called 'Exp4[all non Aggr]' ...
3.4 KiB ± 161.1 B (mean ± std. dev. of 16 runs)
Top 20 lines ranked by memory consumption:
#1: Policies/BasePolicy.py:31: 39 KiB
self.pulls = np.zeros(nbArms, dtype=int) #: Number of pulls of each arms
#2: Policies/BasePolicy.py:32: 31.4 KiB
self.rewards = np.zeros(nbArms) #: Cumulated rewards of each arms
#3: Policies/BasePolicy.py:25: 18.1 KiB
self.nbArms = nbArms #: Number of arms
#4: Policies/IndexPolicy.py:30: 15.1 KiB
self.index = np.zeros(nbArms) #: Numerical index for each arms
#5: core/numeric.py:298: 12.8 KiB
a = empty(shape, dtype, order)
#6: lib/function_base.py:2715: 12.1 KiB
self.otypes = otypes
#7: Posterior/Beta.py:72: 11.4 KiB
self._a = a
#8: Policies/BayesianIndexPolicy.py:20: 10.7 KiB
self.posterior[arm] = posterior(*args, **kwargs)
#9: Policies/Aggregator.py:109: 10.4 KiB
self.children.append(child['archtype'](nbArms, **localparams))
#10: lib/function_base.py:2720: 9.4 KiB
self.excluded = set(excluded)
#11: joblib/pool.py:259: 9.2 KiB
return (loads, (dumps(a, protocol=HIGHEST_PROTOCOL),))
#12: Posterior/Gauss.py:81: 8.6 KiB
self._mu = float(mu)
#13: Posterior/Beta.py:75: 6.9 KiB
self.N = [a, b] #: List of two parameters [a, b]
#14: python3.6/threading.py:347: 6.1 KiB
waiters_to_notify = _deque(_islice(all_waiters, n))
#15: python3.6/threading.py:884: 5.7 KiB
self._bootstrap_inner()
#16: Environment/Evaluator.py:167: 5.7 KiB
self.policies.append(policy['archtype'](env.nbArms, **policy['params']))
#17: Policies/SparseWrapper.py:122: 5.4 KiB
# now for the underlying policy
#18: lib/function_base.py:2703: 4.7 KiB
self.__doc__ = pyfunc.__doc__
#19: joblib/pool.py:371: 4.4 KiB
CustomizablePickler(buffer, self._reducers).dump(obj)
#20: Environment/Evaluator.py:237: 4 KiB
for repeatId in tqdm(range(self.repetitions), desc="Repeat||")
359 others: 182.5 KiB
Total allocated size: 413.8 KiB
Done for simulations main.py ...
Top 20 lines ranked by memory consumption:
#1: python3.6/linecache.py:137: 1011.7 KiB
lines = fp.readlines()
#2: Policies/BasePolicy.py:31: 39 KiB
self.pulls = np.zeros(nbArms, dtype=int) #: Number of pulls of each arms
#3: Policies/BasePolicy.py:32: 31.4 KiB
self.rewards = np.zeros(nbArms) #: Cumulated rewards of each arms
#4: python3.6/tracemalloc.py:65: 22 KiB
return (self.size, self.count, self.traceback)
#5: Policies/BasePolicy.py:25: 18.1 KiB
self.nbArms = nbArms #: Number of arms
#6: Policies/IndexPolicy.py:30: 15.1 KiB
self.index = np.zeros(nbArms) #: Numerical index for each arms
#7: core/numeric.py:298: 12.8 KiB
a = empty(shape, dtype, order)
#8: lib/function_base.py:2715: 12.1 KiB
self.otypes = otypes
#9: Posterior/Beta.py:72: 11.4 KiB
self._a = a
#10: Policies/BayesianIndexPolicy.py:20: 10.7 KiB
self.posterior[arm] = posterior(*args, **kwargs)
#11: Policies/Aggregator.py:109: 10.4 KiB
self.children.append(child['archtype'](nbArms, **localparams))
#12: lib/function_base.py:2720: 9.4 KiB
self.excluded = set(excluded)
#13: joblib/pool.py:259: 9 KiB
return (loads, (dumps(a, protocol=HIGHEST_PROTOCOL),))
#14: Posterior/Gauss.py:81: 8.6 KiB
self._mu = float(mu)
#15: Posterior/Beta.py:75: 6.9 KiB
self.N = [a, b] #: List of two parameters [a, b]
#16: python3.6/threading.py:347: 6.1 KiB
waiters_to_notify = _deque(_islice(all_waiters, n))
#17: python3.6/threading.py:884: 5.7 KiB
self._bootstrap_inner()
#18: Environment/Evaluator.py:167: 5.7 KiB
self.policies.append(policy['archtype'](env.nbArms, **policy['params']))
#19: Policies/SparseWrapper.py:122: 5.4 KiB
# now for the underlying policy
#20: lib/function_base.py:2703: 4.7 KiB
self.__doc__ = pyfunc.__doc__
418 others: 217.4 KiB
Total allocated size: 1.4 MiB