- Setting dpi of all figures to 110 ...
- Setting 'figsize' of all figures to (19.8, 10.8) ...
Info: Using the regular tqdm() decorator ...
Info: numba.jit seems to be available.
Info: numba.jit seems to be available.
Loaded experiments configuration from 'configuration.py' :
configuration = {'horizon': 10000, 'verbosity': 6, 'n_jobs': 2, 'finalRanksOnAverage': True, 'delta_t_save': 1, 'players': [rhoLearn(BayesUCB), rhoLearn(BayesUCB), rhoLearn(BayesUCB), rhoLearn(BayesUCB), rhoLearn(BayesUCB), rhoLearn(BayesUCB)], 'repetitions': 20, 'successive_players': [[Selfish(BayesUCB), Selfish(BayesUCB), Selfish(BayesUCB), Selfish(BayesUCB), Selfish(BayesUCB), Selfish(BayesUCB)], [rhoRand(BayesUCB), rhoRand(BayesUCB), rhoRand(BayesUCB), rhoRand(BayesUCB), rhoRand(BayesUCB), rhoRand(BayesUCB)], [rhoLearn(BayesUCB), rhoLearn(BayesUCB), rhoLearn(BayesUCB), rhoLearn(BayesUCB), rhoLearn(BayesUCB), rhoLearn(BayesUCB)]], 'averageOn': 0.001, 'collisionModel': , 'environment': [{'arm_type': , 'params': [0.005, 0.01, 0.015, 0.02, 0.3, 0.35, 0.4, 0.45, 0.5, 0.55, 0.6, 0.78, 0.8, 0.82, 0.83, 0.84, 0.85]}]}
plots/ is already a directory here...
Number of players in the multi-players game: 6
Time horizon: 10000
Number of repetitions: 20
Sampling rate for saving, delta_t_save: 1
Sampling rate for plotting, delta_t_plot: 1
Number of jobs for parallelization: 2
Using collision model onlyUniqUserGetsReward (function ).
More details:
Simple collision model where only the players alone on one arm samples it and receives the reward.
- This is the default collision model, cf. https://arxiv.org/abs/0910.2065v3 collision model 1.
- The numpy array 'choices' is increased according to the number of users who collided (it is NOT binary).
Creating a new MAB problem ...
Reading arms of this MAB problem from a dictionnary 'configuration' = {'arm_type': , 'params': [0.005, 0.01, 0.015, 0.02, 0.3, 0.35, 0.4, 0.45, 0.5, 0.55, 0.6, 0.78, 0.8, 0.82, 0.83, 0.84, 0.85]} ...
- with 'arm_type' =
- with 'params' = [0.005, 0.01, 0.015, 0.02, 0.3, 0.35, 0.4, 0.45, 0.5, 0.55, 0.6, 0.78, 0.8, 0.82, 0.83, 0.84, 0.85]
- with 'arms' = [B(0.005), B(0.01), B(0.015), B(0.02), B(0.3), B(0.35), B(0.4), B(0.45), B(0.5), B(0.55), B(0.6), B(0.78), B(0.8), B(0.82), B(0.83), B(0.84), B(0.85)]
- with 'nbArms' = 17
- with 'maxArm' = 0.85
- with 'minArm' = 0.005
This MAB problem has:
- a [Lai & Robbins] complexity constant C(mu) = 66.4 ...
- a Optimal Arm Identification factor H_OI(mu) = 56.88% ...
Number of environments to try: 1
Evaluating environment: MAB(nbArms: 17, arms: [B(0.005), B(0.01), B(0.015), B(0.02), B(0.3), B(0.35), B(0.4), B(0.45), B(0.5), B(0.55), B(0.6), B(0.78), B(0.8), B(0.82), B(0.83), B(0.84), B(0.85)], minArm: 0.005, maxArm: 0.85)
- Adding player #1 = #1<$\rho^{\mathrm{Learn}}$[BayesUCB, rank ~ BayesUCB]> ...
Using this already created player 'player' = #1<$\rho^{\mathrm{Learn}}$[BayesUCB, rank ~ BayesUCB]> ...
- Adding player #2 = #2<$\rho^{\mathrm{Learn}}$[BayesUCB, rank ~ BayesUCB]> ...
Using this already created player 'player' = #2<$\rho^{\mathrm{Learn}}$[BayesUCB, rank ~ BayesUCB]> ...
- Adding player #3 = #3<$\rho^{\mathrm{Learn}}$[BayesUCB, rank ~ BayesUCB]> ...
Using this already created player 'player' = #3<$\rho^{\mathrm{Learn}}$[BayesUCB, rank ~ BayesUCB]> ...
- Adding player #4 = #4<$\rho^{\mathrm{Learn}}$[BayesUCB, rank ~ BayesUCB]> ...
Using this already created player 'player' = #4<$\rho^{\mathrm{Learn}}$[BayesUCB, rank ~ BayesUCB]> ...
- Adding player #5 = #5<$\rho^{\mathrm{Learn}}$[BayesUCB, rank ~ BayesUCB]> ...
Using this already created player 'player' = #5<$\rho^{\mathrm{Learn}}$[BayesUCB, rank ~ BayesUCB]> ...
- Adding player #6 = #6<$\rho^{\mathrm{Learn}}$[BayesUCB, rank ~ BayesUCB]> ...
Using this already created player 'player' = #6<$\rho^{\mathrm{Learn}}$[BayesUCB, rank ~ BayesUCB]> ...
Estimated order by the policy #1<$\rho^{\mathrm{Learn}}$[BayesUCB, rank: 6 ~ BayesUCB]> after 10000 steps: [ 0 1 2 3 5 9 10 7 8 4 6 12 15 11 16 13 14] ...
==> Optimal arm identification: 100.00% (relative success)...
==> Manhattan distance from optimal ordering: 79.24% (relative success)...
==> Kendell Tau distance from optimal ordering: 99.99% (relative success)...
==> Spearman distance from optimal ordering: 100.00% (relative success)...
==> Gestalt distance from optimal ordering: 58.82% (relative success)...
==> Mean distance from optimal ordering: 84.51% (relative success)...
Estimated order by the policy #2<$\rho^{\mathrm{Learn}}$[BayesUCB, rank: 6 ~ BayesUCB]> after 10000 steps: [10 6 0 1 2 3 5 7 9 8 4 15 16 12 14 11 13] ...
==> Optimal arm identification: 100.00% (relative success)...
==> Manhattan distance from optimal ordering: 66.78% (relative success)...
==> Kendell Tau distance from optimal ordering: 99.70% (relative success)...
==> Spearman distance from optimal ordering: 99.85% (relative success)...
==> Gestalt distance from optimal ordering: 52.94% (relative success)...
==> Mean distance from optimal ordering: 79.82% (relative success)...
Estimated order by the policy #3<$\rho^{\mathrm{Learn}}$[BayesUCB, rank: 6 ~ BayesUCB]> after 10000 steps: [ 8 4 3 0 1 2 6 7 10 5 9 11 13 15 12 16 14] ...
==> Optimal arm identification: 100.00% (relative success)...
==> Manhattan distance from optimal ordering: 75.09% (relative success)...
==> Kendell Tau distance from optimal ordering: 99.98% (relative success)...
==> Spearman distance from optimal ordering: 100.00% (relative success)...
==> Gestalt distance from optimal ordering: 58.82% (relative success)...
==> Mean distance from optimal ordering: 83.47% (relative success)...
Estimated order by the policy #4<$\rho^{\mathrm{Learn}}$[BayesUCB, rank: 6 ~ BayesUCB]> after 10000 steps: [ 0 1 2 3 5 8 10 7 9 4 6 13 11 15 16 12 14] ...
==> Optimal arm identification: 100.00% (relative success)...
==> Manhattan distance from optimal ordering: 79.24% (relative success)...
==> Kendell Tau distance from optimal ordering: 100.00% (relative success)...
==> Spearman distance from optimal ordering: 100.00% (relative success)...
==> Gestalt distance from optimal ordering: 58.82% (relative success)...
==> Mean distance from optimal ordering: 84.51% (relative success)...
Estimated order by the policy #5<$\rho^{\mathrm{Learn}}$[BayesUCB, rank: 6 ~ BayesUCB]> after 10000 steps: [ 0 1 2 3 4 10 6 9 5 8 7 16 15 14 12 13 11] ...
==> Optimal arm identification: 100.00% (relative success)...
==> Manhattan distance from optimal ordering: 77.85% (relative success)...
==> Kendell Tau distance from optimal ordering: 99.97% (relative success)...
==> Spearman distance from optimal ordering: 100.00% (relative success)...
==> Gestalt distance from optimal ordering: 47.06% (relative success)...
==> Mean distance from optimal ordering: 81.22% (relative success)...
Estimated order by the policy #6<$\rho^{\mathrm{Learn}}$[BayesUCB, rank: 6 ~ BayesUCB]> after 10000 steps: [ 4 9 10 5 0 1 2 3 7 8 6 14 12 15 13 11 16] ...
==> Optimal arm identification: 100.00% (relative success)...
==> Manhattan distance from optimal ordering: 62.63% (relative success)...
==> Kendell Tau distance from optimal ordering: 99.61% (relative success)...
==> Spearman distance from optimal ordering: 99.74% (relative success)...
==> Gestalt distance from optimal ordering: 52.94% (relative success)...
==> Mean distance from optimal ordering: 78.73% (relative success)...
Giving the final ranks ...
Final ranking for this environment #0 :
- Player #6, '#6<$\rho^{\mathrm{Learn}}$[BayesUCB, rank ~ BayesUCB]>' was ranked 1 / 6 for this simulation (last rewards = 7084.5).
- Player #2, '#2<$\rho^{\mathrm{Learn}}$[BayesUCB, rank ~ BayesUCB]>' was ranked 2 / 6 for this simulation (last rewards = 7050.55).
- Player #3, '#3<$\rho^{\mathrm{Learn}}$[BayesUCB, rank ~ BayesUCB]>' was ranked 3 / 6 for this simulation (last rewards = 6922.5).
- Player #1, '#1<$\rho^{\mathrm{Learn}}$[BayesUCB, rank ~ BayesUCB]>' was ranked 4 / 6 for this simulation (last rewards = 6891.45).
- Player #5, '#5<$\rho^{\mathrm{Learn}}$[BayesUCB, rank ~ BayesUCB]>' was ranked 5 / 6 for this simulation (last rewards = 6873.05).
- Player #4, '#4<$\rho^{\mathrm{Learn}}$[BayesUCB, rank ~ BayesUCB]>' was ranked 6 / 6 for this simulation (last rewards = 6812.1).
- Plotting the decentralized rewards
- Plotting the centralized fairness (STD)
- Plotting the centralized regret
Difference between regret and sum of three terms: [ -0.1495 -0.1595 -0.0795 ..., -62.97225 -62.96325 -62.92575]
- For 6 players, Anandtharam et al. centralized lower-bound gave = 12 ...
- For 6 players, our lower bound gave = 71.8 ...
- For 6 players, the initial lower bound in Theorem 6 from [Anandkumar et al., 2010] gave = 54.3 ...
This MAB problem has:
- a [Lai & Robbins] complexity constant C(mu) = 66.4 for 1-player problem ...
- a Optimal Arm Identification factor H_OI(mu) = 56.88% ...
- [Anandtharam et al] centralized lowerbound = 71.8,
- Our decentralized lowerbound = 54.3,
- [Anandkumar et al] decentralized lowerbound = 12
- Plotting the centralized regret
Difference between regret and sum of three terms: [ 4.49775 4.32675 3.93 ..., -62.97225 -62.96325 -62.92575]
- For 6 players, Anandtharam et al. centralized lower-bound gave = 12 ...
- For 6 players, our lower bound gave = 71.8 ...
- For 6 players, the initial lower bound in Theorem 6 from [Anandkumar et al., 2010] gave = 54.3 ...
This MAB problem has:
- a [Lai & Robbins] complexity constant C(mu) = 66.4 for 1-player problem ...
- a Optimal Arm Identification factor H_OI(mu) = 56.88% ...
- [Anandtharam et al] centralized lowerbound = 71.8,
- Our decentralized lowerbound = 54.3,
- [Anandkumar et al] decentralized lowerbound = 12
- Plotting the centralized regret
Difference between regret and sum of three terms: [ -0.1495 -0.1595 -0.0795 ..., -62.97225 -62.96325 -62.92575]
- For 6 players, Anandtharam et al. centralized lower-bound gave = 12 ...
- For 6 players, our lower bound gave = 71.8 ...
- For 6 players, the initial lower bound in Theorem 6 from [Anandkumar et al., 2010] gave = 54.3 ...
This MAB problem has:
- a [Lai & Robbins] complexity constant C(mu) = 66.4 for 1-player problem ...
- a Optimal Arm Identification factor H_OI(mu) = 56.88% ...
- [Anandtharam et al] centralized lowerbound = 71.8,
- Our decentralized lowerbound = 54.3,
- [Anandkumar et al] decentralized lowerbound = 12
- Plotting the centralized regret
Difference between regret and sum of three terms: [ 4.49775 4.32675 3.93 ..., -62.97225 -62.96325 -62.92575]
- For 6 players, Anandtharam et al. centralized lower-bound gave = 12 ...
- For 6 players, our lower bound gave = 71.8 ...
- For 6 players, the initial lower bound in Theorem 6 from [Anandkumar et al., 2010] gave = 54.3 ...
This MAB problem has:
- a [Lai & Robbins] complexity constant C(mu) = 66.4 for 1-player problem ...
- a Optimal Arm Identification factor H_OI(mu) = 56.88% ...
- [Anandtharam et al] centralized lowerbound = 71.8,
- Our decentralized lowerbound = 54.3,
- [Anandkumar et al] decentralized lowerbound = 12
- Plotting the cumulative number of switches
- Plotting the probability of picking the best arm
- Plotting the cumulated total nb of collision as a function of time
No upper bound for the non-cumulated number of collisions...
- Plotting the frequency of collision in each arm
- For #$0$: $B(0.005)$ ($0.0%$$\%$), frequency of collisions is 1e-05 ...
- For #$1$: $B(0.01)$ ($0.0%$$\%$), frequency of collisions is 1.5e-05 ...
- For #$2$: $B(0.015)$ ($0.0%$$\%$), frequency of collisions is 2.58333e-05 ...
- For #$3$: $B(0.02)$ ($0.0%$$\%$), frequency of collisions is 1.83333e-05 ...
- For #$4$: $B(0.3)$ ($0.0%$$\%$), frequency of collisions is 0.0003425 ...
- For #$5$: $B(0.35)$ ($0.0%$$\%$), frequency of collisions is 0.000103333 ...
- For #$6$: $B(0.4)$ ($0.0%$$\%$), frequency of collisions is 0.000238333 ...
- For #$7$: $B(0.45)$ ($0.0%$$\%$), frequency of collisions is 0.00037 ...
- For #$8$: $B(0.5)$ ($0.0%$$\%$), frequency of collisions is 0.000385 ...
- For #$9$: $B(0.55)$ ($0.1%$$\%$), frequency of collisions is 0.001145 ...
- For #$10$: $B(0.6)$ ($0.2%$$\%$), frequency of collisions is 0.00191917 ...
- For #$11$: $B(0.78)$ ($3.3%$$\%$), frequency of collisions is 0.0333225 ...
- For #$12$: $B(0.8)$ ($3.7%$$\%$), frequency of collisions is 0.0368842 ...
- For #$13$: $B(0.82)$ ($1.5%$$\%$), frequency of collisions is 0.0145975 ...
- For #$14$: $B(0.83)$ ($2.3%$$\%$), frequency of collisions is 0.0229517 ...
- For #$15$: $B(0.84)$ ($1.7%$$\%$), frequency of collisions is 0.0174483 ...
- For #$16$: $B(0.85)$ ($1.7%$$\%$), frequency of collisions is 0.0170442 ...
Done for simulations main_multiplayers.py ...
,
- Our decentralized lowerbound = 54.3,
- [Anandkumar et al] decentralized lowerbound = 12
- Plotting the centralized regret
Difference between regret and sum of three terms: [ 0.1325 0.183 0.4865 ..., -42.1415 -42.218 -42.286 ]
- For 6 players, Anandtharam et al. centralized lower-bound gave = 12 ...
- For 6 players, our lower bound gave = 71.8 ...
- For 6 players, the initial lower bound in Theorem 6 from [Anandkumar et al., 2010] gave = 54.3 ...
This MAB problem has:
- a [Lai & Robbins] complexity constant C(mu) = 66.4 for 1-player problem ...
- a Optimal Arm Identification factor H_OI(mu) = 56.88% ...
- [Anandtharam et al] centralized lowerbound = 71.8,
- Our decentralized lowerbound = 54.3,
- [Anandkumar et al] decentralized lowerbound = 12
- Plotting the centralized regret
Difference between regret and sum of three terms: [ -3.06125 -3.10425 -2.6415 ..., -42.1415 -42.218 -42.286 ]
- For 6 players, Anandtharam et al. centralized lower-bound gave = 12 ...
- For 6 players, our lower bound gave = 71.8 ...
- For 6 players, the initial lower bound in Theorem 6 from [Anandkumar et al., 2010] gave = 54.3 ...
This MAB problem has:
- a [Lai & Robbins] complexity constant C(mu) = 66.4 for 1-player problem ...
- a Optimal Arm Identification factor H_OI(mu) = 56.88% ...
- [Anandtharam et al] centralized lowerbound = 71.8,
- Our decentralized lowerbound = 54.3,
- [Anandkumar et al] decentralized lowerbound = 12
- Plotting the cumulative number of switches
- Plotting the probability of picking the best arm
- Plotting the cumulated total nb of collision as a function of time
No upper bound for the non-cumulated number of collisions...
- Plotting the frequency of collision in each arm
- For #$0$: $B(0.005)$ ($0.0%$$\%$), frequency of collisions is 1.41667e-05 ...
- For #$1$: $B(0.01)$ ($0.0%$$\%$), frequency of collisions is 2.83333e-05 ...
- For #$2$: $B(0.015)$ ($0.0%$$\%$), frequency of collisions is 1e-05 ...
- For #$3$: $B(0.02)$ ($0.0%$$\%$), frequency of collisions is 1.41667e-05 ...
- For #$4$: $B(0.3)$ ($0.0%$$\%$), frequency of collisions is 0.000333333 ...
- For #$5$: $B(0.35)$ ($0.0%$$\%$), frequency of collisions is 0.000389167 ...
- For #$6$: $B(0.4)$ ($0.0%$$\%$), frequency of collisions is 0.0002225 ...
- For #$7$: $B(0.45)$ ($0.1%$$\%$), frequency of collisions is 0.0005525 ...
- For #$8$: $B(0.5)$ ($0.0%$$\%$), frequency of collisions is 0.0002975 ...
- For #$9$: $B(0.55)$ ($0.2%$$\%$), frequency of collisions is 0.00230333 ...
- For #$10$: $B(0.6)$ ($0.2%$$\%$), frequency of collisions is 0.00214917 ...
- For #$11$: $B(0.78)$ ($4.3%$$\%$), frequency of collisions is 0.0432367 ...
- For #$12$: $B(0.8)$ ($2.7%$$\%$), frequency of collisions is 0.0268125 ...
- For #$13$: $B(0.82)$ ($2.5%$$\%$), frequency of collisions is 0.0250258 ...
- For #$14$: $B(0.83)$ ($2.7%$$\%$), frequency of collisions is 0.0272858 ...
- For #$15$: $B(0.84)$ ($2.1%$$\%$), frequency of collisions is 0.0208783 ...
- For #$16$: $B(0.85)$ ($1.9%$$\%$), frequency of collisions is 0.0188558 ...
Done for simulations main_multiplayers.py ...