- Setting dpi of all figures to 110 ... - Setting 'figsize' of all figures to (19.8, 10.8) ... Info: Using the regular tqdm() decorator ... Info: numba.jit seems to be available. Info: numba.jit seems to be available. Loaded experiments configuration from 'configuration.py' : configuration = {'horizon': 10000, 'verbosity': 6, 'n_jobs': 2, 'finalRanksOnAverage': True, 'delta_t_save': 1, 'players': [rhoLearn(BayesUCB), rhoLearn(BayesUCB), rhoLearn(BayesUCB), rhoLearn(BayesUCB), rhoLearn(BayesUCB), rhoLearn(BayesUCB)], 'repetitions': 20, 'successive_players': [[Selfish(BayesUCB), Selfish(BayesUCB), Selfish(BayesUCB), Selfish(BayesUCB), Selfish(BayesUCB), Selfish(BayesUCB)], [rhoRand(BayesUCB), rhoRand(BayesUCB), rhoRand(BayesUCB), rhoRand(BayesUCB), rhoRand(BayesUCB), rhoRand(BayesUCB)], [rhoLearn(BayesUCB), rhoLearn(BayesUCB), rhoLearn(BayesUCB), rhoLearn(BayesUCB), rhoLearn(BayesUCB), rhoLearn(BayesUCB)]], 'averageOn': 0.001, 'collisionModel': , 'environment': [{'arm_type': , 'params': [0.005, 0.01, 0.015, 0.02, 0.3, 0.35, 0.4, 0.45, 0.5, 0.55, 0.6, 0.78, 0.8, 0.82, 0.83, 0.84, 0.85]}]} plots/ is already a directory here... Number of players in the multi-players game: 6 Time horizon: 10000 Number of repetitions: 20 Sampling rate for saving, delta_t_save: 1 Sampling rate for plotting, delta_t_plot: 1 Number of jobs for parallelization: 2 Using collision model onlyUniqUserGetsReward (function ). More details: Simple collision model where only the players alone on one arm samples it and receives the reward. - This is the default collision model, cf. https://arxiv.org/abs/0910.2065v3 collision model 1. - The numpy array 'choices' is increased according to the number of users who collided (it is NOT binary). Creating a new MAB problem ... Reading arms of this MAB problem from a dictionnary 'configuration' = {'arm_type': , 'params': [0.005, 0.01, 0.015, 0.02, 0.3, 0.35, 0.4, 0.45, 0.5, 0.55, 0.6, 0.78, 0.8, 0.82, 0.83, 0.84, 0.85]} ... - with 'arm_type' = - with 'params' = [0.005, 0.01, 0.015, 0.02, 0.3, 0.35, 0.4, 0.45, 0.5, 0.55, 0.6, 0.78, 0.8, 0.82, 0.83, 0.84, 0.85] - with 'arms' = [B(0.005), B(0.01), B(0.015), B(0.02), B(0.3), B(0.35), B(0.4), B(0.45), B(0.5), B(0.55), B(0.6), B(0.78), B(0.8), B(0.82), B(0.83), B(0.84), B(0.85)] - with 'nbArms' = 17 - with 'maxArm' = 0.85 - with 'minArm' = 0.005 This MAB problem has: - a [Lai & Robbins] complexity constant C(mu) = 66.4 ... - a Optimal Arm Identification factor H_OI(mu) = 56.88% ... Number of environments to try: 1 Evaluating environment: MAB(nbArms: 17, arms: [B(0.005), B(0.01), B(0.015), B(0.02), B(0.3), B(0.35), B(0.4), B(0.45), B(0.5), B(0.55), B(0.6), B(0.78), B(0.8), B(0.82), B(0.83), B(0.84), B(0.85)], minArm: 0.005, maxArm: 0.85) - Adding player #1 = #1<$\rho^{\mathrm{Learn}}$[BayesUCB, rank ~ BayesUCB]> ... Using this already created player 'player' = #1<$\rho^{\mathrm{Learn}}$[BayesUCB, rank ~ BayesUCB]> ... - Adding player #2 = #2<$\rho^{\mathrm{Learn}}$[BayesUCB, rank ~ BayesUCB]> ... Using this already created player 'player' = #2<$\rho^{\mathrm{Learn}}$[BayesUCB, rank ~ BayesUCB]> ... - Adding player #3 = #3<$\rho^{\mathrm{Learn}}$[BayesUCB, rank ~ BayesUCB]> ... Using this already created player 'player' = #3<$\rho^{\mathrm{Learn}}$[BayesUCB, rank ~ BayesUCB]> ... - Adding player #4 = #4<$\rho^{\mathrm{Learn}}$[BayesUCB, rank ~ BayesUCB]> ... Using this already created player 'player' = #4<$\rho^{\mathrm{Learn}}$[BayesUCB, rank ~ BayesUCB]> ... - Adding player #5 = #5<$\rho^{\mathrm{Learn}}$[BayesUCB, rank ~ BayesUCB]> ... Using this already created player 'player' = #5<$\rho^{\mathrm{Learn}}$[BayesUCB, rank ~ BayesUCB]> ... - Adding player #6 = #6<$\rho^{\mathrm{Learn}}$[BayesUCB, rank ~ BayesUCB]> ... Using this already created player 'player' = #6<$\rho^{\mathrm{Learn}}$[BayesUCB, rank ~ BayesUCB]> ... Estimated order by the policy #1<$\rho^{\mathrm{Learn}}$[BayesUCB, rank: 6 ~ BayesUCB]> after 10000 steps: [ 0 1 2 3 5 9 10 7 8 4 6 12 15 11 16 13 14] ... ==> Optimal arm identification: 100.00% (relative success)... ==> Manhattan distance from optimal ordering: 79.24% (relative success)... ==> Kendell Tau distance from optimal ordering: 99.99% (relative success)... ==> Spearman distance from optimal ordering: 100.00% (relative success)... ==> Gestalt distance from optimal ordering: 58.82% (relative success)... ==> Mean distance from optimal ordering: 84.51% (relative success)... Estimated order by the policy #2<$\rho^{\mathrm{Learn}}$[BayesUCB, rank: 6 ~ BayesUCB]> after 10000 steps: [10 6 0 1 2 3 5 7 9 8 4 15 16 12 14 11 13] ... ==> Optimal arm identification: 100.00% (relative success)... ==> Manhattan distance from optimal ordering: 66.78% (relative success)... ==> Kendell Tau distance from optimal ordering: 99.70% (relative success)... ==> Spearman distance from optimal ordering: 99.85% (relative success)... ==> Gestalt distance from optimal ordering: 52.94% (relative success)... ==> Mean distance from optimal ordering: 79.82% (relative success)... Estimated order by the policy #3<$\rho^{\mathrm{Learn}}$[BayesUCB, rank: 6 ~ BayesUCB]> after 10000 steps: [ 8 4 3 0 1 2 6 7 10 5 9 11 13 15 12 16 14] ... ==> Optimal arm identification: 100.00% (relative success)... ==> Manhattan distance from optimal ordering: 75.09% (relative success)... ==> Kendell Tau distance from optimal ordering: 99.98% (relative success)... ==> Spearman distance from optimal ordering: 100.00% (relative success)... ==> Gestalt distance from optimal ordering: 58.82% (relative success)... ==> Mean distance from optimal ordering: 83.47% (relative success)... Estimated order by the policy #4<$\rho^{\mathrm{Learn}}$[BayesUCB, rank: 6 ~ BayesUCB]> after 10000 steps: [ 0 1 2 3 5 8 10 7 9 4 6 13 11 15 16 12 14] ... ==> Optimal arm identification: 100.00% (relative success)... ==> Manhattan distance from optimal ordering: 79.24% (relative success)... ==> Kendell Tau distance from optimal ordering: 100.00% (relative success)... ==> Spearman distance from optimal ordering: 100.00% (relative success)... ==> Gestalt distance from optimal ordering: 58.82% (relative success)... ==> Mean distance from optimal ordering: 84.51% (relative success)... Estimated order by the policy #5<$\rho^{\mathrm{Learn}}$[BayesUCB, rank: 6 ~ BayesUCB]> after 10000 steps: [ 0 1 2 3 4 10 6 9 5 8 7 16 15 14 12 13 11] ... ==> Optimal arm identification: 100.00% (relative success)... ==> Manhattan distance from optimal ordering: 77.85% (relative success)... ==> Kendell Tau distance from optimal ordering: 99.97% (relative success)... ==> Spearman distance from optimal ordering: 100.00% (relative success)... ==> Gestalt distance from optimal ordering: 47.06% (relative success)... ==> Mean distance from optimal ordering: 81.22% (relative success)... Estimated order by the policy #6<$\rho^{\mathrm{Learn}}$[BayesUCB, rank: 6 ~ BayesUCB]> after 10000 steps: [ 4 9 10 5 0 1 2 3 7 8 6 14 12 15 13 11 16] ... ==> Optimal arm identification: 100.00% (relative success)... ==> Manhattan distance from optimal ordering: 62.63% (relative success)... ==> Kendell Tau distance from optimal ordering: 99.61% (relative success)... ==> Spearman distance from optimal ordering: 99.74% (relative success)... ==> Gestalt distance from optimal ordering: 52.94% (relative success)... ==> Mean distance from optimal ordering: 78.73% (relative success)... Giving the final ranks ... Final ranking for this environment #0 : - Player #6, '#6<$\rho^{\mathrm{Learn}}$[BayesUCB, rank ~ BayesUCB]>' was ranked 1 / 6 for this simulation (last rewards = 7084.5). - Player #2, '#2<$\rho^{\mathrm{Learn}}$[BayesUCB, rank ~ BayesUCB]>' was ranked 2 / 6 for this simulation (last rewards = 7050.55). - Player #3, '#3<$\rho^{\mathrm{Learn}}$[BayesUCB, rank ~ BayesUCB]>' was ranked 3 / 6 for this simulation (last rewards = 6922.5). - Player #1, '#1<$\rho^{\mathrm{Learn}}$[BayesUCB, rank ~ BayesUCB]>' was ranked 4 / 6 for this simulation (last rewards = 6891.45). - Player #5, '#5<$\rho^{\mathrm{Learn}}$[BayesUCB, rank ~ BayesUCB]>' was ranked 5 / 6 for this simulation (last rewards = 6873.05). - Player #4, '#4<$\rho^{\mathrm{Learn}}$[BayesUCB, rank ~ BayesUCB]>' was ranked 6 / 6 for this simulation (last rewards = 6812.1). - Plotting the decentralized rewards - Plotting the centralized fairness (STD) - Plotting the centralized regret Difference between regret and sum of three terms: [ -0.1495 -0.1595 -0.0795 ..., -62.97225 -62.96325 -62.92575] - For 6 players, Anandtharam et al. centralized lower-bound gave = 12 ... - For 6 players, our lower bound gave = 71.8 ... - For 6 players, the initial lower bound in Theorem 6 from [Anandkumar et al., 2010] gave = 54.3 ... This MAB problem has: - a [Lai & Robbins] complexity constant C(mu) = 66.4 for 1-player problem ... - a Optimal Arm Identification factor H_OI(mu) = 56.88% ... - [Anandtharam et al] centralized lowerbound = 71.8, - Our decentralized lowerbound = 54.3, - [Anandkumar et al] decentralized lowerbound = 12 - Plotting the centralized regret Difference between regret and sum of three terms: [ 4.49775 4.32675 3.93 ..., -62.97225 -62.96325 -62.92575] - For 6 players, Anandtharam et al. centralized lower-bound gave = 12 ... - For 6 players, our lower bound gave = 71.8 ... - For 6 players, the initial lower bound in Theorem 6 from [Anandkumar et al., 2010] gave = 54.3 ... This MAB problem has: - a [Lai & Robbins] complexity constant C(mu) = 66.4 for 1-player problem ... - a Optimal Arm Identification factor H_OI(mu) = 56.88% ... - [Anandtharam et al] centralized lowerbound = 71.8, - Our decentralized lowerbound = 54.3, - [Anandkumar et al] decentralized lowerbound = 12 - Plotting the centralized regret Difference between regret and sum of three terms: [ -0.1495 -0.1595 -0.0795 ..., -62.97225 -62.96325 -62.92575] - For 6 players, Anandtharam et al. centralized lower-bound gave = 12 ... - For 6 players, our lower bound gave = 71.8 ... - For 6 players, the initial lower bound in Theorem 6 from [Anandkumar et al., 2010] gave = 54.3 ... This MAB problem has: - a [Lai & Robbins] complexity constant C(mu) = 66.4 for 1-player problem ... - a Optimal Arm Identification factor H_OI(mu) = 56.88% ... - [Anandtharam et al] centralized lowerbound = 71.8, - Our decentralized lowerbound = 54.3, - [Anandkumar et al] decentralized lowerbound = 12 - Plotting the centralized regret Difference between regret and sum of three terms: [ 4.49775 4.32675 3.93 ..., -62.97225 -62.96325 -62.92575] - For 6 players, Anandtharam et al. centralized lower-bound gave = 12 ... - For 6 players, our lower bound gave = 71.8 ... - For 6 players, the initial lower bound in Theorem 6 from [Anandkumar et al., 2010] gave = 54.3 ... This MAB problem has: - a [Lai & Robbins] complexity constant C(mu) = 66.4 for 1-player problem ... - a Optimal Arm Identification factor H_OI(mu) = 56.88% ... - [Anandtharam et al] centralized lowerbound = 71.8, - Our decentralized lowerbound = 54.3, - [Anandkumar et al] decentralized lowerbound = 12 - Plotting the cumulative number of switches - Plotting the probability of picking the best arm - Plotting the cumulated total nb of collision as a function of time No upper bound for the non-cumulated number of collisions... - Plotting the frequency of collision in each arm - For #$0$: $B(0.005)$ ($0.0%$$\%$), frequency of collisions is 1e-05 ... - For #$1$: $B(0.01)$ ($0.0%$$\%$), frequency of collisions is 1.5e-05 ... - For #$2$: $B(0.015)$ ($0.0%$$\%$), frequency of collisions is 2.58333e-05 ... - For #$3$: $B(0.02)$ ($0.0%$$\%$), frequency of collisions is 1.83333e-05 ... - For #$4$: $B(0.3)$ ($0.0%$$\%$), frequency of collisions is 0.0003425 ... - For #$5$: $B(0.35)$ ($0.0%$$\%$), frequency of collisions is 0.000103333 ... - For #$6$: $B(0.4)$ ($0.0%$$\%$), frequency of collisions is 0.000238333 ... - For #$7$: $B(0.45)$ ($0.0%$$\%$), frequency of collisions is 0.00037 ... - For #$8$: $B(0.5)$ ($0.0%$$\%$), frequency of collisions is 0.000385 ... - For #$9$: $B(0.55)$ ($0.1%$$\%$), frequency of collisions is 0.001145 ... - For #$10$: $B(0.6)$ ($0.2%$$\%$), frequency of collisions is 0.00191917 ... - For #$11$: $B(0.78)$ ($3.3%$$\%$), frequency of collisions is 0.0333225 ... - For #$12$: $B(0.8)$ ($3.7%$$\%$), frequency of collisions is 0.0368842 ... - For #$13$: $B(0.82)$ ($1.5%$$\%$), frequency of collisions is 0.0145975 ... - For #$14$: $B(0.83)$ ($2.3%$$\%$), frequency of collisions is 0.0229517 ... - For #$15$: $B(0.84)$ ($1.7%$$\%$), frequency of collisions is 0.0174483 ... - For #$16$: $B(0.85)$ ($1.7%$$\%$), frequency of collisions is 0.0170442 ... Done for simulations main_multiplayers.py ... , - Our decentralized lowerbound = 54.3, - [Anandkumar et al] decentralized lowerbound = 12 - Plotting the centralized regret Difference between regret and sum of three terms: [ 0.1325 0.183 0.4865 ..., -42.1415 -42.218 -42.286 ] - For 6 players, Anandtharam et al. centralized lower-bound gave = 12 ... - For 6 players, our lower bound gave = 71.8 ... - For 6 players, the initial lower bound in Theorem 6 from [Anandkumar et al., 2010] gave = 54.3 ... This MAB problem has: - a [Lai & Robbins] complexity constant C(mu) = 66.4 for 1-player problem ... - a Optimal Arm Identification factor H_OI(mu) = 56.88% ... - [Anandtharam et al] centralized lowerbound = 71.8, - Our decentralized lowerbound = 54.3, - [Anandkumar et al] decentralized lowerbound = 12 - Plotting the centralized regret Difference between regret and sum of three terms: [ -3.06125 -3.10425 -2.6415 ..., -42.1415 -42.218 -42.286 ] - For 6 players, Anandtharam et al. centralized lower-bound gave = 12 ... - For 6 players, our lower bound gave = 71.8 ... - For 6 players, the initial lower bound in Theorem 6 from [Anandkumar et al., 2010] gave = 54.3 ... This MAB problem has: - a [Lai & Robbins] complexity constant C(mu) = 66.4 for 1-player problem ... - a Optimal Arm Identification factor H_OI(mu) = 56.88% ... - [Anandtharam et al] centralized lowerbound = 71.8, - Our decentralized lowerbound = 54.3, - [Anandkumar et al] decentralized lowerbound = 12 - Plotting the cumulative number of switches - Plotting the probability of picking the best arm - Plotting the cumulated total nb of collision as a function of time No upper bound for the non-cumulated number of collisions... - Plotting the frequency of collision in each arm - For #$0$: $B(0.005)$ ($0.0%$$\%$), frequency of collisions is 1.41667e-05 ... - For #$1$: $B(0.01)$ ($0.0%$$\%$), frequency of collisions is 2.83333e-05 ... - For #$2$: $B(0.015)$ ($0.0%$$\%$), frequency of collisions is 1e-05 ... - For #$3$: $B(0.02)$ ($0.0%$$\%$), frequency of collisions is 1.41667e-05 ... - For #$4$: $B(0.3)$ ($0.0%$$\%$), frequency of collisions is 0.000333333 ... - For #$5$: $B(0.35)$ ($0.0%$$\%$), frequency of collisions is 0.000389167 ... - For #$6$: $B(0.4)$ ($0.0%$$\%$), frequency of collisions is 0.0002225 ... - For #$7$: $B(0.45)$ ($0.1%$$\%$), frequency of collisions is 0.0005525 ... - For #$8$: $B(0.5)$ ($0.0%$$\%$), frequency of collisions is 0.0002975 ... - For #$9$: $B(0.55)$ ($0.2%$$\%$), frequency of collisions is 0.00230333 ... - For #$10$: $B(0.6)$ ($0.2%$$\%$), frequency of collisions is 0.00214917 ... - For #$11$: $B(0.78)$ ($4.3%$$\%$), frequency of collisions is 0.0432367 ... - For #$12$: $B(0.8)$ ($2.7%$$\%$), frequency of collisions is 0.0268125 ... - For #$13$: $B(0.82)$ ($2.5%$$\%$), frequency of collisions is 0.0250258 ... - For #$14$: $B(0.83)$ ($2.7%$$\%$), frequency of collisions is 0.0272858 ... - For #$15$: $B(0.84)$ ($2.1%$$\%$), frequency of collisions is 0.0208783 ... - For #$16$: $B(0.85)$ ($1.9%$$\%$), frequency of collisions is 0.0188558 ... Done for simulations main_multiplayers.py ...