PoliciesMultiPlayers : contains various collision-avoidance protocol for the multi-players setting.
Selfish: a multi-player policy where every player is selfish, they do not try to handle the collisions.
CentralizedNotFair: a multi-player policy which uses a centralize intelligence to affect users to a FIXED arm.
CentralizedFair: a multi-player policy which uses a centralize intelligence to affect users an offset, each one take an orthogonal arm based on (offset + t) % nbArms.
OracleNotFair: a multi-player policy with full knowledge and centralized intelligence to affect users to a FIXED arm, among the best arms.
OracleFair: a multi-player policy which uses a centralized intelligence to affect users an offset, each one take an orthogonal arm based on (offset + t) % nbBestArms, among the best arms.
ALOHA: implementation of generic collision avoidance algorithms, relying on a single-player bandit policy (eg.
Thompsonetc). And variants,
rhoCentralizedis a semi-centralized version where orthogonal ranks 1..M are given to the players, instead of just giving them the value of M, but a decentralized learning policy is still used to learn the best arms.
All policies have the same interface, as described in
BaseMPPolicy for decentralized policies,
BaseCentralizedPolicy for centralized policies,
in order to use them in any experiment with the following approach:
my_policy_MP = Policy_MP(nbPlayers, nbArms) children = my_policy_MP.children # get a list of usable single-player policies for one_policy in children: one_policy.startGame() # start the game for t in range(T): for i in range(nbPlayers): k_t[i] = children[i].choice() # chose one arm, for each player for k in range(nbArms): players_who_played_k = [ k_t[i] for i in range(nbPlayers) if k_t[i] == k ] reward = reward_t[k] = sampled from the arm k # sample a reward if len(players_who_played_k) > 1: reward = 0 for i in players_who_played_k: children[i].getReward(k, reward)
- PoliciesMultiPlayers.ALOHA module
- PoliciesMultiPlayers.BaseCentralizedPolicy module
- PoliciesMultiPlayers.BaseMPPolicy module
- PoliciesMultiPlayers.CentralizedCycling module
- PoliciesMultiPlayers.CentralizedFixed module
- PoliciesMultiPlayers.CentralizedIMP module
- PoliciesMultiPlayers.CentralizedMultiplePlay module
- PoliciesMultiPlayers.ChildPointer module
- PoliciesMultiPlayers.DepRound module
- PoliciesMultiPlayers.EstimateM module
- PoliciesMultiPlayers.OracleFair module
- PoliciesMultiPlayers.OracleNotFair module
- PoliciesMultiPlayers.RandTopM module
- PoliciesMultiPlayers.RandTopMEst module
- PoliciesMultiPlayers.Scenario1 module
- PoliciesMultiPlayers.Selfish module
- PoliciesMultiPlayers.rhoCentralized module
- PoliciesMultiPlayers.rhoEst module
- PoliciesMultiPlayers.rhoLearn module
- PoliciesMultiPlayers.rhoLearnEst module
- PoliciesMultiPlayers.rhoLearnExp3 module
- PoliciesMultiPlayers.rhoRand module
- PoliciesMultiPlayers.rhoRandALOHA module
- PoliciesMultiPlayers.rhoRandRand module
- PoliciesMultiPlayers.rhoRandRotating module
- PoliciesMultiPlayers.rhoRandSticky module
- PoliciesMultiPlayers.with_proba module