Policies.GreedyOracle module¶
author: Julien Seznec
Oracle and near-minimax policy for rotting bandits without noise.
Reference: [Heidari et al., 2016, https://www.ijcai.org/Proceedings/16/Papers/224.pdf] Tight Policy Regret Bounds for Improving and Decaying Bandits. Hoda Heidari, Michael Kearns, Aaron Roth. International Joint Conference on Artificial Intelligence (IJCAI) 2016, 1562.
-
class
Policies.GreedyOracle.GreedyPolicy(nbArms)[source]¶ Bases:
Policies.IndexPolicy.IndexPolicyGreedy Policy for rotting bandits (A2 in the reference below). Selects arm with best last value. Reference: [Heidari et al., 2016, https://www.ijcai.org/Proceedings/16/Papers/224.pdf]
-
__init__(nbArms)[source]¶ New generic index policy.
nbArms: the number of arms,
lower, amplitude: lower value and known amplitude of the rewards.
-
getReward(arm, reward)[source]¶ Give a reward: increase t, pulls, and update cumulated sum of rewards for that arm (normalized in [0, 1]).
-
computeAllIndex()[source]¶ Compute the current indexes for all arms. Possibly vectorized, by default it can not be vectorized automatically.
-
__module__= 'Policies.GreedyOracle'¶
-
-
class
Policies.GreedyOracle.GreedyOracle(nbArms, arms)[source]¶ Bases:
Policies.IndexPolicy.IndexPolicyGreedy Oracle for rotting bandits (A0 in the reference below). Look 1 step forward and select next best value. Optimal policy for rotting bandits problem. Reference: [Heidari et al., 2016, https://www.ijcai.org/Proceedings/16/Papers/224.pdf]
-
__init__(nbArms, arms)[source]¶ New generic index policy.
nbArms: the number of arms,
lower, amplitude: lower value and known amplitude of the rewards.
-
__module__= 'Policies.GreedyOracle'¶
-