Policies.UCBimproved module¶
The UCB-Improved policy for bounded bandits, with knowing the horizon, as an example of successive elimination algorithm.
Reference: [[Auer et al, 2010](https://link.springer.com/content/pdf/10.1007/s10998-010-3055-6.pdf)].
-
Policies.UCBimproved.ALPHA= 0.5¶ Default value for parameter \(\alpha\).
-
Policies.UCBimproved.n_m(horizon, delta_m)[source]¶ Function \(\lceil \frac{2 \log(T \Delta_m^2)}{\Delta_m^2} \rceil\).
-
class
Policies.UCBimproved.UCBimproved(nbArms, horizon=None, alpha=0.5, lower=0.0, amplitude=1.0)[source]¶ Bases:
Policies.SuccessiveElimination.SuccessiveEliminationThe UCB-Improved policy for bounded bandits, with knowing the horizon, as an example of successive elimination algorithm.
Reference: [[Auer et al, 2010](https://link.springer.com/content/pdf/10.1007/s10998-010-3055-6.pdf)].
-
__init__(nbArms, horizon=None, alpha=0.5, lower=0.0, amplitude=1.0)[source]¶ New generic index policy.
nbArms: the number of arms,
lower, amplitude: lower value and known amplitude of the rewards.
-
horizon= None¶ Parameter \(T\) = known horizon of the experiment.
-
alpha= None¶ Parameter alpha
-
activeArms= None¶ Set of active arms
-
estimate_delta= None¶ Current estimate of the gap \(\Delta_0\)
-
current_m= None¶ Current round m
-
max_m= None¶ Bound \(m = \lfloor \frac{1}{2} \log_2(\frac{T}{e}) \rfloor\)
-
when_did_it_leave= None¶ Also keep in memory when the arm was kicked out of the
activeArmssets, so fake index can be given, if we ask to order the arms for instance.
-
choice(recursive=False)[source]¶ In policy based on successive elimination, choosing an arm is the same as choosing an arm from the set of active arms (
self.activeArms) with methodchoiceFromSubSet.
-
__module__= 'Policies.UCBimproved'¶