Policies.OracleSequentiallyRestartPolicy module¶
An oracle policy for non-stationary bandits, restarting an underlying stationary bandit policy at each breakpoint.
It runs on top of a simple policy, e.g.,
UCB
, andOracleSequentiallyRestartPolicy
is a wrapper:>>> policy = OracleSequentiallyRestartPolicy(nbArms, UCB) >>> # use policy as usual, with policy.startGame(), r = policy.choice(), policy.getReward(arm, r)
It uses the knowledge of the breakpoints to restart the underlying algorithm at each breakpoint.
It is very simple but impractical: in any real problem it is impossible to know the locations of the breakpoints, but it acts as an efficient baseline.
Warning
It is an efficient baseline, but it has no reason to be the best algorithm on a given problem (empirically)! I found that Policy.DiscountedThompson.DiscountedThompson
is usually the most efficient.
-
Policies.OracleSequentiallyRestartPolicy.
PER_ARM_RESTART
= True¶ Should we reset one arm empirical average or all? Default is
False
for this algorithm.
-
Policies.OracleSequentiallyRestartPolicy.
FULL_RESTART_WHEN_REFRESH
= False¶ Should we fully restart the algorithm or simply reset one arm empirical average? Default is
False
, it’s usually more efficient!
-
Policies.OracleSequentiallyRestartPolicy.
RESET_FOR_ALL_CHANGE
= False¶ True
if the algorithm reset one/all arm memories when a change occur on any arm.False`
if the algorithms only resets one arm memories when a change occur on this arm (needs to knowlistOfMeans
) (default, it should be more efficient).
-
Policies.OracleSequentiallyRestartPolicy.
RESET_FOR_SUBOPTIMAL_CHANGE
= True¶ True
if the algorithms resets memories of this arm no matter if it stays optimal/suboptimal (default, it should be more efficient).False
if the algorithm reset memories only when a change make the previously best arm become suboptimal.
-
class
Policies.OracleSequentiallyRestartPolicy.
OracleSequentiallyRestartPolicy
(nbArms, changePoints=None, listOfMeans=None, reset_for_all_change=False, reset_for_suboptimal_change=True, full_restart_when_refresh=False, per_arm_restart=True, *args, **kwargs)[source]¶ Bases:
Policies.BaseWrapperPolicy.BaseWrapperPolicy
An oracle policy for non-stationary bandits, restarting an underlying stationary bandit policy at each breakpoint.
-
__init__
(nbArms, changePoints=None, listOfMeans=None, reset_for_all_change=False, reset_for_suboptimal_change=True, full_restart_when_refresh=False, per_arm_restart=True, *args, **kwargs)[source]¶ New policy.
-
reset_for_all_change
= None¶
-
reset_for_suboptimal_change
= None¶
-
changePoints
= None¶ Locations of the break points (or change points) of the switching bandit problem, for each arm. If
None
, an empty list is used.
-
all_rewards
= None¶ Keep in memory all the rewards obtained since the last restart on that arm.
-
last_pulls
= None¶ Keep in memory the times where each arm was last seen. Start with -1 (never seen)
-
compute_optimized_changePoints
(changePoints=None, listOfMeans=None)[source]¶ Compute the list of change points for each arm.
If
reset_for_all_change
isTrue
, all change points concern all arms (sub optimal)!- If
reset_for_all_change
isFalse
, If
reset_for_suboptimal_change
isTrue
, all change points were the mean of an arm change concern it (still sub optimal)!If
reset_for_suboptimal_change
isFalse
, only the change points were an arm goes from optimal to sub-optimal or sub-optimal to optimal concern it (optimal!)!
- If
-
__module__
= 'Policies.OracleSequentiallyRestartPolicy'¶
-