Policies.DoublingTrickWrapper module

A policy that acts as a wrapper on another policy P, assumed to be horizon dependent (has to known \(T\)), by implementing a “doubling trick”:

  • starts to assume that \(T=T_0=1000\), and run the policy \(P(T_0)\), from \(t=1\) to \(t=T_0\),
  • if \(t > T_0\), then the “doubling trick” is performed, by either re-initializing or just changing the parameter horizon of the policy P, for instance with \(T_2 = 10 \times T_0\),
  • and keep doing this until \(t = T\).

Note

This is implemented in a very generic way, with simply a function next_horizon(horizon) that gives the next horizon to try when crossing the current guess. It can be a simple linear function (next_horizon(horizon) = horizon + 100), a geometric growth to have the “real” doubling trick (next_horizon(horizon) = horizon * 10), or even functions growing exponentially fast (next_horizon(horizon) = horizon ** 1.1, next_horizon(horizon) = horizon ** 1.5, next_horizon(horizon) = horizon ** 2).

Note

My guess is that this “doubling trick” wrapping policy can only be efficient (for stochastic problems) if:

  • the underlying policy P is a very efficient horizon-dependent algorithm, e.g., the Policies.ApproximatedFHGittins,
  • the growth function next_horizon is growing faster than any geometric rate, so that the number of refresh is \(o(\log T)\) and not \(O(\log T)\).

See also

Reference: [[What the Doubling Trick Can or Can’t Do for Multi-Armed Bandits, Lilian Besson and Emilie Kaufmann, 2018]](https://hal.inria.fr/hal-01736357), to be presented soon.

Warning

Interface: If FULL_RESTART=False (default), the underlying algorithm is recreated at every breakpoint, instead its attribute horizon or _horizon is updated. Be sure that this is enough to really change the internal value used by the policy. Some policy use T only once to compute others parameters, which should be updated as well. A manual implementation of the __setattr__ method can help.

Policies.DoublingTrickWrapper.default_horizonDependent_policy

alias of Policies.UCBH.UCBH

Policies.DoublingTrickWrapper.FULL_RESTART = False

Default constant to know what to do when restarting the underlying policy with a new horizon parameter.

  • True means that a new policy, initialized from scratch, will be created at every breakpoint.
  • False means that the same policy object is used but just its attribute horizon is updated (default).
Policies.DoublingTrickWrapper.DEFAULT_FIRST_HORIZON = 200

Default horizon, used for the first step.

Policies.DoublingTrickWrapper.ARITHMETIC_STEP = 200

Default stepsize for the arithmetic horizon progression.

Policies.DoublingTrickWrapper.next_horizon__arithmetic[source]

The arithmetic horizon progression function:

\[\begin{split}T &\mapsto T + 100,\\ T_i &:= T_0 + 100 \times i.\end{split}\]
Policies.DoublingTrickWrapper.GEOMETRIC_STEP = 2

Default multiplicative constant for the geometric horizon progression.

Policies.DoublingTrickWrapper.next_horizon__geometric[source]

The geometric horizon progression function:

\[\begin{split}T &\mapsto T \times 2,\\ T_i &:= T_0 2^i.\end{split}\]
Policies.DoublingTrickWrapper.EXPONENTIAL_STEP = 1.5

Default exponential constant for the exponential horizon progression.

Policies.DoublingTrickWrapper.next_horizon__exponential[source]

The exponential horizon progression function:

\[\begin{split}T &\mapsto \left\lfloor T^{1.5} \right\rfloor,\\ T_i &:= \left\lfloor T_0^{1.5^i} \right\rfloor.\end{split}\]
Policies.DoublingTrickWrapper.SLOW_EXPONENTIAL_STEP = 1.1

Default exponential constant for the slow exponential horizon progression.

Policies.DoublingTrickWrapper.next_horizon__exponential_slow[source]

The exponential horizon progression function:

\[\begin{split}T &\mapsto \left\lfloor T^{1.1} \right\rfloor,\\ T_i &:= \left\lfloor T_0^{1.1^i} \right\rfloor.\end{split}\]
Policies.DoublingTrickWrapper.FAST_EXPONENTIAL_STEP = 2

Default exponential constant for the fast exponential horizon progression.

Policies.DoublingTrickWrapper.next_horizon__exponential_fast[source]

The exponential horizon progression function:

\[\begin{split}T &\mapsto \lfloor T^{2} \rfloor,\\ T_i &:= \lfloor T_0^{2^i} \rfloor.\end{split}\]
Policies.DoublingTrickWrapper.ALPHA = 2

Default constant \(\alpha\) for the generic exponential sequence.

Policies.DoublingTrickWrapper.BETA = 2

Default constant \(\beta\) for the generic exponential sequence.

Policies.DoublingTrickWrapper.next_horizon__exponential_generic(i, horizon)[source]

The generic exponential horizon progression function:

\[T_i := \left\lfloor \frac{T_0}{a} a^{b^i} \right\rfloor.\]
Policies.DoublingTrickWrapper.default_next_horizon

The exponential horizon progression function:

\[\begin{split}T &\mapsto \left\lfloor T^{1.1} \right\rfloor,\\ T_i &:= \left\lfloor T_0^{1.1^i} \right\rfloor.\end{split}\]
Policies.DoublingTrickWrapper.breakpoints(next_horizon, first_horizon, horizon, debug=False)[source]

Return the list of restart point (breakpoints), if starting from first_horizon to horizon with growth function next_horizon.

  • Also return the gap between the last guess for horizon and the true horizon. This gap should not be too large.
  • Nicely print all the values if debug=True.
  • First examples:
>>> first_horizon = 1000
>>> horizon = 30000
>>> breakpoints(next_horizon__arithmetic, first_horizon, horizon)  # doctest: +ELLIPSIS
([1000, 1200, 1400, ..., 29800, 30000], 0)
>>> breakpoints(next_horizon__geometric, first_horizon, horizon)
([1000, 2000, 4000, 8000, 16000, 32000], 2000)
>>> breakpoints(next_horizon__exponential, first_horizon, horizon)
([1000, 31622], 1622)
>>> breakpoints(next_horizon__exponential_slow, first_horizon, horizon)
([1000, 1995, 4265, 9838, 24671, 67827], 37827)
>>> breakpoints(next_horizon__exponential_fast, first_horizon, horizon)
([1000, 1000000], 970000)
  • Second examples:
>>> first_horizon = 5000
>>> horizon = 1000000
>>> breakpoints(next_horizon__arithmetic, first_horizon, horizon)  # doctest: +ELLIPSIS
([5000, 5200, ..., 999600, 999800, 1000000], 0)
>>> breakpoints(next_horizon__geometric, first_horizon, horizon)
([5000, 10000, 20000, 40000, 80000, 160000, 320000, 640000, 1280000], 280000)
>>> breakpoints(next_horizon__exponential, first_horizon, horizon)
([5000, 353553, 210223755], 209223755)
>>> breakpoints(next_horizon__exponential_slow, first_horizon, horizon)
([5000, 11718, 29904, 83811, 260394, 906137, 3572014], 2572014)
>>> breakpoints(next_horizon__exponential_fast, first_horizon, horizon)
([5000, 25000000], 24000000)
  • Third examples:
>>> first_horizon = 10
>>> horizon = 1123456
>>> breakpoints(next_horizon__arithmetic, first_horizon, horizon)  # doctest: +ELLIPSIS
([10, 210, 410, ..., 1123210, 1123410, 1123610], 154)
>>> breakpoints(next_horizon__geometric, first_horizon, horizon)
([10, 20, 40, 80, 160, 320, 640, 1280, 2560, 5120, 10240, 20480, 40960, 81920, 163840, 327680, 655360, 1310720], 187264)
>>> breakpoints(next_horizon__exponential, first_horizon, horizon)
([10, 31, 172, 2255, 107082, 35040856], 33917400)
>>> breakpoints(next_horizon__exponential_slow, first_horizon, horizon)
([10, 12, 15, 19, 25, 34, 48, 70, 107, 170, 284, 499, 928, 1837, 3895, 8903, 22104, 60106, 180638, 606024, 2294768], 1171312)
>>> breakpoints(next_horizon__exponential_fast, first_horizon, horizon)
([10, 100, 10000, 100000000], 98876544)
Policies.DoublingTrickWrapper.constant_c_for_the_functions_f = 0.5

The constant c in front of the function f.

Policies.DoublingTrickWrapper.function_f__for_geometric_sequences(i, c=0.5)[source]

For the geometric doubling sequences, \(f(i) = c \times \log(i)\).

Policies.DoublingTrickWrapper.function_f__for_exponential_sequences(i, c=0.5)[source]

For the exponential doubling sequences, \(f(i) = c \times i\).

Policies.DoublingTrickWrapper.function_f__for_generic_sequences(i, c=0.5, d=0.5, e=0.0)[source]

For a certain generic family of doubling sequences, \(f(i) = c \times i^{d} \times (\log(i))^{e}\).

Warning

d should most probably be smaller than 1.

Policies.DoublingTrickWrapper.function_f__for_intermediate_sequences(i)[source]
Policies.DoublingTrickWrapper.function_f__for_intermediate2_sequences(i)[source]
Policies.DoublingTrickWrapper.function_f__for_intermediate3_sequences(i)[source]
Policies.DoublingTrickWrapper.function_f__for_intermediate4_sequences(i)[source]
Policies.DoublingTrickWrapper.function_f__for_intermediate5_sequences(i)[source]
Policies.DoublingTrickWrapper.alpha_for_Ti = 0.5

Value of the parameter \(\alpha\) for the Ti_from_f() function.

Policies.DoublingTrickWrapper.Ti_from_f(f, alpha=0.5, *args, **kwargs)[source]

For any non-negative and increasing function \(f: i \mapsto f(i)\), the corresponding sequence is defined by:

\[\forall i\in\mathbb{N},\; T_i := \lfloor \exp(\alpha \times \exp(f(i))) \rfloor.\]

Warning

\(f(i)\) can need other parameters, see the examples above. They can be given as *args or **kwargs to Ti_from_f().

Warning

it should be computed otherwise, I should give \(i \mapsto \exp(f(i))\) instead of \(f: i \mapsto f(i)\). I need to try as much as possible to reduce the risk of overflow errors!

Policies.DoublingTrickWrapper.Ti_geometric(i, horizon, alpha=0.5, first_horizon=200, *args, **kwargs)[source]

Sequence \(T_i\) generated from the function \(f\) = function_f__for_geometric_sequences().

Policies.DoublingTrickWrapper.Ti_exponential(i, horizon, alpha=0.5, first_horizon=200, *args, **kwargs)[source]

Sequence \(T_i\) generated from the function \(f\) = function_f__for_exponential_sequences().

Policies.DoublingTrickWrapper.Ti_intermediate_sqrti(i, horizon, alpha=0.5, first_horizon=200, *args, **kwargs)[source]

Sequence \(T_i\) generated from the function \(f\) = function_f__for_intermediate_sequences().

Policies.DoublingTrickWrapper.Ti_intermediate_i13(i, horizon, alpha=0.5, first_horizon=200, *args, **kwargs)[source]

Sequence \(T_i\) generated from the function \(f\) = function_f__for_intermediate2_sequences().

Policies.DoublingTrickWrapper.Ti_intermediate_i23(i, horizon, alpha=0.5, first_horizon=200, *args, **kwargs)[source]

Sequence \(T_i\) generated from the function \(f\) = function_f__for_intermediate3_sequences().

Policies.DoublingTrickWrapper.Ti_intermediate_i12_logi12(i, horizon, alpha=0.5, first_horizon=200, *args, **kwargs)[source]

Sequence \(T_i\) generated from the function \(f\) = function_f__for_intermediate4_sequences().

Policies.DoublingTrickWrapper.Ti_intermediate_i_by_logi(i, horizon, alpha=0.5, first_horizon=200, *args, **kwargs)[source]

Sequence \(T_i\) generated from the function \(f\) = function_f__for_intermediate5_sequences().

Policies.DoublingTrickWrapper.last_term_operator_LT(Ti, max_i=10000)[source]

For a certain function representing a doubling sequence, \(T: i \mapsto T_i\), this last_term_operator_LT() function returns the function \(L: T \mapsto L_T\), defined as:

\[\forall T\in\mathbb{N},\; L_T := \min\{ i \in\mathbb{N},\; T \leq T_i \}.\]

\(L_T\) is the only integer which satisfies \(T_{L_T - 1} < T \leq T_{L_T}\).

Policies.DoublingTrickWrapper.plot_doubling_sequences(i_min=1, i_max=30, list_of_f=(<function function_f__for_geometric_sequences>, <function function_f__for_intermediate_sequences>, <function function_f__for_intermediate2_sequences>, <function function_f__for_intermediate3_sequences>, <function function_f__for_intermediate4_sequences>, <function function_f__for_exponential_sequences>), label_of_f=('Geometric doubling (d=0, e=1)', 'Intermediate doubling (d=1/2, e=0)', 'Intermediate doubling (d=1/3, e=0)', 'Intermediate doubling (d=2/3, e=0)', 'Intermediate doubling (d=1/2, e=1/2)', 'Exponential doubling (d=1, e=0)'), *args, **kwargs)[source]

Display a plot to illustrate the values of the \(T_i\) as a function of \(i\) for some i.

  • Can accept many functions f (and labels).
Policies.DoublingTrickWrapper.plot_quality_first_upper_bound(Tmin=10, Tmax=100000000, nbTs=100, gamma=0.0, delta=1.0, list_of_f=(<function function_f__for_geometric_sequences>, <function function_f__for_intermediate_sequences>, <function function_f__for_intermediate2_sequences>, <function function_f__for_intermediate3_sequences>, <function function_f__for_intermediate4_sequences>, <function function_f__for_exponential_sequences>), label_of_f=('Geometric doubling (d=0, e=1)', 'Intermediate doubling (d=1/2, e=0)', 'Intermediate doubling (d=1/3, e=0)', 'Intermediate doubling (d=2/3, e=0)', 'Intermediate doubling (d=1/2, e=1/2)', 'Exponential doubling (d=1, e=0)'), show_Ti_m_Tim1=True, *args, **kwargs)[source]

Display a plot to compare numerically between the following sum \(S\) and the upper-bound we hope to have, \(T^{\gamma} (\log T)^{\delta}\), as a function of \(T\) for some values between \(T_{\min}\) and \(T_{\max}\):

\[S := \sum_{i=0}^{L_T} (T_i - T_{i-1})^{\gamma} (\log (T_i - T_{i-1}))^{\delta}.\]
  • Can accept many functions f (and labels).
  • Can use \(T_i\) instead of \(T_i - T_{i-1}\) if show_Ti_m_Tim1=False (default is to use the smaller possible bound, with difference of sequence lengths, \(T_i - T_{i-1}\)).

Warning

This is still ON GOING WORK.

Policies.DoublingTrickWrapper.MAX_NB_OF_TRIALS = 500

If the sequence Ti does not grow enough, artificially increase i until T_inext > T_i

class Policies.DoublingTrickWrapper.DoublingTrickWrapper(nbArms, full_restart=False, policy=<class 'Policies.UCBH.UCBH'>, next_horizon=CPUDispatcher(<function next_horizon__exponential_slow>), first_horizon=200, *args, **kwargs)[source]

Bases: Policies.BaseWrapperPolicy.BaseWrapperPolicy

A policy that acts as a wrapper on another policy P, assumed to be horizon dependent (has to known \(T\)), by implementing a “doubling trick”.

  • Reference: [[What the Doubling Trick Can or Can’t Do for Multi-Armed Bandits, Lilian Besson and Emilie Kaufmann, 2018]](https://hal.inria.fr/hal-01736357), to be presented soon.
__init__(nbArms, full_restart=False, policy=<class 'Policies.UCBH.UCBH'>, next_horizon=CPUDispatcher(<function next_horizon__exponential_slow>), first_horizon=200, *args, **kwargs)[source]

New policy.

full_restart = None

Constant to know how to refresh the underlying policy.

__module__ = 'Policies.DoublingTrickWrapper'
next_horizon_name = None

Pretty string of the name of this growing function

horizon = None

Last guess for the horizon

__str__()[source]

-> str

startGame()[source]

Initialize the policy for a new game.

getReward(arm, reward)[source]

Pass the reward, as usual, update t and sometimes restart the underlying policy.