# Policies.DoublingTrickWrapper module¶

A policy that acts as a wrapper on another policy P, assumed to be horizon dependent (has to known $$T$$), by implementing a “doubling trick”:

• starts to assume that $$T=T_0=1000$$, and run the policy $$P(T_0)$$, from $$t=1$$ to $$t=T_0$$,

• if $$t > T_0$$, then the “doubling trick” is performed, by either re-initializing or just changing the parameter horizon of the policy P, for instance with $$T_2 = 10 \times T_0$$,

• and keep doing this until $$t = T$$.

Note

This is implemented in a very generic way, with simply a function next_horizon(horizon) that gives the next horizon to try when crossing the current guess. It can be a simple linear function (next_horizon(horizon) = horizon + 100), a geometric growth to have the “real” doubling trick (next_horizon(horizon) = horizon * 10), or even functions growing exponentially fast (next_horizon(horizon) = horizon ** 1.1, next_horizon(horizon) = horizon ** 1.5, next_horizon(horizon) = horizon ** 2).

Note

My guess is that this “doubling trick” wrapping policy can only be efficient (for stochastic problems) if:

• the underlying policy P is a very efficient horizon-dependent algorithm, e.g., the Policies.ApproximatedFHGittins,

• the growth function next_horizon is growing faster than any geometric rate, so that the number of refresh is $$o(\log T)$$ and not $$O(\log T)$$.

Reference: [[What the Doubling Trick Can or Can’t Do for Multi-Armed Bandits, Lilian Besson and Emilie Kaufmann, 2018]](https://hal.inria.fr/hal-01736357), to be presented soon.

Warning

Interface: If FULL_RESTART=False (default), the underlying algorithm is recreated at every breakpoint, instead its attribute horizon or _horizon is updated. Be sure that this is enough to really change the internal value used by the policy. Some policy use T only once to compute others parameters, which should be updated as well. A manual implementation of the __setattr__ method can help.

Policies.DoublingTrickWrapper.default_horizonDependent_policy

alias of Policies.UCBH.UCBH

Policies.DoublingTrickWrapper.FULL_RESTART = False

Default constant to know what to do when restarting the underlying policy with a new horizon parameter.

• True means that a new policy, initialized from scratch, will be created at every breakpoint.

• False means that the same policy object is used but just its attribute horizon is updated (default).

Policies.DoublingTrickWrapper.DEFAULT_FIRST_HORIZON = 200

Default horizon, used for the first step.

Policies.DoublingTrickWrapper.ARITHMETIC_STEP = 200

Default stepsize for the arithmetic horizon progression.

Policies.DoublingTrickWrapper.next_horizon__arithmetic[source]

The arithmetic horizon progression function:

$\begin{split}T &\mapsto T + 100,\\ T_i &:= T_0 + 100 \times i.\end{split}$
Policies.DoublingTrickWrapper.GEOMETRIC_STEP = 2

Default multiplicative constant for the geometric horizon progression.

Policies.DoublingTrickWrapper.next_horizon__geometric[source]

The geometric horizon progression function:

$\begin{split}T &\mapsto T \times 2,\\ T_i &:= T_0 2^i.\end{split}$
Policies.DoublingTrickWrapper.EXPONENTIAL_STEP = 1.5

Default exponential constant for the exponential horizon progression.

Policies.DoublingTrickWrapper.next_horizon__exponential[source]

The exponential horizon progression function:

$\begin{split}T &\mapsto \left\lfloor T^{1.5} \right\rfloor,\\ T_i &:= \left\lfloor T_0^{1.5^i} \right\rfloor.\end{split}$
Policies.DoublingTrickWrapper.SLOW_EXPONENTIAL_STEP = 1.1

Default exponential constant for the slow exponential horizon progression.

Policies.DoublingTrickWrapper.next_horizon__exponential_slow[source]

The exponential horizon progression function:

$\begin{split}T &\mapsto \left\lfloor T^{1.1} \right\rfloor,\\ T_i &:= \left\lfloor T_0^{1.1^i} \right\rfloor.\end{split}$
Policies.DoublingTrickWrapper.FAST_EXPONENTIAL_STEP = 2

Default exponential constant for the fast exponential horizon progression.

Policies.DoublingTrickWrapper.next_horizon__exponential_fast[source]

The exponential horizon progression function:

$\begin{split}T &\mapsto \lfloor T^{2} \rfloor,\\ T_i &:= \lfloor T_0^{2^i} \rfloor.\end{split}$
Policies.DoublingTrickWrapper.ALPHA = 2

Default constant $$\alpha$$ for the generic exponential sequence.

Policies.DoublingTrickWrapper.BETA = 2

Default constant $$\beta$$ for the generic exponential sequence.

Policies.DoublingTrickWrapper.next_horizon__exponential_generic(i, horizon)[source]

The generic exponential horizon progression function:

$T_i := \left\lfloor \frac{T_0}{a} a^{b^i} \right\rfloor.$
Policies.DoublingTrickWrapper.default_next_horizon

The exponential horizon progression function:

$\begin{split}T &\mapsto \left\lfloor T^{1.1} \right\rfloor,\\ T_i &:= \left\lfloor T_0^{1.1^i} \right\rfloor.\end{split}$
Policies.DoublingTrickWrapper.breakpoints(next_horizon, first_horizon, horizon, debug=False)[source]

Return the list of restart point (breakpoints), if starting from first_horizon to horizon with growth function next_horizon.

• Also return the gap between the last guess for horizon and the true horizon. This gap should not be too large.

• Nicely print all the values if debug=True.

• First examples:

>>> first_horizon = 1000
>>> horizon = 30000
>>> breakpoints(next_horizon__arithmetic, first_horizon, horizon)
([1000, 1200, 1400, ..., 29800, 30000], 0)
>>> breakpoints(next_horizon__geometric, first_horizon, horizon)
([1000, 2000, 4000, 8000, 16000, 32000], 2000)
>>> breakpoints(next_horizon__exponential, first_horizon, horizon)
([1000, 31622], 1622)
>>> breakpoints(next_horizon__exponential_slow, first_horizon, horizon)
([1000, 1995, 4265, 9838, 24671, 67827], 37827)
>>> breakpoints(next_horizon__exponential_fast, first_horizon, horizon)
([1000, 1000000], 970000)

• Second examples:

>>> first_horizon = 5000
>>> horizon = 1000000
>>> breakpoints(next_horizon__arithmetic, first_horizon, horizon)
([5000, 5200, ..., 999600, 999800, 1000000], 0)
>>> breakpoints(next_horizon__geometric, first_horizon, horizon)
([5000, 10000, 20000, 40000, 80000, 160000, 320000, 640000, 1280000], 280000)
>>> breakpoints(next_horizon__exponential, first_horizon, horizon)
([5000, 353553, 210223755], 209223755)
>>> breakpoints(next_horizon__exponential_slow, first_horizon, horizon)
([5000, 11718, 29904, 83811, 260394, 906137, 3572014], 2572014)
>>> breakpoints(next_horizon__exponential_fast, first_horizon, horizon)
([5000, 25000000], 24000000)

• Third examples:

>>> first_horizon = 10
>>> horizon = 1123456
>>> breakpoints(next_horizon__arithmetic, first_horizon, horizon)
([10, 210, 410, ..., 1123210, 1123410, 1123610], 154)
>>> breakpoints(next_horizon__geometric, first_horizon, horizon)
([10, 20, 40, 80, 160, 320, 640, 1280, 2560, 5120, 10240, 20480, 40960, 81920, 163840, 327680, 655360, 1310720], 187264)
>>> breakpoints(next_horizon__exponential, first_horizon, horizon)
([10, 31, 172, 2255, 107082, 35040856], 33917400)
>>> breakpoints(next_horizon__exponential_slow, first_horizon, horizon)
([10, 12, 15, 19, 25, 34, 48, 70, 107, 170, 284, 499, 928, 1837, 3895, 8903, 22104, 60106, 180638, 606024, 2294768], 1171312)
>>> breakpoints(next_horizon__exponential_fast, first_horizon, horizon)
([10, 100, 10000, 100000000], 98876544)

Policies.DoublingTrickWrapper.constant_c_for_the_functions_f = 0.5

The constant c in front of the function f.

Policies.DoublingTrickWrapper.function_f__for_geometric_sequences(i, c=0.5)[source]

For the geometric doubling sequences, $$f(i) = c \times \log(i)$$.

Policies.DoublingTrickWrapper.function_f__for_exponential_sequences(i, c=0.5)[source]

For the exponential doubling sequences, $$f(i) = c \times i$$.

Policies.DoublingTrickWrapper.function_f__for_generic_sequences(i, c=0.5, d=0.5, e=0.0)[source]

For a certain generic family of doubling sequences, $$f(i) = c \times i^{d} \times (\log(i))^{e}$$.

Warning

d should most probably be smaller than 1.

Policies.DoublingTrickWrapper.function_f__for_intermediate_sequences(i)[source]
Policies.DoublingTrickWrapper.function_f__for_intermediate2_sequences(i)[source]
Policies.DoublingTrickWrapper.function_f__for_intermediate3_sequences(i)[source]
Policies.DoublingTrickWrapper.function_f__for_intermediate4_sequences(i)[source]
Policies.DoublingTrickWrapper.function_f__for_intermediate5_sequences(i)[source]
Policies.DoublingTrickWrapper.alpha_for_Ti = 0.5

Value of the parameter $$\alpha$$ for the Ti_from_f() function.

Policies.DoublingTrickWrapper.Ti_from_f(f, alpha=0.5, *args, **kwargs)[source]

For any non-negative and increasing function $$f: i \mapsto f(i)$$, the corresponding sequence is defined by:

$\forall i\in\mathbb{N},\; T_i := \lfloor \exp(\alpha \times \exp(f(i))) \rfloor.$

Warning

$$f(i)$$ can need other parameters, see the examples above. They can be given as *args or **kwargs to Ti_from_f().

Warning

it should be computed otherwise, I should give $$i \mapsto \exp(f(i))$$ instead of $$f: i \mapsto f(i)$$. I need to try as much as possible to reduce the risk of overflow errors!

Policies.DoublingTrickWrapper.Ti_geometric(i, horizon, alpha=0.5, first_horizon=200, *args, **kwargs)[source]

Sequence $$T_i$$ generated from the function $$f$$ = function_f__for_geometric_sequences().

Policies.DoublingTrickWrapper.Ti_exponential(i, horizon, alpha=0.5, first_horizon=200, *args, **kwargs)[source]

Sequence $$T_i$$ generated from the function $$f$$ = function_f__for_exponential_sequences().

Policies.DoublingTrickWrapper.Ti_intermediate_sqrti(i, horizon, alpha=0.5, first_horizon=200, *args, **kwargs)[source]

Sequence $$T_i$$ generated from the function $$f$$ = function_f__for_intermediate_sequences().

Policies.DoublingTrickWrapper.Ti_intermediate_i13(i, horizon, alpha=0.5, first_horizon=200, *args, **kwargs)[source]

Sequence $$T_i$$ generated from the function $$f$$ = function_f__for_intermediate2_sequences().

Policies.DoublingTrickWrapper.Ti_intermediate_i23(i, horizon, alpha=0.5, first_horizon=200, *args, **kwargs)[source]

Sequence $$T_i$$ generated from the function $$f$$ = function_f__for_intermediate3_sequences().

Policies.DoublingTrickWrapper.Ti_intermediate_i12_logi12(i, horizon, alpha=0.5, first_horizon=200, *args, **kwargs)[source]

Sequence $$T_i$$ generated from the function $$f$$ = function_f__for_intermediate4_sequences().

Policies.DoublingTrickWrapper.Ti_intermediate_i_by_logi(i, horizon, alpha=0.5, first_horizon=200, *args, **kwargs)[source]

Sequence $$T_i$$ generated from the function $$f$$ = function_f__for_intermediate5_sequences().

Policies.DoublingTrickWrapper.last_term_operator_LT(Ti, max_i=10000)[source]

For a certain function representing a doubling sequence, $$T: i \mapsto T_i$$, this last_term_operator_LT() function returns the function $$L: T \mapsto L_T$$, defined as:

$\forall T\in\mathbb{N},\; L_T := \min\{ i \in\mathbb{N},\; T \leq T_i \}.$

$$L_T$$ is the only integer which satisfies $$T_{L_T - 1} < T \leq T_{L_T}$$.

Policies.DoublingTrickWrapper.plot_doubling_sequences(i_min=1, i_max=30, list_of_f=(<function function_f__for_geometric_sequences>, <function function_f__for_intermediate_sequences>, <function function_f__for_intermediate2_sequences>, <function function_f__for_intermediate3_sequences>, <function function_f__for_intermediate4_sequences>, <function function_f__for_exponential_sequences>), label_of_f=('Geometric doubling (d=0, e=1)', 'Intermediate doubling (d=1/2, e=0)', 'Intermediate doubling (d=1/3, e=0)', 'Intermediate doubling (d=2/3, e=0)', 'Intermediate doubling (d=1/2, e=1/2)', 'Exponential doubling (d=1, e=0)'), *args, **kwargs)[source]

Display a plot to illustrate the values of the $$T_i$$ as a function of $$i$$ for some i.

• Can accept many functions f (and labels).

Policies.DoublingTrickWrapper.plot_quality_first_upper_bound(Tmin=10, Tmax=100000000, nbTs=100, gamma=0.0, delta=1.0, list_of_f=(<function function_f__for_geometric_sequences>, <function function_f__for_intermediate_sequences>, <function function_f__for_intermediate2_sequences>, <function function_f__for_intermediate3_sequences>, <function function_f__for_intermediate4_sequences>, <function function_f__for_exponential_sequences>), label_of_f=('Geometric doubling (d=0, e=1)', 'Intermediate doubling (d=1/2, e=0)', 'Intermediate doubling (d=1/3, e=0)', 'Intermediate doubling (d=2/3, e=0)', 'Intermediate doubling (d=1/2, e=1/2)', 'Exponential doubling (d=1, e=0)'), show_Ti_m_Tim1=True, *args, **kwargs)[source]

Display a plot to compare numerically between the following sum $$S$$ and the upper-bound we hope to have, $$T^{\gamma} (\log T)^{\delta}$$, as a function of $$T$$ for some values between $$T_{\min}$$ and $$T_{\max}$$:

$S := \sum_{i=0}^{L_T} (T_i - T_{i-1})^{\gamma} (\log (T_i - T_{i-1}))^{\delta}.$
• Can accept many functions f (and labels).

• Can use $$T_i$$ instead of $$T_i - T_{i-1}$$ if show_Ti_m_Tim1=False (default is to use the smaller possible bound, with difference of sequence lengths, $$T_i - T_{i-1}$$).

Warning

This is still ON GOING WORK.

Policies.DoublingTrickWrapper.MAX_NB_OF_TRIALS = 500

If the sequence Ti does not grow enough, artificially increase i until T_inext > T_i

class Policies.DoublingTrickWrapper.DoublingTrickWrapper(nbArms, full_restart=False, policy=<class 'Policies.UCBH.UCBH'>, next_horizon=CPUDispatcher(<function next_horizon__exponential_slow>), first_horizon=200, *args, **kwargs)[source]

A policy that acts as a wrapper on another policy P, assumed to be horizon dependent (has to known $$T$$), by implementing a “doubling trick”.

• Reference: [[What the Doubling Trick Can or Can’t Do for Multi-Armed Bandits, Lilian Besson and Emilie Kaufmann, 2018]](https://hal.inria.fr/hal-01736357), to be presented soon.

__init__(nbArms, full_restart=False, policy=<class 'Policies.UCBH.UCBH'>, next_horizon=CPUDispatcher(<function next_horizon__exponential_slow>), first_horizon=200, *args, **kwargs)[source]

New policy.

full_restart = None

Constant to know how to refresh the underlying policy.

next_horizon_name = None

Pretty string of the name of this growing function

__module__ = 'Policies.DoublingTrickWrapper'
horizon = None

Last guess for the horizon

__str__()[source]

-> str

startGame()[source]

Initialize the policy for a new game.

getReward(arm, reward)[source]

Pass the reward, as usual, update t and sometimes restart the underlying policy.