Policies.EpsilonGreedy module

The epsilon-greedy random policies, with the naive one and some variants.

Warning

Except if \(\varepsilon(t)\) is optimally tuned for a specific problem, none of these policies can hope to be efficient.

Policies.EpsilonGreedy.random() → x in the interval [0, 1).
class Policies.EpsilonGreedy.EpsilonGreedy(nbArms, epsilon=0.1, lower=0.0, amplitude=1.0)[source]

Bases: Policies.BasePolicy.BasePolicy

The epsilon-greedy random policy.

__init__(nbArms, epsilon=0.1, lower=0.0, amplitude=1.0)[source]

New policy.

property epsilon
__str__()[source]

-> str

choice()[source]

With a probability of epsilon, explore (uniform choice), otherwhise exploit based on just accumulated rewards (not empirical mean rewards).

choiceWithRank(rank=1)[source]

With a probability of epsilon, explore (uniform choice), otherwhise exploit with the rank, based on just accumulated rewards (not empirical mean rewards).

choiceFromSubSet(availableArms='all')[source]

Not defined.

choiceMultiple(nb=1)[source]

Not defined.

__module__ = 'Policies.EpsilonGreedy'
class Policies.EpsilonGreedy.EpsilonDecreasing(nbArms, epsilon=0.1, lower=0.0, amplitude=1.0)[source]

Bases: Policies.EpsilonGreedy.EpsilonGreedy

The epsilon-decreasing random policy.

__init__(nbArms, epsilon=0.1, lower=0.0, amplitude=1.0)[source]

New policy.

__str__()[source]

-> str

property epsilon

Decreasing \(\varepsilon(t) = \min(1, \varepsilon_0 / \max(1, t))\).

__module__ = 'Policies.EpsilonGreedy'
Policies.EpsilonGreedy.C = 0.1

Constant C in the MEGA formula

Policies.EpsilonGreedy.D = 0.5

Constant C in the MEGA formula

Policies.EpsilonGreedy.epsilon0(c, d, nbArms)[source]

MEGA heuristic:

\[\varepsilon_0 = \frac{c K^2}{d^2 (K - 1)}.\]
class Policies.EpsilonGreedy.EpsilonDecreasingMEGA(nbArms, c=0.1, d=0.5, lower=0.0, amplitude=1.0)[source]

Bases: Policies.EpsilonGreedy.EpsilonGreedy

The epsilon-decreasing random policy, using MEGA’s heuristic for a good choice of epsilon0 value.

__init__(nbArms, c=0.1, d=0.5, lower=0.0, amplitude=1.0)[source]

New policy.

__str__()[source]

-> str

property epsilon

Decreasing \(\varepsilon(t) = \min(1, \varepsilon_0 / \max(1, t))\).

__module__ = 'Policies.EpsilonGreedy'
class Policies.EpsilonGreedy.EpsilonFirst(nbArms, horizon, epsilon=0.01, lower=0.0, amplitude=1.0)[source]

Bases: Policies.EpsilonGreedy.EpsilonGreedy

The epsilon-first random policy. Ref: https://en.wikipedia.org/wiki/Multi-armed_bandit#Semi-uniform_strategies

__init__(nbArms, horizon, epsilon=0.01, lower=0.0, amplitude=1.0)[source]

New policy.

horizon = None

Parameter \(T\) = known horizon of the experiment.

__str__()[source]

-> str

property epsilon

1 while \(t \leq \varepsilon_0 T\), 0 after.

__module__ = 'Policies.EpsilonGreedy'
Policies.EpsilonGreedy.EPSILON = 0.1

Default value for epsilon for EpsilonDecreasing

Policies.EpsilonGreedy.DECREASINGRATE = 1e-06

Default value for the constant for the decreasing rate

class Policies.EpsilonGreedy.EpsilonExpDecreasing(nbArms, epsilon=0.1, decreasingRate=1e-06, lower=0.0, amplitude=1.0)[source]

Bases: Policies.EpsilonGreedy.EpsilonGreedy

The epsilon exp-decreasing random policy.

__init__(nbArms, epsilon=0.1, decreasingRate=1e-06, lower=0.0, amplitude=1.0)[source]

New policy.

__module__ = 'Policies.EpsilonGreedy'
__str__()[source]

-> str

property epsilon

Decreasing \(\varepsilon(t) = \min(1, \varepsilon_0 \exp(- t \tau))\).