# Policies.ProbabilityPursuit module¶

The basic Probability Pursuit algorithm.

• We use the simple version of the pursuit algorithm, as described in the seminal book by Sutton and Barto (1998), https://webdocs.cs.ualberta.ca/~sutton/book/the-book.html.

• Initially, a uniform probability is set on each arm, $$p_k(0) = 1/k$$.

• At each time step $$t$$, the probabilities are all recomputed, following this equation:

$\begin{split}p_k(t+1) = \begin{cases} (1 - \beta) p_k(t) + \beta \times 1 & \text{if}\; \hat{\mu}_k(t) = \max_j \hat{\mu}_j(t) \\ (1 - \beta) p_k(t) + \beta \times 0 & \text{otherwise}. \end{cases}\end{split}$
• $$\beta \in (0, 1)$$ is a learning rate, default is BETA = 0.5.

• And then arm $$A_k(t+1)$$ is randomly selected from the distribution $$(p_k(t+1))_{1 \leq k \leq K}$$.

• References: [Kuleshov & Precup - JMLR, 2000](http://www.cs.mcgill.ca/~vkules/bandits.pdf#page=6), [Sutton & Barto, 1998]

Policies.ProbabilityPursuit.BETA = 0.5

Default value for the beta parameter

class Policies.ProbabilityPursuit.ProbabilityPursuit(nbArms, beta=0.5, prior='uniform', lower=0.0, amplitude=1.0)[source]

The basic Probability pursuit algorithm.

__init__(nbArms, beta=0.5, prior='uniform', lower=0.0, amplitude=1.0)[source]

New policy.

probabilities = None

Probabilities of each arm

startGame()[source]

Reinitialize probabilities.

property beta

Constant parameter $$\beta(t) = \beta(0)$$.

__str__()[source]

-> str

getReward(arm, reward)[source]

Give a reward: accumulate rewards on that arm k, then update the probabilities $$p_k(t)$$ of each arm.

choice()[source]

One random selection, with probabilities $$(p_k(t))_{1 \leq k \leq K}$$, thank to numpy.random.choice().

choiceWithRank(rank=1)[source]

Multiple (rank >= 1) random selection, with probabilities $$(p_k(t))_{1 \leq k \leq K}$$, thank to numpy.random.choice(), and select the last one (less probable).

choiceFromSubSet(availableArms='all')[source]

One random selection, from availableArms, with probabilities $$(p_k(t))_{1 \leq k \leq K}$$, thank to numpy.random.choice().

__module__ = 'Policies.ProbabilityPursuit'
choiceMultiple(nb=1)[source]

Multiple (nb >= 1) random selection, with probabilities $$(p_k(t))_{1 \leq k \leq K}$$, thank to numpy.random.choice().