Policies.ProbabilityPursuit module¶
The basic Probability Pursuit algorithm.
We use the simple version of the pursuit algorithm, as described in the seminal book by Sutton and Barto (1998), https://webdocs.cs.ualberta.ca/~sutton/book/the-book.html.
Initially, a uniform probability is set on each arm, \(p_k(0) = 1/k\).
At each time step \(t\), the probabilities are all recomputed, following this equation:
\[\begin{split}p_k(t+1) = \begin{cases} (1 - \beta) p_k(t) + \beta \times 1 & \text{if}\; \hat{\mu}_k(t) = \max_j \hat{\mu}_j(t) \\ (1 - \beta) p_k(t) + \beta \times 0 & \text{otherwise}. \end{cases}\end{split}\]\(\beta \in (0, 1)\) is a learning rate, default is BETA = 0.5.
And then arm \(A_k(t+1)\) is randomly selected from the distribution \((p_k(t+1))_{1 \leq k \leq K}\).
References: [Kuleshov & Precup - JMLR, 2000](http://www.cs.mcgill.ca/~vkules/bandits.pdf#page=6), [Sutton & Barto, 1998]
-
Policies.ProbabilityPursuit.BETA= 0.5¶ Default value for the beta parameter
-
class
Policies.ProbabilityPursuit.ProbabilityPursuit(nbArms, beta=0.5, prior='uniform', lower=0.0, amplitude=1.0)[source]¶ Bases:
Policies.BasePolicy.BasePolicyThe basic Probability pursuit algorithm.
References: [Kuleshov & Precup - JMLR, 2000](http://www.cs.mcgill.ca/~vkules/bandits.pdf#page=6), [Sutton & Barto, 1998]
-
probabilities= None¶ Probabilities of each arm
-
property
beta¶ Constant parameter \(\beta(t) = \beta(0)\).
-
getReward(arm, reward)[source]¶ Give a reward: accumulate rewards on that arm k, then update the probabilities \(p_k(t)\) of each arm.
-
choice()[source]¶ One random selection, with probabilities \((p_k(t))_{1 \leq k \leq K}\), thank to
numpy.random.choice().
-
choiceWithRank(rank=1)[source]¶ Multiple (rank >= 1) random selection, with probabilities \((p_k(t))_{1 \leq k \leq K}\), thank to
numpy.random.choice(), and select the last one (less probable).
-
choiceFromSubSet(availableArms='all')[source]¶ One random selection, from availableArms, with probabilities \((p_k(t))_{1 \leq k \leq K}\), thank to
numpy.random.choice().
-
__module__= 'Policies.ProbabilityPursuit'¶