# Policies.UCBVtuned module¶

The UCBV-Tuned policy for bounded bandits, with a tuned variance correction term. Reference: [Auer et al. 02].

class Policies.UCBVtuned.UCBVtuned(nbArms, lower=0.0, amplitude=1.0)[source]

The UCBV-Tuned policy for bounded bandits, with a tuned variance correction term. Reference: [Auer et al. 02].

__str__()[source]

-> str

computeIndex(arm)[source]

Compute the current index, at time t and after $$N_k(t)$$ pulls of arm k:

$\begin{split}\hat{\mu}_k(t) &= \frac{X_k(t)}{N_k(t)}, \\ V_k(t) &= \frac{Z_k(t)}{N_k(t)} - \hat{\mu}_k(t)^2, \\ V'_k(t) &= V_k(t) + \sqrt{\frac{2 \log(t)}{N_k(t)}}, \\ I_k(t) &= \hat{\mu}_k(t) + \sqrt{\frac{\log(t) V'_k(t)}{N_k(t)}}.\end{split}$

Where $$V'_k(t)$$ is an other estimator of the variance of rewards, obtained from $$X_k(t) = \sum_{\sigma=1}^{t} 1(A(\sigma) = k) r_k(\sigma)$$ is the sum of rewards from arm k, and $$Z_k(t) = \sum_{\sigma=1}^{t} 1(A(\sigma) = k) r_k(\sigma)^2$$ is the sum of rewards squared.

computeAllIndex()[source]

Compute the current indexes for all arms, in a vectorized manner.

__module__ = 'Policies.UCBVtuned'