Policies.Experimentals.UnsupervisedLearning module

An experimental “on-line” policy, using algorithms from Unsupervised Learning.

Basically, it works like this:

  • Start with a purely random exploration phase (uniform exploration), to get some data about each arm,

  • Then, fit some unsupervised learning model on each arm, to build a model of its distribution (e.g., a simple Gaussian, with mean and variance obtained from the data).

  • And then, at each time step, use the models to generate some prediction for the output of each arm, and play according to the arm with highest prediction.

    • If needed, refit the models once in a while, to incorporate all the collected data.

    • If needed, use a robust estimate (e.g., mean of 100 samples) to choose the arm to play, instead of only one sample.

Warning

This is still experimental! It is NOT efficient in terms of storage, and NOT efficient either in terms of efficiency against a Bandit problem (i.e., regret, best arm identification etc).

Warning

It is NOT really an on-line policy, as both the memory consumption and the time complexity of each step increase with time!

This module provides also two simple Unsupervised Learning algorithm, SimpleGaussianA and SimpleBernoulliKernel, see below.

class Policies.Experimentals.UnsupervisedLearning.FittingModel(*args, **kwargs)[source]

Bases: object

Base class for any fitting model

__init__(*args, **kwargs)[source]

Nothing to do here.

__repr__()[source]

Return repr(self).

fit(data)[source]

Nothing to do here.

sample(shape=1)[source]

Always 0., for instance.

score_samples(data)[source]

Always 1., for instance.

score(data)[source]

Log likelihood of the point (or the vector of data), under the current Gaussian model.

__dict__ = mappingproxy({'__module__': 'Policies.Experimentals.UnsupervisedLearning', '__doc__': ' Base class for any fitting model', '__init__': <function FittingModel.__init__>, '__repr__': <function FittingModel.__repr__>, 'fit': <function FittingModel.fit>, 'sample': <function FittingModel.sample>, 'score_samples': <function FittingModel.score_samples>, 'score': <function FittingModel.score>, '__dict__': <attribute '__dict__' of 'FittingModel' objects>, '__weakref__': <attribute '__weakref__' of 'FittingModel' objects>})
__module__ = 'Policies.Experimentals.UnsupervisedLearning'
__weakref__

list of weak references to the object (if defined)

class Policies.Experimentals.UnsupervisedLearning.SimpleGaussianKernel(loc=0.0, scale=1.0, *args, **kwargs)[source]

Bases: Policies.Experimentals.UnsupervisedLearning.FittingModel

Basic Unsupervised Learning algorithm, which simply fits a 1D Gaussian on some 1D data.

  • It works quite well, for Gaussian as well as Constant, Uniform and Bernoulli arms.

  • It fails (more or less) dramatically on Exponential, Binomial and Poisson arms.

>>> K = SimpleGaussianKernel(loc=0.5, scale=0.1)
>>> K
N(0.5, 0.1)
>>> data = [0.33, 0.34, 0.40, 0.37]
>>> K.fit(data)
N(0.36, 0.0274)
>>> np.random.seed(0)  # reproducibility
>>> K.sample()  
0.4083...
>>> np.mean(K.sample((100, 100)))  
0.3594...
__init__(loc=0.0, scale=1.0, *args, **kwargs)[source]

Starts with \(\mathcal{N}(0, 1)\), by default.

__str__()[source]

Return str(self).

fit(data)[source]

Use the mean and variance from the 1D vector data (of shape n_samples or (n_samples, 1)).

sample(shape=1)[source]

Return one or more sample, from the current Gaussian model.

score_samples(data)[source]

Likelihood of the point (or the vector of data), under the current Gaussian model, component-wise.

__module__ = 'Policies.Experimentals.UnsupervisedLearning'
class Policies.Experimentals.UnsupervisedLearning.SimpleBernoulliKernel(p=None, *args, **kwargs)[source]

Bases: Policies.Experimentals.UnsupervisedLearning.FittingModel

Basic Unsupervised Learning algorithm, which simply fits a 1D Bernoulli distribution on some 1D data.

  • It works quite well, for Bernoulli as well as Constant arms.

  • It fails (more or less) dramatically on Gaussian, Uniform, Exponential, Binomial and Poisson arms.

>>> K = SimpleBernoulliKernel(lower=0, amplitude=1)
>>> K.mu
0.5
>>> data = [0.33, 0.34, 0.40, 0.37]
>>> K.fit(data)
B(0.36)
>>> np.random.seed(0)  # reproducibility
>>> K.sample()
0.0
>>> np.mean(K.sample((100, 100)))  
0.3619...
__init__(p=None, *args, **kwargs)[source]

Starts with \(\mathcal{B}(\mu)\), where \(\mu = p\) or \(\mu = \mathrm{lower} + \mathrm{amplitude} / 2\), by default.

lower = None

Known lower bounds on the rewards.

amplitude = None

Known amplitude of the rewards.

mu = None

Mean of the Bernoulli arm.

__str__()[source]

Return str(self).

fit(data)[source]

Use the mean and variance from the 1D vector data (of shape n_samples or (n_samples, 1)).

sample(shape=1)[source]

Return one or more sample, from the current Bernoulli model.

score_samples(data)[source]

Likelihood of the point (or the vector of data), under the current Bernoulli model, component-wise.

__module__ = 'Policies.Experimentals.UnsupervisedLearning'
Policies.Experimentals.UnsupervisedLearning.T0 = 100

Default value for the parameter T_0.

Policies.Experimentals.UnsupervisedLearning.FIT_EVERY = 1000

Default value for the parameter fit_every.

Policies.Experimentals.UnsupervisedLearning.MEAN_OF = 100

Default value for the parameter meanOf.

class Policies.Experimentals.UnsupervisedLearning.UnsupervisedLearning(nbArms, estimator=<class 'Policies.Experimentals.UnsupervisedLearning.SimpleGaussianKernel'>, T_0=100, fit_every=1000, meanOf=100, lower=0.0, amplitude=1.0, *args, **kwargs)[source]

Bases: object

Generic policy using an Unsupervised Learning algorithm, for instance from scikit-learn.

Warning

This is still experimental!

Note

The algorithm I designed is not obvious, but here are some explanations:

  • Initially : create \(K\) Unsupervised Learning algorithms \(\mathcal{U}_k(0)\), \(k\in\{1,\dots,K\}\), for instance KernelDensity estimators.

  • For the first \(K \times T_0\) time steps, each arm \(k \in \{1, \dots, K\}\) is sampled exactly \(T_0\) times, to get a lot of initial observations for each arm.

  • With these first \(T_0\) (e.g., \(50\)) observations, train a first version of the Unsupervised Learning algorithms \(\mathcal{U}_k(t)\), \(k\in\{1,\dots,K\}\).

  • Then, for the following time steps, \(t \geq T_0 + 1\) :

    • Once in a while (every \(T_1 =\) fit_every steps, e.g., \(100\)), retrain all the Unsupervised Learning algorithms :

      • For each arm \(k\in\{1,\dots,K\}\),

      • Use all the previous observations of that arm to train the model \(\mathcal{U}_k(t)\).

    • Otherwise, use the previously trained model to choose the arm \(A(t) \in \{1,\dots,K\}\) to play next (see choice() below).

__init__(nbArms, estimator=<class 'Policies.Experimentals.UnsupervisedLearning.SimpleGaussianKernel'>, T_0=100, fit_every=1000, meanOf=100, lower=0.0, amplitude=1.0, *args, **kwargs)[source]

Create a new UnsupervisedLearning policy.

nbArms = None

Number of arms of the MAB problem.

t = None

Current time.

T_0 = None

Number of random exploration of each arm at the beginning.

fit_every = None

Frequency of refit of the unsupervised models.

meanOf = None

Number of samples used to estimate the best arm.

givenEstimator = None

The class to use to create the estimator.

ests = None

List of estimators (i.e., an object with a fit and sample method).

observations = None

List of observations for each arm. This is the main weakness of this policy: it uses a linear storage space, in the number of observations so far (i.e., the time index t), in other words it has a linear memory complexity : that’s really bad!

lower = None

Known lower bounds on the rewards.

amplitude = None

Known amplitude of the rewards.

__str__()[source]

Return str(self).

startGame()[source]

Reinitialize the estimators.

getReward(armId, reward)[source]

Store this observation reward for that arm armId.

choice()[source]

Choose an arm, according to this algorithm:

  • If \(t < T_0 \times K\), choose arm \(t \;\mathrm{mod}\; K\), in order to select each arm exactly \(K\) times initially.

  • Otherwise, get a random sample, \(s_k(t)\) from the \(K\) Unsupervised Learning algorithms \(\mathcal{U}_k(t)\), \(k\in\{1,\dots,K\}\) :

\[\forall k\in\{1,\dots,K\}, \;\; s_k(t) \sim \mathcal{U}_k(t).\]
  • Choose the arm \(A(t)\) with highest sample :

\[A(t) \in \arg\max_{k\in\{1,\dots,K\}} s_k(t).\]
  • Play that arm \(A(t)\), receive a reward \(r_{A(t)}(t)\) from its (unknown) distribution, and store it.

Note

A more robust (and so more correct) variant could be to use a bunch of samples, and use their mean to give \(s_k(t)\) :

  • Get a bunch of \(M\) random samples (e.g., \(50\)), \(s_k^i(t)\) from the \(K\) Unsupervised Learning algorithms \(\mathcal{U}_k(t)\), \(k\in\{1,\dots,K\}\) :

\[\forall k\in\{1,\dots,K\}, \;\; \forall i\in\{1,\dots,M\}, \;\; s_k^i(t) \sim \mathcal{U}_k(t).\]
  • Average them to get \(\hat{s_k}(t)\) :

\[\forall k\in\{1,\dots,K\}, \;\; \hat{s_k}(t) := \frac{1}{M} \sum_{i=1}^{M} s_k^i(t).\]
  • Choose the arm \(A(t)\) with highest mean sample :

\[A(t) \in \arg\max_{k\in\{1,\dots,K\}} \hat{s_k}(t).\]

Note that if \(M = 1\), this is equivalent to the naive approach.

fit(data)[source]

Fit each of the K models, with the data accumulated up-to now.

sample()[source]

Return a vector of random sample from each of the K models.

sample_with_mean(meanOf=None)[source]

Return a vector of random sample from each of the K models, by averaging a lot of samples (reduce variance).

score(obs)[source]

Return a vector of scores, for each of the K models on its observation.

estimatedOrder()[source]

Return the estimate order of the arms, as a permutation on [0..K-1] that would order the arms by increasing means.

__dict__ = mappingproxy({'__module__': 'Policies.Experimentals.UnsupervisedLearning', '__doc__': ' Generic policy using an Unsupervised Learning algorithm, for instance from scikit-learn.\n\n - By default, it uses a [KernelDensity](http://scikit-learn.org/stable/modules/generated/sklearn.neighbors.KernelDensity.html#sklearn.neighbors.KernelDensity) estimator.\n\n .. warning:: This is still **experimental**!\n\n\n .. note:: The algorithm I designed is not obvious, but here are some explanations:\n\n\n - Initially : create :math:`K` Unsupervised Learning algorithms :math:`\\mathcal{U}_k(0)`, :math:`k\\in\\{1,\\dots,K\\}`, for instance `KernelDensity` estimators.\n\n - For the first :math:`K \\times T_0` time steps, each arm :math:`k \\in \\{1, \\dots, K\\}` is sampled exactly :math:`T_0` times, to get a lot of initial observations for each arm.\n\n - With these first :math:`T_0` (e.g., :math:`50`) observations, train a first version of the Unsupervised Learning algorithms :math:`\\mathcal{U}_k(t)`, :math:`k\\in\\{1,\\dots,K\\}`.\n\n - Then, for the following time steps, :math:`t \\geq T_0 + 1` :\n\n + Once in a while (every :math:`T_1 =` `fit_every` steps, e.g., :math:`100`), retrain all the Unsupervised Learning algorithms :\n\n - For each arm :math:`k\\in\\{1,\\dots,K\\}`,\n - Use all the previous observations of that arm to train the model :math:`\\mathcal{U}_k(t)`.\n\n + Otherwise, use the previously trained model to choose the arm :math:`A(t) \\in \\{1,\\dots,K\\}` to play next (see :meth:`choice` below).\n ', '__init__': <function UnsupervisedLearning.__init__>, '__str__': <function UnsupervisedLearning.__str__>, 'startGame': <function UnsupervisedLearning.startGame>, 'getReward': <function UnsupervisedLearning.getReward>, 'choice': <function UnsupervisedLearning.choice>, 'fit': <function UnsupervisedLearning.fit>, 'sample': <function UnsupervisedLearning.sample>, 'sample_with_mean': <function UnsupervisedLearning.sample_with_mean>, 'score': <function UnsupervisedLearning.score>, 'estimatedOrder': <function UnsupervisedLearning.estimatedOrder>, '__dict__': <attribute '__dict__' of 'UnsupervisedLearning' objects>, '__weakref__': <attribute '__weakref__' of 'UnsupervisedLearning' objects>})
__module__ = 'Policies.Experimentals.UnsupervisedLearning'
__weakref__

list of weak references to the object (if defined)