Policies.Experimentals.BlackBoxOpt module¶

An experimental “on-line” policy, using algorithms from black-box Bayesian optimization, using [scikit-optimize](https://scikit-optimize.github.io/).

It uses an iterative black-box Bayesian optimizer, with two methods ask() and tell() to be used as choice() and getReward() for our Multi-Armed Bandit optimization environment.
See https://scikit-optimize.github.io/notebooks/ask-and-tell.html for more details.

Warning

This is still experimental! It is NOT efficient in terms of storage, and highly NOT efficient either in terms of efficiency against a Bandit problem (i.e., regret, best arm identification etc).

Policies.Experimentals.BlackBoxOpt.default_estimator(*args, **kwargs)[source]¶

Default estimator object.

Default is RandomForestRegressor (https://scikit-optimize.github.io/learning/index.html#skopt.learning.RandomForestRegressor).
Another possibility is to use ExtraTreesRegressor (https://scikit-optimize.github.io/learning/index.html#skopt.learning.ExtraTreesRegressor), but it is slower!
GaussianProcessRegressor (https://scikit-optimize.github.io/learning/index.html#skopt.learning.GaussianProcessRegressor) was failing, don’t really know why. I think it is not designed to work with Categorical inputs.
Any of https://scikit-optimize.github.io/learning/index.html can be used.

Policies.Experimentals.BlackBoxOpt.default_optimizer(nbArms, est, *args, **kwargs)[source]¶

Default optimizer object.

Default is Optimizer (https://scikit-optimize.github.io/#skopt.Optimizer).

class Policies.Experimentals.BlackBoxOpt.BlackBoxOpt(nbArms, opt=<function default_optimizer>, est=<function default_estimator>, lower=0.0, amplitude=1.0, *args, **kwargs)[source]¶

Bases: BasePolicy.BasePolicy

Black-box Bayesian optimizer for Multi-Armed Bandit, using Gaussian processes.

By default, it uses default_optimizer().

Warning

This is still experimental! It works fine, but it is EXTREMELY SLOW!

__init__(nbArms, opt=<function default_optimizer>, est=<function default_estimator>, lower=0.0, amplitude=1.0, *args, **kwargs)[source]¶: New policy.

nbArms = None¶: Number of arms of the MAB problem.

t = None¶: Current time.

opt = None¶: The black-box optimizer to use, initialized from the other arguments

lower = None¶: Known lower bounds on the rewards.

amplitude = None¶: Known amplitude of the rewards.

__str__()[source]¶: -> str

startGame()[source]¶: Reinitialize the black-box optimizer.

getReward(armId, reward)[source]¶

Store this observation reward for that arm armId.

In fact, skopt.Optimizer is a minimizer, so loss=1-reward is stored, to maximize the rewards by minimizing the losses.

choice()[source]¶: Choose an arm, according to the black-box optimizer.

__module__ = 'Policies.Experimentals.BlackBoxOpt'¶