Policies.Experimentals.BlackBoxOpt module

An experimental “on-line” policy, using algorithms from black-box Bayesian optimization, using [scikit-optimize](https://scikit-optimize.github.io/).

Warning

This is still experimental! It is NOT efficient in terms of storage, and highly NOT efficient either in terms of efficiency against a Bandit problem (i.e., regret, best arm identification etc).

Policies.Experimentals.BlackBoxOpt.default_estimator(*args, **kwargs)[source]

Default estimator object.

Policies.Experimentals.BlackBoxOpt.default_optimizer(nbArms, est, *args, **kwargs)[source]

Default optimizer object.

class Policies.Experimentals.BlackBoxOpt.BlackBoxOpt(nbArms, opt=<function default_optimizer>, est=<function default_estimator>, lower=0.0, amplitude=1.0, *args, **kwargs)[source]

Bases: BasePolicy.BasePolicy

Black-box Bayesian optimizer for Multi-Armed Bandit, using Gaussian processes.

Warning

This is still experimental! It works fine, but it is EXTREMELY SLOW!

__init__(nbArms, opt=<function default_optimizer>, est=<function default_estimator>, lower=0.0, amplitude=1.0, *args, **kwargs)[source]

New policy.

nbArms = None

Number of arms of the MAB problem.

t = None

Current time.

opt = None

The black-box optimizer to use, initialized from the other arguments

lower = None

Known lower bounds on the rewards.

amplitude = None

Known amplitude of the rewards.

__str__()[source]

-> str

startGame()[source]

Reinitialize the black-box optimizer.

getReward(armId, reward)[source]

Store this observation reward for that arm armId.

  • In fact, skopt.Optimizer is a minimizer, so loss=1-reward is stored, to maximize the rewards by minimizing the losses.

choice()[source]

Choose an arm, according to the black-box optimizer.

__module__ = 'Policies.Experimentals.BlackBoxOpt'