{ "cells": [ { "cell_type": "markdown", "metadata": { "toc": "true" }, "source": [ "# Table of Contents\n", "
" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "---\n", "# An example of a small Single-Player simulation\n", "\n", "First, be sure to be in the main folder, or to have [SMPyBandits](https://github.com/SMPyBandits/SMPyBandits) installed, and import `Evaluator` from `Environment` package:" ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Requirement already satisfied: SMPyBandits in ./venv3/lib/python3.6/site-packages (0.9.4)\n", "Collecting watermark\n", " Using cached https://files.pythonhosted.org/packages/81/c8/9d3dd05a91bdbda03e24e52870b653c9345a0461294a7290f321bba18fad/watermark-1.7.0-py3-none-any.whl\n", "Requirement already satisfied: numpy in ./venv3/lib/python3.6/site-packages (from SMPyBandits) (1.15.4)\n", "Requirement already satisfied: matplotlib>=2 in ./venv3/lib/python3.6/site-packages (from SMPyBandits) (3.0.2)\n", "Requirement already satisfied: scikit-learn in ./venv3/lib/python3.6/site-packages (from SMPyBandits) (0.20.0)\n", "Requirement already satisfied: scikit-optimize in ./venv3/lib/python3.6/site-packages (from SMPyBandits) (0.5.2)\n", "Requirement already satisfied: seaborn in ./venv3/lib/python3.6/site-packages (from SMPyBandits) (0.9.0)\n", "Requirement already satisfied: joblib in ./venv3/lib/python3.6/site-packages (from SMPyBandits) (0.13.0)\n", "Requirement already satisfied: scipy>0.9 in ./venv3/lib/python3.6/site-packages (from SMPyBandits) (1.1.0)\n", "Requirement already satisfied: ipython in ./venv3/lib/python3.6/site-packages (from watermark) (7.1.1)\n", "Requirement already satisfied: kiwisolver>=1.0.1 in ./venv3/lib/python3.6/site-packages (from matplotlib>=2->SMPyBandits) (1.0.1)\n", "Requirement already satisfied: cycler>=0.10 in ./venv3/lib/python3.6/site-packages (from matplotlib>=2->SMPyBandits) (0.10.0)\n", "Requirement already satisfied: python-dateutil>=2.1 in ./venv3/lib/python3.6/site-packages (from matplotlib>=2->SMPyBandits) (2.7.5)\n", "Requirement already satisfied: pyparsing!=2.0.4,!=2.1.2,!=2.1.6,>=2.0.1 in ./venv3/lib/python3.6/site-packages (from matplotlib>=2->SMPyBandits) (2.3.0)\n", "Requirement already satisfied: pandas>=0.15.2 in ./venv3/lib/python3.6/site-packages (from seaborn->SMPyBandits) (0.23.4)\n", "Requirement already satisfied: pygments in ./venv3/lib/python3.6/site-packages (from ipython->watermark) (2.2.0)\n", "Requirement already satisfied: pickleshare in ./venv3/lib/python3.6/site-packages (from ipython->watermark) (0.7.5)\n", "Requirement already satisfied: pexpect; sys_platform != \"win32\" in ./venv3/lib/python3.6/site-packages (from ipython->watermark) (4.6.0)\n", "Requirement already satisfied: jedi>=0.10 in ./venv3/lib/python3.6/site-packages (from ipython->watermark) (0.13.1)\n", "Requirement already satisfied: traitlets>=4.2 in ./venv3/lib/python3.6/site-packages (from ipython->watermark) (4.3.2)\n", "Requirement already satisfied: backcall in ./venv3/lib/python3.6/site-packages (from ipython->watermark) (0.1.0)\n", "Requirement already satisfied: prompt-toolkit<2.1.0,>=2.0.0 in ./venv3/lib/python3.6/site-packages (from ipython->watermark) (2.0.7)\n", "Requirement already satisfied: setuptools>=18.5 in ./venv3/lib/python3.6/site-packages (from ipython->watermark) (40.6.2)\n", "Requirement already satisfied: decorator in ./venv3/lib/python3.6/site-packages (from ipython->watermark) (4.3.0)\n", "Requirement already satisfied: six in ./venv3/lib/python3.6/site-packages (from cycler>=0.10->matplotlib>=2->SMPyBandits) (1.11.0)\n", "Requirement already satisfied: pytz>=2011k in ./venv3/lib/python3.6/site-packages (from pandas>=0.15.2->seaborn->SMPyBandits) (2018.7)\n", "Requirement already satisfied: ptyprocess>=0.5 in ./venv3/lib/python3.6/site-packages (from pexpect; sys_platform != \"win32\"->ipython->watermark) (0.6.0)\n", "Requirement already satisfied: parso>=0.3.0 in ./venv3/lib/python3.6/site-packages (from jedi>=0.10->ipython->watermark) (0.3.1)\n", "Requirement already satisfied: ipython-genutils in ./venv3/lib/python3.6/site-packages (from traitlets>=4.2->ipython->watermark) (0.2.0)\n", "Requirement already satisfied: wcwidth in ./venv3/lib/python3.6/site-packages (from prompt-toolkit<2.1.0,>=2.0.0->ipython->watermark) (0.1.7)\n", "Installing collected packages: watermark\n", "Successfully installed watermark-1.7.0\n", "Info: Using the Jupyter notebook version of the tqdm() decorator, tqdm_notebook() ...\n", "Lilian Besson \n", "\n", "CPython 3.6.6\n", "IPython 7.1.1\n", "\n", "SMPyBandits 0.9.4\n", "\n", "compiler : GCC 8.0.1 20180414 (experimental) [trunk revision 259383\n", "system : Linux\n", "release : 4.15.0-38-generic\n", "machine : x86_64\n", "processor : x86_64\n", "CPU cores : 4\n", "interpreter: 64bit\n" ] } ], "source": [ "!pip install SMPyBandits watermark\n", "%load_ext watermark\n", "%watermark -v -m -p SMPyBandits -a \"Lilian Besson\"" ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [], "source": [ "# Local imports\n", "from SMPyBandits.Environment import Evaluator, tqdm" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We also need arms, for instance `Bernoulli`-distributed arm:" ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [], "source": [ "# Import arms\n", "from SMPyBandits.Arms import Bernoulli" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "And finally we need some single-player Reinforcement Learning algorithms:" ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [], "source": [ "# Import algorithms\n", "from SMPyBandits.Policies import *" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "For instance, this imported the [`UCB` algorithm](https://en.wikipedia.org/wiki/Multi-armed_bandit#Bandit_strategies) is the `UCBalpha` class:" ] }, { "cell_type": "code", "execution_count": 6, "metadata": { "code_folding": [ 0 ] }, "outputs": [], "source": [ "# Just improving the ?? in Jupyter. Thanks to https://nbviewer.jupyter.org/gist/minrk/7715212\n", "from __future__ import print_function\n", "from IPython.core import page\n", "def myprint(s):\n", " try:\n", " print(s['text/plain'])\n", " except (KeyError, TypeError):\n", " print(s)\n", "page.page = myprint" ] }, { "cell_type": "code", "execution_count": 7, "metadata": { "scrolled": true }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "\u001b[0;31mInit signature:\u001b[0m \u001b[0mUCBalpha\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mnbArms\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0malpha\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0;36m4\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mlower\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0;36m0.0\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mamplitude\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0;36m1.0\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n", "\u001b[0;31mDocstring:\u001b[0m \n", "The UCB1 (UCB-alpha) index policy, modified to take a random permutation order for the initial exploration of each arm (reduce collisions in the multi-players setting).\n", "Reference: [Auer et al. 02].\n", "\u001b[0;31mInit docstring:\u001b[0m\n", "New generic index policy.\n", "\n", "- nbArms: the number of arms,\n", "- lower, amplitude: lower value and known amplitude of the rewards.\n", "\u001b[0;31mFile:\u001b[0m /tmp/SMPyBandits/notebooks/venv3/lib/python3.6/site-packages/SMPyBandits/Policies/UCBalpha.py\n", "\u001b[0;31mType:\u001b[0m type\n", "\n" ] } ], "source": [ "UCBalpha?" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "With more details, here the code:" ] }, { "cell_type": "code", "execution_count": 8, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "\u001b[0;31mInit signature:\u001b[0m \u001b[0mUCBalpha\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mnbArms\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0malpha\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0;36m4\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mlower\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0;36m0.0\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mamplitude\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0;36m1.0\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n", "\u001b[0;31mSource:\u001b[0m \n", "\u001b[0;32mclass\u001b[0m \u001b[0mUCBalpha\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mUCB\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\n", "\u001b[0;34m\u001b[0m \u001b[0;34m\"\"\" The UCB1 (UCB-alpha) index policy, modified to take a random permutation order for the initial exploration of each arm (reduce collisions in the multi-players setting).\u001b[0m\n", "\u001b[0;34m Reference: [Auer et al. 02].\u001b[0m\n", "\u001b[0;34m \"\"\"\u001b[0m\u001b[0;34m\u001b[0m\n", "\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\n", "\u001b[0;34m\u001b[0m \u001b[0;32mdef\u001b[0m \u001b[0m__init__\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mself\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mnbArms\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0malpha\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0mALPHA\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mlower\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0;36m0.\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mamplitude\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0;36m1.\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\n", "\u001b[0;34m\u001b[0m \u001b[0msuper\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mUCBalpha\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mself\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0m__init__\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mnbArms\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mlower\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0mlower\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mamplitude\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0mamplitude\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\n", "\u001b[0;34m\u001b[0m \u001b[0;32massert\u001b[0m \u001b[0malpha\u001b[0m \u001b[0;34m>=\u001b[0m \u001b[0;36m0\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0;34m\"Error: the alpha parameter for UCBalpha class has to be >= 0.\"\u001b[0m \u001b[0;31m# DEBUG\u001b[0m\u001b[0;34m\u001b[0m\n", "\u001b[0;34m\u001b[0m \u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0malpha\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0malpha\u001b[0m \u001b[0;31m#: Parameter alpha\u001b[0m\u001b[0;34m\u001b[0m\n", "\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\n", "\u001b[0;34m\u001b[0m \u001b[0;32mdef\u001b[0m \u001b[0m__str__\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mself\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\n", "\u001b[0;34m\u001b[0m \u001b[0;32mreturn\u001b[0m \u001b[0;34mr\"UCB($\\alpha={:.3g}$)\"\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mformat\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0malpha\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\n", "\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\n", "\u001b[0;34m\u001b[0m \u001b[0;32mdef\u001b[0m \u001b[0mcomputeIndex\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mself\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0marm\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\n", "\u001b[0;34m\u001b[0m \u001b[0;34mr\"\"\" Compute the current index, at time t and after :math:`N_k(t)` pulls of arm k:\u001b[0m\n", "\u001b[0;34m\u001b[0m\n", "\u001b[0;34m .. math:: I_k(t) = \\frac{X_k(t)}{N_k(t)} + \\sqrt{\\frac{\\alpha \\log(t)}{2 N_k(t)}}.\u001b[0m\n", "\u001b[0;34m \"\"\"\u001b[0m\u001b[0;34m\u001b[0m\n", "\u001b[0;34m\u001b[0m \u001b[0;32mif\u001b[0m \u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mpulls\u001b[0m\u001b[0;34m[\u001b[0m\u001b[0marm\u001b[0m\u001b[0;34m]\u001b[0m \u001b[0;34m<\u001b[0m \u001b[0;36m1\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\n", "\u001b[0;34m\u001b[0m \u001b[0;32mreturn\u001b[0m \u001b[0mfloat\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m'+inf'\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\n", "\u001b[0;34m\u001b[0m \u001b[0;32melse\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\n", "\u001b[0;34m\u001b[0m \u001b[0;32mreturn\u001b[0m \u001b[0;34m(\u001b[0m\u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mrewards\u001b[0m\u001b[0;34m[\u001b[0m\u001b[0marm\u001b[0m\u001b[0;34m]\u001b[0m \u001b[0;34m/\u001b[0m \u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mpulls\u001b[0m\u001b[0;34m[\u001b[0m\u001b[0marm\u001b[0m\u001b[0;34m]\u001b[0m\u001b[0;34m)\u001b[0m \u001b[0;34m+\u001b[0m \u001b[0msqrt\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0malpha\u001b[0m \u001b[0;34m*\u001b[0m \u001b[0mlog\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mt\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m)\u001b[0m \u001b[0;34m/\u001b[0m \u001b[0;34m(\u001b[0m\u001b[0;36m2\u001b[0m \u001b[0;34m*\u001b[0m \u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mpulls\u001b[0m\u001b[0;34m[\u001b[0m\u001b[0marm\u001b[0m\u001b[0;34m]\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\n", "\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\n", "\u001b[0;34m\u001b[0m \u001b[0;32mdef\u001b[0m \u001b[0mcomputeAllIndex\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mself\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\n", "\u001b[0;34m\u001b[0m \u001b[0;34m\"\"\" Compute the current indexes for all arms, in a vectorized manner.\"\"\"\u001b[0m\u001b[0;34m\u001b[0m\n", "\u001b[0;34m\u001b[0m \u001b[0mindexes\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0;34m(\u001b[0m\u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mrewards\u001b[0m \u001b[0;34m/\u001b[0m \u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mpulls\u001b[0m\u001b[0;34m)\u001b[0m \u001b[0;34m+\u001b[0m \u001b[0mnp\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0msqrt\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0malpha\u001b[0m \u001b[0;34m*\u001b[0m \u001b[0mnp\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mlog\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mt\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m)\u001b[0m \u001b[0;34m/\u001b[0m \u001b[0;34m(\u001b[0m\u001b[0;36m2\u001b[0m \u001b[0;34m*\u001b[0m \u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mpulls\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\n", "\u001b[0;34m\u001b[0m \u001b[0mindexes\u001b[0m\u001b[0;34m[\u001b[0m\u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mpulls\u001b[0m \u001b[0;34m<\u001b[0m \u001b[0;36m1\u001b[0m\u001b[0;34m]\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mfloat\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m'+inf'\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\n", "\u001b[0;34m\u001b[0m \u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mindex\u001b[0m\u001b[0;34m[\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m]\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mindexes\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n", "\u001b[0;31mFile:\u001b[0m /tmp/SMPyBandits/notebooks/venv3/lib/python3.6/site-packages/SMPyBandits/Policies/UCBalpha.py\n", "\u001b[0;31mType:\u001b[0m type\n", "\n" ] } ], "source": [ "UCBalpha??" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "---\n", "## Creating the problem" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Parameters for the simulation\n", "- $T = 10000$ is the time horizon,\n", "- $N = 10$ is the number of repetitions,\n", "- `N_JOBS = 4` is the number of cores used to parallelize the code." ] }, { "cell_type": "code", "execution_count": 9, "metadata": {}, "outputs": [], "source": [ "HORIZON = 10000\n", "REPETITIONS = 10\n", "N_JOBS = 1" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Some MAB problem with Bernoulli arms\n", "We consider in this example $3$ problems, with `Bernoulli` arms, of different means." ] }, { "cell_type": "code", "execution_count": 10, "metadata": {}, "outputs": [], "source": [ "ENVIRONMENTS = [ # 1) Bernoulli arms\n", " { # A very easy problem, but it is used in a lot of articles\n", " \"arm_type\": Bernoulli,\n", " \"params\": [0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9]\n", " },\n", " { # An other problem, best arm = last, with three groups: very bad arms (0.01, 0.02), middle arms (0.3 - 0.6) and very good arms (0.78, 0.8, 0.82)\n", " \"arm_type\": Bernoulli,\n", " \"params\": [0.01, 0.02, 0.3, 0.4, 0.5, 0.6, 0.795, 0.8, 0.805]\n", " },\n", " { # A very hard problem, as used in [Cappé et al, 2012]\n", " \"arm_type\": Bernoulli,\n", " \"params\": [0.01, 0.01, 0.01, 0.02, 0.02, 0.02, 0.05, 0.05, 0.1]\n", " },\n", " ]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Some RL algorithms\n", "We compare Thompson Sampling against $\\mathrm{UCB}_1$, and $\\mathrm{kl}-\\mathrm{UCB}$." ] }, { "cell_type": "code", "execution_count": 11, "metadata": {}, "outputs": [], "source": [ "POLICIES = [\n", " # --- UCB1 algorithm\n", " {\n", " \"archtype\": UCBalpha,\n", " \"params\": {\n", " \"alpha\": 1\n", " }\n", " },\n", " {\n", " \"archtype\": UCBalpha,\n", " \"params\": {\n", " \"alpha\": 0.5 # Smallest theoretically acceptable value\n", " }\n", " },\n", " # --- Thompson algorithm\n", " {\n", " \"archtype\": Thompson,\n", " \"params\": {}\n", " },\n", " # --- KL algorithms, here only klUCB\n", " {\n", " \"archtype\": klUCB,\n", " \"params\": {}\n", " },\n", " # --- BayesUCB algorithm\n", " {\n", " \"archtype\": BayesUCB,\n", " \"params\": {}\n", " },\n", " ]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Complete configuration for the problem:" ] }, { "cell_type": "code", "execution_count": 12, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "{'horizon': 10000,\n", " 'repetitions': 10,\n", " 'n_jobs': 1,\n", " 'verbosity': 6,\n", " 'environment': [{'arm_type': SMPyBandits.Arms.Bernoulli.Bernoulli,\n", " 'params': [0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9]},\n", " {'arm_type': SMPyBandits.Arms.Bernoulli.Bernoulli,\n", " 'params': [0.01, 0.02, 0.3, 0.4, 0.5, 0.6, 0.795, 0.8, 0.805]},\n", " {'arm_type': SMPyBandits.Arms.Bernoulli.Bernoulli,\n", " 'params': [0.01, 0.01, 0.01, 0.02, 0.02, 0.02, 0.05, 0.05, 0.1]}],\n", " 'policies': [{'archtype': SMPyBandits.Policies.UCBalpha.UCBalpha,\n", " 'params': {'alpha': 1}},\n", " {'archtype': SMPyBandits.Policies.UCBalpha.UCBalpha,\n", " 'params': {'alpha': 0.5}},\n", " {'archtype': SMPyBandits.Policies.Thompson.Thompson, 'params': {}},\n", " {'archtype': SMPyBandits.Policies.klUCB.klUCB, 'params': {}},\n", " {'archtype': SMPyBandits.Policies.BayesUCB.BayesUCB, 'params': {}}]}" ] }, "execution_count": 12, "metadata": {}, "output_type": "execute_result" } ], "source": [ "configuration = {\n", " # --- Duration of the experiment\n", " \"horizon\": HORIZON,\n", " # --- Number of repetition of the experiment (to have an average)\n", " \"repetitions\": REPETITIONS,\n", " # --- Parameters for the use of joblib.Parallel\n", " \"n_jobs\": N_JOBS, # = nb of CPU cores\n", " \"verbosity\": 6, # Max joblib verbosity\n", " # --- Arms\n", " \"environment\": ENVIRONMENTS,\n", " # --- Algorithms\n", " \"policies\": POLICIES,\n", "}\n", "configuration" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "---\n", "## Creating the `Evaluator` object" ] }, { "cell_type": "code", "execution_count": 13, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Number of policies in this comparison: 5\n", "Time horizon: 10000\n", "Number of repetitions: 10\n", "Sampling rate for plotting, delta_t_plot: 1\n", "Number of jobs for parallelization: 1\n", "Using this dictionary to create a new environment:\n", " {'arm_type':