{ "cells": [ { "cell_type": "markdown", "metadata": { "toc": "true" }, "source": [ "# Table of Contents\n", "
" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "---\n", "# Lai & Robbins lower-bound for stochastic bandit with full restart points\n", "\n", "First, be sure to be in the main folder, or to have installed [`SMPyBandits`](https://github.com/SMPyBandits/SMPyBandits), and import `Evaluator` from `Environment` package:" ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [], "source": [ "import numpy as np\n", "import matplotlib.pyplot as plt" ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Requirement already satisfied: SMPyBandits in ./venv3/lib/python3.6/site-packages (0.9.4)\n", "Requirement already satisfied: watermark in ./venv3/lib/python3.6/site-packages (1.7.0)\n", "Requirement already satisfied: seaborn in ./venv3/lib/python3.6/site-packages (from SMPyBandits) (0.9.0)\n", "Requirement already satisfied: scikit-learn in ./venv3/lib/python3.6/site-packages (from SMPyBandits) (0.20.0)\n", "Requirement already satisfied: numpy in ./venv3/lib/python3.6/site-packages (from SMPyBandits) (1.15.4)\n", "Requirement already satisfied: scipy>0.9 in ./venv3/lib/python3.6/site-packages (from SMPyBandits) (1.1.0)\n", "Requirement already satisfied: joblib in ./venv3/lib/python3.6/site-packages (from SMPyBandits) (0.13.0)\n", "Requirement already satisfied: scikit-optimize in ./venv3/lib/python3.6/site-packages (from SMPyBandits) (0.5.2)\n", "Requirement already satisfied: matplotlib>=2 in ./venv3/lib/python3.6/site-packages (from SMPyBandits) (3.0.2)\n", "Requirement already satisfied: ipython in ./venv3/lib/python3.6/site-packages (from watermark) (7.1.1)\n", "Requirement already satisfied: pandas>=0.15.2 in ./venv3/lib/python3.6/site-packages (from seaborn->SMPyBandits) (0.23.4)\n", "Requirement already satisfied: python-dateutil>=2.1 in ./venv3/lib/python3.6/site-packages (from matplotlib>=2->SMPyBandits) (2.7.5)\n", "Requirement already satisfied: pyparsing!=2.0.4,!=2.1.2,!=2.1.6,>=2.0.1 in ./venv3/lib/python3.6/site-packages (from matplotlib>=2->SMPyBandits) (2.3.0)\n", "Requirement already satisfied: cycler>=0.10 in ./venv3/lib/python3.6/site-packages (from matplotlib>=2->SMPyBandits) (0.10.0)\n", "Requirement already satisfied: kiwisolver>=1.0.1 in ./venv3/lib/python3.6/site-packages (from matplotlib>=2->SMPyBandits) (1.0.1)\n", "Requirement already satisfied: decorator in ./venv3/lib/python3.6/site-packages (from ipython->watermark) (4.3.0)\n", "Requirement already satisfied: setuptools>=18.5 in ./venv3/lib/python3.6/site-packages (from ipython->watermark) (40.6.2)\n", "Requirement already satisfied: backcall in ./venv3/lib/python3.6/site-packages (from ipython->watermark) (0.1.0)\n", "Requirement already satisfied: traitlets>=4.2 in ./venv3/lib/python3.6/site-packages (from ipython->watermark) (4.3.2)\n", "Requirement already satisfied: jedi>=0.10 in ./venv3/lib/python3.6/site-packages (from ipython->watermark) (0.13.1)\n", "Requirement already satisfied: pexpect; sys_platform != \"win32\" in ./venv3/lib/python3.6/site-packages (from ipython->watermark) (4.6.0)\n", "Requirement already satisfied: pygments in ./venv3/lib/python3.6/site-packages (from ipython->watermark) (2.2.0)\n", "Requirement already satisfied: prompt-toolkit<2.1.0,>=2.0.0 in ./venv3/lib/python3.6/site-packages (from ipython->watermark) (2.0.7)\n", "Requirement already satisfied: pickleshare in ./venv3/lib/python3.6/site-packages (from ipython->watermark) (0.7.5)\n", "Requirement already satisfied: pytz>=2011k in ./venv3/lib/python3.6/site-packages (from pandas>=0.15.2->seaborn->SMPyBandits) (2018.7)\n", "Requirement already satisfied: six>=1.5 in ./venv3/lib/python3.6/site-packages (from python-dateutil>=2.1->matplotlib>=2->SMPyBandits) (1.11.0)\n", "Requirement already satisfied: ipython-genutils in ./venv3/lib/python3.6/site-packages (from traitlets>=4.2->ipython->watermark) (0.2.0)\n", "Requirement already satisfied: parso>=0.3.0 in ./venv3/lib/python3.6/site-packages (from jedi>=0.10->ipython->watermark) (0.3.1)\n", "Requirement already satisfied: ptyprocess>=0.5 in ./venv3/lib/python3.6/site-packages (from pexpect; sys_platform != \"win32\"->ipython->watermark) (0.6.0)\n", "Requirement already satisfied: wcwidth in ./venv3/lib/python3.6/site-packages (from prompt-toolkit<2.1.0,>=2.0.0->ipython->watermark) (0.1.7)\n", "Info: Using the Jupyter notebook version of the tqdm() decorator, tqdm_notebook() ...\n", "Lilian Besson \n", "\n", "CPython 3.6.6\n", "IPython 7.1.1\n", "\n", "SMPyBandits 0.9.4\n", "\n", "compiler : GCC 8.0.1 20180414 (experimental) [trunk revision 259383\n", "system : Linux\n", "release : 4.15.0-38-generic\n", "machine : x86_64\n", "processor : x86_64\n", "CPU cores : 4\n", "interpreter: 64bit\n" ] } ], "source": [ "!pip install SMPyBandits watermark\n", "%load_ext watermark\n", "%watermark -v -m -p SMPyBandits -a \"Lilian Besson\"" ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [], "source": [ "# Local imports\n", "from SMPyBandits.Environment import Evaluator, tqdm\n", "from SMPyBandits.Environment.plotsettings import legend, makemarkers" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We also need arms, for instance `Bernoulli`-distributed arm:" ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [], "source": [ "# Import arms\n", "from SMPyBandits.Arms import Bernoulli" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "And finally we need some single-player Reinforcement Learning algorithms:" ] }, { "cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [], "source": [ "# Import algorithms\n", "from SMPyBandits.Policies import *" ] }, { "cell_type": "code", "execution_count": 8, "metadata": {}, "outputs": [], "source": [ "import matplotlib as mpl\n", "mpl.rcParams['figure.figsize'] = (12.4, 7)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "---\n", "## Creating the problem" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Parameters for the simulation\n", "- $T = 20000$ is the time horizon,\n", "- $N = 40$ is the number of repetitions,\n", "- `N_JOBS = 4` is the number of cores used to parallelize the code." ] }, { "cell_type": "code", "execution_count": 9, "metadata": {}, "outputs": [], "source": [ "HORIZON = 20000\n", "REPETITIONS = 40\n", "N_JOBS = 4" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Some MAB problem with Bernoulli arms\n", "We consider in this example $3$ problems, with `Bernoulli` arms, of different means." ] }, { "cell_type": "code", "execution_count": 10, "metadata": {}, "outputs": [], "source": [ "ENVIRONMENTS = [ # 1) Bernoulli arms\n", " { # A very easy problem, but it is used in a lot of articles\n", " \"arm_type\": Bernoulli,\n", " \"params\": [0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9]\n", " }\n", " ]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Some RL algorithms\n", "We compare some policies that use the [`DoublingTrickWrapper`](https://smpybandits.github.io/docs/Policies.DoublingTrickWrapper.html#module-Policies.DoublingTrickWrapper) policy, with a common growing scheme." ] }, { "cell_type": "code", "execution_count": 11, "metadata": {}, "outputs": [], "source": [ "NEXT_HORIZONS = [\n", " # next_horizon__arithmetic,\n", " next_horizon__geometric,\n", " # next_horizon__exponential,\n", " # next_horizon__exponential_slow,\n", " next_horizon__exponential_generic\n", "]" ] }, { "cell_type": "code", "execution_count": 12, "metadata": {}, "outputs": [], "source": [ "POLICIES = [\n", " # --- Doubling trick algorithm\n", " {\n", " \"archtype\": DoublingTrickWrapper,\n", " \"params\": {\n", " \"next_horizon\": next_horizon,\n", " \"full_restart\": full_restart,\n", " \"policy\": policy,\n", " }\n", " }\n", " for policy in [\n", " UCBH,\n", " MOSSH,\n", " klUCBPlusPlus,\n", " ApproximatedFHGittins,\n", " ]\n", " for full_restart in [\n", " True,\n", " # False,\n", " ]\n", " for next_horizon in NEXT_HORIZONS\n", "]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Complete configuration for the problem:" ] }, { "cell_type": "code", "execution_count": 13, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "{'horizon': 20000,\n", " 'repetitions': 40,\n", " 'n_jobs': 4,\n", " 'verbosity': 6,\n", " 'environment': [{'arm_type': SMPyBandits.Arms.Bernoulli.Bernoulli,\n", " 'params': [0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9]}],\n", " 'policies': [{'archtype': SMPyBandits.Policies.DoublingTrickWrapper.DoublingTrickWrapper,\n", " 'params': {'next_horizon': CPUDispatcher(