{ "cells": [ { "cell_type": "markdown", "metadata": { "toc": "true" }, "source": [ "# Table of Contents\n", "
" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "---\n", "# Demonstrations of Single-Player Simulations for Non-Stationary-Bandits\n", "\n", "This notebook shows how to 1) **define**, 2) **launch**, and 3) **plot the results** of numerical simulations of piecewise stationary (multi-armed) bandits problems using my framework [SMPyBandits](https://github.com/SMPyBandits/SMPyBandits).\n", "For more details on the maths behind this problem, see this page in the documentation: [SMPyBandits.GitHub.io/NonStationaryBandits.html](https://smpybandits.github.io/NonStationaryBandits.html).\n", "\n", "First, be sure to be in the main folder, or to have [SMPyBandits](https://github.com/SMPyBandits/SMPyBandits) installed, and import `Evaluator` from `Environment` package.\n", "\n", "WARNING\n", "If you are running this notebook locally, in the [`notebooks`](https://github.com/SMPyBandits/SMPyBandits/tree/master/notebooks) folder in the [`SMPyBandits`](https://github.com/SMPyBandits/SMPyBandits/) source, you need to do:" ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [], "source": [ "import sys\n", "sys.path.insert(0, '..')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "If you are running this notebook elsewhere, `SMPyBandits` can be `pip install`ed easily:\n", "(this is especially true if you run this notebook from Google Colab or MyBinder)." ] }, { "cell_type": "code", "execution_count": 2, "metadata": { "scrolled": true }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Info: Using the Jupyter notebook version of the tqdm() decorator, tqdm_notebook() ...\n" ] } ], "source": [ "try:\n", " import SMPyBandits\n", "except ImportError:\n", " !pip3 install SMPyBandits" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Let's just check the versions of the installed modules:" ] }, { "cell_type": "code", "execution_count": 3, "metadata": { "scrolled": true }, "outputs": [], "source": [ "!pip3 install watermark > /dev/null" ] }, { "cell_type": "code", "execution_count": 4, "metadata": { "scrolled": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Lilian Besson \n", "\n", "CPython 3.6.7\n", "IPython 7.2.0\n", "\n", "SMPyBandits 0.9.4\n", "numpy 1.15.4\n", "matplotlib 3.0.2\n", "\n", "compiler : GCC 8.2.0\n", "system : Linux\n", "release : 4.15.0-42-generic\n", "machine : x86_64\n", "processor : x86_64\n", "CPU cores : 4\n", "interpreter: 64bit\n" ] } ], "source": [ "%load_ext watermark\n", "%watermark -v -m -p SMPyBandits,numpy,matplotlib -a \"Lilian Besson\"" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We can now import all the modules we need for this demonstration." ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [], "source": [ "import numpy as np" ] }, { "cell_type": "code", "execution_count": 44, "metadata": {}, "outputs": [], "source": [ "FIGSIZE = (19.80, 10.80)\n", "DPI = 160" ] }, { "cell_type": "code", "execution_count": 45, "metadata": {}, "outputs": [], "source": [ "# Large figures for pretty notebooks\n", "import matplotlib as mpl\n", "mpl.rcParams['figure.figsize'] = FIGSIZE\n", "mpl.rcParams['figure.dpi'] = DPI" ] }, { "cell_type": "code", "execution_count": 46, "metadata": {}, "outputs": [], "source": [ "# Local imports\n", "from SMPyBandits.Environment import Evaluator, tqdm" ] }, { "cell_type": "code", "execution_count": 47, "metadata": {}, "outputs": [], "source": [ "# Large figures for pretty notebooks\n", "import matplotlib as mpl\n", "mpl.rcParams['figure.figsize'] = FIGSIZE\n", "mpl.rcParams['figure.dpi'] = DPI" ] }, { "cell_type": "code", "execution_count": 48, "metadata": {}, "outputs": [], "source": [ "# Large figures for pretty notebooks\n", "import matplotlib as mpl\n", "mpl.rcParams['figure.figsize'] = FIGSIZE\n", "mpl.rcParams['figure.dpi'] = DPI" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We also need arms, for instance `Bernoulli`-distributed arm:" ] }, { "cell_type": "code", "execution_count": 10, "metadata": {}, "outputs": [], "source": [ "# Import arms\n", "from SMPyBandits.Arms import Bernoulli" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "And finally we need some single-player Reinforcement Learning algorithms:" ] }, { "cell_type": "code", "execution_count": 11, "metadata": {}, "outputs": [], "source": [ "# Import algorithms\n", "from SMPyBandits.Policies import *" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "---\n", "## Creating the problem" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Parameters for the simulation\n", "- $T = 2000$ is the time horizon,\n", "- $N = 100$ is the number of repetitions, or 1 to debug the simulations,\n", "- `N_JOBS = 4` is the number of cores used to parallelize the code,\n", "- $5$ piecewise stationary sequences will have length 400" ] }, { "cell_type": "code", "execution_count": 49, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Using 4 jobs in parallel...\n" ] } ], "source": [ "from multiprocessing import cpu_count\n", "CPU_COUNT = cpu_count()\n", "N_JOBS = CPU_COUNT if CPU_COUNT <= 4 else CPU_COUNT - 4\n", "\n", "print(\"Using {} jobs in parallel...\".format(N_JOBS))" ] }, { "cell_type": "code", "execution_count": 51, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Using T = 2000, and N = 50 repetitions\n" ] } ], "source": [ "HORIZON = 2000\n", "REPETITIONS = 50\n", "\n", "print(\"Using T = {}, and N = {} repetitions\".format(HORIZON, REPETITIONS))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Two MAB problems with Bernoulli arms and piecewise stationary means\n", "\n", "We consider in this example $2$ problems, with `Bernoulli` arms, of different piecewise stationary means.\n", "\n", "1. The first problem has changes on only one arm at every breakpoint times,\n", "2. The second problem has changes on all arms at every breakpoint times." ] }, { "cell_type": "code", "execution_count": 53, "metadata": {}, "outputs": [], "source": [ "ENVIRONMENTS = []" ] }, { "cell_type": "code", "execution_count": 54, "metadata": {}, "outputs": [], "source": [ "ENVIRONMENT_0 = { # A simple piece-wise stationary problem\n", " \"arm_type\": Bernoulli,\n", " \"params\": {\n", " \"listOfMeans\": [\n", " [0.2, 0.5, 0.9], # 0 to 399\n", " [0.2, 0.2, 0.9], # 400 to 799\n", " [0.2, 0.2, 0.1], # 800 to 1199\n", " [0.7, 0.2, 0.1], # 1200 to 1599\n", " [0.7, 0.5, 0.1], # 1600 to end\n", " ],\n", " \"changePoints\": [\n", " int(0 * HORIZON / 2000.0),\n", " int(400 * HORIZON / 2000.0),\n", " int(800 * HORIZON / 2000.0),\n", " int(1200 * HORIZON / 2000.0),\n", " int(1600 * HORIZON / 2000.0),\n", " ],\n", " }\n", "}" ] }, { "cell_type": "code", "execution_count": 55, "metadata": {}, "outputs": [], "source": [ "# Pb 2 changes are on all or almost arms at a time\n", "ENVIRONMENT_1 = { # A simple piece-wise stationary problem\n", " \"arm_type\": Bernoulli,\n", " \"params\": {\n", " \"listOfMeans\": [\n", " [0.4, 0.5, 0.9], # 0 to 399\n", " [0.5, 0.4, 0.7], # 400 to 799\n", " [0.6, 0.3, 0.5], # 800 to 1199\n", " [0.7, 0.2, 0.3], # 1200 to 1599\n", " [0.8, 0.1, 0.1], # 1600 to end\n", " ],\n", " \"changePoints\": [\n", " int(0 * HORIZON / 2000.0),\n", " int(400 * HORIZON / 2000.0),\n", " int(800 * HORIZON / 2000.0),\n", " int(1200 * HORIZON / 2000.0),\n", " int(1600 * HORIZON / 2000.0),\n", " ],\n", " }\n", "}" ] }, { "cell_type": "code", "execution_count": 56, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "==> Using K = 3 arms\n", "==> Using Upsilon_T = 5 change points\n", "==> Using the following [0, 400, 800, 1200, 1600] change points\n" ] } ], "source": [ "ENVIRONMENTS = [\n", " ENVIRONMENT_0,\n", " ENVIRONMENT_1,\n", "]\n", "\n", "list_nb_arms = [len(env[\"params\"][\"listOfMeans\"][0]) for env in ENVIRONMENTS]\n", "NB_ARMS = max(list_nb_arms)\n", "assert all(n == NB_ARMS for n in list_nb_arms), \"Error: it is NOT supported to have successive problems with a different number of arms!\"\n", "print(\"==> Using K = {} arms\".format(NB_ARMS))\n", "\n", "NB_BREAK_POINTS = max(len(env[\"params\"][\"changePoints\"]) for env in ENVIRONMENTS)\n", "print(\"==> Using Upsilon_T = {} change points\".format(NB_BREAK_POINTS))\n", "\n", "CHANGE_POINTS = np.unique(np.array(list(set.union(*(set(env[\"params\"][\"changePoints\"]) for env in ENVIRONMENTS)))))\n", "print(\"==> Using the following {} change points\".format(list(CHANGE_POINTS)))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Some MAB algorithms\n", "\n", "We want compare some classical MAB algorithms ($\\mathrm{UCB}_1$, Thompson Sampling and $\\mathrm{kl}$-$\\mathrm{UCB}$) that are designed to solve stationary problems against other algorithms designed to solve piecewise-stationary problems." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Parameters of the algorithms" ] }, { "cell_type": "code", "execution_count": 15, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "