complete_tree_exploration_for_MP_bandits module¶

Experimental code to perform complete tree exploration for Multi-Player bandits.

Algorithms:

Support Selfish 0-greedy, UCB, and klUCB in 3 different variants.
Support also RhoRand, RandTopM and MCTopM, even though they are not memory-less, by using another state representation (inlining the memory of each player, eg the ranks for RhoRand).

Features:

For the means of each arm, \(\mu_1, \dots, \mu_K\), this script can use exact formal computations with sympy, or fractions with Fraction, or float number.
The graph can contain all nodes from root to leafs, or only leafs (with summed probabilities), and possibly only the absorbing nodes are showed.
Support export of the tree to a GraphViz dot graph, and can save it to SVG/PNG and LaTeX (with Tikz) and PDF etc.
By default, the root is highlighted in green and the absorbing nodes are in red.

Warning

I still have to fix these issues:

TODO : right now, it is not so efficient, could it be improved? I don’t think I can do anything in a smarter way, in pure Python.

Requirements:

‘sympy’ module to use formal means \(\mu_1, \dots, \mu_K\) instead of numbers,
‘numpy’ module for computations on indexes (e.g., np.where),
‘graphviz’ module to generate the graph and save it,
‘dot2tex’ module to generate nice LaTeX (with Tikz) graph and save it to PDF.

Note

To use the ‘dot2tex’ module, only Python2 is supported. However, I maintain an unpublished port of ‘dot2tex’ for Python3, see [here](https://github.com/Naereen/dot2tex), that you can download, and install manually (sudo python3 setup.py install) to have ‘dot2tex’ for Python3 also.

About:

Date: 16/09/2017.
Author: Lilian Besson, (C) 2017
Licence: MIT Licence (http://lbesson.mit-license.org).

complete_tree_exploration_for_MP_bandits.oo = inf¶: Shortcut for float(‘+inf’).

complete_tree_exploration_for_MP_bandits.PLOT_DIR = 'plots/trees'¶: Directory for the plots

complete_tree_exploration_for_MP_bandits.tupleit1(anarray)[source]¶: Convert a non-hashable 1D numpy array to a hashable tuple.

complete_tree_exploration_for_MP_bandits.tupleit2(anarray)[source]¶: Convert a non-hashable 2D numpy array to a hashable tuple-of-tuples.

complete_tree_exploration_for_MP_bandits.prod(iterator)[source]¶: Product of the values in this iterator.

complete_tree_exploration_for_MP_bandits.WIDTH = 200¶: Default value for the width parameter for wraptext() and wraplatex().

complete_tree_exploration_for_MP_bandits.wraptext(text, width=200)[source]¶: Wrap the text, using textwrap module, and width.

complete_tree_exploration_for_MP_bandits.mybool(s)[source]¶

complete_tree_exploration_for_MP_bandits.ONLYLEAFS = True¶: By default, aim at the most concise graph representation by only showing the leafs.

complete_tree_exploration_for_MP_bandits.ONLYABSORBING = False¶: By default, don’t aim at the most concise graph representation by only showing the absorbing leafs.

complete_tree_exploration_for_MP_bandits.CONCISE = True¶: By default, only show \(\tilde{S}\) and \(N\) in the graph representations, not all the 4 vectors.

complete_tree_exploration_for_MP_bandits.FULLHASH = False¶: Use only Stilde, N for hashing the states.

complete_tree_exploration_for_MP_bandits.FORMAT = 'svg'¶: Format used to save the graphs.

complete_tree_exploration_for_MP_bandits.FixedArm[source]¶: Fake player j that always targets at arm j.

complete_tree_exploration_for_MP_bandits.UniformExploration[source]¶: Fake player j that always targets all arms.

complete_tree_exploration_for_MP_bandits.ConstantRank[source]¶: Constant rank no matter what.

complete_tree_exploration_for_MP_bandits.choices_from_indexes[source]¶: For deterministic index policies, if more than one index is maximum, return the list of positions attaining this maximum (ties), or only one position.

complete_tree_exploration_for_MP_bandits.Selfish_0Greedy_U[source]¶: Selfish policy + 0-Greedy index + U feedback.

complete_tree_exploration_for_MP_bandits.Selfish_0Greedy_Utilde[source]¶: Selfish policy + 0-Greedy index + Utilde feedback.

complete_tree_exploration_for_MP_bandits.Selfish_0Greedy_Ubar[source]¶: Selfish policy + 0-Greedy index + Ubar feedback.

complete_tree_exploration_for_MP_bandits.Selfish_UCB_U[source]¶: Selfish policy + UCB_0.5 index + U feedback.

complete_tree_exploration_for_MP_bandits.Selfish_UCB[source]¶: Selfish policy + UCB_0.5 index + Utilde feedback.

complete_tree_exploration_for_MP_bandits.Selfish_UCB_Utilde¶: Selfish policy + UCB_0.5 index + Utilde feedback.

complete_tree_exploration_for_MP_bandits.Selfish_UCB_Ubar[source]¶: Selfish policy + UCB_0.5 index + Ubar feedback.

complete_tree_exploration_for_MP_bandits.Selfish_KLUCB_U[source]¶: Selfish policy + Bernoulli KL-UCB index + U feedback.

complete_tree_exploration_for_MP_bandits.Selfish_KLUCB[source]¶: Selfish policy + Bernoulli KL-UCB index + Utilde feedback.

complete_tree_exploration_for_MP_bandits.Selfish_KLUCB_Utilde¶: Selfish policy + Bernoulli KL-UCB index + Utilde feedback.

complete_tree_exploration_for_MP_bandits.Selfish_KLUCB_Ubar[source]¶: Selfish policy + Bernoulli KL-UCB index + Ubar feedback.

complete_tree_exploration_for_MP_bandits.choices_from_indexes_with_rank[source]¶: For deterministic index policies, if more than one index is maximum, return the list of positions attaining the rank-th largest index (with more than one if ties, or only one position).

complete_tree_exploration_for_MP_bandits.RhoRand_UCB_U[source]¶: RhoRand policy + UCB_0.5 index + U feedback.

complete_tree_exploration_for_MP_bandits.RhoRand_UCB_Utilde[source]¶: RhoRand policy + UCB_0.5 index + Utilde feedback.

complete_tree_exploration_for_MP_bandits.RhoRand_UCB_Ubar[source]¶: RhoRand policy + UCB_0.5 index + Ubar feedback.

complete_tree_exploration_for_MP_bandits.RhoRand_KLUCB_U[source]¶: RhoRand policy + Bernoulli KL-UCB index + U feedback.

complete_tree_exploration_for_MP_bandits.RhoRand_KLUCB_Utilde[source]¶: RhoRand policy + Bernoulli KL-UCB index + Utilde feedback.

complete_tree_exploration_for_MP_bandits.RhoRand_KLUCB_Ubar[source]¶: RhoRand policy + Bernoulli KL-UCB index + Ubar feedback.

complete_tree_exploration_for_MP_bandits.RandomNewRank[source]¶: RhoRand chooses a new uniform rank in {1,..,M} in case of collision, or keep the same.

complete_tree_exploration_for_MP_bandits.default_policy¶: RhoRand policy + UCB_0.5 index + U feedback.

complete_tree_exploration_for_MP_bandits.default_update_memory¶: RhoRand chooses a new uniform rank in {1,..,M} in case of collision, or keep the same.

complete_tree_exploration_for_MP_bandits.RandTopM_UCB_U[source]¶: RandTopM policy + UCB_0.5 index + U feedback.

complete_tree_exploration_for_MP_bandits.RandTopM_UCB_Utilde[source]¶: RandTopM policy + UCB_0.5 index + Utilde feedback.

complete_tree_exploration_for_MP_bandits.RandTopM_UCB_Ubar[source]¶: RandTopM policy + UCB_0.5 index + Ubar feedback.

complete_tree_exploration_for_MP_bandits.RandTopM_KLUCB_U[source]¶: RandTopM policy + Bernoulli KL-UCB index + U feedback.

complete_tree_exploration_for_MP_bandits.RandTopM_KLUCB_Utilde[source]¶: RandTopM policy + Bernoulli KL-UCB index + Utilde feedback.

complete_tree_exploration_for_MP_bandits.RandTopM_KLUCB_Ubar[source]¶: RandTopM policy + Bernoulli KL-UCB index + Ubar feedback.

complete_tree_exploration_for_MP_bandits.RandTopM_RandomNewChosenArm[source]¶: RandTopM chooses a new arm after a collision or if the chosen arm lies outside of its estimatedBestArms set, uniformly from the set of estimated M best arms, or keep the same.

complete_tree_exploration_for_MP_bandits.write_to_tuple[source]¶: Tuple cannot be written, this hack fixes that.

complete_tree_exploration_for_MP_bandits.MCTopM_UCB_U[source]¶: MCTopM policy + UCB_0.5 index + U feedback.

complete_tree_exploration_for_MP_bandits.MCTopM_UCB_Utilde[source]¶: MCTopM policy + UCB_0.5 index + Utilde feedback.

complete_tree_exploration_for_MP_bandits.MCTopM_UCB_Ubar[source]¶: MCTopM policy + UCB_0.5 index + Ubar feedback.

complete_tree_exploration_for_MP_bandits.MCTopM_KLUCB_U[source]¶: MCTopM policy + Bernoulli KL-UCB index + U feedback.

complete_tree_exploration_for_MP_bandits.MCTopM_KLUCB_Utilde[source]¶: MCTopM policy + Bernoulli KL-UCB index + Utilde feedback.

complete_tree_exploration_for_MP_bandits.MCTopM_KLUCB_Ubar[source]¶: MCTopM policy + Bernoulli KL-UCB index + Ubar feedback.

complete_tree_exploration_for_MP_bandits.MCTopM_RandomNewChosenArm[source]¶: RandTopMC chooses a new arm after if the chosen arm lies outside of its estimatedBestArms set, uniformly from the set of estimated M best arms, or keep the same.

complete_tree_exploration_for_MP_bandits.symbol_means(K)[source]¶: Better to work directly with symbols and instantiate the results after.

complete_tree_exploration_for_MP_bandits.random_uniform_means(K)[source]¶: If needed, generate an array of K (numerical) uniform means in [0, 1].

complete_tree_exploration_for_MP_bandits.uniform_means(nbArms=3, delta=0.1, lower=0.0, amplitude=1.0)[source]¶

Return a list of means of arms, well spaced:

in [lower, lower + amplitude],
sorted in increasing order,
starting from lower + amplitude * delta, up to lower + amplitude * (1 - delta),
and there is nbArms arms.

>>> np.array(uniform_means(2, 0.1))
array([ 0.1,  0.9])
>>> np.array(uniform_means(3, 0.1))
array([ 0.1,  0.5,  0.9])
>>> np.array(uniform_means(9, 1 / (1. + 9)))
array([ 0.1,  0.2,  0.3,  0.4,  0.5,  0.6,  0.7,  0.8,  0.9])

complete_tree_exploration_for_MP_bandits.proba2float(proba, values=None, K=None, names=None)[source]¶: Replace mu_k by a numerical value and evaluation the formula.

complete_tree_exploration_for_MP_bandits.simplify(proba)[source]¶: Try to simplify the expression of the probability.

complete_tree_exploration_for_MP_bandits.proba2str(proba, latex=False, html_in_var_names=False)[source]¶: Pretty print a proba, either a number, a Fraction, or a sympy expression.

complete_tree_exploration_for_MP_bandits.tex2pdf(filename)[source]¶: Naive call to command line pdflatex, twice.

class complete_tree_exploration_for_MP_bandits.State(S, Stilde, N, Ntilde, mus, players, depth=0)[source]¶

Bases: object

Not space-efficient representation of a state in the system we model.

S, Stilde, N, Ntilde: are arrays of size (M, K),
depth, t, M, K: integers, to avoid recomputing them,
mus: the problem parameters (only for Bernoulli arms),
players: is a list of algorithms,
probas: list of transition probabilities,
children: list of all possible next states (transitions).

__init__(S, Stilde, N, Ntilde, mus, players, depth=0)[source]¶: Create a new state. Arrays S, Stilde, N, Ntilde are copied to avoid modify previous values!

S = None¶: sensing feedback

Stilde = None¶: number of sensing trials

N = None¶: number of succesful transmissions

Ntilde = None¶: number of trials without collisions

depth = None¶: current depth of the exploration tree

t = None¶: current time step. Simply = sum(N[0]) = sum(N[i]) for all player i, but easier to compute it once and store it

M = None¶: number of players

K = None¶: number of arms (channels)

children = None¶: list of next state, representing all the possible transitions

probas = None¶: probabilities of transitions

__str__(concise=True)[source]¶: Return str(self).

to_node(concise=True)[source]¶: Print the state as a small string to be attached to a GraphViz node.

to_dot(title='', name='', comment='', latex=False, html_in_var_names=False, ext='svg', onlyleafs=True, onlyabsorbing=False, concise=True)[source]¶

Convert the state to a .dot graph, using GraphViz. See http://graphviz.readthedocs.io/ for more details.

onlyleafs: only print the root and the leafs, to see a concise representation of the tree.
onlyabsorbing: only print the absorbing leafs, to see a really concise representation of the tree.
concise: weather to use the short representation of states (using \(\tilde{S}\) and \(N\)) or the long one (using the 4 variables).
html_in_var_names: experimental use of <SUB>..</SUB> and <SUP>..</SUP> in the label for the tree.
latex: experimental use of _{..} and ^{..} in the label for the tree, to use with dot2tex.

saveto(filename, view=True, title='', name='', comment='', latex=False, html_in_var_names=False, ext='svg', onlyleafs=True, onlyabsorbing=False, concise=True)[source]¶

copy()[source]¶: Get a new copy of that state with same S, Stilde, N, Ntilde but no probas and no children (and depth=0).

__hash__(full=False)[source]¶: Hash the matrix Stilde and N of the state.

is_absorbing()[source]¶: Try to detect if this state is absorbing, ie only one transition is possible, and again infinitely for the only child.

Warning

Still very experimental!

has_absorbing_child_whole_subtree()[source]¶: Try to detect if this state has an absorbing child in the whole subtree.

explore_from_node_to_depth(depth=1)[source]¶: Compute recursively the one_depth children of the root and its children.

compute_one_depth()[source]¶: Use all_deltas to store all the possible transitions and their probabilities. Increase depth by 1 at the end.

all_absorbing_states(depth=1)[source]¶

Generator that yields all the absorbing nodes of the tree, one by one.

It might not find any,
It does so without merging common nodes, in order to find the first absorbing node as quick as possible.

absorbing_states_one_depth()[source]¶: Use all_deltas to yield all the absorbing one-depth child and their probabilities.

find_N_absorbing_states(N=1, maxdepth=8)[source]¶: Find at least N absorbing states, by considering a large depth.

all_deltas()[source]¶

Generator that yields functions transforming state to another state.

It is memory efficient as it is a generator.
Do not convert that to a list or it might use all your system memory: each returned value is a function with code and variables inside!

pretty_print_result_recursively()[source]¶: Print all the transitions, depth by depth (recursively).

get_all_leafs()[source]¶: Recurse and get all the leafs. Many different state can be present in the list of leafs, with possibly different probabilities (each correspond to a trajectory).

get_unique_leafs()[source]¶: Compute all the leafs (deepest children) and merge the common one to compute their full probabilities.

proba_reaching_absorbing_state()[source]¶: Compute the probability of reaching a leaf that is an absorbing state.

__dict__ = mappingproxy({'__module__': 'complete_tree_exploration_for_MP_bandits', '__doc__': 'Not space-efficient representation of a state in the system we model.\n\n - S, Stilde, N, Ntilde: are arrays of size (M, K),\n - depth, t, M, K: integers, to avoid recomputing them,\n - mus: the problem parameters (only for Bernoulli arms),\n - players: is a list of algorithms,\n - probas: list of transition probabilities,\n - children: list of all possible next states (transitions).\n ', '__init__': <function State.__init__>, '__str__': <function State.__str__>, 'to_node': <function State.to_node>, 'to_dot': <function State.to_dot>, 'saveto': <function State.saveto>, 'copy': <function State.copy>, '__hash__': <function State.__hash__>, 'is_absorbing': <function State.is_absorbing>, 'has_absorbing_child_whole_subtree': <function State.has_absorbing_child_whole_subtree>, 'explore_from_node_to_depth': <function State.explore_from_node_to_depth>, 'compute_one_depth': <function State.compute_one_depth>, 'all_absorbing_states': <function State.all_absorbing_states>, 'absorbing_states_one_depth': <function State.absorbing_states_one_depth>, 'find_N_absorbing_states': <function State.find_N_absorbing_states>, 'all_deltas': <function State.all_deltas>, 'pretty_print_result_recursively': <function State.pretty_print_result_recursively>, 'get_all_leafs': <function State.get_all_leafs>, 'get_unique_leafs': <function State.get_unique_leafs>, 'proba_reaching_absorbing_state': <function State.proba_reaching_absorbing_state>, '__dict__': <attribute '__dict__' of 'State' objects>, '__weakref__': <attribute '__weakref__' of 'State' objects>})¶

__module__ = 'complete_tree_exploration_for_MP_bandits'¶

__weakref__¶: list of weak references to the object (if defined)

class complete_tree_exploration_for_MP_bandits.StateWithMemory(S, Stilde, N, Ntilde, mus, players, update_memories, memories=None, depth=0)[source]¶

Bases: complete_tree_exploration_for_MP_bandits.State

State with a memory for each player, to represent and play with RhoRand etc.

__init__(S, Stilde, N, Ntilde, mus, players, update_memories, memories=None, depth=0)[source]¶: Create a new state. Arrays S, Stilde, N, Ntilde are copied to avoid modify previous values!

memories = None¶: Personal memory for all players, can be a rank in {1,..,M} for rhoRand, or anything else.

__str__(concise=False)[source]¶: Return str(self).

to_node(concise=True)[source]¶: Print the state as a small string to be attached to a GraphViz node.

copy()[source]¶: Get a new copy of that state with same S, Stilde, N, Ntilde but no probas and no children (and depth=0).

__hash__(full=False)[source]¶: Hash the matrix Stilde and N of the state and memories of the players (ie. ranks for RhoRand).

is_absorbing()[source]¶: Try to detect if this state is absorbing, ie only one transition is possible, and again infinitely for the only child.

Warning

Still very experimental!

all_deltas()[source]¶

Generator that yields functions transforming state to another state.

It is memory efficient as it is a generator.
Do not convert that to a list or it might use all your system memory: each returned value is a function with code and variables inside!

__module__ = 'complete_tree_exploration_for_MP_bandits'¶

complete_tree_exploration_for_MP_bandits.main(depth=1, players=None, update_memories=None, mus=None, M=2, K=2, S=None, Stilde=None, N=None, Ntilde=None, find_only_N=None)[source]¶: Compute all the transitions, and print them.

complete_tree_exploration_for_MP_bandits.test(depth=1, M=2, K=2, S=None, Stilde=None, N=None, Ntilde=None, mus=None, debug=True, all_players=None, all_update_memories=None, find_only_N=None)[source]¶: Test the main exploration function for various all_players.