# complete_tree_exploration_for_MP_bandits module¶

Experimental code to perform complete tree exploration for Multi-Player bandits.

Algorithms:

• Support Selfish 0-greedy, UCB, and klUCB in 3 different variants.

• Support also RhoRand, RandTopM and MCTopM, even though they are not memory-less, by using another state representation (inlining the memory of each player, eg the ranks for RhoRand).

Features:

• For the means of each arm, $$\mu_1, \dots, \mu_K$$, this script can use exact formal computations with sympy, or fractions with Fraction, or float number.

• The graph can contain all nodes from root to leafs, or only leafs (with summed probabilities), and possibly only the absorbing nodes are showed.

• Support export of the tree to a GraphViz dot graph, and can save it to SVG/PNG and LaTeX (with Tikz) and PDF etc.

• By default, the root is highlighted in green and the absorbing nodes are in red.

Warning

I still have to fix these issues:

• TODO : right now, it is not so efficient, could it be improved? I don’t think I can do anything in a smarter way, in pure Python.

Requirements:

• ‘sympy’ module to use formal means $$\mu_1, \dots, \mu_K$$ instead of numbers,

• ‘numpy’ module for computations on indexes (e.g., np.where),

• ‘graphviz’ module to generate the graph and save it,

• ‘dot2tex’ module to generate nice LaTeX (with Tikz) graph and save it to PDF.

Note

To use the ‘dot2tex’ module, only Python2 is supported. However, I maintain an unpublished port of ‘dot2tex’ for Python3, see [here](https://github.com/Naereen/dot2tex), that you can download, and install manually (sudo python3 setup.py install) to have ‘dot2tex’ for Python3 also.

complete_tree_exploration_for_MP_bandits.oo = inf

Shortcut for float(‘+inf’).

complete_tree_exploration_for_MP_bandits.PLOT_DIR = 'plots/trees'

Directory for the plots

complete_tree_exploration_for_MP_bandits.tupleit1(anarray)[source]

Convert a non-hashable 1D numpy array to a hashable tuple.

complete_tree_exploration_for_MP_bandits.tupleit2(anarray)[source]

Convert a non-hashable 2D numpy array to a hashable tuple-of-tuples.

complete_tree_exploration_for_MP_bandits.prod(iterator)[source]

Product of the values in this iterator.

complete_tree_exploration_for_MP_bandits.WIDTH = 200

Default value for the width parameter for wraptext() and wraplatex().

complete_tree_exploration_for_MP_bandits.wraptext(text, width=200)[source]

Wrap the text, using textwrap module, and width.

complete_tree_exploration_for_MP_bandits.mybool(s)[source]
complete_tree_exploration_for_MP_bandits.ONLYLEAFS = True

By default, aim at the most concise graph representation by only showing the leafs.

complete_tree_exploration_for_MP_bandits.ONLYABSORBING = False

By default, don’t aim at the most concise graph representation by only showing the absorbing leafs.

complete_tree_exploration_for_MP_bandits.CONCISE = True

By default, only show $$\tilde{S}$$ and $$N$$ in the graph representations, not all the 4 vectors.

complete_tree_exploration_for_MP_bandits.FULLHASH = False

Use only Stilde, N for hashing the states.

complete_tree_exploration_for_MP_bandits.FORMAT = 'svg'

Format used to save the graphs.

complete_tree_exploration_for_MP_bandits.FixedArm[source]

Fake player j that always targets at arm j.

complete_tree_exploration_for_MP_bandits.UniformExploration[source]

Fake player j that always targets all arms.

complete_tree_exploration_for_MP_bandits.ConstantRank[source]

Constant rank no matter what.

complete_tree_exploration_for_MP_bandits.choices_from_indexes[source]

For deterministic index policies, if more than one index is maximum, return the list of positions attaining this maximum (ties), or only one position.

complete_tree_exploration_for_MP_bandits.Selfish_0Greedy_U[source]

Selfish policy + 0-Greedy index + U feedback.

complete_tree_exploration_for_MP_bandits.Selfish_0Greedy_Utilde[source]

Selfish policy + 0-Greedy index + Utilde feedback.

complete_tree_exploration_for_MP_bandits.Selfish_0Greedy_Ubar[source]

Selfish policy + 0-Greedy index + Ubar feedback.

complete_tree_exploration_for_MP_bandits.Selfish_UCB_U[source]

Selfish policy + UCB_0.5 index + U feedback.

complete_tree_exploration_for_MP_bandits.Selfish_UCB[source]

Selfish policy + UCB_0.5 index + Utilde feedback.

complete_tree_exploration_for_MP_bandits.Selfish_UCB_Utilde

Selfish policy + UCB_0.5 index + Utilde feedback.

complete_tree_exploration_for_MP_bandits.Selfish_UCB_Ubar[source]

Selfish policy + UCB_0.5 index + Ubar feedback.

complete_tree_exploration_for_MP_bandits.Selfish_KLUCB_U[source]

Selfish policy + Bernoulli KL-UCB index + U feedback.

complete_tree_exploration_for_MP_bandits.Selfish_KLUCB[source]

Selfish policy + Bernoulli KL-UCB index + Utilde feedback.

complete_tree_exploration_for_MP_bandits.Selfish_KLUCB_Utilde

Selfish policy + Bernoulli KL-UCB index + Utilde feedback.

complete_tree_exploration_for_MP_bandits.Selfish_KLUCB_Ubar[source]

Selfish policy + Bernoulli KL-UCB index + Ubar feedback.

complete_tree_exploration_for_MP_bandits.choices_from_indexes_with_rank[source]

For deterministic index policies, if more than one index is maximum, return the list of positions attaining the rank-th largest index (with more than one if ties, or only one position).

complete_tree_exploration_for_MP_bandits.RhoRand_UCB_U[source]

RhoRand policy + UCB_0.5 index + U feedback.

complete_tree_exploration_for_MP_bandits.RhoRand_UCB_Utilde[source]

RhoRand policy + UCB_0.5 index + Utilde feedback.

complete_tree_exploration_for_MP_bandits.RhoRand_UCB_Ubar[source]

RhoRand policy + UCB_0.5 index + Ubar feedback.

complete_tree_exploration_for_MP_bandits.RhoRand_KLUCB_U[source]

RhoRand policy + Bernoulli KL-UCB index + U feedback.

complete_tree_exploration_for_MP_bandits.RhoRand_KLUCB_Utilde[source]

RhoRand policy + Bernoulli KL-UCB index + Utilde feedback.

complete_tree_exploration_for_MP_bandits.RhoRand_KLUCB_Ubar[source]

RhoRand policy + Bernoulli KL-UCB index + Ubar feedback.

complete_tree_exploration_for_MP_bandits.RandomNewRank[source]

RhoRand chooses a new uniform rank in {1,..,M} in case of collision, or keep the same.

complete_tree_exploration_for_MP_bandits.default_policy

RhoRand policy + UCB_0.5 index + U feedback.

complete_tree_exploration_for_MP_bandits.default_update_memory

RhoRand chooses a new uniform rank in {1,..,M} in case of collision, or keep the same.

complete_tree_exploration_for_MP_bandits.RandTopM_UCB_U[source]

RandTopM policy + UCB_0.5 index + U feedback.

complete_tree_exploration_for_MP_bandits.RandTopM_UCB_Utilde[source]

RandTopM policy + UCB_0.5 index + Utilde feedback.

complete_tree_exploration_for_MP_bandits.RandTopM_UCB_Ubar[source]

RandTopM policy + UCB_0.5 index + Ubar feedback.

complete_tree_exploration_for_MP_bandits.RandTopM_KLUCB_U[source]

RandTopM policy + Bernoulli KL-UCB index + U feedback.

complete_tree_exploration_for_MP_bandits.RandTopM_KLUCB_Utilde[source]

RandTopM policy + Bernoulli KL-UCB index + Utilde feedback.

complete_tree_exploration_for_MP_bandits.RandTopM_KLUCB_Ubar[source]

RandTopM policy + Bernoulli KL-UCB index + Ubar feedback.

complete_tree_exploration_for_MP_bandits.RandTopM_RandomNewChosenArm[source]

RandTopM chooses a new arm after a collision or if the chosen arm lies outside of its estimatedBestArms set, uniformly from the set of estimated M best arms, or keep the same.

complete_tree_exploration_for_MP_bandits.write_to_tuple[source]

Tuple cannot be written, this hack fixes that.

complete_tree_exploration_for_MP_bandits.MCTopM_UCB_U[source]

MCTopM policy + UCB_0.5 index + U feedback.

complete_tree_exploration_for_MP_bandits.MCTopM_UCB_Utilde[source]

MCTopM policy + UCB_0.5 index + Utilde feedback.

complete_tree_exploration_for_MP_bandits.MCTopM_UCB_Ubar[source]

MCTopM policy + UCB_0.5 index + Ubar feedback.

complete_tree_exploration_for_MP_bandits.MCTopM_KLUCB_U[source]

MCTopM policy + Bernoulli KL-UCB index + U feedback.

complete_tree_exploration_for_MP_bandits.MCTopM_KLUCB_Utilde[source]

MCTopM policy + Bernoulli KL-UCB index + Utilde feedback.

complete_tree_exploration_for_MP_bandits.MCTopM_KLUCB_Ubar[source]

MCTopM policy + Bernoulli KL-UCB index + Ubar feedback.

complete_tree_exploration_for_MP_bandits.MCTopM_RandomNewChosenArm[source]

RandTopMC chooses a new arm after if the chosen arm lies outside of its estimatedBestArms set, uniformly from the set of estimated M best arms, or keep the same.

complete_tree_exploration_for_MP_bandits.symbol_means(K)[source]

Better to work directly with symbols and instantiate the results after.

complete_tree_exploration_for_MP_bandits.random_uniform_means(K)[source]

If needed, generate an array of K (numerical) uniform means in [0, 1].

complete_tree_exploration_for_MP_bandits.uniform_means(nbArms=3, delta=0.1, lower=0.0, amplitude=1.0)[source]

Return a list of means of arms, well spaced:

• in [lower, lower + amplitude],

• sorted in increasing order,

• starting from lower + amplitude * delta, up to lower + amplitude * (1 - delta),

• and there is nbArms arms.

>>> np.array(uniform_means(2, 0.1))
array([ 0.1,  0.9])
>>> np.array(uniform_means(3, 0.1))
array([ 0.1,  0.5,  0.9])
>>> np.array(uniform_means(9, 1 / (1. + 9)))
array([ 0.1,  0.2,  0.3,  0.4,  0.5,  0.6,  0.7,  0.8,  0.9])

complete_tree_exploration_for_MP_bandits.proba2float(proba, values=None, K=None, names=None)[source]

Replace mu_k by a numerical value and evaluation the formula.

complete_tree_exploration_for_MP_bandits.simplify(proba)[source]

Try to simplify the expression of the probability.

complete_tree_exploration_for_MP_bandits.proba2str(proba, latex=False, html_in_var_names=False)[source]

Pretty print a proba, either a number, a Fraction, or a sympy expression.

complete_tree_exploration_for_MP_bandits.tex2pdf(filename)[source]

Naive call to command line pdflatex, twice.

class complete_tree_exploration_for_MP_bandits.State(S, Stilde, N, Ntilde, mus, players, depth=0)[source]

Bases: object

Not space-efficient representation of a state in the system we model.

• S, Stilde, N, Ntilde: are arrays of size (M, K),

• depth, t, M, K: integers, to avoid recomputing them,

• mus: the problem parameters (only for Bernoulli arms),

• players: is a list of algorithms,

• probas: list of transition probabilities,

• children: list of all possible next states (transitions).

__init__(S, Stilde, N, Ntilde, mus, players, depth=0)[source]

Create a new state. Arrays S, Stilde, N, Ntilde are copied to avoid modify previous values!

S = None

sensing feedback

Stilde = None

number of sensing trials

N = None

number of succesful transmissions

Ntilde = None

number of trials without collisions

depth = None

current depth of the exploration tree

t = None

current time step. Simply = sum(N) = sum(N[i]) for all player i, but easier to compute it once and store it

M = None

number of players

K = None

number of arms (channels)

children = None

list of next state, representing all the possible transitions

probas = None

probabilities of transitions

__str__(concise=True)[source]

Return str(self).

to_node(concise=True)[source]

Print the state as a small string to be attached to a GraphViz node.

to_dot(title='', name='', comment='', latex=False, html_in_var_names=False, ext='svg', onlyleafs=True, onlyabsorbing=False, concise=True)[source]

Convert the state to a .dot graph, using GraphViz. See http://graphviz.readthedocs.io/ for more details.

• onlyleafs: only print the root and the leafs, to see a concise representation of the tree.

• onlyabsorbing: only print the absorbing leafs, to see a really concise representation of the tree.

• concise: weather to use the short representation of states (using $$\tilde{S}$$ and $$N$$) or the long one (using the 4 variables).

• html_in_var_names: experimental use of <SUB>..</SUB> and <SUP>..</SUP> in the label for the tree.

• latex: experimental use of _{..} and ^{..} in the label for the tree, to use with dot2tex.

saveto(filename, view=True, title='', name='', comment='', latex=False, html_in_var_names=False, ext='svg', onlyleafs=True, onlyabsorbing=False, concise=True)[source]
copy()[source]

Get a new copy of that state with same S, Stilde, N, Ntilde but no probas and no children (and depth=0).

__hash__(full=False)[source]

Hash the matrix Stilde and N of the state.

is_absorbing()[source]

Try to detect if this state is absorbing, ie only one transition is possible, and again infinitely for the only child.

Warning

Still very experimental!

has_absorbing_child_whole_subtree()[source]

Try to detect if this state has an absorbing child in the whole subtree.

explore_from_node_to_depth(depth=1)[source]

Compute recursively the one_depth children of the root and its children.

compute_one_depth()[source]

Use all_deltas to store all the possible transitions and their probabilities. Increase depth by 1 at the end.

all_absorbing_states(depth=1)[source]

Generator that yields all the absorbing nodes of the tree, one by one.

• It might not find any,

• It does so without merging common nodes, in order to find the first absorbing node as quick as possible.

absorbing_states_one_depth()[source]

Use all_deltas to yield all the absorbing one-depth child and their probabilities.

find_N_absorbing_states(N=1, maxdepth=8)[source]

Find at least N absorbing states, by considering a large depth.

all_deltas()[source]

Generator that yields functions transforming state to another state.

• It is memory efficient as it is a generator.

• Do not convert that to a list or it might use all your system memory: each returned value is a function with code and variables inside!

pretty_print_result_recursively()[source]

Print all the transitions, depth by depth (recursively).

get_all_leafs()[source]

Recurse and get all the leafs. Many different state can be present in the list of leafs, with possibly different probabilities (each correspond to a trajectory).

get_unique_leafs()[source]

Compute all the leafs (deepest children) and merge the common one to compute their full probabilities.

proba_reaching_absorbing_state()[source]

Compute the probability of reaching a leaf that is an absorbing state.

__dict__ = mappingproxy({'__module__': 'complete_tree_exploration_for_MP_bandits', '__doc__': 'Not space-efficient representation of a state in the system we model.\n\n - S, Stilde, N, Ntilde: are arrays of size (M, K),\n - depth, t, M, K: integers, to avoid recomputing them,\n - mus: the problem parameters (only for Bernoulli arms),\n - players: is a list of algorithms,\n - probas: list of transition probabilities,\n - children: list of all possible next states (transitions).\n ', '__init__': <function State.__init__>, '__str__': <function State.__str__>, 'to_node': <function State.to_node>, 'to_dot': <function State.to_dot>, 'saveto': <function State.saveto>, 'copy': <function State.copy>, '__hash__': <function State.__hash__>, 'is_absorbing': <function State.is_absorbing>, 'has_absorbing_child_whole_subtree': <function State.has_absorbing_child_whole_subtree>, 'explore_from_node_to_depth': <function State.explore_from_node_to_depth>, 'compute_one_depth': <function State.compute_one_depth>, 'all_absorbing_states': <function State.all_absorbing_states>, 'absorbing_states_one_depth': <function State.absorbing_states_one_depth>, 'find_N_absorbing_states': <function State.find_N_absorbing_states>, 'all_deltas': <function State.all_deltas>, 'pretty_print_result_recursively': <function State.pretty_print_result_recursively>, 'get_all_leafs': <function State.get_all_leafs>, 'get_unique_leafs': <function State.get_unique_leafs>, 'proba_reaching_absorbing_state': <function State.proba_reaching_absorbing_state>, '__dict__': <attribute '__dict__' of 'State' objects>, '__weakref__': <attribute '__weakref__' of 'State' objects>})
__module__ = 'complete_tree_exploration_for_MP_bandits'
__weakref__

list of weak references to the object (if defined)

class complete_tree_exploration_for_MP_bandits.StateWithMemory(S, Stilde, N, Ntilde, mus, players, update_memories, memories=None, depth=0)[source]

State with a memory for each player, to represent and play with RhoRand etc.

__init__(S, Stilde, N, Ntilde, mus, players, update_memories, memories=None, depth=0)[source]

Create a new state. Arrays S, Stilde, N, Ntilde are copied to avoid modify previous values!

memories = None

Personal memory for all players, can be a rank in {1,..,M} for rhoRand, or anything else.

__str__(concise=False)[source]

Return str(self).

to_node(concise=True)[source]

Print the state as a small string to be attached to a GraphViz node.

copy()[source]

Get a new copy of that state with same S, Stilde, N, Ntilde but no probas and no children (and depth=0).

__hash__(full=False)[source]

Hash the matrix Stilde and N of the state and memories of the players (ie. ranks for RhoRand).

is_absorbing()[source]

Try to detect if this state is absorbing, ie only one transition is possible, and again infinitely for the only child.

Warning

Still very experimental!

all_deltas()[source]

Generator that yields functions transforming state to another state.

• It is memory efficient as it is a generator.

• Do not convert that to a list or it might use all your system memory: each returned value is a function with code and variables inside!

__module__ = 'complete_tree_exploration_for_MP_bandits'
complete_tree_exploration_for_MP_bandits.main(depth=1, players=None, update_memories=None, mus=None, M=2, K=2, S=None, Stilde=None, N=None, Ntilde=None, find_only_N=None)[source]

Compute all the transitions, and print them.

complete_tree_exploration_for_MP_bandits.test(depth=1, M=2, K=2, S=None, Stilde=None, N=None, Ntilde=None, mus=None, debug=True, all_players=None, all_update_memories=None, find_only_N=None)[source]

Test the main exploration function for various all_players.