On the complexity of graphs (networks) by information content, and conditional (mutual) information given other graphs 

Lloyd Allison,
www.allisons.org/ll/MML/, May 2014,


To encode a graph, one encodes some representation of the graph, notably the adjacency matrix or the adjacency lists representations. Since every nonredundant representation contains the same information, that is the graph, an optimal encoding of any representation must give the same minimum message length (MML) as any other. However one representation might be easier  more convenient  to compress than another.
A representation may contain more information than
just the graph, in particular
it may imply a certain distinguishable ordering of the Vertices.
There are V! possible orderings of the Vertices
although not all are necessarily distinguishable.
A Vertex ordering (numbering) is arbitrary 
changing it does not change the graph 
such a representation contains
An adjacency matrix representation of a graph implies
a particular ordering of the Vertices.
Any distinguishable ordering could be used,
each corresponding to a different adjacency matrix.
Alternatively, some canonical
ordering [various_{0}],
and adjacency matrix,
could be used, for example,
the one with its rows in lexicographically nondescending order.
An optimal encoding of it must somehow take advantage of that property.
Alternatively, to calculate the information content of the graph,
encode any adjacency matrix and
subtract
An adjacency matrix of a graph can be trivially encoded using 1 bit per possible Edge. This is only optimal if half of the possible Edges exist, and the existence of an Edge is independent of other Edges, and of labels if present.
When the Edge density is not known apriori, the very least that should be done is to model the entries of the adjacency matrix by a binomial distribution. This is optimal if the existence of an Edge is independent of other Edges. The hypothesis has one parameter, the expected Edge density, which is the same in all parts of the graph. This does not mean that all Vertices have the same degree. Note, subtly different is the hypothesis that the Edge density is homogeneous (and we do not care what the density is) which can be realised by using the adaptive coding of binomial data and which saves a fraction of a bit in total.
For a directed graph, if it is believed that the degrees of Vertices may differ and are independent, and the existence of Edges are independent, an approriate model is to use one binomial distribution per row (or column) of the adjacency matrix. The model has one parameter per Vertex, being the out (or in) Edge density of the Vertex. (Many simple variations are possible, for example, a model could have two density parameters, for "low" and "high" degree vertices, plus a binomial distribution over into which of these "classes" Vertices fall.)
The situation is more complicated for an undirected graph where an adjacency matrix is necessarily symmetric (equivalently upper or lower triangular). The existence of an undirected Edge (u,v) depends on the neighbourliness of both u and v. These cannot be independent: consider "u is adjacant to all, and v is adjacent to none." It is tempting to average the contributions of u and v.
An adjacencylists representation of a graph
contains the same information as an adjacency matrix so,
if optimally encoded, they must have the same length.
It takes log_{2}V bits to specify a given Vertex
in some ordering.
The adjacency list, L, of some Vertex does not contain the Vertex and
contains no Vertex twice,
so it might as well be sorted ascending, say
(otherwise it contains log_{2}L bits
of extra, nongraph information).
The information content if L is
For comparison,
the information in one row of the adjacency matrix (see above),
using the adaptive coding for a binomial distribution and
assuming a uniform prior on the number of 1s is [WB69]:
The above examples rely on global properties of a graph and do not attempt to take advantage of any kind of repeated pattern.
More interesting are attempts to base graph compression on recognising patterns (motifs), that are repeated, possibly approximately, that is with variations. This is particularly so where a graph has semantics  where its structure carries meaning.
The idea is to extend the LempelZiv style of sequence compression to graphs: traverse the graph in some arbitrary order. At each step in the traversal there is the next element, the next unknown  a Vertex or an Edge. Seek scored matches to the recently traversed, and hence known, context in the common knowledge, on the basis that what happened "then" might happen "now". Form a probability distribution over the possible values for the next element (Vertex (label, arity), Edge (label, looping)) using the scored (weighted) matches. The common knowledge consists of zero or more given graphs. (For simplicity, the prototype does not search for matches to the context within the traversed part of the graph being compressed, but this is clearly possible in principle.)
The method described below calculates the information content of an undirected graph. It is clear that, in principle, a "similar method" can be devised for directed graphs. (The probability distribution calculated at each step could be used, with an arithmetic compressor, say, to perform actual data compression.)
The prototype does not attempt to "recover" the extra information from the arbitrary Vertex ordering (see automorphism group earlier).
It is sufficient to consider a single connected component; a disconnected graph can be treated one component at a time, including earlier components in the common knowledge, if desired.
 e.g., ... data Entity = Utility  House deriving (Eq, Enum, Bounded, Show, Read) data Connection = Elec  Gas  H2O deriving (Eq, Enum, Bounded, Show, Read) k33 = v00 where  complete, bipartite, labelled v00 = Vertex Utility 0 [Edge Elec v00 v10, Edge Elec v00 v11, Edge Elec v00 v12] v01 = Vertex Utility 1 [Edge Gas v01 v10, Edge Gas v01 v11, Edge Gas v01 v12] v02 = Vertex Utility 2 [Edge H2O v02 v10, Edge H2O v02 v11, Edge H2O v02 v12] v10 = Vertex House 3 [Edge Elec v10 v00, Edge Gas v10 v01, Edge H2O v10 v02] v11 = Vertex House 4 [Edge Elec v11 v00, Edge Gas v11 v01, Edge H2O v11 v02] v12 = Vertex House 5 [Edge Elec v12 v00, Edge Gas v12 v01, Edge H2O v12 v02]
An undirected graph is represented by an adjacencylists data structure; if Vertices u and v are adjacent, the data structure links u to v and also links v to u.
The informationcontent calculations rely on a graph traversal. The traversal arrives at a Vertex via an incoming Edge and follows an Edge out from a Vertex, so the immediate context of an Edge is a Vertex, and the immediate context of a Vertex is an Edge (except for the initial Vertex which has the empty context).
traverse_undirected fv fe v = ...

During traversal, a Vertex has a 'status', one of
UNVISITED  VISITING  VISITED
An Edge involving a VISITING Vertex is either 'open' (untraversed) or 'closed' (traversed). Note, an Edge is necessarily closed if both of its Vertices are VISITED. (Nothing at all is known of an Edge both of whose Vertices are UNVISITED.)
The traversal maintains a list of VISITING Vertices and
a list of VISITED Vertices,
which are passed to fv and fe;
fv is also given the optional (reversed) incoming edge to the Vertex.
The VISITING list acts as a stack during traversal and
when a Vertex is popped it becomes VISITED.
predict vertexLabel2maxDegree v1 v2s = let unwrap (Left(Vertex vLabel _ ves, scored)) = ... unwrap (Right(Edge eLabel _ v2, loop_candidates, scored)) = ... in map unwrap (traverse_and_match v1 v2s)
For the graph accessible from Vertex v1, calculate the negative log probability of
traverse_and_match v1 v2s = let given_Vs = concat(map vertex_list v2s) in traverse_undirected (v_matches given_Vs) (e_matches given_Vs) v1
'traverse_undirected' (see earlier) is a general purpose traversal function, taking a function to apply to each Vertex, a function to apply to each Edge, and the "root" of a graph.
v_matches v2s visiting visited back_e v1  Return scored matches to (v1's) context from amongst the v2s.  v1 is as yet unknown during a traversal; we are trying to predict it.  back_e is our "context" [e1], [v1e1>prev_v], unless it is []. ... v20 @ (Vertex _ _ v20es) < v2s, e2 < v20es, let [e1] = back_e (_, score) = match_Edge ... ... ... e1 e2, ...
During traversal one at arrives at a Vertex by an Edge, back_e (reversed), the most recent part of the 'context'. (The initial Vertex is an exception, having no context, [].) Any Vertex, v2, in the given v2s is a potential prediction for v1, with a score depending on how much graph around v2 matches the context.
And an Edge has a Vertex for immediate contexte_matches v2s visiting visited (e1 @ (Edge e1l v10 _ )) =  Return scored matches to (e1 and its) context from amongst the v2s' Edges.  e1 (v10e1>v11) and v11 are as yet unknown during the traversal, except  that v10 is known "context". We are trying to predict e1 including whether  it closes a loop or not; we cannot and do not use actual knowledge of v11. ... v20 @ (Vertex _ _ v20es) < v2s, e2 @ (Edge _ _ v21) < v20es, match_Vertex ... ... ... v10 [e1] v20 [e2] ...
Similarly, during traversal one arrives at an Edge, e1, from a Vertex, v10, the most recent part of the Edge's "context". Any Edge e2, in (the graphs accessible from) v2s is a potential prediction for e1, with a score depending on how much graph around e2 matches the context.
Note that UNVISITED Vertices and open Edges of the graph being traversed are as yet unknown and cannot be used in matching and scoring, although the existence of an open Edge (but nothing more) is known.
match_Vertex pv pe corr (v1 @ (Vertex _ _ v1e)) back_e1 (v2 @ (Vertex _ _ v2e)) back_e2 = ...
Calculate how much of the graph rooted at v1 matches the graph rooted at v2. This could be done in a great many ways depending on the application. At the simplest, v1 and v2 must have the same label, as a start. Beyond that, v1's Edges (and beyond) and some permutation of v2's Edges (and beyond) may match.
match_Edge pv pe corr (e1 @ (Edge _ _ v11)) (e2 @ (Edge _ _ v21)) = ...
At the simplest, Edges e1 and e2 must have the same label, as a start. Beyond that, their "to" Vertices (and beyond) may match.
Note that an open Edge can match any Edge in the background graphs, but it contributes zero to the score. and matching does not continue beyond it.
near_k33 = v00 where v00 = Vertex Utility 0 [Edge Elec v00 v10, Edge Elec v00 v11, Edge Elec v00 v12] v01 = Vertex Utility 1 [Edge Gas v01 v10, Edge Gas v01 v12, Edge Gas v01 v13] v02 = Vertex Utility 2 [Edge H2O v02 v10, Edge H2O v02 v11, Edge H2O v02 v12, Edge H2O v02 v13] v10 = Vertex House 3 [Edge Elec v10 v00, Edge Gas v10 v01, Edge H2O v10 v02] v11 = Vertex House 4 [Edge Elec v11 v00, Edge H2O v11 v02] v12 = Vertex House 5 [Edge Elec v12 v00, Edge Gas v12 v01, Edge H2O v12 v02] v13 = Vertex House 6 [Edge Gas v13 v01, Edge H2O v13 v02]
For present purposes, a chemical compound is a graph where each Vertex is labelled with a chemical element, and each Edge is labelled with a bond. SMILES [Wei88] is a language for defining chemical compounds. Some functions were written to manipulate SMILES:
viagra Carbon labels are generally omitted in renderings. 
cialis 
valium 
xanax

Chemical compounds are of bounded degree. Most small compounds are planar but exceptions are conceivable. Hydrogen atoms can be specified in SMILES but they are usually omitted and, if so, they, and the bonds to them, are left out of the resulting graph.
The informationcontent routines were "told" that the graphs were of bounded degree, and were given the maximum degree of each element. Apart from that, they were not given any hardwired chemical information. It is of course possible to include example graphs as background knowledge.
The following table shows the results of experiments on viagra, cialis, valium and xanax [daylight.com], singly and in various combinations.

(The diagonal elements are small, but not tiny, because there is no great prior expectation of XX.) 
(Note that viagra and cialis are less helpful to valium than cialis alone.) 
degree: 

(in?)dependence of:  Vertex label, Vertex degree, Edge label, cycle length 
label matching:  equal or "close" (perhaps squared difference for continuous?) 
subgraph matching:  equal or approximate (even varieties of approximate tree matching are hard) 
Examples of graphs, automorphism groups and distinguishable Vertex orderings.
K4 ** A=4! \ /  \/  all 4! Vertex orderings are indistinguishable  /\  / \ **ditto the completely disconnected graph.
C4 ** A=8 (4 rotations × reflection)     ** 4!/8 = 3 distinguishable Vertex orderings 12 12 13             43 34 42 and their implied adjacency matrices _101 _110 _011 1_10 1_01 0_11 01_1 10_1 11_0 101_ 011_ 110_
G = ** A=4 (reflection × reflection) \   \ ** 4!/4 = 6 distinguishable Vertex orderings 12 12 13 21 21 31 \  \  \  \  \  \   \  \  \  \  \  \ 43 34 42 43 34 24
G = WB     BWalso has the same A=4.
www: 
