Networks / BALANCE
Partitioning Signed Graphs


Download.

See: Doreian P., Mrvar A.: A partitioning approach to structural balance. Social Networks, 18(1996)2, 149-168.


Balanced and clusterable signed graphs

The problem of balance can be formulated in the following way:
Is it possible to cluster vertices of a signed graph 
(graph with positive and negative lines) into two or more groups,
so that every positive relation occurs between vertices in the same
group, and every negative relation occurs between vertices in
different groups?
A signed graph is called balanced if all vertices can be clustered 
into exactly two groups. If all vertices can be clustered into more than 
two groups the graph is called clusterable. 

A path in a signed graph is called positive if it has an even number of 
negative edges and is called negative otherwise. 

Theorems:
 
1.  A signed graph is balanced iff for every two vertices in graph all paths    
    joining them have the same sign.
 
2.  A signed graph is balanced iff every closed semiwalk is positive.

3.  A signed graph is clusterable iff it contains no closed semiwalk
4.  with exactly one negative arc.


Example:
Consider a group of people again. Pairs of individuals are friends 
or enemies. Suppose anyone would tell the rumour which has only two basic
forms, one true and one false. Everyone in this system will tell the 
rumour to a friend in the same form he had received it, but would change
the form if he were to pass the rumour to his enemy. If the system is balanced, 
each person will hear only one version of the rumour. Also the person who 
started the rumour will hear it returned to him in the same form as he 
originally knew it.

Using these theorems an algorithm can be constructed to verify if a
signed graph is clusterable: we must just examine signs of all paths
and signs of all closed semiwalks in that graph.

But we want more. We want to find an optimal solution (clusters
in a signed graph) if it exists. If there is no optimal solution 
we want to find a solution which has the least 
errors. What are actually errors of a given solution?
An error is every negative link between two vertices in the same
cluster and every positive link between two vertices which belong to
different clusters. These ideas can be used in an optimizational approach
to the problem of balance of signed graphs.

Optimizational approach to the problem of balanced and clusterable graphs

Local optimization

Local optimization is a procedure, which enables us to find 
a minimum of a function. The only difficulty is that we cannot distinguish
between local and global minimum. But if we repeat this procedure several 
times we can come very close to global minimum or we can even find the 
global minimum. 

We start a local optimization procedure with a random initial feasible 
solution. Then we examine all neighbours of a given solution. If we find 
a solution with smaller value of criterion function we move to that solution.
The process is finished when there is no better solution in a neighbourhood 
of a given solution. This solution is then called a local minimum.

Solving the problem of balance using local optimization

Set of all feasible solutions is the set of all clusterings with a given 
number of clusters. 
Criterion function is number of errors -- number of negative 
links between two vertices in the same cluster plus number of positive 
links between two vertices which belong to different clusters.
Neighbours of a given clustering are clusterings which
can be obtained from a given clustering by:
        moving one vertex from one to another cluster
        interchanging two vertices from different clusters


Input to program:

Words that begin with * are reserved -- they must be present in input
file. % means comment: Everything following that sign will be ignored 
by the program.
Vertices are defined using *Vertex statement.
First, number of vertices is defined, then all vertices are described in 
the following way: vertex number, vertex name, (x, y) coordinates in the 
plane (x and y are numbers between  0 and 1) and radium of a circle that 
represents the vertex.
Then directed and undirected links are defined using *Arcs and
*Edges statement.  *Arcs 10 11 -1 for example stands for:
Vertex 10 is connected with vertex 11 with negative arc. 
(1 means positive link, --1 means negative link).
Clustering which is obtained using the program is appended
at the end of the file as properties *Properties of this graph.
Example: 

*Network Sample6
%Network from Roberts p. 76 a
*Vertices 11
1  a 0.5667 0.1250 15
2  b 0.6833 0.2500 15
3  c 0.5167 0.5000 15
4  d 0.6833 0.5000 15
5  e 0.6833 0.6750 15
6  f 0.5167 0.7000 15
7  g 0.3167 0.5000 15
8  h 0.3167 0.7000 15
9  i 0.3167 0.8750 15
10 j 0.1333 0.8750 15
11 k 0.1333 0.7000 15
%
*Arcs
10 11 -1
10  9 -1
9  11  1
8   9 -1
11  8 -1
8   7 -1
6   8 -1
7   3 -1
3   6 -1
5   3  1
4   5 -1
3   4 -1
3   1  1
1   2  1
2   3  1
*Edges
%
*Properties
{ { a, b, c, e, h, j},  { d, f, g, i, k} }  0
{ { a, b, c, e, h},  { d, g, i, k},  { f, j} }  0


The difference between the first and the second type of input file is
only in representation of links which can be also defined using a matrix form
*Matrix.

Example: 

*Network Sample2
% Batagelj page 6
*Vertices 9
1 1 0.1833 0.6250 25
2 2 0.3500 0.8750 25
3 3 0.5167 0.8750 25
4 4 0.6833 0.8750 25
5 5 0.8500 0.6250 25
6 6 0.8500 0.2500 25
7 7 0.6833 0.1250 25
8 8 0.3500 0.1250 25
9 9 0.1833 0.2500 25
*Matrix
 0 -1  0  0  1  0  0  0  1
-1  0  1  0  0  0 -1  0  0
 0  1  0  1  0  0  0  0  0
 0  0  1  0 -1  0 -1  0  0
 1  0  0 -1  0  1  0  0  0
 0  0  0  0  1  0 -1  0  1
 0 -1  0 -1  0 -1  0  1  0
 0  0  0  0  0  0  1  0 -1
 1  0  0  0  0  1  0 -1  0
%
*Properties
{ { 1, 5, 6, 9},  { 2, 3, 4},  { 7, 8} }  0

This is the shortest representation of a signed graph.
Numbers in each line mean:
Vertex number, x and y coordinates of vertex (between 0 and 1) and
list of all neighbours of the given vertex. If one connection is negative then
the neighbour has a negative sign. 

Example: 
1 0.2500  0.125  2  6  -7 
2 0.0833  0.375  1  3  -8
3 0.2500  0.625  2  4  -9
4 0.4167  0.625  3  5 -10
5 0.5833  0.375  4  6 -11
6 0.4167  0.125  1  5 -12
7 0.2500  0.250  9 11  -1
8 0.1667  0.375 10 12  -2
9 0.2500  0.500  7 11  -3
10 0.4167 0.500  8 12  -4
11 0.5000 0.375  7  9  -5
12 0.4167 0.250  8 10  -6

How to use program CLUSTER

Program Cluster is implementation of local optimization approach to 
the problem of balanced and clusterable signed graphs. It is written
in Turbo Pascal 6.0. Here are some basic instructions how to use the program. 
To run the program you should type for example:
cluster sample6 2 10 0.5 g
   a       b    c  d  e  f
Six parameters (a, b, c, d, e, f) are: 

a Name of the program.
b Name of the input file with graph description (one of three
           possible types as defined above). The default extension is .net.
           If you have some other extension you should type it (for example 
           Network.dat). Instead of using only sign (+ or --) to represent
           relationship, also integer values can be used to express the
           strength of relationship (friendship for example). Larger values
           represent stronger relationship. These values are also used
           when criterion function is computed.

c Number of clusters you want to divide your graph to.
           (for example 2 or 3 or 4, ...)
d Number of repetitions of local optimization procedure -- 
           (from randomly selected initial partition to local minimum).
           The best solution from all repetitions is selected.
           For more complicated graphs more repetitions are needed to find 
           a good solution. If the optimal solution (solution without 
           errors) is found before the last repetition is finished the 
           program is stopped.
e Factor alpha that tells us the ratio between importance of
          negative and positive errors:
          Total.error=alpha*(neg.errors) + (1-alpha)*(pos.errors) 

          0 < alpha < 0.5 positive errors are more important than negative
          alpha=0.5 positive and negative errors are equally important
          0.5 < alpha < 1 negative errors are more important than positive
          alpha > 1 all links are considered as positive
          alpha < 0 all links are considered as negative
          
f         Graphical display graphical mode (VGA is needed).
          If the last parameter is not 'g' non-graphical representation
          will be used.

g         using labels/numbers

The local optimization procedure starts with randomly selected initial
partition. But you can also suggest the program which initial partition
should be used (for example you get this partition using some other
approach and you want to optimize it). You can do that by
using another input file with the same name as file with description
of the graph and extension .inp. For example if you run the program
as written above (name of the graph is sample6.net) and the file
sample6.inp is found on the directory the initial partition will
be that one which is recommended in file sample6.inp. 

1 2 3 5 8 10
4 6 7 9 11
So the number of lines must be equal to the number of clusters in which 
we want to cluster our graph (in this example 2). Values in the same line 
correspond to the vertices' numbers that belong to the same initial cluster.

If graphical representation is used the resulting clusters are displayed
so that vertices that correspond to the same cluster are coloured
with the same colour. If the solution is not optimal, the wrong relations
are displayed with thick lines. 
If type 1 or type 2 input files are used the resulting clustering is 
written as properties in the same (input) file together with number of errors
and value of alpha. The result is written only in case that this 
solution is found the first time (it is not written yet as properties).
If type 3 input file is used the result is written to the file with the same 
name as input file and extension .out.