Partitioning Signed Graphs

Download.

See: Doreian P., Mrvar A.: *A partitioning approach to structural balance.*
Social Networks, **18**(1996)2, 149-168.

Balanced and clusterable signed graphs The problem of balance can be formulated in the following way: Is it possible to cluster vertices of a signed graph (graph with positive and negative lines) into two or more groups, so that every positive relation occurs between vertices in the same group, and every negative relation occurs between vertices in different groups? A signed graph is called balanced if all vertices can be clustered into exactly two groups. If all vertices can be clustered into more than two groups the graph is called clusterable. A path in a signed graph is called positive if it has an even number of negative edges and is called negative otherwise. Theorems: 1. A signed graph is balanced iff for every two vertices in graph all paths joining them have the same sign. 2. A signed graph is balanced iff every closed semiwalk is positive. 3. A signed graph is clusterable iff it contains no closed semiwalk 4. with exactly one negative arc. Example: Consider a group of people again. Pairs of individuals are friends or enemies. Suppose anyone would tell the rumour which has only two basic forms, one true and one false. Everyone in this system will tell the rumour to a friend in the same form he had received it, but would change the form if he were to pass the rumour to his enemy. If the system is balanced, each person will hear only one version of the rumour. Also the person who started the rumour will hear it returned to him in the same form as he originally knew it. Using these theorems an algorithm can be constructed to verify if a signed graph is clusterable: we must just examine signs of all paths and signs of all closed semiwalks in that graph. But we want more. We want to find an optimal solution (clusters in a signed graph) if it exists. If there is no optimal solution we want to find a solution which has the least errors. What are actually errors of a given solution? An error is every negative link between two vertices in the same cluster and every positive link between two vertices which belong to different clusters. These ideas can be used in an optimizational approach to the problem of balance of signed graphs. Optimizational approach to the problem of balanced and clusterable graphs Local optimization Local optimization is a procedure, which enables us to find a minimum of a function. The only difficulty is that we cannot distinguish between local and global minimum. But if we repeat this procedure several times we can come very close to global minimum or we can even find the global minimum. We start a local optimization procedure with a random initial feasible solution. Then we examine all neighbours of a given solution. If we find a solution with smaller value of criterion function we move to that solution. The process is finished when there is no better solution in a neighbourhood of a given solution. This solution is then called a local minimum. Solving the problem of balance using local optimization Set of all feasible solutions is the set of all clusterings with a given number of clusters. Criterion function is number of errors -- number of negative links between two vertices in the same cluster plus number of positive links between two vertices which belong to different clusters. Neighbours of a given clustering are clusterings which can be obtained from a given clustering by: moving one vertex from one to another cluster interchanging two vertices from different clusters Input to program: Words that begin with * are reserved -- they must be present in input file. % means comment: Everything following that sign will be ignored by the program. Vertices are defined using *Vertex statement. First, number of vertices is defined, then all vertices are described in the following way: vertex number, vertex name, (x, y) coordinates in the plane (x and y are numbers between 0 and 1) and radium of a circle that represents the vertex. Then directed and undirected links are defined using *Arcs and *Edges statement. *Arcs 10 11 -1 for example stands for: Vertex 10 is connected with vertex 11 with negative arc. (1 means positive link, --1 means negative link). Clustering which is obtained using the program is appended at the end of the file as properties *Properties of this graph. Example: *Network Sample6 %Network from Roberts p. 76 a *Vertices 11 1 a 0.5667 0.1250 15 2 b 0.6833 0.2500 15 3 c 0.5167 0.5000 15 4 d 0.6833 0.5000 15 5 e 0.6833 0.6750 15 6 f 0.5167 0.7000 15 7 g 0.3167 0.5000 15 8 h 0.3167 0.7000 15 9 i 0.3167 0.8750 15 10 j 0.1333 0.8750 15 11 k 0.1333 0.7000 15 % *Arcs 10 11 -1 10 9 -1 9 11 1 8 9 -1 11 8 -1 8 7 -1 6 8 -1 7 3 -1 3 6 -1 5 3 1 4 5 -1 3 4 -1 3 1 1 1 2 1 2 3 1 *Edges % *Properties { { a, b, c, e, h, j}, { d, f, g, i, k} } 0 { { a, b, c, e, h}, { d, g, i, k}, { f, j} } 0 The difference between the first and the second type of input file is only in representation of links which can be also defined using a matrix form *Matrix. Example: *Network Sample2 % Batagelj page 6 *Vertices 9 1 1 0.1833 0.6250 25 2 2 0.3500 0.8750 25 3 3 0.5167 0.8750 25 4 4 0.6833 0.8750 25 5 5 0.8500 0.6250 25 6 6 0.8500 0.2500 25 7 7 0.6833 0.1250 25 8 8 0.3500 0.1250 25 9 9 0.1833 0.2500 25 *Matrix 0 -1 0 0 1 0 0 0 1 -1 0 1 0 0 0 -1 0 0 0 1 0 1 0 0 0 0 0 0 0 1 0 -1 0 -1 0 0 1 0 0 -1 0 1 0 0 0 0 0 0 0 1 0 -1 0 1 0 -1 0 -1 0 -1 0 1 0 0 0 0 0 0 0 1 0 -1 1 0 0 0 0 1 0 -1 0 % *Properties { { 1, 5, 6, 9}, { 2, 3, 4}, { 7, 8} } 0 This is the shortest representation of a signed graph. Numbers in each line mean: Vertex number, x and y coordinates of vertex (between 0 and 1) and list of all neighbours of the given vertex. If one connection is negative then the neighbour has a negative sign. Example: 1 0.2500 0.125 2 6 -7 2 0.0833 0.375 1 3 -8 3 0.2500 0.625 2 4 -9 4 0.4167 0.625 3 5 -10 5 0.5833 0.375 4 6 -11 6 0.4167 0.125 1 5 -12 7 0.2500 0.250 9 11 -1 8 0.1667 0.375 10 12 -2 9 0.2500 0.500 7 11 -3 10 0.4167 0.500 8 12 -4 11 0.5000 0.375 7 9 -5 12 0.4167 0.250 8 10 -6 How to use program CLUSTER Program Cluster is implementation of local optimization approach to the problem of balanced and clusterable signed graphs. It is written in Turbo Pascal 6.0. Here are some basic instructions how to use the program. To run the program you should type for example: cluster sample6 2 10 0.5 g a b c d e f Six parameters (a, b, c, d, e, f) are: a Name of the program. b Name of the input file with graph description (one of three possible types as defined above). The default extension is .net. If you have some other extension you should type it (for example Network.dat). Instead of using only sign (+ or --) to represent relationship, also integer values can be used to express the strength of relationship (friendship for example). Larger values represent stronger relationship. These values are also used when criterion function is computed. c Number of clusters you want to divide your graph to. (for example 2 or 3 or 4, ...) d Number of repetitions of local optimization procedure -- (from randomly selected initial partition to local minimum). The best solution from all repetitions is selected. For more complicated graphs more repetitions are needed to find a good solution. If the optimal solution (solution without errors) is found before the last repetition is finished the program is stopped. e Factor alpha that tells us the ratio between importance of negative and positive errors: Total.error=alpha*(neg.errors) + (1-alpha)*(pos.errors) 0 < alpha < 0.5 positive errors are more important than negative alpha=0.5 positive and negative errors are equally important 0.5 < alpha < 1 negative errors are more important than positive alpha > 1 all links are considered as positive alpha < 0 all links are considered as negative f Graphical display graphical mode (VGA is needed). If the last parameter is not 'g' non-graphical representation will be used. g using labels/numbers The local optimization procedure starts with randomly selected initial partition. But you can also suggest the program which initial partition should be used (for example you get this partition using some other approach and you want to optimize it). You can do that by using another input file with the same name as file with description of the graph and extension .inp. For example if you run the program as written above (name of the graph is sample6.net) and the file sample6.inp is found on the directory the initial partition will be that one which is recommended in file sample6.inp. 1 2 3 5 8 10 4 6 7 9 11 So the number of lines must be equal to the number of clusters in which we want to cluster our graph (in this example 2). Values in the same line correspond to the vertices' numbers that belong to the same initial cluster. If graphical representation is used the resulting clusters are displayed so that vertices that correspond to the same cluster are coloured with the same colour. If the solution is not optimal, the wrong relations are displayed with thick lines. If type 1 or type 2 input files are used the resulting clustering is written as properties in the same (input) file together with number of errors and value of alpha. The result is written only in case that this solution is found the first time (it is not written yet as properties). If type 3 input file is used the result is written to the file with the same name as input file and extension .out.