Date : Thu, 30 Sep 1993 11:29 EST From : PITPAT@vms.cis.pitt.edu Change 1: some clarifications 10 Oct 1993, Vladimir Batagelj ----------------------------------------------------------------------------- S T R A N very preliminary help notes, by Patrick Doreian *** ------ and some comments, by Vladimir Batagelj *** ------ *** ------ The latest version of programs for PC will be available by anonymous FTP from uek.uni-lj.si:pub/vlado or ftp.mat.uni-lj.si:pub/datana *** ------ This disk contains four executable files, CLUSE.INF and two examples, SHARP and MULTI. SHARP.NET comes from a book on LIttle League teams by Alan Fine With the Boys. MULTI.NET came with MODEL. I have not played with it, but it is an example of having more than one relation. I do not know the practical limits with regard to the number of nodes or the number of relations that can be examined. Experiment, but there are trade offs: larger networks with fewer relations versus more relations but for fewer nodes. *** ------ In July 1993 version we have: max # of units = 20, max # of networks = 8 ; ALLREQ, STRUCA, REGULA max # of units = 50, max # of networks = 1 ; MODEL *** ------ STRAN is the overall 'complete' framework/program for doing structural analyses. There is no specific program called STRAN on the diskette, but the *.EXE files are all part of "STRAN". The following is a brief description of the partitioning methods via the direct approach advocated by Doreian, Batagelj and Ferligoj. The versions on this diskette were created in the beginning of July on the last days of my stay in Slovenia. As we were revising code on close to an hourly basis, I am not entirely sure which version I brought back. (Indeed, the creation date is for the day I left.) There are differences between the version we used prior to the European Network Conference in Munich this summer and the one on this disk. Which is another way of saying that I am not able to duplicate all of the results I generated while in Slovenia. (There were experiments with different specifications of error functions.) This makes me a little nervous and is, perhaps, a caution that I will not always be able to give an intelligent response to all of the questions that will arise from using these programs. I am neither the magician nor the witch doctor....... To run, you need to have the following files (some of which will get generated during an analysis): *** ------ You need only NET file - all other files are selected/defined during program execution. The file extensions are 'free'; but I recommend the use of default extensions (NET, LST, ENV, ANA) or some other system. *** ------ 1 a network matrix with the extension NET (e.g SHARP.NET) 2 an environment matrix with the extension ENV (e.g.SHARP.ENV) Note: there is no need to have a tailor made environment file for each analysis as you can redirect from an environment file for some other run during a current run. 3. An output file that can be specified during a run. Note: we tend to have these with a LST extension, but this is not required. I frequently use an OUT extension and believe that any extension is OK. (Except those reserved for other purposes like NET, ANA etc.) 4. A file with an ANA extension. This can be used as a file receiving partition information (for an analysis) or as an input file for giving specific partitions. (The value of the criterion function can be computed for the partiton, and this partition can be the start point for invoking an optimizing program.) *** ------ The clustering on ANA file has the following form *CLUSTERING Oct-10-1993 Sharpstone MODEL RANDOM 13 3 0 0 2.0000000 1 1 2 1 2 2 1 3 3 3 3 3 3 The third number in the third line is the number of units, followed by the number of clusters. The last number in this line is the value of criterion function (clustering error). You can edit clusterings on ANA file to test clusterings obtained from other sources/programs. For example *CLUSTERING Sep-14-1992 Sharpstone - alternating clustering MODEL MANUAL 13 4 0 0 0 1 2 3 4 1 2 3 4 1 2 3 4 1 *** ------ 5. a CLUSE.INF file that is read during an analysis. It simply gives a name for an environment file (which can be overruled.) However, during some failing analysis this file can be changed in a way which screws up subsequent runs. (To use a technical term.) Make sure it does not mutate into form that does not just state an environment file. For example, CLUSE.INF on the disk simple states: sharp.env *** ------ In case of problems, delete CLUSE.INF and ENV files, and retry *** ------ On this disk are examples of these files for a specific set of empirical examples. Programs: ALLREQ.EXE This will do an exhaustive search of all partitions of a (small) network and compute the criterion function for regular equivalence. The partitions (or a set of them) are reported in terms of the smallest value of the criterion function. MODEL.EXE The 'current' version of the software where all of the generalized partition patterns can be specified, in any combination, and then can be modelled - as described in the recently circulated manuscript (which will appear in JMS.) STRUCA.EXE A program that focuses only on structural equivalence and was (is) a precurser to MODEL. REGULA.EXE Similarly, this is an earlier program confined to regular equivalence. The criterion function may be 'old' in the sense that we used to count the number of not 1-covered rows and columns. We now count the number of 0's in not 1-covered rows and columns. *** ------ ALLREQ, REGULA and STRUCA use the 'old' criterion functions *** ------ Using ALLREQ 1. The program is invoked by typing ALLREQ. You will be told you are in the world of STRAN and you then hit enter. 2. The current ENV file is then listed and you are invited to specify a new ENV file. If you do not need to do this, hit enter. If you do want/need to change this file, type in the new environment file's name. If it is there, and is in an appropriate form, it becomes the current file. You then hit enter. 3. The screen then reports the current LST file. The interaction sequence is the same as in step 2. Note if the LST file has output already, the subsequent output will be appended to it. 4. The screen reports the current NET file. Again, the interaction sequence is the same. If the NET file is in error, or cannot be found you get dumped into the welcoming arms of DOS. 5. Next, it is the turn of the ANA file. Same interaction sequence. If there are partitions there already, the current run will append output to it. If nothing is amiss you then get to the real stuff. 6. Into the real stuff. A heading will be provided and you get the opportunity to provide an additional title. a. You get the following options: 0 stop 1 report the short output 2 list all We strongly recommend using 1 (unless you want to stop). b. You are next asked to specify the number of clusters in the partition. This can run from 1 to n if there are n nodes. These extremes are trivial, so choose numbers from 2 to (n-1). You will get on the screen a continuous running count of the partitions searched and the lowest values of the criterion function thus far encountered. When all partions with the specified number of clusters have been examined, you go back to 6a and can specify another number of clusters. Etc. When done, respond with 0 to stop. c. The output can be viewed in an editor or word processor of your choice. Using MODEL 1. Invoke by typing MODEL 2. You will then enter exactly the interaction sequences as described above: i.e. set up the ENV, LST, NET, ANA files. With nothing amiss you get the same opportunity to add a heading, as before. 3. Now the options for MODEL kick in: error type 1 constant 2 size We recommend 1. [If you go with 2 you are playing in a new playpen. And I will say only " beware there be dragons out there". (with 2 there is a way of weighting items for calculating the criterion function values...I have not learned its details.)] 4. Next, you get minimal dom/fun/reg size. We recommend you choose 2. This precludes having a trivial 'block' like 00000100000 from being declared as 1-covered. Thus the minimal block size for such a declaration is 2 rows (or 2 columns). I guess that you can choose higher numbers here but I have not done so....yet. *** ------ You can choose higher numbers *** ------ 5. Next, you have the option for specifying the weighting/aggregation rule for computing ties between blocks: options: 0 without weighting (probably the only that makes sense for binary data) 1. average (mean) 2. median *** ------ Use 1 for at least interval scales and binary data, and 2 for ordinal scales. You can omit the value matrix by selecting 0. *** ------ 6. You then get a table that has four columns: index weight status block type 0 default 1 on/off/all etc null 1 " " complete 2 " " row-dominant 3 " " column-dominant 4 " " regular 5 " " row-regular 6 " " column-regular 7 " " row-functional 8 " " column-functional 9 " " density The default is to have the weight 1 for each type of pattern. In the current version, the default is the null, complete and regular patterns are ON with the rest OFF. (Look at SHARP.LST which is an output file for the SHARP.NET example.) This is consistent with regular equivalence. However, these can be changed and the next interaction sequence takes you into doing that. (I do not know how to change the default setting in a way that persists until the next completely new analysis run.) You will get the options -1 stop (the defaults are acceptable) 10 to change the priority ordering of the patterns. The default is to have the priorities as listed in the order above. To change, respond by typing 10 and then, in response to prompts, give the priority ordering you want by typing the patterns in the order you want. i change the ith weight. You enter the index of the item you want to change (e.g. 4 to change the status of the regular pattern) [You enter the value and not i.] Once in there, 0 turns the pattern OFF 1 pattern for diagonal diagonal blocks only 2 pattern for off diagonal blocks 3 all blocks to conform to the pattern. and -1 to terminate setting parameters *** ------ In the last version of MODEL, MODEL1, there are two densities density of diagonal blocks and density of outdiagonal blocks *** ------ 7. If there is an ANA file that contains partitions and you named it as the ANA file, then MODEL will read this and take you into an examination of the partitions in that ANA file. If you named a non-existant ANA file, you go into using MODEL directly, as it were, to return partitions for the network in the NET file. So, if that is what you want to do, do not name an extant ANA file Output in the form of a listed set of partitions will go into a new ANA file with that name. And if you wanted to examine an extant partition, for example gotten from the indirect approach in UCINET, put it in ANA file, declare it and continue. MODEL will read the first partition in the ANA file and will give you 3 options: 1. Optimize from that partition in the ANA file 2 Skip the optimization (and just use MODEL to compute the value of the criterion function.) 3. Stop. If there is more than one partition in the ANA file, MODEL will stop and you need to hit ENTER to go to the next partition. Etc. 8. This is the optimize part, either reached directly (step 6) or through the use of an ANA file (in step 7.) You get three options: 0 relocate (move a node from one cluster to another) 1 relocate and transpose (interchange nodes between two clusters.) We recommend option 1. (If you really are into speed reading you can follow the changes that are made. Each new random initial partitions triggers a high value of the criterion function and you see the changes and their impact on the criterion function. With that said, your time is better spent doing something else as MODEL grinds on.) 9. Choose the number of random starting partitions. Remember that this is a local optimization procedure. Choose a high number depending on your patience, experience or whether you can go off and do something else. I use hundreds of random starts and sometimes more. In here you have further options: 0 save partitions 1 ask You are asked if you want to save the specific partition. This can get very tedious 2 opt Save optimal partions 3 all Save all partitions These options seem useful if you are looking at extant partitions in an ANA file. (At the moment I am not sure of the difference between 0 and 3.) *** ------ 0 DO NOT save partitions to ANA file If you after the current run select the same number of clusters previous results are preserved (new results are merged with them). Therefore the usual strategy is to try first with some reasonable number of repetitions (20-100) and then select the 'real' number of repetitions according to the sample. *** ------ 10 After the repetitions are done you go back step 8. Enter 0 if you want to stop. If you want to explore something else, and do not exit, the subsequent output will be appended in the LST file, and partitions will be stored in the ANA file. Using STRUCA This follows the same form as MODEL only much simpler. You have to do all of the initial specification of ENV,NET,ANA and NET files as before and you follow the prompts (which are only a small subset of MODEL's prompts.) The only difference is that you get to specify the relative importance of errors. Recall, this is structural equivalence so the sources of errors are 1's where there should be 0's (i.e. in what should be a null block) and 0's when there should be 1's (i.e. in a complete block.) The diagonal blocks are taken care of automatically. Most runs weight them equally. For further discussion see Batagelj, Ferligoj and Doreian in Social Networks (1992, special issue on block models.) Using REGULA This is specific to regular equivalence and is simpler to use than MODEL. (But it may have an old criterion function - not in itself bad. I have not explored this. The options are a subset of those for MODEL. Invoke by typing REGULA and follow the prompts. (See Batagelj, Doreian and Ferligoj in the Social Networks special issue on block models for a further discussion.) INPUT This is best discussed with an example - SHARP.NET *NETS SHARP 13 1 Sharpstone Little League team justin harry whit brian paul ian mike jim dan ray cliff mason roy *NET SHARP 0 0 Sharpstone 0 1 1 0 0 0 1 0 0 0 0 0 0 1 0 1 0 0 0 1 0 0 0 0 0 0 1 1 0 1 0 0 0 0 0 0 0 0 0 1 1 0 0 1 0 0 0 0 0 0 0 0 1 0 1 0 0 1 0 0 0 0 0 0 0 1 1 1 0 0 0 0 0 0 0 0 0 0 1 1 0 0 0 1 0 0 0 0 0 0 0 1 0 1 1 0 0 0 0 0 0 0 0 0 1 1 1 0 0 0 0 0 0 0 0 0 0 1 1 1 0 0 0 0 0 0 0 0 0 0 1 1 1 0 0 0 0 0 0 0 0 0 0 1 1 0 1 0 0 0 0 0 0 0 0 0 0 0 1 0 1 0 0 1 0 0 0 0 0 1 The first line is a declaration of the dimensions of the problem, in this case, 13 nodes in a single network. 2 A header/label line to remind you what you analyzed when you come back to the output later. 3. Next come the labels of the nodes in 7A10 format. Use as many lines as you need but put no more than 7 labels on a line. *** ------ exactly 7 labels, except on the last line *** ------ 4. The next line is magic to me.....I just do it. *** ------ Elements of this line are: *NET keyword name name of the network form 0 - binary matrix, 1 - integer matrix 2 - real matrix, 3 - graph sym 0 - general graph, 1 - symmetric graph *** ------ 5. Next a heading for the (first and, in this case the only) relation content of the ties. 6. The data in adjacency matrix form. *** ------ In MODEL the network SHARP can be equivalently given also in the form *NET SHARP 3 0 Sharpstone -1 2 3 7 -2 1 3 7 -3 1 2 4 -4 1 2 5 -5 1 3 6 -6 1 2 3 -7 1 2 6 -8 1 3 4 -9 1 2 3 -10 1 2 3 -11 1 2 3 -12 1 2 4 -13 3 5 8 0 *** ------ It is possible to analyze multiple matrices/relations and here is an example with 6 matrices: *** ------ But only with programs ALLREQ, STRUCA and REGULA. Program MODEL analyzes only the first network in NETS. *** ------ *NETS STUDENTS 11 6 STUDENT GOVERNMENT OF UNIVERSITY OF LJUBLJANA (Hlebec 1992) minister 1p.ministerminister 2minister 3minister 4minister 5minister 6 minister 7adviser 1 adviser 2 adviser 3 *NET STUDENTS 0 0 discussion, recall 0 1 1 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 1 1 0 1 0 1 1 1 0 0 0 0 0 0 0 0 0 1 1 0 0 0 0 1 0 1 0 1 1 1 0 0 0 0 1 0 1 1 0 1 1 0 0 0 0 0 0 1 0 0 0 1 1 0 1 0 1 0 1 0 0 1 0 0 0 1 0 0 0 1 0 0 1 1 0 0 1 1 0 1 1 1 0 0 0 0 0 0 0 0 0 0 0 1 0 1 1 0 0 *NET STUDENTS 0 0 discussion, recognition 0 1 1 0 0 1 0 0 0 1 0 0 0 0 1 0 1 0 1 0 0 0 1 1 0 1 1 1 1 1 0 1 0 0 0 0 0 0 0 1 1 0 0 0 0 1 0 1 0 1 1 1 0 1 0 0 1 0 1 1 0 1 1 0 0 0 0 0 0 1 0 0 0 1 1 0 1 0 1 0 1 0 0 1 0 0 0 1 0 1 1 0 0 0 1 1 0 0 1 1 1 1 0 1 0 0 0 0 0 0 0 0 0 1 0 1 0 1 0 0 0 *NET STUDENTS 0 0 asking for an opinion, recall 0 1 0 0 0 1 0 0 0 0 0 0 0 0 1 0 0 0 1 0 0 0 1 1 0 1 0 1 0 1 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 1 0 0 0 1 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 1 1 0 0 0 1 0 1 0 0 0 0 0 0 1 0 0 0 0 0 0 1 1 0 0 1 1 1 1 0 1 0 0 0 0 0 0 0 0 1 0 0 1 0 1 0 0 0 *NET STUDENTS 0 0 asking for an opinion, recognition 0 1 1 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 1 0 0 0 1 1 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 1 1 0 0 0 0 1 0 0 0 1 0 1 0 1 0 1 1 1 1 1 0 1 1 0 0 0 0 0 0 1 0 0 0 1 1 0 0 0 1 0 1 0 0 0 0 0 0 1 0 1 0 0 0 0 1 1 0 0 0 1 1 1 0 1 0 0 0 0 0 0 0 0 0 1 0 1 0 1 0 0 0 *NET STUDENTS 0 0 being asked for an opinion, recall (transposed) 0 1 1 0 0 0 0 0 0 1 1 1 0 1 1 1 1 1 1 0 1 0 0 1 0 0 1 0 0 0 0 0 0 0 1 1 0 0 1 1 1 0 0 1 0 1 0 0 0 1 1 0 0 1 0 0 1 0 0 0 0 0 0 0 0 1 0 0 0 1 0 0 0 1 1 0 1 0 1 1 1 0 1 1 0 1 0 1 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 1 0 0 0 0 0 0 0 1 0 0 0 0 0 0 1 0 0 *NET STUDENTS 0 0 being asked for an opinion, recognition (transposed) 0 1 1 1 0 0 0 1 0 1 1 1 0 1 1 0 1 0 1 0 1 0 1 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 1 1 1 1 0 0 1 1 0 0 0 1 0 0 0 0 0 1 1 0 0 1 0 0 0 0 0 1 1 0 0 1 0 1 0 1 1 0 0 0 1 0 1 0 1 1 0 1 0 1 0 0 0 0 0 0 1 0 0 0 0 1 0 1 0 1 0 0 0 0 0 0 0 0 0 0 0 0 1 0 1 0 0 ENVIRONMENT file This is, for me, like magic and my behavior is simply ritual (and done as it works.) (Vlado cautions against tinkering with ENV files. I kinda believe him.) Here is an example - SHARP.ENV. (Don't ask me about the numerical parameters.) *** ------ I do know their meaning, but I will not explain it because ENV files are STRAN system files - I will encode them in the next version. You do not need an ENV file to start the program. It is created/changed during program execution. *** ------ *Cluse/PC Sharpstone Little League team Sep-28-1993 20:42:45 0 0 1 0 0 0 0 60 80 0 1 1 1 C:\STRAN\TEST.RAW sharp.net C:\STRAN\TEST.NAM C:\STRAN\TEST.VAR C:\STRAN\TEST.REL sharp.lst C:\STRAN\TEST.DBG sharp.ana C:\STRAN\TEST1.DIS C:\STRAN\TEST2.DIS C:\STRAN\TEST3.DIS C:\STRAN\TEST4.DIS C:\STRAN\TEST5.DIS C:\STRAN\TEST6.DIS C:\STRAN\TEST7.DIS C:\STRAN\TEST8.DIS C:\STRAN\TEST9.DIS *EOD The main thing to note is that lurking in here is the specification of the 'current' files that are assumed: The 'current' network file will be SHARP.NET, the 'current' output file will be SHARP.LST and the 'current' partition storing file will be SHARP.ANA When you indicate a 'new' file in using a STRAN program, this change will be recorded into the environment file. So even if you want to analyze a different matrix with different dimensions, any ENV file can be used. (It is a bit like working with sour dough bread or your own yogurt, all you need is a starter file.) OUTPUT This is fairly obvious. See the included examples. The main thing to note with MODEL is that at the end of the output, or a section of the output, you are given the connection pattern in the model matrix. Also, the count of errors partitioned in a way consistent with the model in the model matrix. Experiment, play and have fun!