Network data

Steven Corman: News analysis

Ulrik Brandes: Steven Corman from Arizona State is generating networks from text documents and uses betweenness to identify the important issues in the text. You have met him in Dagstuhl. These networks are large, and I am sure he'd be willing to provide data. One data set he is using are Reuters news ticker information after the September 11 terrorist attack

The advantage would be that no-one has worked on this kind of data before (so we can do whatever we like without caring about other people's work), and ... are data that everyone at Sunbelt can relate to.

CRA Analyses of News Stories on the Terrorist Attack: This page contains descriptive analyses of media coverage of the terrorist attacks on the United States using Centering Resonance Analysis (CRA). CRA is a new text analysis technique developed by Steve Corman and Kevin Dooley at Arizona State University. It uses natural language processing and network text analysis techniques to produce abstract representations of texts that are useful for a number of purposes. The emphasis in this demo is on visualization of the CRA networks for text understanding. The terrorist attacks were chosen because many people are intimately aware of and interested in the details of that event.

Analysis of Dictionaries

Vladimir Batagelj, Andrej Mrvar: On the Internet exist several dictionaries of selected fields: FOLDOC - the Free On-line Dictionary of Computing, Creative Music Online Dictionary of Musical Terms , ArtLex: Dictionary of Visual Art .

Every dictionary entry points to other entries in the dictionary, thus determining a directed network on entries. For example: from the ODLIS: Online Dictionary of Library and Information Science we produced the corresponding network in Pajek format (2909 vertices and 18419 arcs). The obtained network provide us with a new view on the field's terminoilogy.

Very interesting view on words can be obtained also from WordNet; Lexical FreeNet Connected thesaurus where you find different relations between words.

BiBTeX Bibliographies

Vladimir Batagelj: Cooperation (colaboration) networks can be produced from bibliographies on different fields. They are often available in BibTeX format: 1, 2, 3.

The cooperation network among authors can be produced manually, as I did for one network Imrich W, Klavžar S. (1999) Graph products. ( 2-mode network file in Pajek format). But it wouldn't be too hard to write a program for this task. What can be said about such networks? With larger effort (programming a wraper or manually) it is also possible to produce citations networks on selected subjects from ISI or NEC.

Gerhard Wuehrer: Turistic data

Ulrik Brandes: "Also, I just recalled that I have access to another interesting data set: An Austrian professor for marketing approached me last Sunbelt with data on the usage of a savings card for tourists sold in an Austrian state. This is two-mode data (about 25.000 cards, that can be used to obtain savings in about 150 places) with many attributes. Because each card has an identification, it is known which card has been used at what date how many times in which place over a two-year period. Also, there is geographic information about points of usage and travel distance between them."

The advantage of Wuehrer's data set is that it is an entirely new application, where we are free to do whatever analysis we can think of - probably with more creativity and more surprising insights than in an application already studied from a network perspective. Or in other words: it saves us the time to find out and compare against what has already been done in that area.

Large German Social Network

Lothar Krempel:I can offer at least two datasets :
The first is a coauthor network in German Sociology . These data are from Jürgen Guedlers Dissertation. they are based on databases from the Informations Zentrum für Sozialwissenschaften in Bonn, Germany
In the very large database he finds 15 connected components. Among these there is a big 680 x 680 connected component. which is connected but quite sparse.. so it is difficult to draw nicely. Additional attributes are also available

People and Events

Lothar Krempel: another network I can offer
is two mode 1260 people connected through 60 events .. these data are from a joint article with Carola Lipp and are based on a database of historical documents additional attributes are available

South Pole Station

Jeff Johnson: As far as data you are welcome to use my over time data for winter over crews at the South Pole Station. This involves three years of data with between 22-28 crew per year and 8 time points per year. It is a small data set in terms of the number of nodes, but it does have the data over time dimension. Also the data are ratings of social interaction on a scale from 0-10.


Vladimir Batagelj: Genealogies (in GED format - you can use Pajek to transform them into graphs and some property vectors) are available on the Internet (several links are not active anymore) in large quantities. There are some specific problems in visualization of genealogies and their substructures. In some discussions with genealogist I found that not much was done on (special) methods for analysis of genealogies.

Besides family genealogies there exist also (scientificaly more interesting) genealogies of local communities. European Royalties, Genealogical Data from NSF Project on Empirical Kinship Networks.

Airline traffic

Vladimir Batagelj: Two sets of data about the airline traffic in USA through time are available: Airline Market and Segment Data and Airline Origin and Destination Data. A temporal valued network can be build from them.

Movie Database

Vladimir Batagelj: I was somehow surprised when I noticed that the The Internet Movie Database is available for download. I don't know what is its relation to databases as TvGuide and AllMovie. See also: EachMovie collaborative filtering data set.


Vladimir Batagelj: Here are some additional datasets from the web.


The Effects of Family Disruption on Social Mobility, Irish Educational Transitions Data

(Text) Datamining

Datamining -> Data , Data Mining Competitions, Reuters Corpus , New Reuters Corpus/ XML 20 Newsgroups.

Popular Kids

The Role of Sports as a Social Determinant for Children

Catasto Data

Catasto Data

Amsterdam Merchants

Amsterdam merchants, 1578-1630

International trade

International Trade Data; International Trade Patterns

International relations

International Relations Data Site


Cheswick's Internet scanning database