Pajek datasets

KDD Cup 2003
High Energy Particle Physics (HEP) literature

Dataset   hep-th

Description directed network with 27240 vertices and 342437 arcs (39 loops). directed network with 27770 vertices and 352807 arcs (39 loops).
date-new.vec integer vector on 27770 vertices.
year-new.vec integer vector on 27770 vertices.


complete dataset (ZIP, 2607K)

Background Citation data from KDD Cup 2003, a knowledge discovery and data mining competition held in conjunction with the Ninth Annual ACM SIGKDD Conference.

The Stanford Linear Accelerator Center SPIRES-HEP database has been comprehensively cataloguing the High Energy Particle Physics (HEP) literature online since 1974, and indexes more than 500,000 high-energy physics related articles including their full citation tree.

The network contains a citation graph of the hep-th portion of the arXiv. The units names are the arXiv IDs of papers; the relation is X cites Y . Note that revised papers may have updated citations. As such, citations may refer to future papers, i.e. a paper may cite another paper that was published after the first paper.

The SLAC/SPIRES dates for all hep-th papers are given. Some older papers were uploaded years after their intial publication and the arXiv submission date from the abstracts may not correspond to the publication date. An alternative date has been provided from SLAC/SPIRES that may be a better estimate for the initial publication of these old papers.

The first version of data was updated on May 12, 2003. X cites Y relation, first version. X cites Y relation, updated version.
date-new.vec SLAC date of paper was transformed to the number of days since August 1, 1991, updated version.
year-new.vec year from the SLAC date of paper, updated version.


  1. KDD Cup 2003
  2. arXiv
Transformed in Pajek format by V. Batagelj, 26. July 2003
Pajek Data; Pajek Home
26. July 2003