Pajek datasets
KDD Cup 2003 High Energy Particle Physics (HEP) literature
Dataset hepth
Description
hepth.net directed network with 27240 vertices and 342437 arcs (39 loops).
hepthnew.net directed network with 27770 vertices and 352807 arcs (39 loops).
datenew.vec integer vector on 27770 vertices.
yearnew.vec integer vector on 27770 vertices.
Download
complete dataset (ZIP, 2607K)
Background
Citation data from KDD Cup 2003, a knowledge discovery and data mining
competition held in conjunction with the Ninth Annual ACM SIGKDD Conference.
The Stanford Linear Accelerator Center SPIRESHEP database has been
comprehensively cataloguing the High Energy Particle Physics (HEP) literature
online since 1974, and indexes more than 500,000 highenergy physics related
articles including their full citation tree.
The network contains a citation graph of the hepth portion of the arXiv.
The units names are the arXiv IDs of papers; the relation is X cites Y .
Note that revised papers may have updated citations. As such, citations may
refer to future papers, i.e. a paper may cite another paper that was published
after the first paper.
The SLAC/SPIRES dates for all hepth papers are given. Some older papers were
uploaded years after their intial publication and the arXiv submission date
from the abstracts may not correspond to the publication date. An alternative
date has been provided from SLAC/SPIRES that may be a better estimate
for the initial publication of these old papers.
The first version of data was updated on May 12, 2003.
hepth.net X cites Y relation, first version.
hepthnew.net X cites Y relation, updated version.
datenew.vec SLAC date of paper was transformed to
the number of days since August 1, 1991, updated version.
yearnew.vec year from the SLAC date of paper, updated version.
References
 KDD Cup 2003
 arXiv
Transformed in Pajek format by V. Batagelj, 26. July 2003
Pajek Data;
Pajek Home
26. July 2003
