Pajek datasets

Roget's Thesaurus, 1879

Dataset   Roget

Description directed network with 1022 vertices and 5075 arcs (1 loop); word X is related to word Y.

Download (ZIP, 17K)


The network is based on the file roget.dat from the Stanford GraphBase that contains cross-references in Roget's Thesaurus, 1879.

Dr. Peter Mark Roget (1779-1869) philologist, scientist, physician. The name Roget could soon become a virtual synonym for the word "synonym". For those who use Roget's Thesaurus it is one of the three most important books ever printed...along with The Bible and Webster's Dictionary. In order to communicate one's exact intention...or one's precise meaning, the Thesaurus, being a list of synonyms or verbal equivalents, is a necessary tool. The first draft of the Thesaurus was written in 1805, two years before Webster started on his dictionary. However for a period of 47 years Dr. Roget used his manuscript as his personal, secret, treasure trove. Not until he was 73 years old did he decide to reveal and publish this great manuscript.

Since 1852, Roget's Thesaurus has never been out of print. In fact, each succeeding edition has increased the popularity of the work. The original 15,000 words included in the 1805 manuscript has increased to over a quarter of a million in the 1992 edition (the tenth printing). With such an increase in size, it is encouraging to notice that the basic content still remains intact..... for example, where the 1805 Thesaurus traces the word: existence: "Ens, entity, being, existence, essence...", the 1992 Thesaurus contains existence: "existence, being, entity, ens,...essence..."

Each vertex of the graph corresponds to one of the 1022 categories in the 1879 edition of Peter Mark Roget's Thesaurus of English Words and Phrases, edited by John Lewis Roget. An arc goes from one category to another if Roget gave a reference to the latter among the words and phrases of the former, or if the two categories were directly related to each other by their positions in Roget's book. For example, the vertex for category 312 (`ascent') has arcs to the vertices for categories 224 (`obliquity'), 313 (`descent'), and 316 (`leap'), because Roget gave explicit cross-references from 312 to 224 and 316, and because category 312 was implicitly paired with 313 in his scheme.


  1. Original Roget's Thesaurus was published in 1852.
  2. Peter's son John Luis Roget published the second, improved edition in 1879.
  3. Project Gutenberg Roget's Thesaurus (1911 edition) put into electronic format in 1991.
  4. Graph Roget.dat of cross-references based on the second edition was produced for Stanford GraphBase (SGB) in 1992/3.
  5. MICRA (Pat Cassidy) prepared the electronic version of the 1911 Roget's Thesaurus that is widely available on the internet.
  6. SGB Roget.dat transformed in Pajek format: A. Mrvar, 5. December 1996.


  1. Peter Mark Roget: Roget's Thesaurus of English Words and Phrases
  2. Project Gutenberg: Roget's Thesaurus
  3. Donald E. Knuth: The Stanford GraphBase: A Platform for Combinatorial Computing . New York: ACM Press, 1993
  4. The Stanford GraphBase: roget.dat, version 15.6.1993
  5. Pat Cassidy: MICRA / Factotum
  6. CIDE (Collaborative International Dictionary of English), GNU 1996-2002

Pajek Data; Pajek Home
23. January 2004