Pajek datasets


Amazon
"Customers who bought this item also bought these items ..."

Dataset   Amazon

Description

AmazonBk.net directed network with 216737 vertices and 982296 arcs.
AmazonBkLong.nam long names (author and title).
AmazonBkShort.nam short names (Amazon IDs).
AmazonCD.net directed network with 79244 vertices and 526271 arcs.
AmazonCDLong.nam long names (author and title).
AmazonCDShort.nam short names (Amazon IDs).

Download

AmazonBk.net (ZIP, 204K); included also original files

AmazonCD.net (ZIP, 204K); included also original files

Background

Amazon opened its virtual doors in July 1995 with the mission to use Internet to transform book buying into the fastest and easiest way. The Company's principal corporate offices are located in Seattle, Washington. It is one of the leading online shopping sites. It offers huge selection of products, including books, CDs, videos, DVDs, toys and games, electronics, kitchenware, computers etc.

The vertices in Amazon networks are books / CDs; while the arcs are determined based on the list of products (CDs/books) under the title: "Customers who bought this CD/book also bought"

Using relatively simple program written in Python we 'harvested' the books network from June 16 till June 27, 2004; and the CDs network from July 7 till July 23, 2004. We harvested only the portion of each network reachable from the selected starting book/CD.

The books network has 216737 vertices and 982296 arcs (number of arcs = 983374, loops=831, mult.lines=289). The CDs network has 79244 vertices and 526271 arcs.

By the construction both networks have limited out-degree and are weakly connected. 178281 books have the out-degree 5; and 55373 CDs have out-degree 8.

History

  1. June-July 2004 harvesting of original data from Amazon;


Pajek Data; Pajek Home
24. November 2004