Pajek datasets
Amazon "Customers who bought this item also bought these items ..."
Dataset Amazon
Description
AmazonBk.net directed network with 216737 vertices and 982296 arcs.
AmazonBkLong.nam long names (author and title).
AmazonBkShort.nam short names (Amazon IDs).
AmazonCD.net directed network with 79244 vertices and 526271 arcs.
AmazonCDLong.nam long names (author and title).
AmazonCDShort.nam short names (Amazon IDs).
Download
AmazonBk.net (ZIP, 204K); included also original files
AmazonCD.net (ZIP, 204K); included also original files
Background
Amazon
opened its virtual doors in July 1995
with the mission to use Internet to transform book buying into the fastest and easiest way.
The Company's principal corporate offices are located in Seattle, Washington.
It is one of the leading online shopping sites. It offers huge selection of products, including
books, CDs, videos, DVDs, toys and games, electronics, kitchenware, computers etc.
The vertices in Amazon networks are books / CDs; while the arcs
are determined based on the list of products (CDs/books) under the title:
"Customers who bought this CD/book also bought"
Using relatively
simple program
written in Python
we 'harvested' the books network from June 16 till June 27, 2004; and the CDs
network from July 7 till July 23, 2004.
We harvested only the portion of each network reachable from the selected
starting book/CD.
The books network has 216737 vertices and 982296 arcs
(number of arcs = 983374, loops=831, mult.lines=289).
The CDs network has 79244 vertices and 526271 arcs.
By the construction both networks have limited out-degree and are
weakly connected. 178281 books have the out-degree 5; and
55373 CDs have out-degree 8.
History
- June-July 2004 harvesting of original data from Amazon;
Pajek Data;
Pajek Home
24. November 2004
|