Example datasets

Here you can find a list of available example datasets for Neo4j and learn how to import and explore them.

Datasets

For getting started with Neo4j, it’s helpful to use example datasets relevant to your domain and use case. For each we want to provide a description, the graph model and some use case queries.

Built-in examples

Neo4j Browser comes with two built-in databases, which you can create and explore using interactive slideshows.

The "Movies" example, is launched via the :guide movie-graph command and contains a small graph of movies and people related to those movies as actors, directors, producers etc.

The "Northwind" example, is run via :guide northwind-graph and contains a traditional retail-system with products, orders, customers, suppliers and employees. It walks you through the import of the data and incrementally complex queries using the available data.

Other Browser guides

Other example datasets that you can run within your own Neo4j Browser are:

Game of Thrones Interactions — :play got
UK company registration, property ownership, political donations — :play ukcompanies
Stack Overflow users, tags and Q&A data — :play stackoverflow
BBC Good Foods recipe data — :play recipes
Airbnb listings data — :play listings
Football (Soccer) transfer data — :play football_transfers

AuraDB Free example datasets

When creating a new database in Neo4j AuraDB, besides the default empty you can also select one of the starting datasets:

Movies
Graph based Recommendations
Graphs for Cybersecurity
StackOverflow Data

You can explore them following the Browser guides instructions and test data with suggested Cypher^® queries.

In addition, you have few options to download graph data into Aura from other Neo4j instances:

Load a dump from Neo4j Sandbox backup.
Load a dump from Neo4j Graph Example repository.
Load a dump from Neo4j Desktop.

For more information, you can read the blog post Week 10 — Getting Dumps and Example Projects into Aura Free and watch the corresponding video from the series Discover Neo4j Aura Free with Michael and Alex.

Neo4j Sandbox

To explore a wide variety of datasets in an online setup without a local installation, you can use the Neo4j Sandbox.

Each sandbox is available for at least three days after creation and can also be remotely accessed from applications using any Neo4j driver.

Except for the "blank" sandbox, all other sandboxes come prepopulated with the domain data and focus on use case specific queries.

All sandboxes provide access to Neo4j Browser, Neo4j Bloom, APOC, Graph Data Science, neosemantics (n10s) and a GraphQL integration.

Neo4j Graph Example repository on Github

The data, browser guides, code examples (JavaScript, Java, Python, Go, C#), Cypher queries, Bloom perspectives for each sandbox are all available on GitHub repository.

The use cases range from

movie recommendations (Repo)
network management (Repo)
investigative data from the ICIJ (Panama Papers) (Repo)
crime investigation (Repo)
social networks optionally using your own Twitter account (Repo).

Neo4j dataset demo server

Access information

If you need to explore more graph databases you can access the server on https://demo.neo4jlabs.com:7473
This server hosts a number of datasets with read-only access for public use.
The username and password are the same as the database name.
For instance, for recommendations database the username is recommendations and password is recommendations too.

Hosted databases

You can open any of the following databases by clicking the link. Don’t forget to copy the username and password.

recommendations

movies

northwind

fincen

twitter

stackoverflow

gameofthrones

neoflix

wordnet

slack

Means of data import

Loading data from source data

The most reliable way to get a dataset into Neo4j is to import it from the raw sources. Then you are independent of database versions, which you otherwise might have to upgrade. That’s why we provided raw data (CSV, JSON, XML) for several of the datasets, accompanied by import scripts in Cypher.

You could run the Cypher script using a command-line client like cypher-shell.

Run Cypher Shell from the "Terminal" of your Graph Database in Neo4j Desktop

./bin/cypher-shell -u neo4j -p "password" -f import-file.cypher

You can also drag and drop or paste the script into Neo4j Browser (check that multi-statement editor is enabled in the settings) and run it from there.

CSV data can be imported using either LOAD CSV clause in Cypher or neo4j-admin database import for initial bulk imports of large datasets.

For loading JSON, XML files you need to have the APOC Core library installed, which comes with a number of procedures for importing data also from other databases.

To load XLS files, you can use APOC Extended library. Note APOC Extended library is not officially supported.

Using a dump of a Neo4j database

Other datasets are provided as dump of a Neo4j datastore. Follow the link http://github.com/neo4j-graph-examples to find dump files for many graph example datasets.

Community Edition (replace the default database)

Stop your Neo4j server.
Then you can import the file using the ./bin/neo4j-admin database load --overwrite-destination true --from-stdin neo4j < file.dump command.
Start the Neo4j server.

Enterprise Edition (also Neo4j Desktop)

Import the file using the ./bin/neo4j-admin database load --overwrite-destination true --from-stdin <dbname> < file.dump command.
Make the new database known to the system database with CREATE DATABASE dbname which will also automatically start it.

The Neo4j version of some of the datasets might be older than your Neo4j version. Then you might need to configure Neo4j to upgrade your database, by using neo4j-admin database migrate command. Pay attention, the neo4j-admin database migrate command is run only on a stopped database. For more details, see Operations manual → Tools.

Large data dumps

Stack Overflow

This is a graph-import of the Stack Overflow archive with 16.4M questions, 52k tags and 8.9M users (Stack Overflow Dump (6.2GB)). This graph is pretty big, for global graph queries you’d need a page-cache of 6G and heap of 16G to work with it.

Here is an article explaining the data model and some exploratory analysis we ran on the data.

The database is available in the Demo Server as outlined above.

Was this page helpful?