Example datasets
Here you can find a list of available example datasets for Neo4j and learn how to import and explore them.
Datasets
For getting started with Neo4j, it’s helpful to use example datasets relevant to your domain and use case. For each we want to provide a description, the graph model and some use case queries.
Built-in examples
Neo4j Browser comes with two built-in databases, which you can create and explore using interactive slideshows.
The "Movies" example, is launched via the :guide movie-graph
command and contains a small graph of movies and people related to those movies as actors, directors, producers etc.
The "Northwind" example, is run via :guide northwind-graph
and contains a traditional retail-system with products, orders, customers, suppliers and employees.
It walks you through the import of the data and incrementally complex queries using the available data.
Other Browser guides
Other example datasets that you can run within your own Neo4j Browser are:
-
Game of Thrones Interactions —
:play got
-
UK company registration, property ownership, political donations —
:play ukcompanies
-
Stack Overflow users, tags and Q&A data —
:play stackoverflow
-
BBC Good Foods recipe data —
:play recipes
-
Airbnb listings data —
:play listings
-
Football (Soccer) transfer data —
:play football_transfers
AuraDB Free example datasets
When creating a new database in Neo4j AuraDB, besides the default empty you can also select one of the starting datasets:
-
Movies
-
Graph based Recommendations
-
Graphs for Cybersecurity
-
StackOverflow Data
You can explore them following the Browser guides instructions and test data with suggested Cypher® queries.
In addition, you have few options to download graph data into Aura from other Neo4j instances:
-
Load a dump from Neo4j Sandbox backup.
-
Load a dump from Neo4j Graph Example repository.
-
Load a dump from Neo4j Desktop.
For more information, you can read the blog post Week 10 — Getting Dumps and Example Projects into Aura Free and watch the corresponding video from the series Discover Neo4j Aura Free with Michael and Alex.
Neo4j Sandbox
To explore a wide variety of datasets in an online setup without a local installation, you can use the Neo4j Sandbox.
Each sandbox is available for at least three days after creation and can also be remotely accessed from applications using any Neo4j driver.
Except for the "blank" sandbox, all other sandboxes come prepopulated with the domain data and focus on use case specific queries.
All sandboxes provide access to Neo4j Browser, Neo4j Bloom, APOC, Graph Data Science, neosemantics (n10s) and a GraphQL integration.
Neo4j Graph Example repository on Github
The data, browser guides, code examples (JavaScript, Java, Python, Go, C#), Cypher queries, Bloom perspectives for each sandbox are all available on GitHub repository.
The use cases range from
-
social networks optionally using your own Twitter account (Repo).
Neo4j dataset demo server
Access information
If you need to explore more graph databases you can access the server on https://demo.neo4jlabs.com:7473
This server hosts a number of datasets with read-only access for public use.
The username and password are the same as the database name.
For instance, for recommendations
database the username is recommendations
and password is recommendations
too.
Means of data import
Loading data from source data
The most reliable way to get a dataset into Neo4j is to import it from the raw sources. Then you are independent of database versions, which you otherwise might have to upgrade. That’s why we provided raw data (CSV, JSON, XML) for several of the datasets, accompanied by import scripts in Cypher.
You could run the Cypher script using a command-line client like cypher-shell
.
./bin/cypher-shell -u neo4j -p "password" -f import-file.cypher
You can also drag and drop or paste the script into Neo4j Browser (check that multi-statement editor
is enabled in the settings) and run it from there.
CSV data can be imported using either LOAD CSV
clause in Cypher or neo4j-admin database import
for initial bulk imports of large datasets.
For loading JSON, XML files you need to have the APOC Core library installed, which comes with a number of procedures for importing data also from other databases.
To load XLS files, you can use APOC Extended library. Note APOC Extended library is not officially supported. |
Using a dump of a Neo4j database
Other datasets are provided as dump of a Neo4j datastore. Follow the link http://github.com/neo4j-graph-examples to find dump files for many graph example datasets.
-
Stop your Neo4j server.
-
Then you can import the file using the
./bin/neo4j-admin database load --overwrite-destination true --from-stdin neo4j < file.dump
command. -
Start the Neo4j server.
-
Import the file using the
./bin/neo4j-admin database load --overwrite-destination true --from-stdin <dbname> < file.dump
command. -
Make the new database known to the system database with
CREATE DATABASE dbname
which will also automatically start it.
The Neo4j version of some of the datasets might be older than your Neo4j version.
Then you might need to configure Neo4j to upgrade your database, by using |
Large data dumps
Stack Overflow
This is a graph-import of the Stack Overflow archive with 16.4M questions, 52k tags and 8.9M users (Stack Overflow Dump (6.2GB)). This graph is pretty big, for global graph queries you’d need a page-cache of 6G and heap of 16G to work with it.
Here is an article explaining the data model and some exploratory analysis we ran on the data.
The database is available in the Demo Server as outlined above.
Was this page helpful?