LOAD CSV

LOAD CSV is used to import data from CSV files.

Introduction

  • The URL of the CSV file is specified by using FROM followed by an arbitrary expression evaluating to the URL in question.

  • It is required to specify a variable for the CSV data using AS.

  • CSV files can be stored on the database server and are then accessible using a file:/// URL. Alternatively, LOAD CSV also supports accessing CSV files via HTTPS, HTTP, and FTP.

  • LOAD CSV supports resources compressed with gzip and Deflate. Additionally LOAD CSV supports locally stored CSV files compressed with ZIP.

  • LOAD CSV will follow HTTP redirects but for security reasons it will not follow redirects that changes the protocol, for example if the redirect is going from HTTPS to HTTP.

  • LOAD CSV is often used in conjunction with the subquery CALL { …​ } IN TRANSACTIONS.

  • LOAD CSV is regulated by the load privileges.

Configuration settings for file URLs
dbms.security.allow_csv_import_from_file_urls

This setting determines if Cypher will allow the use of file:/// URLs when loading data using LOAD CSV. Such URLs identify files on the filesystem of the database server. Default is true. Setting dbms.security.allow_csv_import_from_file_urls=false will completely disable access to the file system for LOAD CSV.

server.directories.import

Sets the root directory for file:/// URLs used with the Cypher LOAD CSV clause. This should be set to a single directory relative to the Neo4j installation path on the database server. All requests to load from file:/// URLs will then be relative to the specified directory. The default value set in the config settings is import. This is a security measure which prevents the database from accessing files outside the standard import directory, similar to how a Unix chroot operates. Setting this to an empty field will allow access to all files within the Neo4j installation folder. Commenting out this setting will disable the security feature, allowing all files in the local system to be imported. This is definitely not recommended.

File URLs will be resolved relative to the server.directories.import directory. For example, a file URL will typically look like file:///myfile.csv or file:///myproject/myfile.csv.

  • When using file:/// URLs, spaces and other non-alphanumeric characters need to be URL encoded.[1]

  • If server.directories.import is set to the default value import, using the above URLs in LOAD CSV would read from <NEO4J_HOME>/import/myfile.csv and <NEO4J_HOME>/import/myproject/myfile.csv respectively.

  • If it is set to /data/csv, using the above URLs in LOAD CSV would read from <NEO4J_HOME>/data/csv/myfile.csv and <NEO4J_HOME>/data/csv/myproject/myfile.csv respectively.

The file location is relative to the import. The config setting server.directories.import only applies to local disc and not to remote URLs.

See the examples below for further details.

CSV file format

The CSV file to use with LOAD CSV must have the following characteristics:

  • the character encoding is UTF-8;

  • the end line termination is system dependent, e.g., it is \n on unix or \r\n on windows;

  • the default field terminator is ,;

  • the field terminator character can be change by using the option FIELDTERMINATOR available in the LOAD CSV command;

  • quoted strings are allowed in the CSV file and the quotes are dropped when reading the data;

  • the character for string quotation is double quote ";

  • if dbms.import.csv.legacy_quote_escaping is set to the default value of true, \ is used as an escape character;

  • a double quote must be in a quoted string and escaped, either with the escape character or a second double quote.

Import data from a CSV file

To import data from a CSV file into Neo4j, you can use LOAD CSV to get the data into your query. Then you write it to your database using the normal updating clauses of Cypher®.

artists.csv
1,ABBA,1992
2,Roxette,1986
3,Europe,1979
4,The Cardigans,1992
Query
LOAD CSV FROM 'file:///artists.csv' AS line
CREATE (:Artist {name: line[1], year: toInteger(line[2])})

A new node with the Artist label is created for each row in the CSV file. In addition, two columns from the CSV file are set as properties on the nodes.

Result
+-------------------+
| No data returned. |
+-------------------+
Nodes created: 4
Properties set: 8
Labels added: 4

Import data from a remote CSV file

Accordingly, you can import data from a CSV file in a remote location into Neo4j. Note that this applies to all variations of CSV files (see examples below for other variations).

data.neo4j.com/bands/artists.csv
1,ABBA,1992
2,Roxette,1986
3,Europe,1979
4,The Cardigans,1992
Query
LOAD CSV FROM 'https://data.neo4j.com/bands/artists.csv' AS line
CREATE (:Artist {name: line[1], year: toInteger(line[2])})
Result
+-------------------+
| No data returned. |
+-------------------+
Nodes created: 4
Properties set: 8
Labels added: 4

Import data from a CSV file containing headers

When your CSV file has headers, you can view each row in the file as a map instead of as an array of strings.

artists-with-headers.csv
Id,Name,Year
1,ABBA,1992
2,Roxette,1986
3,Europe,1979
4,The Cardigans,1992
Query
LOAD CSV WITH HEADERS FROM 'file:///artists-with-headers.csv' AS line
CREATE (:Artist {name: line.Name, year: toInteger(line.Year)})

This time, the file starts with a single row containing column names. Indicate this using WITH HEADERS and you can access specific fields by their corresponding column name.

Result
+-------------------+
| No data returned. |
+-------------------+
Nodes created: 4
Properties set: 8
Labels added: 4

Import data from a CSV file with a custom field delimiter

Sometimes, your CSV file has other field delimiters than commas. You can specify which delimiter your file uses, using FIELDTERMINATOR. Hexadecimal representation of the unicode character encoding can be used if prepended by \u. The encoding must be written with four digits. For example, \u003B is equivalent to ; (SEMICOLON).

artists-fieldterminator.csv
1;ABBA;1992
2;Roxette;1986
3;Europe;1979
4;The Cardigans;1992
Query
LOAD CSV FROM 'file:///artists-fieldterminator.csv' AS line FIELDTERMINATOR ';'
CREATE (:Artist {name: line[1], year: toInteger(line[2])})

As values in this file are separated by a semicolon, a custom FIELDTERMINATOR is specified in the LOAD CSV clause.

Result
+-------------------+
| No data returned. |
+-------------------+
Nodes created: 4
Properties set: 8
Labels added: 4

Importing large amounts of data

If the CSV file contains a significant number of rows (approaching hundreds of thousands or millions), CALL { ... } IN TRANSACTIONS can be used to instruct Neo4j to commit a transaction after a number of rows. This reduces the memory overhead of the transaction state.

The query clause CALL { …​ } IN TRANSACTIONS is only allowed in implicit (auto-commit or :auto) transactions. For more information, see Subqueries in transactions.

artists.csv
1,ABBA,1992
2,Roxette,1986
3,Europe,1979
4,The Cardigans,1992
Query
LOAD CSV FROM 'file:///artists.csv' AS line
CALL {
  WITH line
  CREATE (:Artist {name: line[1], year: toInteger(line[2])})
} IN TRANSACTIONS
Result
+-------------------+
| No data returned. |
+-------------------+
Nodes created: 4
Properties set: 8
Labels added: 4
Transactions committed: 1

Setting the rate of CALL IN TRANSACTIONS

You can set the number of rows as in the example, where it is set to 500 rows.

artists.csv
1,ABBA,1992
2,Roxette,1986
3,Europe,1979
4,The Cardigans,1992
Query
LOAD CSV FROM 'file:///artists.csv' AS line
CALL {
  WITH line
  CREATE (:Artist {name: line[1], year: toInteger(line[2])})
} IN TRANSACTIONS OF 500 ROWS
Result
+-------------------+
| No data returned. |
+-------------------+
Nodes created: 4
Properties set: 8
Labels added: 4
Transactions committed: 1

Import data containing escaped characters

In this example, we both have additional quotes around the values, as well as escaped quotes inside one value.

artists-with-escaped-char.csv
"1","The ""Symbol""","1992"
Query
LOAD CSV FROM 'file:///artists-with-escaped-char.csv' AS line
CREATE (a:Artist {name: line[1], year: toInteger(line[2])})
RETURN
  a.name AS name,
  a.year AS year,
  size(a.name) AS size

Note that strings are wrapped in quotes in the output here. You can see that when comparing to the length of the string in this case!

Table 1. Result
name year size

'The "Symbol"'

1992

12

Nodes created: 1
Properties set: 2
Labels added: 1

Using linenumber() with LOAD CSV

For certain scenarios, like debugging a problem with a csv file, it may be useful to get the current line number that LOAD CSV is operating on. The linenumber() function provides exactly that or null if called without a LOAD CSV context.

artists.csv
1,ABBA,1992
2,Roxette,1986
3,Europe,1979
4,The Cardigans,1992
Query
LOAD CSV FROM 'file:///artists.csv' AS line
RETURN linenumber() AS number, line
Result
+---------------------------------------+
| number | line                         |
+---------------------------------------+
| 1      | ["1","ABBA","1992"]          |
| 2      | ["2","Roxette","1986"]       |
| 3      | ["3","Europe","1979"]        |
| 4      | ["4","The Cardigans","1992"] |
+---------------------------------------+
4 rows

Using file() with LOAD CSV

For certain scenarios, like debugging a problem with a csv file, it may be useful to get the absolute path of the file that LOAD CSV is operating on. The file() function provides exactly that or null if called without a LOAD CSV context.

artists.csv
1,ABBA,1992
2,Roxette,1986
3,Europe,1979
4,The Cardigans,1992
Query
LOAD CSV FROM 'file:///artists.csv' AS line
RETURN DISTINCT file() AS path

Since LOAD CSV can temporary download a file to process it, it is important to note that file() will always return the path on disk. If LOAD CSV is invoked with a file:/// URL that points to your disk file() will return that same path.

Result
+------------------------------------------+
| path                                     |
+------------------------------------------+
| "/home/example/neo4j/import/artists.csv" |
+------------------------------------------+
1 row