arangoimport Options
Usage: arangoimport [<options>]
General
--auto-rate-limit
Introduced in: v3.7.11
Type: boolean
Adjust the data loading rate automatically, starting at --batch-size
bytes per thread per second.
This option can be specified without a value to enable it.
Default: false
--backslash-escape
Type: boolean
Use backslash as the escape character for quotes. Used for CSV and TSV imports.
This option can be specified without a value to enable it.
Default: false
--batch-size
Type: uint64
The size for individual data batches (in bytes).
Default: 8388608
--check-configuration
Type: boolean
Check the configuration and exit.
This is a command, no value needs to be specified. The process terminates after executing the command.
--collection
Type: string
The name of the collection to import into.
Default: ""
--config
Type: string
The configuration file or "none".
Default: ""
--configuration
Type: string
The configuration file or "none".
Default: ""
--convert
Type: boolean
Convert the strings null
, false
, true
and strings containing numbers into non-string types. For CSV and TSV only.
This option can be specified without a value to enable it.
Default: true
--create-collection
Type: boolean
Create collection if it does not yet exist
This option can be specified without a value to enable it.
Default: false
--create-collection-type
Type: string
The type of the collection if it needs to be created (edge or document).
Default: "document"
Possible values: “document”, “edge”
--create-database
Type: boolean
Create the target database if it does not exist.
This option can be specified without a value to enable it.
Default: false
--datatype
Introduced in: v3.9.0
Type: string…
Force a specific datatype for an attribute (null/boolean/number/string) using the syntax "attribute=type". For CSV and TSV only. Takes precedence over --convert
.
Default: []
--define
Type: string…
Define a value for a @key@
entry in the configuration file using the syntax "key=value"
.
Default: []
--dump-dependencies
Type: boolean
Dump the dependency graph of the feature phases (internal) and exit.
This is a command, no value needs to be specified. The process terminates after executing the command.
--dump-options
Type: boolean
Dump all available startup options in JSON format and exit.
This is a command, no value needs to be specified. The process terminates after executing the command.
--file
Type: string
The file to import ("-" for stdin).
Default: ""
--from-collection-prefix
Type: string
The collection name prefix to prepend to all values in the _from
attribute.
Default: ""
--headers-file
Introduced in: v3.8.0
Type: string
The file to read the CSV or TSV header from. If specified, no header is expected in the regular input file.
Default: ""
--ignore-missing
Type: boolean
Ignore missing columns in CSV and TSV input.
This option can be specified without a value to enable it.
Default: false
--latency
Type: boolean
Show 10 second latency statistics (values in microseconds).
This option can be specified without a value to enable it.
Default: false
--log
Deprecated in: v3.5.0
Type: string…
Set the topic-specific log level, using --log level
for the general topic or --log topic=level
for the specified topic (can be specified multiple times). Available log levels: fatal, error, warning, info, debug, trace.
Default: ["info"]
--merge-attributes
Introduced in: v3.9.1
Type: string…
Merge attributes into new document attribute (e.g. "mergedAttribute=[someAttribute]-[otherAttribute]") (CSV and TSV only)
Default: []
--on-duplicate
Type: string
The action to perform when a unique key constraint violation occurs. Possible values: ignore, replace, update, error
Default: "error"
Possible values: “error”, “ignore”, “replace”, “update”
--overwrite
Type: boolean
Overwrite the collection if it exists. WARNING: This removes any data from the collection!
This option can be specified without a value to enable it.
Default: false
--overwrite-collection-prefix
Type: boolean
If the collection name is already prefixed, overwrite the prefix. Only useful in combination with --from-collection-prefix
/ --to-collection-prefix
.
This option can be specified without a value to enable it.
Default: false
--progress
Type: boolean
Show the progress.
This option can be specified without a value to enable it.
Default: true
--quote
Type: string
Quote character(s). Used for CSV and TSV.
Default: "\""
--remove-attribute
Type: string…
Remove an attribute before inserting documents into collection (for CSV, TSV and JSON only)
Default: []
--separator
Type: string
The field separator. Used for CSV and TSV imports. Defaults to a comma (CSV) or a tabulation character (TSV).
Default: dynamic (e.g. ""
)
--skip-lines
Type: uint64
The number of lines to skip of the input file (CSV and TSV only).
Default: 0
--skip-validation
Introduced in: v3.7.0
Type: boolean
Skip document schema validation during import.
This option can be specified without a value to enable it.
Default: false
--threads
Type: uint32
Number of parallel import threads.
Default: dynamic (e.g. 32
)
--to-collection-prefix
Type: string
The collection name prefix to prepend to all values in the _to
attribute.
Default: ""
--translate
Type: string…
Translate an attribute name using the syntax "from=to". For CSV and TSV only.
Default: []
--type
Type: string
The format of import file.
Default: "json"
Possible values: “auto”, “csv”, “json”, “jsonl”, “tsv”
--use-splice-syscall
Introduced in: v3.9.4
Type: boolean
Use the splice() syscall for file copying (may not be supported on all filesystems).
This option can be specified without a value to enable it.
Default: true
Show details…
While the syscall is generally available since
Linux 2.6.x, it is also required that the underlying filesystem supports the
splice operation. This is not true for some encrypted filesystems
(e.g. ecryptfs), on which splice()
calls can fail.
You can set the --use-splice-syscall
startup option to false
to use a less
efficient, but more portable file copying method instead, which should work on
all filesystems.
--version
Type: boolean
Print the version and other related information, then exit.
This is a command, no value needs to be specified. The process terminates after executing the command.
--version-json
Introduced in: v3.9.0
Type: boolean
Print the version and other related information in JSON format, then exit.
This is a command, no value needs to be specified. The process terminates after executing the command.
Encryption
--encryption.key-generator
Enterprise Edition only
Type: string
A program providing the encryption key on stdout. If set, encryption at rest is enabled.
Default: ""
Show details…
The program must output 32 bytes of data on the standard output and exit.
--encryption.keyfile
Enterprise Edition only
Type: string
The path to the file that contains the encryption key. Must contain 32 bytes of data. If set, encryption at rest is enabled.
Default: ""
Show details…
You must secure the encryption key file so that
only arangodump
, arangorestore
, and arangod
can access it. You should also
ensure that the file is not readable if someone steals your hardware, for
example, by encrypting /mytmpfs
or creating an in-memory file-system under
/mytmpfs
.
Log
--log.color
Type: boolean
Use colors for TTY logging.
This option can be specified without a value to enable it.
Default: dynamic (e.g. true
)
--log.escape-control-chars
Introduced in: v3.9.0
Type: boolean
Escape control characters in log messages.
This option can be specified without a value to enable it.
Default: true
Show details…
This option applies to the control characters,
that have hex codes below \x20
, and also the character DEL
with hex code
\x7f
.
If you set this option to false
, control characters are retained when they
have a visible representation, and replaced with a space character in case they
do not have a visible representation. For example, the control character \n
is visible, so a \n
is displayed in the log. Contrary, the control character
BEL
is not visible, so a space is displayed instead.
If you set this option to true
, the hex code for the character is displayed,
for example, the BEL
character is displayed as \x07
.
The default value for this option is true
to ensure compatibility with
previous versions.
A side effect of turning off the escaping is that it reduces the CPU overhead
for the logging. However, this is only noticeable if logging is set to a very
verbose level (e.g. debug
or trace
).
--log.escape-unicode-chars
Introduced in: v3.9.0
Type: boolean
Escape Unicode characters in log messages.
This option can be specified without a value to enable it.
Default: false
Show details…
If you set this option to false
, Unicode
characters are retained and written to the log as-is. For example, 犬
is
logged as 犬
.
If you set this options to true
, any Unicode characters are escaped, and the
hex codes for all Unicode characters are logged instead. For example, 犬
is
logged as \u72AC
.
The default value for this option is set to false
for compatibility with
previous versions.
A side effect of turning off the escaping is that it reduces the CPU overhead
for the logging. However, this is only noticeable if logging is set to a very
verbose level (e.g. debug
or trace
).
--log.file
Type: string
Shortcut for '--log.output file://<filename>'
Default: "-"
--log.file-group
Introduced in: v3.4.5
Type: string
Group to use for new log file, user must be a member of this group
Default: ""
--log.file-mode
Introduced in: v3.4.5
Type: string
Mode to use for new log file, umask will be applied as well
Default: ""
--log.force-direct
Type: boolean
Do not start a separate thread for logging.
This option can be specified without a value to enable it.
Default: false
Show details…
You can use this option to disable logging in an
extra logging thread. If set to true
, any log messages are immediately
printed in the thread that triggered the log message. This is non-optimal for
performance but can aid debugging. If set to false
, log messages are handed
off to an extra logging thread, which asynchronously writes the log messages.
--log.foreground-tty
Type: boolean
Also log to TTY if backgrounded.
This option can be specified without a value to enable it.
Default: dynamic (e.g. false
)
--log.hostname
Introduced in: v3.8.0
Type: string
The hostname to use in log message. Leave empty for none, use "auto" to automatically determine a hostname.
Default: ""
Show details…
You can specify a hostname to be logged at the
beginning of each log message (for regular logging) or inside the hostname
attribute (for JSON-based logging).
The default value is an empty string, meaning no hostnames is logged.
If you set this option to auto
, the hostname is automatically determined.
--log.ids
Introduced in: v3.5.0
Type: boolean
Log unique message IDs.
This option can be specified without a value to enable it.
Default: true
Show details…
Each log invocation in the ArangoDB source code contains a unique log ID, which can be used to quickly find the location in the source code that produced a specific log message.
Log IDs are printed as 5-digit hexadecimal identifiers in square brackets between the log level and the log topic:
2020-06-22T21:16:48Z [39028] INFO [144fe] {general} using storage engine
'rocksdb'
(where 144fe
is the log ID).
--log.level
Type: string…
Set the topic-specific log level, using --log.level level
for the general topic or --log.level topic=level
for the specified topic (can be specified multiple times).
Available log levels: fatal, error, warning, info, debug, trace.
Available log topics: all, agency, agencycomm, agencystore, aql, audit-authentication, audit-authorization, audit-collection, audit-database, audit-document, audit-hotbackup, audit-service, audit-view, authentication, authorization, backup, bench, cache, cluster, clustercomm, collector, communication, config, crash, development, dump, engines, flush, general, graphs, heartbeat, httpclient, ldap, license, maintenance, memory, mmap, performance, pregel, queries, rep-state, replication, replication2, requests, restore, rocksdb, security, ssl, startup, statistics, supervision, syscall, threads, trx, ttl, v8, validation, views.
Default: ["info"]
Show details…
ArangoDB’s log output is grouped by topics.
--log.level
can be specified multiple times at startup, for as many topics as
needed. The log verbosity and output files can be adjusted per log topic.
arangod --log.level all=warning --log.level queries=trace --log.level startup=trace
This sets a global log level of warning
and two topic-specific levels
(trace
for queries and info
for startup). Note that --log.level warning
does not set a log level globally for all existing topics, but only the
general
topic. Use the pseudo-topic all
to set a global log level.
The same in a configuration file:
[log]
level = all=warning
level = queries=trace
level = startup=trace
The available log levels are:
fatal
: Only log fatal errors.error
: Only log errors.warning
: Only log warnings and errors.info
: Log information messages, warnings, and errors.debug
: Log debug and information messages, warnings, and errors.trace
: Logs trace, debug, and information messages, warnings, and errors.
Note that the debug
and trace
levels are very verbose.
Some relevant log topics available in ArangoDB 3 are:
agency
: Information about the cluster Agency.performance
: Performance-related messages.queries
: Executed AQL queries, slow queries.replication
: Replication-related information.requests
: HTTP requests.startup
: Information about server startup and shutdown.threads
: Information about threads.
You can adjust the log levels at runtime via the PUT /_admin/log/level
HTTP API endpoint.
Audit logging (Enterprise Edition): The server logs all audit events by
default. Low priority events, such as statistics operations, are logged with the
debug
log level. To keep such events from cluttering the log, set the
appropriate log topics to the info
log level.
--log.line-number
Type: boolean
Include the function name, file name, and line number of the source code that issues the log message. Format: [func@FileName.cpp:123]
This option can be specified without a value to enable it.
Default: false
--log.max-entry-length
Introduced in: v3.7.9
Type: uint32
The maximum length of a log entry (in bytes).
Default: 134217728
Show details…
Note: This option does not include audit log
messages. See --audit.max-entry-length
instead.
Any log messages longer than the specified value are truncated and the suffix
...
is added to them.
The purpose of this option is to shorten long log messages in case there is not a lot of space for log files, and to keep rogue log messages from overusing resources.
The default value is 128 MB, which is very high and should effectively mean downwards-compatibility with previous arangod versions, which did not restrict the maximum size of log messages.
--log.output
Type: string…
Log destination(s), e.g. file:///path/to/file (any occurrence of $PID is replaced with the process ID).
Default: []
Show details…
This option allows you to direct the global or per-topic log messages to different outputs. The output definition can be one of the following:
-
for stdin+
for stderrsyslog://<syslog-facility>
syslog://<syslog-facility>/<application-name>
file://<relative-or-absolute-path>
To set up a per-topic output configuration, use
--log.output <topic>=<definition>
:
--log.output queries=file://queries.log
The above example logs query-related messages to the file queries.log
.
You can specify the option multiple times in order to configure the output for different log topics:
--log.level queries=trace --log.output queries=file:///queries.log
--log.level requests=info --log.output requests=file:///requests.log
The above example logs all query-related messages to the file queries.log
and HTTP requests with a level of info
or higher to the file requests.log
.
Any occurrence of $PID
in the log output value is replaced at runtime with
the actual process ID. This enables logging to process-specific files:
--log.output 'file://arangod.log.$PID'
Note that dollar sign may need extra escaping when specified on a command-line such as Bash.
If you specify --log.file-mode <octalvalue>
, then any newly created log
file uses octalvalue
as file mode. Please note that the umask
value is
applied as well.
If you specify --log.file-group <name>
, then any newly created log file tries
to use <name>
as the group name. Note that you have to be a member of that
group. Otherwise, the group ownership is not changed. This option is only
available under Linux and macOS. It is not available under Windows.
The old --log.file
option is still available for convenience. It is a
shortcut for the more general option --log.output file://filename
.
The old --log.requests-file
option is still available. It is a shortcut for
the more general option --log.output requests=file://...
.
--log.performance
Deprecated in: v3.5.0
Type: boolean
Shortcut for --log.level performance=trace
.
This option can be specified without a value to enable it.
Default: false
--log.prefix
Type: string
Prefix log message with this string.
Default: ""
Show details…
Example: arangod ... --log.prefix "-->"
2020-07-23T09:46:03Z --> [17493] INFO ...
--log.process
Introduced in: v3.8.0
Type: boolean
Show the process identifier (PID) in log messages.
This option can be specified without a value to enable it.
Default: true
--log.request-parameters
Type: boolean
Include full URLs and HTTP request parameters in trace logs
This option can be specified without a value to enable it.
Default: true
--log.role
Type: boolean
Log the server role.
This option can be specified without a value to enable it.
Default: false
Show details…
If you set this option to true
, log messages
contains a single character with the server’s role. The roles are:
U
: Undefined / unclear (used at startup)S
: Single serverC
: CoordinatorP
: Primary / DB-ServerA
: Agent
--log.shorten-filenames
Type: boolean
Shorten filenames in log output (use with --log.line-number)
This option can be specified without a value to enable it.
Default: true
--log.structured-param
Introduced in: v3.10.0
Type: string…
Toggle the usage of the log category parameter in structured log messages.
Default: []
Show details…
Some log messages can be displayed together with additional information in a structured form. The following parameters are available:
database
: The name of the database.username
: The name of the user.url
: The endpoint path.pregelID
: The ID of the Pregel job.
The format to enable or disable a parameter is <parameter>=<bool>
, or
<parameter>
to enable it. You can specify the option multiple times to
configure multiple parameters:
arangod --log.structured-param database=true --log.structured-param url
--log.structured-param username=false
You can adjust the parameter settings at runtime using the
/_admin/log/structured
HTTP API.
--log.thread
Type: boolean
Show the thread identifier in log messages.
This option can be specified without a value to enable it.
Default: false
--log.thread-name
Type: boolean
Show thread name in log messages.
This option can be specified without a value to enable it.
Default: false
--log.time-format
Introduced in: v3.5.0
Type: string
The time format to use in logs.
Default: "utc-datestring"
Possible values: “local-datestring”, “timestamp”, “timestamp-micros”, “timestamp-millis”, “uptime”, “uptime-micros”, “uptime-millis”, “utc-datestring”, “utc-datestring-micros”, “utc-datestring-millis”
Show details…
Overview over the different options:
Format | Example | Description |
---|---|---|
timestamp |
1553766923000 | Unix timestamps, in seconds |
timestamp-millis |
1553766923000.123 | Unix timestamps, in seconds, with millisecond precision |
timestamp-micros |
1553766923000.123456 | Unix timestamps, in seconds, with microsecond precision |
uptime |
987654 | seconds since server start |
uptime-millis |
987654.123 | seconds since server start, with millisecond precision |
uptime-micros |
987654.123456 | seconds since server start, with microsecond precision |
utc-datestring |
2019-03-28T09:55:23Z | UTC-based date and time in format YYYY-MM-DDTHH:MM:SSZ |
utc-datestring-millis |
2019-03-28T09:55:23.123Z | like utc-datestring , but with millisecond precision |
local-datestring |
2019-03-28T10:55:23 | local date and time in format YYYY-MM-DDTHH:MM:SS |
--log.use-json-format
Introduced in: v3.8.0
Type: boolean
Use JSON as output format for logging.
This option can be specified without a value to enable it.
Default: false
Show details…
You can use this option to switch the log output to the JSON format. Each log message then produces a separate line with JSON-encoded log data, which can be consumed by other applications.
The object attributes produced for each log message are:
Key | Value |
---|---|
time |
date/time of log message, in format specified by --log.time-format |
prefix |
only emitted if --log.prefix is set |
pid |
process id, only emitted if --log.process is set |
tid |
thread id, only emitted if --log.thread is set |
thread |
thread name, only emitted if --log.thread-name is set |
role |
server role (1 character), only emitted if --log.role is set |
level |
log level (e.g. "WARN" , "INFO" ) |
file |
source file name of log message, only emitted if --log.line-number is set |
line |
source file line of log message, only emitted if --log.line-number is set |
function |
source file function name, only emitted if --log.line-number is set |
topic |
log topic name |
id |
log id (5 digit hexadecimal string), only emitted if --log.ids is set |
hostname |
hostname if --log.hostname is set |
message |
the actual log message payload |
--log.use-local-time
Deprecated in: v3.5.0
Type: boolean
Use the local timezone instead of UTC.
This option can be specified without a value to enable it.
Default: false
Show details…
This option is deprecated.
Use --log.time-format local-datestring
instead.
--log.use-microtime
Deprecated in: v3.5.0
Type: boolean
Use Unix timestamps in seconds with microsecond precision.
This option can be specified without a value to enable it.
Default: false
Show details…
This option is deprecated.
Use --log.time-format timestamp-micros
instead.
Random
--random.generator
Type: uint32
The random number generator to use (1 = MERSENNE, 2 = RANDOM, 3 = URANDOM, 4 = COMBINED (not available on Windows), 5 = WinCrypt (Windows only).
Default: 1
Possible values: 1, 2, 3, 4
Show details…
1
: a pseudo-random number generator using an implication of the Mersenne Twister MT19937 algorithm2
: use a blocking random (or pseudo-random) number generator3
: use the non-blocking random (or pseudo-random) number generator supplied by the operating system4
: a combination of the blocking random number generator and the Mersenne Twister (not available on Windows)5
: use WinCrypt (Windows only)
Server
--server.authentication
Type: boolean
Require authentication credentials when connecting (does not affect the server-side authentication settings).
This option can be specified without a value to enable it.
Default: false
--server.connection-timeout
Type: double
The connection timeout (in seconds).
Default: 5
--server.database
Type: string
The database name to use when connecting.
Default: "_system"
--server.endpoint
Type: string…
The endpoint to connect to. Use 'none' to start without a server. Use http+ssl:// as schema to connect to an SSL-secured server endpoint, otherwise http+tcp:// or unix://
Default: ["http+tcp://127.0.0.1:8529"]
--server.max-packet-size
Type: uint64
The maximum packet size (in bytes) for client/server communication.
Default: 1073741824
--server.password
Type: string
The password to use when connecting. If not specified and authentication is required, the user is prompted for a password
Default: ""
--server.request-timeout
Type: double
The request timeout (in seconds).
Default: 1200
--server.username
Type: string
The username to use when connecting.
Default: "root"
SSL
--ssl.protocol
Type: uint64
The SSL protocol (1 = SSLv2 (unsupported), 2 = SSLv2 or SSLv3 (negotiated), 3 = SSLv3, 4 = TLSv1, 5 = TLSv1.2, 6 = TLSv1.3, 9 = generic TLS (negotiated))
Default: 5
Possible values: 1, 2, 3, 4, 5, 6, 9
Temp
--temp.path
Type: string
The path for temporary files.
Default: ""
Show details…
ArangoDB uses the path for storing temporary files, for extracting data from uploaded zip files (e.g. for Foxx services), and other things.
Ideally, the temporary path is set to an instance-specific subdirectory of the operating system’s temporary directory. To avoid data loss, the temporary path should not overlap with any directories that contain important data, for example, the instance’s database directory.
If you set the temporary path to the same directory as the instance’s database directory, a startup error is logged and the startup is aborted.