Skip to main content

gqlalchemy.transformations.importing.loaders

info

The features below aren’t included in the default GQLAlchemy installation. To use them, make sure to install GQLAlchemy with the relevant optional dependencies.

ForeignKeyMapping Objects​

@dataclass(frozen=True)
class ForeignKeyMapping()

Class that contains the full description of a single foreign key in a table.

Attributes:

  • column_name - Column name that holds the foreign key.
  • reference_table - Name of a table from which the foreign key is taken.
  • reference_key - Column name in the referenced table from which the foreign key is taken.

OneToManyMapping Objects​

@dataclass(frozen=True)
class OneToManyMapping()

Class that holds the full description of a single one to many mapping in a table.

Attributes:

  • foreign_key - Foreign key used for mapping.
  • label - Label which will be applied to the relationship created from this object.
  • from_entity - Direction of the relationship created from the mapping object.
  • parameters - Parameters that will be added to the relationship created from this object (Optional).

ManyToManyMapping Objects​

@dataclass(frozen=True)
class ManyToManyMapping()

Class that holds the full description of a single many to many mapping in a table. Many to many mapping is intended to be used in case of associative tables.

Attributes:

  • foreign_key_from - Describes the source of the relationship.
  • foreign_key_to - Describes the destination of the relationship.
  • label - Label to be applied to the newly created relationship.
  • parameters - Parameters that will be added to the relationship created from this object (Optional).

TableMapping Objects​

@dataclass
class TableMapping()

Class that holds the full description of all of the mappings for a single table.

Attributes:

  • table_name - Name of the table.
  • mapping - All of the mappings in the table (Optional).
  • indices - List of the indices to be created for this table (Optional).

NameMappings Objects​

@dataclass(frozen=True)
class NameMappings()

Class that contains new label name and all of the column name mappings for a single table.

Attributes:

  • label - New label (Optional).
  • column_names_mapping - Dictionary containing key-value pairs in form ("column name", "property name") (Optional).

NameMapper Objects​

class NameMapper()

Class that holds all name mappings for all of the collections.

get_label​

def get_label(collection_name: str) -> str

Returns label for given collection.

Arguments:

  • collection_name - Original collection name.

get_property_name​

def get_property_name(collection_name: str, column_name: str) -> str

Returns property name for column from collection.

Arguments:

  • collection_name - Original collection name.
  • column_name - Original column name.

FileSystemHandler Objects​

class FileSystemHandler(ABC)

Abstract class for defining FileSystemHandler.

Inherit this class, define a custom data source and initialize the connection.

get_path​

@abstractmethod
def get_path(collection_name: str) -> str

Returns complete path in specific file system. Used to read the file system for a specific file.

S3FileSystemHandler Objects​

class S3FileSystemHandler(FileSystemHandler)

Handles connection to Amazon S3 service via PyArrow.

__init__​

def __init__(bucket_name: str, **kwargs)

Initializes connection and data bucket.

Arguments:

  • bucket_name - Name of the bucket on S3 from which to read the data

    Kwargs:

  • access_key - S3 access key.

  • secret_key - S3 secret key.

  • region - S3 region.

  • session_token - S3 session token (Optional).

Raises:

  • KeyError - kwargs doesn't contain necessary fields.

get_path​

def get_path(collection_name: str) -> str

Get file path in file system.

Arguments:

  • collection_name - Name of the file to read.

AzureBlobFileSystemHandler Objects​

class AzureBlobFileSystemHandler(FileSystemHandler)

Handles connection to Azure Blob service via adlfs package.

__init__​

def __init__(container_name: str, **kwargs) -> None

Initializes connection and data container.

Arguments:

  • container_name - Name of the Blob container storing data.

    Kwargs:

  • account_name - Account name from Azure Blob.

  • account_key - Account key for Azure Blob (Optional - if using sas_token).

  • sas_token - Shared access signature token for authentification (Optional).

Raises:

  • KeyError - kwargs doesn't contain necessary fields.

get_path​

def get_path(collection_name: str) -> str

Get file path in file system.

Arguments:

  • collection_name - Name of the file to read.

LocalFileSystemHandler Objects​

class LocalFileSystemHandler(FileSystemHandler)

Handles a local filesystem.

__init__​

def __init__(path: str) -> None

Initializes an fsspec local file system and sets path to data.

Arguments:

  • path - path to the local storage location.

get_path​

def get_path(collection_name: str) -> str

Get file path in the local file system.

Arguments:

  • collection_name - Name of the file to read.

DataLoader Objects​

class DataLoader(ABC)

Implements loading of a data type from file system service to TableToGraphImporter.

__init__​

def __init__(file_extension: str,
file_system_handler: FileSystemHandler) -> None

Arguments:

  • file_extension - File format to be read.
  • file_system_handler - Object for handling of the file system service.

load_data​

@abstractmethod
def load_data(collection_name: str, is_cross_table: bool = False) -> None

Override this method in the derived class. Intended to be used for reading data from data format.

Arguments:

  • collection_name - Name of the file to read.
  • is_cross_table - Indicate whether or not the collection contains associative table (default=False).

Raises:

  • NotImplementedError - The method is not implemented in the extended class.

PyArrowFileTypeEnum Objects​

class PyArrowFileTypeEnum(Enum)

Enumerates file types supported by PyArrow

PyArrowDataLoader Objects​

class PyArrowDataLoader(DataLoader)

Loads data using PyArrow.

PyArrow currently supports "parquet", "ipc"/"arrow"/"feather", "csv", and "orc", see pyarrow.dataset.dataset for up-to-date info. ds.dataset in load_data accepts any fsspec subclass, making this DataLoader compatible with fsspec-compatible filesystems.

__init__​

def __init__(file_extension_enum: PyArrowFileTypeEnum,
file_system_handler: FileSystemHandler) -> None

Arguments:

  • file_extension_enum - The file format to be read.
  • file_system_handler - Object for handling of the file system service.

load_data​

def load_data(collection_name: str,
is_cross_table: bool = False,
columns: Optional[List[str]] = None) -> None

Generator for loading data.

Arguments:

  • collection_name - Name of the file to read.
  • is_cross_table - Flag signifying whether it is a cross table.
  • columns - Table columns to read.

TableToGraphImporter Objects​

class TableToGraphImporter(Importer)

Implements translation of table data to graph data, and imports it to Memgraph.

__init__​

def __init__(data_loader: DataLoader,
data_configuration: Dict[str, Any],
memgraph: Optional[Memgraph] = None) -> None

Arguments:

  • data_loader - Object for loading data.
  • data_configuration - Configuration for the translations.
  • memgraph - Connection to Memgraph (Optional).

translate​

def translate(drop_database_on_start: bool = True) -> None

Performs the translations.

Arguments:

  • drop_database_on_start - Indicate whether or not the database should be dropped prior to the start of the translations.

PyArrowImporter Objects​

class PyArrowImporter(TableToGraphImporter)

TableToGraphImporter wrapper for use with PyArrow for reading data.

__init__​

def __init__(file_system_handler: str,
file_extension_enum: PyArrowFileTypeEnum,
data_configuration: Dict[str, Any],
memgraph: Optional[Memgraph] = None) -> None

Arguments:

  • file_system_handler - File system to read from.
  • file_extension_enum - File format to be read.
  • data_configuration - Configuration for the translations.
  • memgraph - Connection to Memgraph (Optional).

Raises:

  • ValueError - PyArrow doesn't support ORC on Windows.

PyArrowS3Importer Objects​

class PyArrowS3Importer(PyArrowImporter)

PyArrowImporter wrapper for use with the Amazon S3 File System.

__init__​

def __init__(bucket_name: str,
file_extension_enum: PyArrowFileTypeEnum,
data_configuration: Dict[str, Any],
memgraph: Optional[Memgraph] = None,
**kwargs) -> None

Arguments:

  • bucket_name - Name of the bucket in S3 to read from.
  • file_extension_enum - File format to be read.
  • data_configuration - Configuration for the translations.
  • memgraph - Connection to Memgraph (Optional).
  • **kwargs - Specified for S3FileSystem.

PyArrowAzureBlobImporter Objects​

class PyArrowAzureBlobImporter(PyArrowImporter)

PyArrowImporter wrapper for use with the Azure Blob File System.

__init__​

def __init__(container_name: str,
file_extension_enum: PyArrowFileTypeEnum,
data_configuration: Dict[str, Any],
memgraph: Optional[Memgraph] = None,
**kwargs) -> None

Arguments:

  • container_name - Name of the container in Azure Blob to read from.
  • file_extension_enum - File format to be read.
  • data_configuration - Configuration for the translations.
  • memgraph - Connection to Memgraph (Optional).
  • **kwargs - Specified for AzureBlobFileSystem.

PyArrowLocalFileSystemImporter Objects​

class PyArrowLocalFileSystemImporter(PyArrowImporter)

PyArrowImporter wrapper for use with the Local File System.

__init__​

def __init__(path: str,
file_extension_enum: PyArrowFileTypeEnum,
data_configuration: Dict[str, Any],
memgraph: Optional[Memgraph] = None) -> None

Arguments:

  • path - Full path to the directory to read from.
  • file_extension_enum - File format to be read.
  • data_configuration - Configuration for the translations.
  • memgraph - Connection to Memgraph (Optional).

ParquetS3FileSystemImporter Objects​

class ParquetS3FileSystemImporter(PyArrowS3Importer)

PyArrowS3Importer wrapper for use with the S3 file system and the parquet file type.

__init__​

def __init__(bucket_name: str,
data_configuration: Dict[str, Any],
memgraph: Optional[Memgraph] = None,
**kwargs) -> None

Arguments:

  • bucket_name - Name of the bucket in S3 to read from.
  • data_configuration - Configuration for the translations.
  • memgraph - Connection to Memgraph (Optional).
  • **kwargs - Specified for S3FileSystem.

CSVS3FileSystemImporter Objects​

class CSVS3FileSystemImporter(PyArrowS3Importer)

PyArrowS3Importer wrapper for use with the S3 file system and the CSV file type.

__init__​

def __init__(bucket_name: str,
data_configuration: Dict[str, Any],
memgraph: Optional[Memgraph] = None,
**kwargs) -> None

Arguments:

  • bucket_name - Name of the bucket in S3 to read from.
  • data_configuration - Configuration for the translations.
  • memgraph - Connection to Memgraph (Optional).
  • **kwargs - Specified for S3FileSystem.

ORCS3FileSystemImporter Objects​

class ORCS3FileSystemImporter(PyArrowS3Importer)

PyArrowS3Importer wrapper for use with the S3 file system and the ORC file type.

__init__​

def __init__(bucket_name: str,
data_configuration: Dict[str, Any],
memgraph: Optional[Memgraph] = None,
**kwargs) -> None

Arguments:

  • bucket_name - Name of the bucket in S3 to read from.
  • data_configuration - Configuration for the translations.
  • memgraph - Connection to Memgraph (Optional).
  • **kwargs - Specified for S3FileSystem.

FeatherS3FileSystemImporter Objects​

class FeatherS3FileSystemImporter(PyArrowS3Importer)

PyArrowS3Importer wrapper for use with the S3 file system and the feather file type.

__init__​

def __init__(bucket_name: str,
data_configuration: Dict[str, Any],
memgraph: Optional[Memgraph] = None,
**kwargs) -> None

Arguments:

  • bucket_name - Name of the bucket in S3 to read from.
  • data_configuration - Configuration for the translations.
  • memgraph - Connection to Memgraph (Optional).
  • **kwargs - Specified for S3FileSystem.

ParquetAzureBlobFileSystemImporter Objects​

class ParquetAzureBlobFileSystemImporter(PyArrowAzureBlobImporter)

PyArrowAzureBlobImporter wrapper for use with the Azure Blob file system and the parquet file type.

__init__​

def __init__(container_name: str,
data_configuration: Dict[str, Any],
memgraph: Optional[Memgraph] = None,
**kwargs) -> None

Arguments:

  • container_name - Name of the container in Azure Blob storage to read from.
  • data_configuration - Configuration for the translations.
  • memgraph - Connection to Memgraph (Optional).
  • **kwargs - Specified for AzureBlobFileSystem.

CSVAzureBlobFileSystemImporter Objects​

class CSVAzureBlobFileSystemImporter(PyArrowAzureBlobImporter)

PyArrowAzureBlobImporter wrapper for use with the Azure Blob file system and the CSV file type.

__init__​

def __init__(container_name: str,
data_configuration: Dict[str, Any],
memgraph: Optional[Memgraph] = None,
**kwargs) -> None

Arguments:

  • container_name - Name of the container in Azure Blob storage to read from.
  • data_configuration - Configuration for the translations.
  • memgraph - Connection to Memgraph (Optional).
  • **kwargs - Specified for AzureBlobFileSystem.

ORCAzureBlobFileSystemImporter Objects​

class ORCAzureBlobFileSystemImporter(PyArrowAzureBlobImporter)

PyArrowAzureBlobImporter wrapper for use with the Azure Blob file system and the CSV file type.

__init__​

def __init__(container_name,
data_configuration: Dict[str, Any],
memgraph: Optional[Memgraph] = None,
**kwargs) -> None

Arguments:

  • container_name - Name of the container in Blob storage to read from.
  • data_configuration - Configuration for the translations.
  • memgraph - Connection to Memgraph (Optional).
  • **kwargs - Specified for AzureBlobFileSystem.

FeatherAzureBlobFileSystemImporter Objects​

class FeatherAzureBlobFileSystemImporter(PyArrowAzureBlobImporter)

PyArrowAzureBlobImporter wrapper for use with the Azure Blob file system and the Feather file type.

__init__​

def __init__(container_name,
data_configuration: Dict[str, Any],
memgraph: Optional[Memgraph] = None,
**kwargs) -> None

Arguments:

  • container_name - Name of the container in Blob storage to read from.
  • data_configuration - Configuration for the translations.
  • memgraph - Connection to Memgraph (Optional).
  • **kwargs - Specified for AzureBlobFileSystem.

ParquetLocalFileSystemImporter Objects​

class ParquetLocalFileSystemImporter(PyArrowLocalFileSystemImporter)

PyArrowLocalFileSystemImporter wrapper for use with the local file system and the parquet file type.

__init__​

def __init__(path: str,
data_configuration: Dict[str, Any],
memgraph: Optional[Memgraph] = None) -> None

Arguments:

  • path - Full path to directory.
  • data_configuration - Configuration for the translations.
  • memgraph - Connection to Memgraph (Optional).
  • **kwargs - Specified for LocalFileSystem.

CSVLocalFileSystemImporter Objects​

class CSVLocalFileSystemImporter(PyArrowLocalFileSystemImporter)

PyArrowLocalFileSystemImporter wrapper for use with the local file system and the CSV file type.

__init__​

def __init__(path: str,
data_configuration: Dict[str, Any],
memgraph: Optional[Memgraph] = None) -> None

Arguments:

  • path - Full path to directory.
  • data_configuration - Configuration for the translations.
  • memgraph - Connection to Memgraph (Optional).
  • **kwargs - Specified for LocalFileSystem.

ORCLocalFileSystemImporter Objects​

class ORCLocalFileSystemImporter(PyArrowLocalFileSystemImporter)

PyArrowLocalFileSystemImporter wrapper for use with the local file system and the ORC file type.

__init__​

def __init__(path: str,
data_configuration: Dict[str, Any],
memgraph: Optional[Memgraph] = None) -> None

Arguments:

  • path - Full path to directory.
  • data_configuration - Configuration for the translations.
  • memgraph - Connection to Memgraph (Optional).
  • **kwargs - Specified for LocalFileSystem.

FeatherLocalFileSystemImporter Objects​

class FeatherLocalFileSystemImporter(PyArrowLocalFileSystemImporter)

PyArrowLocalFileSystemImporter wrapper for use with the local file system and the Feather/IPC/Arrow file type.

__init__​

def __init__(path: str,
data_configuration: Dict[str, Any],
memgraph: Optional[Memgraph] = None) -> None

Arguments:

  • path - Full path to directory.
  • data_configuration - Configuration for the translations.
  • memgraph - Connection to Memgraph (Optional).
  • **kwargs - Specified for LocalFileSystem.