gqlalchemy.transformations.importing.loaders
The features below aren’t included in the default GQLAlchemy installation. To use them, make sure to install GQLAlchemy with the relevant optional dependencies.
ForeignKeyMapping Objects
@dataclass(frozen=True)
class ForeignKeyMapping()
Class that contains the full description of a single foreign key in a table.
Attributes:
column_name
- Column name that holds the foreign key.reference_table
- Name of a table from which the foreign key is taken.reference_key
- Column name in the referenced table from which the foreign key is taken.
OneToManyMapping Objects
@dataclass(frozen=True)
class OneToManyMapping()
Class that holds the full description of a single one to many mapping in a table.
Attributes:
foreign_key
- Foreign key used for mapping.label
- Label which will be applied to the relationship created from this object.from_entity
- Direction of the relationship created from the mapping object.parameters
- Parameters that will be added to the relationship created from this object (Optional).
ManyToManyMapping Objects
@dataclass(frozen=True)
class ManyToManyMapping()
Class that holds the full description of a single many to many mapping in a table. Many to many mapping is intended to be used in case of associative tables.
Attributes:
foreign_key_from
- Describes the source of the relationship.foreign_key_to
- Describes the destination of the relationship.label
- Label to be applied to the newly created relationship.parameters
- Parameters that will be added to the relationship created from this object (Optional).
TableMapping Objects
@dataclass
class TableMapping()
Class that holds the full description of all of the mappings for a single table.
Attributes:
table_name
- Name of the table.mapping
- All of the mappings in the table (Optional).indices
- List of the indices to be created for this table (Optional).
NameMappings Objects
@dataclass(frozen=True)
class NameMappings()
Class that contains new label name and all of the column name mappings for a single table.
Attributes:
label
- New label (Optional).column_names_mapping
- Dictionary containing key-value pairs in form ("column name", "property name") (Optional).
NameMapper Objects
class NameMapper()
Class that holds all name mappings for all of the collections.
get_label
def get_label(collection_name: str) -> str
Returns label for given collection.
Arguments:
collection_name
- Original collection name.
get_property_name
def get_property_name(collection_name: str, column_name: str) -> str
Returns property name for column from collection.
Arguments:
collection_name
- Original collection name.column_name
- Original column name.
FileSystemHandler Objects
class FileSystemHandler(ABC)
Abstract class for defining FileSystemHandler.
Inherit this class, define a custom data source and initialize the connection.
get_path
@abstractmethod
def get_path(collection_name: str) -> str
Returns complete path in specific file system. Used to read the file system for a specific file.
S3FileSystemHandler Objects
class S3FileSystemHandler(FileSystemHandler)
Handles connection to Amazon S3 service via PyArrow.
__init__
def __init__(bucket_name: str, **kwargs)
Initializes connection and data bucket.
Arguments:
bucket_name
- Name of the bucket on S3 from which to read the dataKwargs:
access_key
- S3 access key.secret_key
- S3 secret key.region
- S3 region.session_token
- S3 session token (Optional).
Raises:
KeyError
- kwargs doesn't contain necessary fields.
get_path
def get_path(collection_name: str) -> str
Get file path in file system.
Arguments:
collection_name
- Name of the file to read.
AzureBlobFileSystemHandler Objects
class AzureBlobFileSystemHandler(FileSystemHandler)
Handles connection to Azure Blob service via adlfs package.
__init__
def __init__(container_name: str, **kwargs) -> None
Initializes connection and data container.
Arguments:
container_name
- Name of the Blob container storing data.Kwargs:
account_name
- Account name from Azure Blob.account_key
- Account key for Azure Blob (Optional - if using sas_token).sas_token
- Shared access signature token for authentification (Optional).
Raises:
KeyError
- kwargs doesn't contain necessary fields.
get_path
def get_path(collection_name: str) -> str
Get file path in file system.
Arguments:
collection_name
- Name of the file to read.
LocalFileSystemHandler Objects
class LocalFileSystemHandler(FileSystemHandler)
Handles a local filesystem.
__init__
def __init__(path: str) -> None
Initializes an fsspec local file system and sets path to data.
Arguments:
path
- path to the local storage location.
get_path
def get_path(collection_name: str) -> str
Get file path in the local file system.
Arguments:
collection_name
- Name of the file to read.
DataLoader Objects
class DataLoader(ABC)
Implements loading of a data type from file system service to TableToGraphImporter.
__init__
def __init__(file_extension: str,
file_system_handler: FileSystemHandler) -> None
Arguments:
file_extension
- File format to be read.file_system_handler
- Object for handling of the file system service.
load_data
@abstractmethod
def load_data(collection_name: str, is_cross_table: bool = False) -> None
Override this method in the derived class. Intended to be used for reading data from data format.
Arguments:
collection_name
- Name of the file to read.is_cross_table
- Indicate whether or not the collection contains associative table (default=False).
Raises:
NotImplementedError
- The method is not implemented in the extended class.
PyArrowFileTypeEnum Objects
class PyArrowFileTypeEnum(Enum)
Enumerates file types supported by PyArrow
PyArrowDataLoader Objects
class PyArrowDataLoader(DataLoader)
Loads data using PyArrow.
PyArrow currently supports "parquet", "ipc"/"arrow"/"feather", "csv", and "orc", see pyarrow.dataset.dataset for up-to-date info. ds.dataset in load_data accepts any fsspec subclass, making this DataLoader compatible with fsspec-compatible filesystems.
__init__
def __init__(file_extension_enum: PyArrowFileTypeEnum,
file_system_handler: FileSystemHandler) -> None
Arguments:
file_extension_enum
- The file format to be read.file_system_handler
- Object for handling of the file system service.
load_data
def load_data(collection_name: str,
is_cross_table: bool = False,
columns: Optional[List[str]] = None) -> None
Generator for loading data.
Arguments:
collection_name
- Name of the file to read.is_cross_table
- Flag signifying whether it is a cross table.columns
- Table columns to read.
TableToGraphImporter Objects
class TableToGraphImporter(Importer)
Implements translation of table data to graph data, and imports it to Memgraph.
__init__
def __init__(data_loader: DataLoader,
data_configuration: Dict[str, Any],
memgraph: Optional[Memgraph] = None) -> None
Arguments:
data_loader
- Object for loading data.data_configuration
- Configuration for the translations.memgraph
- Connection to Memgraph (Optional).
translate
def translate(drop_database_on_start: bool = True) -> None
Performs the translations.
Arguments:
drop_database_on_start
- Indicate whether or not the database should be dropped prior to the start of the translations.
PyArrowImporter Objects
class PyArrowImporter(TableToGraphImporter)
TableToGraphImporter wrapper for use with PyArrow for reading data.
__init__
def __init__(file_system_handler: str,
file_extension_enum: PyArrowFileTypeEnum,
data_configuration: Dict[str, Any],
memgraph: Optional[Memgraph] = None) -> None
Arguments:
file_system_handler
- File system to read from.file_extension_enum
- File format to be read.data_configuration
- Configuration for the translations.memgraph
- Connection to Memgraph (Optional).
Raises:
ValueError
- PyArrow doesn't support ORC on Windows.
PyArrowS3Importer Objects
class PyArrowS3Importer(PyArrowImporter)
PyArrowImporter wrapper for use with the Amazon S3 File System.
__init__
def __init__(bucket_name: str,
file_extension_enum: PyArrowFileTypeEnum,
data_configuration: Dict[str, Any],
memgraph: Optional[Memgraph] = None,
**kwargs) -> None
Arguments:
bucket_name
- Name of the bucket in S3 to read from.file_extension_enum
- File format to be read.data_configuration
- Configuration for the translations.memgraph
- Connection to Memgraph (Optional).**kwargs
- Specified for S3FileSystem.
PyArrowAzureBlobImporter Objects
class PyArrowAzureBlobImporter(PyArrowImporter)
PyArrowImporter wrapper for use with the Azure Blob File System.
__init__
def __init__(container_name: str,
file_extension_enum: PyArrowFileTypeEnum,
data_configuration: Dict[str, Any],
memgraph: Optional[Memgraph] = None,
**kwargs) -> None
Arguments:
container_name
- Name of the container in Azure Blob to read from.file_extension_enum
- File format to be read.data_configuration
- Configuration for the translations.memgraph
- Connection to Memgraph (Optional).**kwargs
- Specified for AzureBlobFileSystem.
PyArrowLocalFileSystemImporter Objects
class PyArrowLocalFileSystemImporter(PyArrowImporter)
PyArrowImporter wrapper for use with the Local File System.
__init__
def __init__(path: str,
file_extension_enum: PyArrowFileTypeEnum,
data_configuration: Dict[str, Any],
memgraph: Optional[Memgraph] = None) -> None
Arguments:
path
- Full path to the directory to read from.file_extension_enum
- File format to be read.data_configuration
- Configuration for the translations.memgraph
- Connection to Memgraph (Optional).
ParquetS3FileSystemImporter Objects
class ParquetS3FileSystemImporter(PyArrowS3Importer)
PyArrowS3Importer wrapper for use with the S3 file system and the parquet file type.
__init__
def __init__(bucket_name: str,
data_configuration: Dict[str, Any],
memgraph: Optional[Memgraph] = None,
**kwargs) -> None
Arguments:
bucket_name
- Name of the bucket in S3 to read from.data_configuration
- Configuration for the translations.memgraph
- Connection to Memgraph (Optional).**kwargs
- Specified for S3FileSystem.
CSVS3FileSystemImporter Objects
class CSVS3FileSystemImporter(PyArrowS3Importer)
PyArrowS3Importer wrapper for use with the S3 file system and the CSV file type.
__init__
def __init__(bucket_name: str,
data_configuration: Dict[str, Any],
memgraph: Optional[Memgraph] = None,
**kwargs) -> None
Arguments:
bucket_name
- Name of the bucket in S3 to read from.data_configuration
- Configuration for the translations.memgraph
- Connection to Memgraph (Optional).**kwargs
- Specified for S3FileSystem.
ORCS3FileSystemImporter Objects
class ORCS3FileSystemImporter(PyArrowS3Importer)
PyArrowS3Importer wrapper for use with the S3 file system and the ORC file type.
__init__
def __init__(bucket_name: str,
data_configuration: Dict[str, Any],
memgraph: Optional[Memgraph] = None,
**kwargs) -> None
Arguments:
bucket_name
- Name of the bucket in S3 to read from.data_configuration
- Configuration for the translations.memgraph
- Connection to Memgraph (Optional).**kwargs
- Specified for S3FileSystem.
FeatherS3FileSystemImporter Objects
class FeatherS3FileSystemImporter(PyArrowS3Importer)
PyArrowS3Importer wrapper for use with the S3 file system and the feather file type.
__init__
def __init__(bucket_name: str,
data_configuration: Dict[str, Any],
memgraph: Optional[Memgraph] = None,
**kwargs) -> None
Arguments:
bucket_name
- Name of the bucket in S3 to read from.data_configuration
- Configuration for the translations.memgraph
- Connection to Memgraph (Optional).**kwargs
- Specified for S3FileSystem.
ParquetAzureBlobFileSystemImporter Objects
class ParquetAzureBlobFileSystemImporter(PyArrowAzureBlobImporter)
PyArrowAzureBlobImporter wrapper for use with the Azure Blob file system and the parquet file type.
__init__
def __init__(container_name: str,
data_configuration: Dict[str, Any],
memgraph: Optional[Memgraph] = None,
**kwargs) -> None
Arguments:
container_name
- Name of the container in Azure Blob storage to read from.data_configuration
- Configuration for the translations.memgraph
- Connection to Memgraph (Optional).**kwargs
- Specified for AzureBlobFileSystem.
CSVAzureBlobFileSystemImporter Objects
class CSVAzureBlobFileSystemImporter(PyArrowAzureBlobImporter)
PyArrowAzureBlobImporter wrapper for use with the Azure Blob file system and the CSV file type.
__init__
def __init__(container_name: str,
data_configuration: Dict[str, Any],
memgraph: Optional[Memgraph] = None,
**kwargs) -> None
Arguments:
container_name
- Name of the container in Azure Blob storage to read from.data_configuration
- Configuration for the translations.memgraph
- Connection to Memgraph (Optional).**kwargs
- Specified for AzureBlobFileSystem.
ORCAzureBlobFileSystemImporter Objects
class ORCAzureBlobFileSystemImporter(PyArrowAzureBlobImporter)
PyArrowAzureBlobImporter wrapper for use with the Azure Blob file system and the CSV file type.
__init__
def __init__(container_name,
data_configuration: Dict[str, Any],
memgraph: Optional[Memgraph] = None,
**kwargs) -> None
Arguments:
container_name
- Name of the container in Blob storage to read from.data_configuration
- Configuration for the translations.memgraph
- Connection to Memgraph (Optional).**kwargs
- Specified for AzureBlobFileSystem.
FeatherAzureBlobFileSystemImporter Objects
class FeatherAzureBlobFileSystemImporter(PyArrowAzureBlobImporter)
PyArrowAzureBlobImporter wrapper for use with the Azure Blob file system and the Feather file type.
__init__
def __init__(container_name,
data_configuration: Dict[str, Any],
memgraph: Optional[Memgraph] = None,
**kwargs) -> None
Arguments:
container_name
- Name of the container in Blob storage to read from.data_configuration
- Configuration for the translations.memgraph
- Connection to Memgraph (Optional).**kwargs
- Specified for AzureBlobFileSystem.
ParquetLocalFileSystemImporter Objects
class ParquetLocalFileSystemImporter(PyArrowLocalFileSystemImporter)
PyArrowLocalFileSystemImporter wrapper for use with the local file system and the parquet file type.
__init__
def __init__(path: str,
data_configuration: Dict[str, Any],
memgraph: Optional[Memgraph] = None) -> None
Arguments:
path
- Full path to directory.data_configuration
- Configuration for the translations.memgraph
- Connection to Memgraph (Optional).**kwargs
- Specified for LocalFileSystem.
CSVLocalFileSystemImporter Objects
class CSVLocalFileSystemImporter(PyArrowLocalFileSystemImporter)
PyArrowLocalFileSystemImporter wrapper for use with the local file system and the CSV file type.
__init__
def __init__(path: str,
data_configuration: Dict[str, Any],
memgraph: Optional[Memgraph] = None) -> None
Arguments:
path
- Full path to directory.data_configuration
- Configuration for the translations.memgraph
- Connection to Memgraph (Optional).**kwargs
- Specified for LocalFileSystem.
ORCLocalFileSystemImporter Objects
class ORCLocalFileSystemImporter(PyArrowLocalFileSystemImporter)
PyArrowLocalFileSystemImporter wrapper for use with the local file system and the ORC file type.
__init__
def __init__(path: str,
data_configuration: Dict[str, Any],
memgraph: Optional[Memgraph] = None) -> None
Arguments:
path
- Full path to directory.data_configuration
- Configuration for the translations.memgraph
- Connection to Memgraph (Optional).**kwargs
- Specified for LocalFileSystem.
FeatherLocalFileSystemImporter Objects
class FeatherLocalFileSystemImporter(PyArrowLocalFileSystemImporter)
PyArrowLocalFileSystemImporter wrapper for use with the local file system and the Feather/IPC/Arrow file type.
__init__
def __init__(path: str,
data_configuration: Dict[str, Any],
memgraph: Optional[Memgraph] = None) -> None
Arguments:
path
- Full path to directory.data_configuration
- Configuration for the translations.memgraph
- Connection to Memgraph (Optional).**kwargs
- Specified for LocalFileSystem.