gqlalchemy.transformations.importing.loaders
The features below arenβt included in the default GQLAlchemy installation. To use them, make sure to install GQLAlchemy with the relevant optional dependencies.
ForeignKeyMapping Objectsβ
@dataclass(frozen=True)
class ForeignKeyMapping()
Class that contains the full description of a single foreign key in a table.
Attributes:
column_name
- Column name that holds the foreign key.reference_table
- Name of a table from which the foreign key is taken.reference_key
- Column name in the referenced table from which the foreign key is taken.
OneToManyMapping Objectsβ
@dataclass(frozen=True)
class OneToManyMapping()
Class that holds the full description of a single one to many mapping in a table.
Attributes:
foreign_key
- Foreign key used for mapping.label
- Label which will be applied to the relationship created from this object.from_entity
- Direction of the relationship created from the mapping object.parameters
- Parameters that will be added to the relationship created from this object (Optional).
ManyToManyMapping Objectsβ
@dataclass(frozen=True)
class ManyToManyMapping()
Class that holds the full description of a single many to many mapping in a table. Many to many mapping is intended to be used in case of associative tables.
Attributes:
foreign_key_from
- Describes the source of the relationship.foreign_key_to
- Describes the destination of the relationship.label
- Label to be applied to the newly created relationship.parameters
- Parameters that will be added to the relationship created from this object (Optional).
TableMapping Objectsβ
@dataclass
class TableMapping()
Class that holds the full description of all of the mappings for a single table.
Attributes:
table_name
- Name of the table.mapping
- All of the mappings in the table (Optional).indices
- List of the indices to be created for this table (Optional).
NameMappings Objectsβ
@dataclass(frozen=True)
class NameMappings()
Class that contains new label name and all of the column name mappings for a single table.
Attributes:
label
- New label (Optional).column_names_mapping
- Dictionary containing key-value pairs in form ("column name", "property name") (Optional).
NameMapper Objectsβ
class NameMapper()
Class that holds all name mappings for all of the collections.
get_labelβ
def get_label(collection_name: str) -> str
Returns label for given collection.
Arguments:
collection_name
- Original collection name.
get_property_nameβ
def get_property_name(collection_name: str, column_name: str) -> str
Returns property name for column from collection.
Arguments:
collection_name
- Original collection name.column_name
- Original column name.
FileSystemHandler Objectsβ
class FileSystemHandler(ABC)
Abstract class for defining FileSystemHandler.
Inherit this class, define a custom data source and initialize the connection.
get_pathβ
@abstractmethod
def get_path(collection_name: str) -> str
Returns complete path in specific file system. Used to read the file system for a specific file.
S3FileSystemHandler Objectsβ
class S3FileSystemHandler(FileSystemHandler)
Handles connection to Amazon S3 service via PyArrow.
__init__β
def __init__(bucket_name: str, **kwargs)
Initializes connection and data bucket.
Arguments:
bucket_name
- Name of the bucket on S3 from which to read the dataKwargs:
access_key
- S3 access key.secret_key
- S3 secret key.region
- S3 region.session_token
- S3 session token (Optional).
Raises:
KeyError
- kwargs doesn't contain necessary fields.
get_pathβ
def get_path(collection_name: str) -> str
Get file path in file system.
Arguments:
collection_name
- Name of the file to read.
AzureBlobFileSystemHandler Objectsβ
class AzureBlobFileSystemHandler(FileSystemHandler)
Handles connection to Azure Blob service via adlfs package.
__init__β
def __init__(container_name: str, **kwargs) -> None
Initializes connection and data container.
Arguments:
container_name
- Name of the Blob container storing data.Kwargs:
account_name
- Account name from Azure Blob.account_key
- Account key for Azure Blob (Optional - if using sas_token).sas_token
- Shared access signature token for authentification (Optional).
Raises:
KeyError
- kwargs doesn't contain necessary fields.
get_pathβ
def get_path(collection_name: str) -> str
Get file path in file system.
Arguments:
collection_name
- Name of the file to read.
LocalFileSystemHandler Objectsβ
class LocalFileSystemHandler(FileSystemHandler)
Handles a local filesystem.
__init__β
def __init__(path: str) -> None
Initializes an fsspec local file system and sets path to data.
Arguments:
path
- path to the local storage location.
get_pathβ
def get_path(collection_name: str) -> str
Get file path in the local file system.
Arguments:
collection_name
- Name of the file to read.
DataLoader Objectsβ
class DataLoader(ABC)
Implements loading of a data type from file system service to TableToGraphImporter.
__init__β
def __init__(file_extension: str,
file_system_handler: FileSystemHandler) -> None
Arguments:
file_extension
- File format to be read.file_system_handler
- Object for handling of the file system service.
load_dataβ
@abstractmethod
def load_data(collection_name: str, is_cross_table: bool = False) -> None
Override this method in the derived class. Intended to be used for reading data from data format.
Arguments:
collection_name
- Name of the file to read.is_cross_table
- Indicate whether or not the collection contains associative table (default=False).
Raises:
NotImplementedError
- The method is not implemented in the extended class.
PyArrowFileTypeEnum Objectsβ
class PyArrowFileTypeEnum(Enum)
Enumerates file types supported by PyArrow
PyArrowDataLoader Objectsβ
class PyArrowDataLoader(DataLoader)
Loads data using PyArrow.
PyArrow currently supports "parquet", "ipc"/"arrow"/"feather", "csv", and "orc", see pyarrow.dataset.dataset for up-to-date info. ds.dataset in load_data accepts any fsspec subclass, making this DataLoader compatible with fsspec-compatible filesystems.
__init__β
def __init__(file_extension_enum: PyArrowFileTypeEnum,
file_system_handler: FileSystemHandler) -> None
Arguments:
file_extension_enum
- The file format to be read.file_system_handler
- Object for handling of the file system service.
load_dataβ
def load_data(collection_name: str,
is_cross_table: bool = False,
columns: Optional[List[str]] = None) -> None
Generator for loading data.
Arguments:
collection_name
- Name of the file to read.is_cross_table
- Flag signifying whether it is a cross table.columns
- Table columns to read.
TableToGraphImporter Objectsβ
class TableToGraphImporter(Importer)
Implements translation of table data to graph data, and imports it to Memgraph.
__init__β
def __init__(data_loader: DataLoader,
data_configuration: Dict[str, Any],
memgraph: Optional[Memgraph] = None) -> None
Arguments:
data_loader
- Object for loading data.data_configuration
- Configuration for the translations.memgraph
- Connection to Memgraph (Optional).
translateβ
def translate(drop_database_on_start: bool = True) -> None
Performs the translations.
Arguments:
drop_database_on_start
- Indicate whether or not the database should be dropped prior to the start of the translations.
PyArrowImporter Objectsβ
class PyArrowImporter(TableToGraphImporter)
TableToGraphImporter wrapper for use with PyArrow for reading data.
__init__β
def __init__(file_system_handler: str,
file_extension_enum: PyArrowFileTypeEnum,
data_configuration: Dict[str, Any],
memgraph: Optional[Memgraph] = None) -> None
Arguments:
file_system_handler
- File system to read from.file_extension_enum
- File format to be read.data_configuration
- Configuration for the translations.memgraph
- Connection to Memgraph (Optional).
Raises:
ValueError
- PyArrow doesn't support ORC on Windows.
PyArrowS3Importer Objectsβ
class PyArrowS3Importer(PyArrowImporter)
PyArrowImporter wrapper for use with the Amazon S3 File System.
__init__β
def __init__(bucket_name: str,
file_extension_enum: PyArrowFileTypeEnum,
data_configuration: Dict[str, Any],
memgraph: Optional[Memgraph] = None,
**kwargs) -> None
Arguments:
bucket_name
- Name of the bucket in S3 to read from.file_extension_enum
- File format to be read.data_configuration
- Configuration for the translations.memgraph
- Connection to Memgraph (Optional).**kwargs
- Specified for S3FileSystem.
PyArrowAzureBlobImporter Objectsβ
class PyArrowAzureBlobImporter(PyArrowImporter)
PyArrowImporter wrapper for use with the Azure Blob File System.
__init__β
def __init__(container_name: str,
file_extension_enum: PyArrowFileTypeEnum,
data_configuration: Dict[str, Any],
memgraph: Optional[Memgraph] = None,
**kwargs) -> None
Arguments:
container_name
- Name of the container in Azure Blob to read from.file_extension_enum
- File format to be read.data_configuration
- Configuration for the translations.memgraph
- Connection to Memgraph (Optional).**kwargs
- Specified for AzureBlobFileSystem.
PyArrowLocalFileSystemImporter Objectsβ
class PyArrowLocalFileSystemImporter(PyArrowImporter)
PyArrowImporter wrapper for use with the Local File System.
__init__β
def __init__(path: str,
file_extension_enum: PyArrowFileTypeEnum,
data_configuration: Dict[str, Any],
memgraph: Optional[Memgraph] = None) -> None
Arguments:
path
- Full path to the directory to read from.file_extension_enum
- File format to be read.data_configuration
- Configuration for the translations.memgraph
- Connection to Memgraph (Optional).
ParquetS3FileSystemImporter Objectsβ
class ParquetS3FileSystemImporter(PyArrowS3Importer)
PyArrowS3Importer wrapper for use with the S3 file system and the parquet file type.
__init__β
def __init__(bucket_name: str,
data_configuration: Dict[str, Any],
memgraph: Optional[Memgraph] = None,
**kwargs) -> None
Arguments:
bucket_name
- Name of the bucket in S3 to read from.data_configuration
- Configuration for the translations.memgraph
- Connection to Memgraph (Optional).**kwargs
- Specified for S3FileSystem.
CSVS3FileSystemImporter Objectsβ
class CSVS3FileSystemImporter(PyArrowS3Importer)
PyArrowS3Importer wrapper for use with the S3 file system and the CSV file type.
__init__β
def __init__(bucket_name: str,
data_configuration: Dict[str, Any],
memgraph: Optional[Memgraph] = None,
**kwargs) -> None
Arguments:
bucket_name
- Name of the bucket in S3 to read from.data_configuration
- Configuration for the translations.memgraph
- Connection to Memgraph (Optional).**kwargs
- Specified for S3FileSystem.
ORCS3FileSystemImporter Objectsβ
class ORCS3FileSystemImporter(PyArrowS3Importer)
PyArrowS3Importer wrapper for use with the S3 file system and the ORC file type.
__init__β
def __init__(bucket_name: str,
data_configuration: Dict[str, Any],
memgraph: Optional[Memgraph] = None,
**kwargs) -> None
Arguments:
bucket_name
- Name of the bucket in S3 to read from.data_configuration
- Configuration for the translations.memgraph
- Connection to Memgraph (Optional).**kwargs
- Specified for S3FileSystem.
FeatherS3FileSystemImporter Objectsβ
class FeatherS3FileSystemImporter(PyArrowS3Importer)
PyArrowS3Importer wrapper for use with the S3 file system and the feather file type.
__init__β
def __init__(bucket_name: str,
data_configuration: Dict[str, Any],
memgraph: Optional[Memgraph] = None,
**kwargs) -> None
Arguments:
bucket_name
- Name of the bucket in S3 to read from.data_configuration
- Configuration for the translations.memgraph
- Connection to Memgraph (Optional).**kwargs
- Specified for S3FileSystem.
ParquetAzureBlobFileSystemImporter Objectsβ
class ParquetAzureBlobFileSystemImporter(PyArrowAzureBlobImporter)
PyArrowAzureBlobImporter wrapper for use with the Azure Blob file system and the parquet file type.
__init__β
def __init__(container_name: str,
data_configuration: Dict[str, Any],
memgraph: Optional[Memgraph] = None,
**kwargs) -> None
Arguments:
container_name
- Name of the container in Azure Blob storage to read from.data_configuration
- Configuration for the translations.memgraph
- Connection to Memgraph (Optional).**kwargs
- Specified for AzureBlobFileSystem.
CSVAzureBlobFileSystemImporter Objectsβ
class CSVAzureBlobFileSystemImporter(PyArrowAzureBlobImporter)
PyArrowAzureBlobImporter wrapper for use with the Azure Blob file system and the CSV file type.
__init__β
def __init__(container_name: str,
data_configuration: Dict[str, Any],
memgraph: Optional[Memgraph] = None,
**kwargs) -> None
Arguments:
container_name
- Name of the container in Azure Blob storage to read from.data_configuration
- Configuration for the translations.memgraph
- Connection to Memgraph (Optional).**kwargs
- Specified for AzureBlobFileSystem.
ORCAzureBlobFileSystemImporter Objectsβ
class ORCAzureBlobFileSystemImporter(PyArrowAzureBlobImporter)
PyArrowAzureBlobImporter wrapper for use with the Azure Blob file system and the CSV file type.
__init__β
def __init__(container_name,
data_configuration: Dict[str, Any],
memgraph: Optional[Memgraph] = None,
**kwargs) -> None
Arguments:
container_name
- Name of the container in Blob storage to read from.data_configuration
- Configuration for the translations.memgraph
- Connection to Memgraph (Optional).**kwargs
- Specified for AzureBlobFileSystem.
FeatherAzureBlobFileSystemImporter Objectsβ
class FeatherAzureBlobFileSystemImporter(PyArrowAzureBlobImporter)
PyArrowAzureBlobImporter wrapper for use with the Azure Blob file system and the Feather file type.
__init__β
def __init__(container_name,
data_configuration: Dict[str, Any],
memgraph: Optional[Memgraph] = None,
**kwargs) -> None
Arguments:
container_name
- Name of the container in Blob storage to read from.data_configuration
- Configuration for the translations.memgraph
- Connection to Memgraph (Optional).**kwargs
- Specified for AzureBlobFileSystem.
ParquetLocalFileSystemImporter Objectsβ
class ParquetLocalFileSystemImporter(PyArrowLocalFileSystemImporter)
PyArrowLocalFileSystemImporter wrapper for use with the local file system and the parquet file type.
__init__β
def __init__(path: str,
data_configuration: Dict[str, Any],
memgraph: Optional[Memgraph] = None) -> None
Arguments:
path
- Full path to directory.data_configuration
- Configuration for the translations.memgraph
- Connection to Memgraph (Optional).**kwargs
- Specified for LocalFileSystem.
CSVLocalFileSystemImporter Objectsβ
class CSVLocalFileSystemImporter(PyArrowLocalFileSystemImporter)
PyArrowLocalFileSystemImporter wrapper for use with the local file system and the CSV file type.
__init__β
def __init__(path: str,
data_configuration: Dict[str, Any],
memgraph: Optional[Memgraph] = None) -> None
Arguments:
path
- Full path to directory.data_configuration
- Configuration for the translations.memgraph
- Connection to Memgraph (Optional).**kwargs
- Specified for LocalFileSystem.
ORCLocalFileSystemImporter Objectsβ
class ORCLocalFileSystemImporter(PyArrowLocalFileSystemImporter)
PyArrowLocalFileSystemImporter wrapper for use with the local file system and the ORC file type.
__init__β
def __init__(path: str,
data_configuration: Dict[str, Any],
memgraph: Optional[Memgraph] = None) -> None
Arguments:
path
- Full path to directory.data_configuration
- Configuration for the translations.memgraph
- Connection to Memgraph (Optional).**kwargs
- Specified for LocalFileSystem.
FeatherLocalFileSystemImporter Objectsβ
class FeatherLocalFileSystemImporter(PyArrowLocalFileSystemImporter)
PyArrowLocalFileSystemImporter wrapper for use with the local file system and the Feather/IPC/Arrow file type.
__init__β
def __init__(path: str,
data_configuration: Dict[str, Any],
memgraph: Optional[Memgraph] = None) -> None
Arguments:
path
- Full path to directory.data_configuration
- Configuration for the translations.memgraph
- Connection to Memgraph (Optional).**kwargs
- Specified for LocalFileSystem.