Skip to content

Ingestion

datapizza.pipeline.pipeline.IngestionPipeline

A pipeline for ingesting data into a vector store.

__init__

__init__(
    modules=None, vector_store=None, collection_name=None
)

Initialize the ingestion pipeline.

Parameters:

Name Type Description Default
modules list[PipelineComponent]

List of pipeline components. Defaults to None.

None
vector_store Vectorstore

Vector store to store the ingested data. Defaults to None.

None
collection_name str

Name of the vector store collection to store the ingested data. Defaults to None.

None

a_run async

a_run(file_path, metadata=None)

Run the ingestion pipeline asynchronously.

Parameters:

Name Type Description Default
file_path str | list[str]

The file path or list of file paths to ingest.

required
metadata dict

Metadata to add to the ingested chunks. Defaults to None.

None

Returns:

Type Description
list[Chunk] | None

if vector_store is set does not return anything, otherwise returns the last result of the pipeline.

from_yaml

from_yaml(config_path)

Load the ingestion pipeline from a YAML configuration file.

Parameters:

Name Type Description Default
config_path str

Path to the YAML configuration file.

required

Returns:

Name Type Description
IngestionPipeline IngestionPipeline

The ingestion pipeline instance.

run

run(file_path, metadata=None)

Run the ingestion pipeline.

Parameters:

Name Type Description Default
file_path str | list[str]

The file path or list of file paths to ingest.

required
metadata dict

Metadata to add to the ingested chunks. Defaults to None.

None

Returns:

Type Description
list[Chunk] | None

if vector_store is set does not return anything, otherwise returns the last result of the pipeline.