Ingestion
datapizza.pipeline.pipeline.IngestionPipeline
A pipeline for ingesting data into a vector store.
__init__
Initialize the ingestion pipeline.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
modules
|
list[PipelineComponent]
|
List of pipeline components. Defaults to None. |
None
|
vector_store
|
Vectorstore
|
Vector store to store the ingested data. Defaults to None. |
None
|
collection_name
|
str
|
Name of the vector store collection to store the ingested data. Defaults to None. |
None
|
a_run
async
Run the ingestion pipeline asynchronously.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
file_path
|
str | list[str]
|
The file path or list of file paths to ingest. |
required |
metadata
|
dict
|
Metadata to add to the ingested chunks. Defaults to None. |
None
|
Returns:
Type | Description |
---|---|
list[Chunk] | None
|
if vector_store is set does not return anything, otherwise returns the last result of the pipeline. |
from_yaml
Load the ingestion pipeline from a YAML configuration file.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
config_path
|
str
|
Path to the YAML configuration file. |
required |
Returns:
Name | Type | Description |
---|---|---|
IngestionPipeline |
IngestionPipeline
|
The ingestion pipeline instance. |
run
Run the ingestion pipeline.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
file_path
|
str | list[str]
|
The file path or list of file paths to ingest. |
required |
metadata
|
dict
|
Metadata to add to the ingested chunks. Defaults to None. |
None
|
Returns:
Type | Description |
---|---|
list[Chunk] | None
|
if vector_store is set does not return anything, otherwise returns the last result of the pipeline. |