QdrantVectorstore
datapizza.vectorstores.qdrant.QdrantVectorstore
Bases: Vectorstore
datapizza-ai implementation of a Qdrant vectorstore.
__init__
Initialize the QdrantVectorstore.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
host
|
str
|
The host to use for the Qdrant client. Defaults to None. |
None
|
port
|
int
|
The port to use for the Qdrant client. Defaults to 6333. |
6333
|
api_key
|
str
|
The API key to use for the Qdrant client. Defaults to None. |
None
|
**kwargs
|
Additional keyword arguments to pass to the Qdrant client. |
{}
|
a_search
async
Search for chunks in a collection by their query vector.
add
Add a single chunk or list of chunks to the vectorstore. Args: chunk (Chunk | list[Chunk]): The chunk or list of chunks to add. collection_name (str, optional): The name of the collection to add the chunks to. Defaults to None.
create_collection
Create a new collection in Qdrant if it doesn't exist with the specified vector configurations
Parameters:
Name | Type | Description | Default |
---|---|---|---|
collection_name
|
str
|
Name of the collection to create |
required |
vector_config
|
list[VectorConfig]
|
List of vector configurations specifying dimensions and distance metrics |
required |
**kwargs
|
Additional arguments to pass to Qdrant's create_collection |
{}
|
dump_collection
Dumps all points from a collection in a chunk-wise manner.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
collection_name
|
str
|
Name of the collection to dump. |
required |
page_size
|
int
|
Number of points to retrieve per batch. |
100
|
with_vectors
|
bool
|
Whether to include vectors in the dumped chunks. |
False
|
Yields:
Name | Type | Description |
---|---|---|
Chunk |
Chunk
|
A chunk object from the collection. |
remove
Remove chunks from a collection by their IDs. Args: collection_name (str): The name of the collection to remove the chunks from. ids (list[str]): The IDs of the chunks to remove. **kwargs: Additional keyword arguments to pass to the Qdrant client.
retrieve
Retrieve chunks from a collection by their IDs. Args: collection_name (str): The name of the collection to retrieve the chunks from. ids (list[str]): The IDs of the chunks to retrieve. **kwargs: Additional keyword arguments to pass to the Qdrant client. Returns: list[Chunk]: The list of chunks retrieved from the collection.
search
Search for chunks in a collection by their query vector.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
collection_name
|
str
|
The name of the collection to search in. |
required |
query_vector
|
list[float]
|
The query vector to search for. |
required |
k
|
int
|
The number of results to return. Defaults to 10. |
10
|
vector_name
|
str
|
The name of the vector to search for. Defaults to None. |
None
|
**kwargs
|
Additional keyword arguments to pass to the Qdrant client. |
{}
|
Returns:
Type | Description |
---|---|
list[Chunk]
|
list[Chunk]: The list of chunks found in the collection. |
Usage
from datapizza.vectorstores.qdrant import QdrantVectorstore
# Connect to Qdrant server
vectorstore = QdrantVectorstore(
host="localhost",
port=6333,
api_key="your-api-key" # Optional
)
# Or use in-memory/file storage
vectorstore = QdrantVectorstore(
location=":memory:" # Or path to file
)
Features
- Connect to Qdrant server or use local storage
- Support for both dense and sparse embeddings
- Named vector configurations for multi-vector collections
- Batch operations for efficient processing
- Collection management (create, delete, list)
- Chunk-based operations with metadata preservation
- Async support for all operations
- Point-level operations (add, update, remove, retrieve)
Examples
Basic Setup and Collection Creation
from datapizza.core.vectorstore import Distance, VectorConfig
from datapizza.type import EmbeddingFormat
from datapizza.vectorstores.qdrant import QdrantVectorstore
vectorstore = QdrantVectorstore(location=":memory:")
# Create collection with vector configuration
vector_config = [
VectorConfig(
name="text_embeddings",
dimensions=3,
format=EmbeddingFormat.DENSE,
distance=Distance.COSINE
)
]
vectorstore.create_collection(
collection_name="documents",
vector_config=vector_config
)
# Add nodes and search
import uuid
from datapizza.type import Chunk, DenseEmbedding
from datapizza.vectorstores.qdrant import QdrantVectorstore
# Create chunks with embeddings
chunks = [
Chunk(
id=str(uuid.uuid4()),
text="First document content",
metadata={"source": "doc1.txt"},
embeddings=[DenseEmbedding(name="text_embeddings", vector=[0.1, 0.2, 0.3])]
),
Chunk(
id=str(uuid.uuid4()),
text="Second document content",
metadata={"source": "doc2.txt"},
embeddings=[DenseEmbedding(name="text_embeddings", vector=[0.4, 0.5, 0.6])]
)
]
# Add chunks to collection
vectorstore.add(chunks, collection_name="documents")
# Search for similar chunks
query_vector = [0.1, 0.2, 0.3]
results = vectorstore.search(
collection_name="documents",
query_vector=query_vector,
k=5
)
for chunk in results:
print(f"Text: {chunk.text}")
print(f"Metadata: {chunk.metadata}")