Skip to content

ImageRAGPrompt

Specialized prompting utilities for Retrieval-Augmented Generation (RAG) with image content. The ImageRAGPrompt class provides multimodal content integration for vision-language models.

datapizza.modules.prompt.ImageRAGPrompt

Bases: Prompt

Create a memory for a image RAG system.

__init__

__init__(
    user_prompt_template,
    image_prompt_presentation,
    each_image_template,
)

Parameters:

Name Type Description Default
user_prompt_template str

str # The user prompt jinja template

required
image_prompt_presentation str

str # The image prompt jinja template

required
each_image_template str

str # The each image jinja template

required

format

format(chunks, user_query, retrieval_query, memory=None)

Creates a new memory object that includes: - Existing memory messages - User's query - Function call retrieval results - Chunks retrieval results

Parameters:

Name Type Description Default
chunks list[Chunk]

The chunks to add to the memory.

required
user_query str

The user's query.

required
retrieval_query str

The query to search the vectorstore for.

required
memory Memory | None

The memory object to add the new messages to.

None

Returns:

Name Type Description
memory Memory

A new memory object with the new messages.

Overview

from datapizza.modules.prompt.image_rag import ImageRAGPrompt

# Initialize image RAG prompt handler
image_rag = ImageRAGPrompt()

Features:

  • Image-aware RAG prompt construction
  • Multimodal content integration
  • Context preservation for image-text interactions
  • Optimized prompting for vision-language models

Usage Examples

Basic Image RAG Usage

from datapizza.modules.prompt.image_rag import ImageRAGPrompt
from datapizza.type import Media

# Initialize image RAG prompt
image_rag = ImageRAGPrompt()

# Create multimodal RAG prompt
media_content = Media(data=image_data, media_type="image/png")
rag_prompt = image_rag.create_rag_prompt(
    query="What does this chart show?",
    retrieved_context=text_context,
    images=[media_content],
    instructions="Analyze both the text context and image content"
)