Captioners

Captioners are pipeline components that generate captions and descriptions for media content such as images, figures, and tables. They use LLM clients to analyze visual content and produce descriptive text.

LLMCaptioner

datapizza.modules.captioners.LLMCaptioner

Bases: NodeCaptioner

Captioner that uses an LLM client to caption a node.

init

__init__(
    client,
    max_workers=3,
    system_prompt_table="Generate concise captions for tables.",
    system_prompt_figure="Generate descriptive captions for figures.",
)

Captioner that uses an LLM client to caption a node. Args: client: The LLM client to use. max_workers: The maximum number of workers to use. in sync mode is the number of threads spawned, in async mode is the number of batches. system_prompt_table: The system prompt to use for table captioning. system_prompt_figure: The system prompt to use for figure captioning.

a_caption `async`

a_caption(node)

async Caption a node. Args: node: The node to caption.

Returns:

Type	Description
`Node`	The same node with the caption.

a_caption_media `async`

a_caption_media(media, system_prompt=None)

async Caption image. Args: media: The media to caption. system_prompt: Optional system prompt to guide the captioning.

Returns:

Type	Description
`str`	The string caption.

caption

caption(node)

Caption a node. Args: node: The node to caption.

Returns:

Type	Description
`Node`	The same node with the caption.

caption_media

caption_media(media, system_prompt=None)

Caption an image. Args: media: The media to caption. system_prompt: Optional system prompt to guide the captioning.

Returns:

Type	Description
`str`	The string caption.

A captioner that uses language models to generate captions for media nodes (figures and tables) within document hierarchies.

from datapizza.clients.openai import OpenAIClient
from datapizza.modules.captioners import LLMCaptioner
from datapizza.type import ROLE, Media, MediaNode, NodeType

client = OpenAIClient(api_key="OPENAI_API_KEY", model="gpt-4o")
captioner = LLMCaptioner(
    client=client,
    max_workers=3,
    system_prompt_table="Describe this table in detail.",
    system_prompt_figure="Describe this figure/image in detail."
)

document_node = MediaNode( node_type=NodeType.FIGURE, children=[], metadata={}, media=Media(source_type="path", source="gogole.png", extension="png", media_type="image"))
captioned_document = captioner(document_node)
print(captioned_document)

Parameters:

client (Client): The LLM client to use for caption generation
max_workers (int): Maximum number of concurrent workers for parallel processing (default: 3)
system_prompt_table (str, optional): System prompt for table captioning
system_prompt_figure (str, optional): System prompt for figure captioning

Features:

Automatically finds all media nodes (figures and tables) in a document hierarchy
Generates captions using configurable system prompts
Supports concurrent processing for better performance
Creates new paragraph nodes containing the original content plus generated captions
Preserves original node metadata and structure
Supports both sync and async processing

Supported Node Types:

FIGURE: Images and visual figures
TABLE: Tables and tabular data

Output Format:

The captioner creates new paragraph nodes with content in the format:

{original_content} <{node_type}> [{generated_caption}]

Captioners

LLMCaptioner

datapizza.modules.captioners.LLMCaptioner

__init__

a_caption async

a_caption_media async

caption

caption_media

init

a_caption `async`

a_caption_media `async`