NodeSplitter

datapizza.modules.splitters.NodeSplitter

Bases: Splitter

A splitter that traverses a document tree from the root node. If the root node's content is smaller than max_chars, it becomes a single chunk. Otherwise, it recursively processes the node's children, creating chunks from the first level of children that fit within max_chars, continuing deeper into the tree structure as needed.

init

__init__(max_char=5000)

Initialize the NodeSplitter.

Parameters:

Name	Type	Description	Default
`max_char`	`int`	The maximum number of characters per chunk	`5000`

split

split(node)

Split the node into chunks.

Parameters:

Name	Type	Description	Default
`node`	`Node`	The node to split	required

Returns:

Type	Description
`list[Chunk]`	A list of chunks

Usage

from datapizza.modules.splitters import NodeSplitter

splitter = NodeSplitter(
    max_char=800,
)

node_chunks = splitter.split(document_node)

Features

Maintains Node object structure and hierarchy
Preserves metadata from original nodes
Respects node boundaries when possible
Supports both structure-preserving and flattened chunking
Handles nested node relationships intelligently

Examples

Basic Node Splitting

from datapizza.modules.parsers import TextParser
from datapizza.modules.splitters import NodeSplitter

# Parse text into nodes
parser = TextParser()
document = parser.parse("""
This is the first section of the document.
It contains important information about the topic.

This is the second section with more details.
It provides additional context and examples.

The final section concludes the document.
It summarizes the key points discussed.
""")

splitter = NodeSplitter(
    max_char=150,
)

chunks = splitter.split(document)

# Examine the structured chunks
for i, chunk in enumerate(chunks):
    print(f"Chunk {i+1}:")
    print(f"  Content length: {len(chunk.text)}")
    print(f"  Content preview: {chunk.text[:80]}...")
    print("---")