Skip to content

TextSplitter

datapizza.modules.splitters.TextSplitter

Bases: Splitter

A basic text splitter that operates directly on strings rather than Node objects. Unlike other splitters that work with Node types, this splitter takes raw text input and splits it into chunks while maintaining configurable size and overlap parameters.

__init__

__init__(max_char=5000, overlap=0)

Initialize the TextSplitter.

Parameters:

Name Type Description Default
max_char int

The maximum number of characters per chunk

5000
overlap int

The number of characters to overlap between chunks

0

split

split(text)

Split the text into chunks.

Parameters:

Name Type Description Default
text str

The text to split

required

Returns:

Type Description
list[Chunk]

A list of chunks

Usage

from datapizza.modules.splitters import TextSplitter

splitter = TextSplitter(
    max_char=500,
    overlap=50
)

chunks = splitter.split(text_content)

Features

  • Simple, straightforward text splitting algorithm
  • Configurable chunk size and overlap
  • Lightweight implementation for basic splitting needs
  • Preserves character-level accuracy in chunk boundaries
  • Minimal overhead for high-performance applications

Examples

Basic Usage

from datapizza.modules.splitters import TextSplitter

splitter = TextSplitter(max_char=50, overlap=5)

text = """
This is a sample text that we want to split into smaller chunks.
The TextSplitter will divide this content based on the specified
chunk size and overlap parameters. This ensures that information
is preserved while creating manageable pieces of content.
"""

chunks = splitter.split(text)

for i, chunk in enumerate(chunks):
    print(f"Chunk {i+1}: {len(chunk.text)} chars")
    print(f"Content: {chunk.text}")
    print("---")