RAG Chunking Comparison

Semantic vs Naive Chunking Analysis

This research tool demonstrates the impact of different chunking strategies on RAG system performance. Compare semantic chunking (similarity-based) against naive chunking (fixed-size) using comprehensive RAGAS metrics and statistical analysis.

OpenAI API Key

Your API key is stored locally in your browser and never sent to our servers.

Enter your OpenAI API Key

Don't have an API key? You can still use the application with simulated results.

Get an API key from OpenAI.

Document Input

Upload a file or drag and drop

PDF, TXT, or MD files up to 4.0 MB

Or paste your text directly:

0 characters • 0 words

Or try a sample document:

Experiment Configuration

Quick Presets

Focused

Very high similarity, small precise chunks

Sim: 0.85

Max: 250

Balanced

Moderate settings for most documents

Sim: 0.7

Max: 400

Contextual

Lower threshold for narrative documents

Sim: 0.55

Max: 600

Semantic Chunking Parameters

Similarity Threshold

0.70

Higher values (0.7-0.9) create focused chunks better for Q&A. Lower values (0.5-0.7) preserve more context.

Max Tokens

400

Maximum tokens per semantic chunk

Min Tokens

Minimum tokens per semantic chunk

Naive Chunking Parameters

Chunk Size

400

Fixed size for naive chunks (tokens)

Overlap

Token overlap between consecutive chunks

Model Configuration

Provider

Model

Current Configuration

Similarity:

0.7

Max Tokens:

400

Chunk Size:

400

Model:

gpt-3.5-turbo