RAG Chunking Comparison

Research tool comparing semantic vs naive chunking strategies

Semantic vs Naive Chunking Analysis

This research tool demonstrates the impact of different chunking strategies on RAG system performance. Compare semantic chunking (similarity-based) against naive chunking (fixed-size) using comprehensive RAGAS metrics and statistical analysis.

OpenAI API Key

Your API key is stored locally in your browser and never sent to our servers.

Don't have an API key? You can still use the application with simulated results.

Get an API key from OpenAI.

Document Input

or drag and drop

PDF, TXT, or MD files up to 4.0 MB

0 characters • 0 words

Or try a sample document:

Experiment Configuration

Focused
Very high similarity, small precise chunks
Sim: 0.85
Max: 250
Balanced
Moderate settings for most documents
Sim: 0.7
Max: 400
Contextual
Lower threshold for narrative documents
Sim: 0.55
Max: 600

Semantic Chunking Parameters

0.70

Higher values (0.7-0.9) create focused chunks better for Q&A. Lower values (0.5-0.7) preserve more context.

400

Maximum tokens per semantic chunk

75

Minimum tokens per semantic chunk

Naive Chunking Parameters

400

Fixed size for naive chunks (tokens)

50

Token overlap between consecutive chunks

Model Configuration

Current Configuration

Similarity:
0.7
Max Tokens:
400
Chunk Size:
400
Model:
gpt-3.5-turbo