v2.5.3 — Indexing Is 5.6x Faster. Here's What We Found.

2026-04-16·5 min read

Quick Answer: v2.5.3 cuts indexing time from 171 minutes to 30.8 minutes (5.6x faster) for a 6,500-file library. The root cause was a single 45MB Excel file consuming over 2 hours. We added a 10MB size cap and text-only extraction to eliminate the bottleneck.

171 minutes → 30.8 minutes. 5.6x faster. Same 6,512 files.

That's the headline number for v2.5.3. But the story behind it is more interesting than the number itself — because the fix wasn't "make everything faster." It was "find the one thing that was making everything slow."

The diagnosis: one Excel file was eating 85% of indexing time

We profiled real-world indexing on a library of 6,512 work documents — IPO filings, contracts, financial reports, spreadsheets. The old indexing rate was 37.9 files per minute, which meant a full index took almost 3 hours.

When we broke down the timing by file, the answer was immediately obvious: Excel files (.xlsx) consumed 85.7% of total indexing time. One 45MB spreadsheet alone took over 2 hours to process. That single file was the bottleneck for the entire library.

The root cause was the parser attempting to extract and index every cell from massive spreadsheets — including machine-generated data dumps that no human would ever search through. The fix was surgical:

10MB size cap — files above this threshold are indexed by metadata only, not full-text parsed
Text-only extraction — formulas, styling data, and internal cell references are stripped before indexing

Result: the same 6,512-file library now indexes in 30.8 minutes instead of 171.6 minutes.

Search response: faster and more consistent

We also fixed several issues that made search feel slower than it actually was:

Duplicate query execution — a debounce bug was causing search to fire twice on Enter key. Fixed.
Korean IME false triggers — mid-composition keystrokes were firing searches before you finished typing. Now requires 2+ completed syllables.
Re-entry guard — the same query can no longer run simultaneously in parallel. If it's already running, the duplicate is dropped.

Target: P95 search response ≤ 150ms, down from a 336ms baseline.

Search ranking: body content finally competes with filenames

Previously, a single keyword match in a filename scored 5x higher than the same keyword appearing dozens of times in a document body. That meant a file named report.docx would always outrank a 50-page document full of the word "report" in its actual content.

v2.5.3 reduces the filename boost from 5.0x to 2.5x and increases the folder path signal. Documents in relevant folders now surface more naturally, and body content gets a fair shot at the top of results.

Parser quality: 8 formats verified on 80 real documents

We tested every supported parser against real work documents — not synthetic test files, but actual IPO filings, contracts, and financial reports. Overall quality score: 4.3 out of 5.

Specific fixes:

DOCX — fixed internal numbering IDs leaking into search snippets
HWP — removed <> tag noise from table cell extraction
PDF — added garbled-text detection for CMap encoding failures. Instead of silently indexing unreadable characters, files with encoding issues are now correctly flagged as unindexable

Under the hood

SHA256 model verification — the BGE-M3 embedding model (2.3 GB) is now verified against SHA256 hashes on download. Previously, hash fields were empty and verification was silently skipped.
Embedding phase skip — dense (vector) search is temporarily disabled while we prepare v2.6. Embedding generation is now correctly skipped during indexing, so no CPU time is wasted building vectors that aren't used yet.

Get v2.5.3

Download from localsynapse.com. If you're upgrading from a previous version, re-indexing is recommended to get the full benefit of the parser and speed improvements.

Free. Open source (Apache 2.0). Windows and macOS.

Try LocalSynapse Free

Search inside files, 100% offline, free

Go to Home

Local RAG MCP Servers Compared (2026): Which One Reads Your Office Files?

6 min read

How Fast Is Offline Local Document Search? Real Benchmarks