v2.5.3 — Indexing Is 5.6x Faster. Here's What We Found.
171 minutes → 30.8 minutes. 5.6x faster. Same 6,512 files.
That's the headline number for v2.5.3. But the story behind it is more interesting than the number itself — because the fix wasn't "make everything faster." It was "find the one thing that was making everything slow."
The diagnosis: one Excel file was eating 85% of indexing time
We profiled real-world indexing on a library of 6,512 work documents — IPO filings, contracts, financial reports, spreadsheets. The old indexing rate was 37.9 files per minute, which meant a full index took almost 3 hours.
When we broke down the timing by file, the answer was immediately obvious: Excel files (.xlsx) consumed 85.7% of total indexing time. One 45MB spreadsheet alone took over 2 hours to process. That single file was the bottleneck for the entire library.
The root cause was the parser attempting to extract and index every cell from massive spreadsheets — including machine-generated data dumps that no human would ever search through. The fix was surgical:
- 10MB size cap — files above this threshold are indexed by metadata only, not full-text parsed
- Text-only extraction — formulas, styling data, and internal cell references are stripped before indexing
Result: the same 6,512-file library now indexes in 30.8 minutes instead of 171.6 minutes.
Search response: faster and more consistent
We also fixed several issues that made search feel slower than it actually was:
- Duplicate query execution — a debounce bug was causing search to fire twice on Enter key. Fixed.
- Korean IME false triggers — mid-composition keystrokes were firing searches before you finished typing. Now requires 2+ completed syllables.
- Re-entry guard — the same query can no longer run simultaneously in parallel. If it's already running, the duplicate is dropped.
Target: P95 search response ≤ 150ms, down from a 336ms baseline.
Search ranking: body content finally competes with filenames
Previously, a single keyword match in a filename scored 5x higher than the same keyword appearing dozens of times in a document body. That meant a file named report.docx would always outrank a 50-page document full of the word "report" in its actual content.
v2.5.3 reduces the filename boost from 5.0x to 2.5x and increases the folder path signal. Documents in relevant folders now surface more naturally, and body content gets a fair shot at the top of results.
Parser quality: 8 formats verified on 80 real documents
We tested every supported parser against real work documents — not synthetic test files, but actual IPO filings, contracts, and financial reports. Overall quality score: 4.3 out of 5.
Specific fixes:
- DOCX — fixed internal numbering IDs leaking into search snippets
- HWP — removed
<>tag noise from table cell extraction - PDF — added garbled-text detection for CMap encoding failures. Instead of silently indexing unreadable characters, files with encoding issues are now correctly flagged as unindexable
Under the hood
- SHA256 model verification — the BGE-M3 embedding model (2.3 GB) is now verified against SHA256 hashes on download. Previously, hash fields were empty and verification was silently skipped.
- Embedding phase skip — dense (vector) search is temporarily disabled while we prepare v2.6. Embedding generation is now correctly skipped during indexing, so no CPU time is wasted building vectors that aren't used yet.
Get v2.5.3
Download from localsynapse.com. If you're upgrading from a previous version, re-indexing is recommended to get the full benefit of the parser and speed improvements.
Free. Open source (Apache 2.0). Windows and macOS.