Search & OCR
doclens provides unified full-text search across all text-based formats with keyboard navigation and visual highlighting.
Search Bar
The built-in search bar appears in the header toolbar. Type a query and results are highlighted across the entire document. Navigate with:
| Shortcut | Action |
|---|---|
Enter | Next match |
Shift + Enter | Previous match |
Escape | Clear search |
The search count shows "3 of 27" with up/down arrows for navigation.
Pre-Search on Load
Pass initialSearchTerms to highlight terms as soon as the document loads:
<DocViewer
document={{ uri: '/contract.pdf' }}
initialSearchTerms={['liability', 'indemnification', 'termination']}
/>
- The viewer auto-scrolls to the first match
- Each term gets a distinct highlight color (up to 5 built-in colors)
- Colors are customizable via
--dv-highlight-term-1through--dv-highlight-term-5
Programmatic Search
Using the engine API or React hooks:
import { useSearch } from 'doclens';
function SearchPanel() {
const { query, count, activeIndex, search, nextMatch, prevMatch, clearSearch } = useSearch();
return (
<div>
<input value={query} onChange={(e) => search(e.target.value)} />
<span>{activeIndex + 1} / {count}</span>
<button onClick={prevMatch}>Prev</button>
<button onClick={nextMatch}>Next</button>
<button onClick={clearSearch}>Clear</button>
</div>
);
}
Search Events
<DocViewer
document={{ uri: '/report.pdf' }}
onSearchChange={(query, results) => {
console.log(`"${query}": ${results.length} matches`);
// results[i].text — matched text
// results[i].page — page number (PDF)
// results[i].index — result index
}}
/>
OCR (Optical Character Recognition)
doclens supports OCR via Tesseract.js for finding text inside images and scanned PDF pages.
Enable OCR
<DocViewer document={{ uri: '/scanned-invoice.pdf' }} enableOCR />
npm install tesseract.js # required for OCR
How It Works
-
Eager processing — OCR starts as soon as the PDF loads, not when you search. By the time you type a query, OCR data is usually ready.
-
Mixed-content PDFs — pages with both embedded text and images are handled correctly. The PDF engine finds text-layer matches while OCR finds text in images. Results are merged with spatial deduplication (OCR results that overlap existing text-layer matches are skipped).
-
Sorted navigation — all results (text-layer and OCR) are sorted by page number and vertical position, so navigation follows the natural reading order.
-
Progressive UX — if OCR is still processing when you search, the count shows
"1 of 27+"with a spinner. Once OCR finishes, results update automatically (e.g., to"1 of 44"). If you wait for OCR to complete before searching, all results appear immediately. -
Image files — standalone images (PNG, JPG, etc.) are also OCR-processed when
enableOCRis true, enabling search within photos, screenshots, and scanned documents.
OCR Progress Events
<DocViewer
document={{ uri: '/scanned.pdf' }}
enableOCR
onSearchChange={(query, results) => {
// Called initially with text-layer results,
// then again when OCR results are ready
}}
/>
The engine also emits ocrProgress events:
engine.on('ocrProgress', ({ processing }) => {
if (processing) {
showSpinner();
} else {
hideSpinner();
}
});
Without Tesseract.js
If tesseract.js is not installed, OCR is skipped silently. Text-layer search for PDFs and all other formats work normally. No errors are thrown.