Search & OCR

doclens provides unified full-text search across all text-based formats with keyboard navigation and visual highlighting.

Search Bar

The built-in search bar appears in the header toolbar. Type a query and results are highlighted across the entire document. Navigate with:

Shortcut	Action
`Enter`	Next match
`Shift + Enter`	Previous match
`Escape`	Clear search

The search count shows "3 of 27" with up/down arrows for navigation.

Pre-Search on Load

Pass initialSearchTerms to highlight terms as soon as the document loads:

<DocViewer
  document={{ uri: '/contract.pdf' }}
  initialSearchTerms={['liability', 'indemnification', 'termination']}
/>

The viewer auto-scrolls to the first match
Each term gets a distinct highlight color (up to 5 built-in colors)
Colors are customizable via --dv-highlight-term-1 through --dv-highlight-term-5

Programmatic Search

Using the engine API or React hooks:

import { useSearch } from 'doclens';

function SearchPanel() {
  const { query, count, activeIndex, search, nextMatch, prevMatch, clearSearch } = useSearch();

  return (
    <div>
      <input value={query} onChange={(e) => search(e.target.value)} />
      <span>{activeIndex + 1} / {count}</span>
      <button onClick={prevMatch}>Prev</button>
      <button onClick={nextMatch}>Next</button>
      <button onClick={clearSearch}>Clear</button>
    </div>
  );
}

Search Events

<DocViewer
  document={{ uri: '/report.pdf' }}
  onSearchChange={(query, results) => {
    console.log(`"${query}": ${results.length} matches`);
    // results[i].text — matched text
    // results[i].page — page number (PDF)
    // results[i].index — result index
  }}
/>

OCR (Optical Character Recognition)

doclens supports OCR via Tesseract.js for finding text inside images and scanned PDF pages.

Enable OCR

<DocViewer document={{ uri: '/scanned-invoice.pdf' }} enableOCR />

npm install tesseract.js  # required for OCR

How It Works

Eager processing — OCR starts as soon as the PDF loads, not when you search. By the time you type a query, OCR data is usually ready.
Mixed-content PDFs — pages with both embedded text and images are handled correctly. The PDF engine finds text-layer matches while OCR finds text in images. Results are merged with spatial deduplication (OCR results that overlap existing text-layer matches are skipped).
Sorted navigation — all results (text-layer and OCR) are sorted by page number and vertical position, so navigation follows the natural reading order.
Progressive UX — if OCR is still processing when you search, the count shows "1 of 27+" with a spinner. Once OCR finishes, results update automatically (e.g., to "1 of 44"). If you wait for OCR to complete before searching, all results appear immediately.
Image files — standalone images (PNG, JPG, etc.) are also OCR-processed when enableOCR is true, enabling search within photos, screenshots, and scanned documents.

OCR Progress Events

<DocViewer
  document={{ uri: '/scanned.pdf' }}
  enableOCR
  onSearchChange={(query, results) => {
    // Called initially with text-layer results,
    // then again when OCR results are ready
  }}
/>

The engine also emits ocrProgress events:

engine.on('ocrProgress', ({ processing }) => {
  if (processing) {
    showSpinner();
  } else {
    hideSpinner();
  }
});

Without Tesseract.js

If tesseract.js is not installed, OCR is skipped silently. Text-layer search for PDFs and all other formats work normally. No errors are thrown.

Search Bar​

Pre-Search on Load​

Programmatic Search​

Search Events​

OCR (Optical Character Recognition)​

Enable OCR​

How It Works​

OCR Progress Events​

Without Tesseract.js​