Skip to main content

Text & Search

pdfnova uses PDFium's text extraction for character-level precision — the same accuracy as Chrome's "Find in PDF" feature.

Extract Plain Text

const page = doc.getPage(0);
const text = page.getText();
console.log(text); // "Annual Report 2025\nQuarterly revenue grew by..."

Text Spans

Get text with position data for each word/run:

const spans = page.getTextSpans();
for (const span of spans) {
console.log(span.text, span.x, span.y, span.fontSize);
}

Each TextSpan contains:

PropertyTypeDescription
textstringThe text content
xnumberLeft position (PDF points)
ynumberBottom position (PDF points)
widthnumberSpan width
heightnumberSpan height
fontSizenumberFont size in points
charIndexnumberStarting character index
charCountnumberNumber of characters

Character Boxes

For pixel-perfect text selection or highlighting, get individual character bounding boxes:

const boxes = page.getCharBoxes();
for (const box of boxes) {
console.log(`"${box.char}" at (${box.left}, ${box.bottom}) - (${box.right}, ${box.top})`);
}

Text Layer

Build a transparent, selectable text overlay on top of a rendered canvas:

const container = document.getElementById("page-container")!;

// Render the page
const canvas = document.createElement("canvas");
await page.render(canvas, { scale: 2 });
container.appendChild(canvas);

// Build text layer on top
const textLayer = page.createTextLayer(container);
// textLayer is a div with positioned spans matching the rendered text

The text layer uses CSS position: absolute spans matched to the rendered scale, enabling native text selection, copy/paste, and accessibility.

Search a Single Page

const results = page.search("revenue", { caseSensitive: true });
for (const match of results) {
console.log(`Found "${match.text}" at char index ${match.charIndex}`);
console.log("Highlight rects:", match.rects);
}

Search the Entire Document

const allResults = doc.search("quarterly revenue", { wholeWord: true });
for (const match of allResults) {
console.log(`Page ${match.pageIndex + 1}: "${match.text}"`);
}

Search Options

OptionTypeDefaultDescription
caseSensitivebooleanfalseMatch case exactly
wholeWordbooleanfalseMatch whole words only

Search Result

Each SearchResult contains:

PropertyTypeDescription
pageIndexnumber0-based page number
matchIndexnumberGlobal match counter
charIndexnumberCharacter index in the page text
charCountnumberNumber of characters matched
rectsTextRect[]Bounding rectangles for highlighting
textstringThe matched text

Highlighting Search Results

Use the rects from search results to draw highlights:

const results = doc.search("revenue");
const ctx = canvas.getContext("2d")!;
const scale = 2;

ctx.fillStyle = "rgba(255, 235, 59, 0.4)";
for (const match of results.filter((r) => r.pageIndex === 0)) {
for (const rect of match.rects) {
ctx.fillRect(
rect.left * scale,
(page.height - rect.top) * scale,
(rect.right - rect.left) * scale,
(rect.top - rect.bottom) * scale,
);
}
}

Bookmarks / Table of Contents

const outline = doc.outline;
for (const item of outline) {
console.log(`${item.title} → page ${item.pageIndex + 1}`);
for (const child of item.children) {
console.log(` ${child.title} → page ${child.pageIndex + 1}`);
}
}

Extract hyperlinks from a page:

const links = page.getLinks();
for (const link of links) {
console.log(`${link.url} at page ${link.pageIndex}`);
}