Text & Search
pdfnova uses PDFium's text extraction for character-level precision — the same accuracy as Chrome's "Find in PDF" feature.
Extract Plain Text
const page = doc.getPage(0);
const text = page.getText();
console.log(text); // "Annual Report 2025\nQuarterly revenue grew by..."
Text Spans
Get text with position data for each word/run:
const spans = page.getTextSpans();
for (const span of spans) {
console.log(span.text, span.x, span.y, span.fontSize);
}
Each TextSpan contains:
| Property | Type | Description |
|---|---|---|
text | string | The text content |
x | number | Left position (PDF points) |
y | number | Bottom position (PDF points) |
width | number | Span width |
height | number | Span height |
fontSize | number | Font size in points |
charIndex | number | Starting character index |
charCount | number | Number of characters |
Character Boxes
For pixel-perfect text selection or highlighting, get individual character bounding boxes:
const boxes = page.getCharBoxes();
for (const box of boxes) {
console.log(`"${box.char}" at (${box.left}, ${box.bottom}) - (${box.right}, ${box.top})`);
}
Text Layer
Build a transparent, selectable text overlay on top of a rendered canvas:
const container = document.getElementById("page-container")!;
// Render the page
const canvas = document.createElement("canvas");
await page.render(canvas, { scale: 2 });
container.appendChild(canvas);
// Build text layer on top
const textLayer = page.createTextLayer(container);
// textLayer is a div with positioned spans matching the rendered text
The text layer uses CSS position: absolute spans matched to the rendered scale, enabling native text selection, copy/paste, and accessibility.
Full-Text Search
Search a Single Page
const results = page.search("revenue", { caseSensitive: true });
for (const match of results) {
console.log(`Found "${match.text}" at char index ${match.charIndex}`);
console.log("Highlight rects:", match.rects);
}
Search the Entire Document
const allResults = doc.search("quarterly revenue", { wholeWord: true });
for (const match of allResults) {
console.log(`Page ${match.pageIndex + 1}: "${match.text}"`);
}
Search Options
| Option | Type | Default | Description |
|---|---|---|---|
caseSensitive | boolean | false | Match case exactly |
wholeWord | boolean | false | Match whole words only |
Search Result
Each SearchResult contains:
| Property | Type | Description |
|---|---|---|
pageIndex | number | 0-based page number |
matchIndex | number | Global match counter |
charIndex | number | Character index in the page text |
charCount | number | Number of characters matched |
rects | TextRect[] | Bounding rectangles for highlighting |
text | string | The matched text |
Highlighting Search Results
Use the rects from search results to draw highlights:
const results = doc.search("revenue");
const ctx = canvas.getContext("2d")!;
const scale = 2;
ctx.fillStyle = "rgba(255, 235, 59, 0.4)";
for (const match of results.filter((r) => r.pageIndex === 0)) {
for (const rect of match.rects) {
ctx.fillRect(
rect.left * scale,
(page.height - rect.top) * scale,
(rect.right - rect.left) * scale,
(rect.top - rect.bottom) * scale,
);
}
}
Bookmarks / Table of Contents
const outline = doc.outline;
for (const item of outline) {
console.log(`${item.title} → page ${item.pageIndex + 1}`);
for (const child of item.children) {
console.log(` ${child.title} → page ${child.pageIndex + 1}`);
}
}
Links
Extract hyperlinks from a page:
const links = page.getLinks();
for (const link of links) {
console.log(`${link.url} at page ${link.pageIndex}`);
}