---
title: API Reference | Developer Documentation
description: API reference for the @llamaindex/liteparse TypeScript library.
---

LiteParse — open-source PDF parsing with spatial text extraction, OCR, and bounding boxes.

## Example

```
import { LiteParse } from "@llamaindex/liteparse";


const parser = new LiteParse({ ocrEnabled: true });
const result = await parser.parse("document.pdf");
console.log(result.text);
```

## Classes

### LiteParse

Defined in: [core/parser.ts:58](https://github.com/run-llama/liteparse/blob/e1a13f0421526d0d515ee8388d2f5619a5ad8551/src/core/parser.ts#L58)

Main document parser class. Handles PDF parsing, OCR, format conversion, and screenshot generation.

#### Examples

```
import { LiteParse } from "@llamaindex/liteparse";


const parser = new LiteParse();
const result = await parser.parse("document.pdf");
console.log(result.text);
```

```
const parser = new LiteParse({ outputFormat: "json", dpi: 300 });
const result = await parser.parse("document.pdf");
for (const page of result.json.pages) {
  console.log(`Page ${page.page}: ${page.boundingBoxes.length} bounding boxes`);
}
```

```
const parser = new LiteParse({
  ocrServerUrl: "http://localhost:8828/ocr",
  ocrLanguage: "en",
});
const result = await parser.parse("scanned-document.pdf");
```

#### Constructors

##### Constructor

> **new LiteParse**(`userConfig?`): [`LiteParse`](#liteparse)

Defined in: [core/parser.ts:68](https://github.com/run-llama/liteparse/blob/e1a13f0421526d0d515ee8388d2f5619a5ad8551/src/core/parser.ts#L68)

Create a new LiteParse instance.

###### Parameters

###### userConfig?

`Partial`<[`LiteParseConfig`](#liteparseconfig)> = `{}`

Partial configuration to override defaults. See [LiteParseConfig](#liteparseconfig) for all options.

###### Returns

[`LiteParse`](#liteparse)

#### Methods

##### getConfig()

> **getConfig**(): [`LiteParseConfig`](#liteparseconfig)

Defined in: [core/parser.ts:492](https://github.com/run-llama/liteparse/blob/e1a13f0421526d0d515ee8388d2f5619a5ad8551/src/core/parser.ts#L492)

Get a copy of the current configuration, including defaults merged with user overrides.

###### Returns

[`LiteParseConfig`](#liteparseconfig)

A shallow copy of the active [LiteParseConfig](#liteparseconfig).

##### parse()

> **parse**(`input`, `quiet?`): `Promise`<[`ParseResult`](#parseresult)>

Defined in: [core/parser.ts:100](https://github.com/run-llama/liteparse/blob/e1a13f0421526d0d515ee8388d2f5619a5ad8551/src/core/parser.ts#L100)

Parse a document and return the extracted text, page data, and optionally structured JSON.

Supports PDFs natively. Non-PDF formats (DOCX, XLSX, images, etc.) are automatically converted to PDF before parsing if the required system tools are installed.

###### Parameters

###### input

[`LiteParseInput`](#liteparseinput)

A file path, `Buffer`, or `Uint8Array` containing document bytes. When given raw bytes, PDF data is parsed directly with zero disk I/O. Non-PDF bytes are written to a temp file for format conversion.

###### quiet?

`boolean` = `false`

If `true`, suppresses progress logging to stderr.

###### Returns

`Promise`<[`ParseResult`](#parseresult)>

Parsed document data including text, per-page info, and optional JSON.

###### Throws

Error if the file cannot be found, converted, or parsed.

##### screenshot()

> **screenshot**(`input`, `pageNumbers?`, `quiet?`): `Promise`<[`ScreenshotResult`](#screenshotresult)\[]>

Defined in: [core/parser.ts:235](https://github.com/run-llama/liteparse/blob/e1a13f0421526d0d515ee8388d2f5619a5ad8551/src/core/parser.ts#L235)

Generate screenshots of PDF pages as image buffers.

Uses PDFium for high-quality rendering. Each page is returned as a [ScreenshotResult](#screenshotresult) with the raw image buffer and dimensions.

Supports PDFs natively. Non-PDF formats (DOCX, XLSX, images, etc.) are automatically converted to PDF before rendering if the required system tools are installed. Text-based formats (TXT, CSV, etc.) cannot be screenshotted and will throw an error.

###### Parameters

###### input

[`LiteParseInput`](#liteparseinput)

A file path, `Buffer`, or `Uint8Array` containing document bytes.

###### pageNumbers?

`number`\[]

1-indexed page numbers to screenshot. If omitted, all pages are rendered.

###### quiet?

`boolean` = `false`

If `true`, suppresses progress logging to stderr.

###### Returns

`Promise`<[`ScreenshotResult`](#screenshotresult)\[]>

Array of screenshot results, one per rendered page.

###### Throws

Error if the input is a text-based format that cannot be rendered.

###### Throws

Error if the file cannot be found, converted, or rendered.

## Interfaces

### ~~BoundingBox~~

Defined in: [core/types.ts:281](https://github.com/run-llama/liteparse/blob/e1a13f0421526d0d515ee8388d2f5619a5ad8551/src/core/types.ts#L281)

An axis-aligned bounding box defined by its top-left and bottom-right corners.

All coordinates are in PDF points.

#### Deprecated

Use [TextItem](#textitem) coordinates (`x`, `y`, `width`, `height`) instead. Will be removed in v2.0.

#### Properties

##### ~~x1~~

> **x1**: `number`

Defined in: [core/types.ts:283](https://github.com/run-llama/liteparse/blob/e1a13f0421526d0d515ee8388d2f5619a5ad8551/src/core/types.ts#L283)

X coordinate of the top-left corner.

##### ~~x2~~

> **x2**: `number`

Defined in: [core/types.ts:287](https://github.com/run-llama/liteparse/blob/e1a13f0421526d0d515ee8388d2f5619a5ad8551/src/core/types.ts#L287)

X coordinate of the bottom-right corner.

##### ~~y1~~

> **y1**: `number`

Defined in: [core/types.ts:285](https://github.com/run-llama/liteparse/blob/e1a13f0421526d0d515ee8388d2f5619a5ad8551/src/core/types.ts#L285)

Y coordinate of the top-left corner.

##### ~~y2~~

> **y2**: `number`

Defined in: [core/types.ts:289](https://github.com/run-llama/liteparse/blob/e1a13f0421526d0d515ee8388d2f5619a5ad8551/src/core/types.ts#L289)

Y coordinate of the bottom-right corner.

---

### GridDebugConfig

Defined in: [processing/gridDebugLogger.ts:13](https://github.com/run-llama/liteparse/blob/e1a13f0421526d0d515ee8388d2f5619a5ad8551/src/processing/gridDebugLogger.ts#L13)

Configuration for grid projection debug logging.

When enabled, logs detailed information about how text elements are snapped, anchored, and projected during grid layout. Use filters to narrow output to specific elements you’re investigating.

#### Properties

##### enabled

> **enabled**: `boolean`

Defined in: [processing/gridDebugLogger.ts:18](https://github.com/run-llama/liteparse/blob/e1a13f0421526d0d515ee8388d2f5619a5ad8551/src/processing/gridDebugLogger.ts#L18)

Enable debug logging for grid projection.

###### Default Value

`false`

##### lineFilter?

> `optional` **lineFilter**: `number`\[]

Defined in: [processing/gridDebugLogger.ts:29](https://github.com/run-llama/liteparse/blob/e1a13f0421526d0d515ee8388d2f5619a5ad8551/src/processing/gridDebugLogger.ts#L29)

Only log elements on these line indices (0-based within the page).

##### outputPath?

> `optional` **outputPath**: `string`

Defined in: [processing/gridDebugLogger.ts:44](https://github.com/run-llama/liteparse/blob/e1a13f0421526d0d515ee8388d2f5619a5ad8551/src/processing/gridDebugLogger.ts#L44)

Write log output to a file path instead of stderr. If not set, logs to stderr.

##### pageFilter?

> `optional` **pageFilter**: `number`

Defined in: [processing/gridDebugLogger.ts:34](https://github.com/run-llama/liteparse/blob/e1a13f0421526d0d515ee8388d2f5619a5ad8551/src/processing/gridDebugLogger.ts#L34)

Only log elements on this page number (1-indexed).

##### regionFilter?

> `optional` **regionFilter**: `object`

Defined in: [processing/gridDebugLogger.ts:39](https://github.com/run-llama/liteparse/blob/e1a13f0421526d0d515ee8388d2f5619a5ad8551/src/processing/gridDebugLogger.ts#L39)

Only log elements within this bounding region (PDF coordinates).

###### x1

> **x1**: `number`

###### x2

> **x2**: `number`

###### y1

> **y1**: `number`

###### y2

> **y2**: `number`

##### textFilter?

> `optional` **textFilter**: `string`\[]

Defined in: [processing/gridDebugLogger.ts:24](https://github.com/run-llama/liteparse/blob/e1a13f0421526d0d515ee8388d2f5619a5ad8551/src/processing/gridDebugLogger.ts#L24)

Only log elements whose text contains one of these substrings (case-insensitive). If empty, all elements are logged.

##### trace?

> `optional` **trace**: `boolean`

Defined in: [processing/gridDebugLogger.ts:69](https://github.com/run-llama/liteparse/blob/e1a13f0421526d0d515ee8388d2f5619a5ad8551/src/processing/gridDebugLogger.ts#L69)

Enable trace mode for detailed render decision logging. When enabled, each render logs the full decision chain: initial targetX, lineMax computation, forward anchor checks, and which factor was the binding constraint. Forward anchor mutations are also traced with their triggering item. Respects textFilter/lineFilter/pageFilter.

###### Default Value

`false`

##### visualize?

> `optional` **visualize**: `boolean`

Defined in: [processing/gridDebugLogger.ts:52](https://github.com/run-llama/liteparse/blob/e1a13f0421526d0d515ee8388d2f5619a5ad8551/src/processing/gridDebugLogger.ts#L52)

Generate PNG visualizations of the grid projection showing text boxes color-coded by snap type (left/right/center/floating/flowing) with anchor lines overlaid.

###### Default Value

`false`

##### visualizePath?

> `optional` **visualizePath**: `string`

Defined in: [processing/gridDebugLogger.ts:59](https://github.com/run-llama/liteparse/blob/e1a13f0421526d0d515ee8388d2f5619a5ad8551/src/processing/gridDebugLogger.ts#L59)

Directory to save visualization PNGs. Each page produces a file named `page-{N}-grid.png`.

###### Default Value

`"./debug-output"`

---

### JsonTextItem

Defined in: [core/types.ts:316](https://github.com/run-llama/liteparse/blob/e1a13f0421526d0d515ee8388d2f5619a5ad8551/src/core/types.ts#L316)

A text element from the JSON output with position, size, and font metadata.

#### Properties

##### confidence?

> `optional` **confidence**: `number`

Defined in: [core/types.ts:332](https://github.com/run-llama/liteparse/blob/e1a13f0421526d0d515ee8388d2f5619a5ad8551/src/core/types.ts#L332)

The OCR confidence (null if OCR wasn’t used)

##### fontName?

> `optional` **fontName**: `string`

Defined in: [core/types.ts:328](https://github.com/run-llama/liteparse/blob/e1a13f0421526d0d515ee8388d2f5619a5ad8551/src/core/types.ts#L328)

Font name.

##### fontSize?

> `optional` **fontSize**: `number`

Defined in: [core/types.ts:330](https://github.com/run-llama/liteparse/blob/e1a13f0421526d0d515ee8388d2f5619a5ad8551/src/core/types.ts#L330)

Font size in PDF points.

##### height

> **height**: `number`

Defined in: [core/types.ts:326](https://github.com/run-llama/liteparse/blob/e1a13f0421526d0d515ee8388d2f5619a5ad8551/src/core/types.ts#L326)

Height of the text item in PDF points.

##### text

> **text**: `string`

Defined in: [core/types.ts:318](https://github.com/run-llama/liteparse/blob/e1a13f0421526d0d515ee8388d2f5619a5ad8551/src/core/types.ts#L318)

The text content of this item.

##### width

> **width**: `number`

Defined in: [core/types.ts:324](https://github.com/run-llama/liteparse/blob/e1a13f0421526d0d515ee8388d2f5619a5ad8551/src/core/types.ts#L324)

Width of the text item in PDF points.

##### x

> **x**: `number`

Defined in: [core/types.ts:320](https://github.com/run-llama/liteparse/blob/e1a13f0421526d0d515ee8388d2f5619a5ad8551/src/core/types.ts#L320)

X coordinate of the top-left corner, in PDF points.

##### y

> **y**: `number`

Defined in: [core/types.ts:322](https://github.com/run-llama/liteparse/blob/e1a13f0421526d0d515ee8388d2f5619a5ad8551/src/core/types.ts#L322)

Y coordinate of the top-left corner, in PDF points.

---

### LiteParseConfig

Defined in: [core/types.ts:36](https://github.com/run-llama/liteparse/blob/e1a13f0421526d0d515ee8388d2f5619a5ad8551/src/core/types.ts#L36)

Configuration options for the [LiteParse](#liteparse) parser.

All fields have sensible defaults. Pass a `Partial<LiteParseConfig>` to the constructor to override only the options you need.

#### Example

```
const parser = new LiteParse({
  ocrEnabled: true,
  ocrLanguage: "fra",
  dpi: 300,
  outputFormat: "json",
});
```

#### Properties

##### debug?

> `optional` **debug**: [`GridDebugConfig`](#griddebugconfig)

Defined in: [core/types.ts:160](https://github.com/run-llama/liteparse/blob/e1a13f0421526d0d515ee8388d2f5619a5ad8551/src/core/types.ts#L160)

Debug configuration for grid projection. When enabled, logs detailed information about how text elements are snapped, anchored, and projected. Can also generate visual PNG overlays of the projection.

###### Example

```
const parser = new LiteParse({
  debug: {
    enabled: true,
    textFilter: ["Total", "Revenue"],
    pageFilter: 2,
    visualize: true,
    visualizePath: "./debug-output",
  }
});
```

##### dpi

> **dpi**: `number`

Defined in: [core/types.ts:100](https://github.com/run-llama/liteparse/blob/e1a13f0421526d0d515ee8388d2f5619a5ad8551/src/core/types.ts#L100)

DPI (dots per inch) for rendering pages to images. Higher values improve OCR accuracy but increase processing time and memory usage.

###### Default Value

`150`

##### maxPages

> **maxPages**: `number`

Defined in: [core/types.ts:85](https://github.com/run-llama/liteparse/blob/e1a13f0421526d0d515ee8388d2f5619a5ad8551/src/core/types.ts#L85)

Maximum number of pages to parse from the document.

###### Default Value

`1000`

##### numWorkers

> **numWorkers**: `number`

Defined in: [core/types.ts:78](https://github.com/run-llama/liteparse/blob/e1a13f0421526d0d515ee8388d2f5619a5ad8551/src/core/types.ts#L78)

Number of pages to OCR in parallel. Higher values use more memory but process faster on multi-core machines.

###### Default Value

```
CPU cores - 1 (minimum 1)
```

##### ocrEnabled

> **ocrEnabled**: `boolean`

Defined in: [core/types.ts:51](https://github.com/run-llama/liteparse/blob/e1a13f0421526d0d515ee8388d2f5619a5ad8551/src/core/types.ts#L51)

Whether to run OCR on pages with little or no native text. When enabled, LiteParse selectively OCRs only images and text-sparse regions.

###### Default Value

`true`

##### ocrLanguage

> **ocrLanguage**: `string` | `string`\[]

Defined in: [core/types.ts:43](https://github.com/run-llama/liteparse/blob/e1a13f0421526d0d515ee8388d2f5619a5ad8551/src/core/types.ts#L43)

OCR language code(s). Uses ISO 639-3 codes for Tesseract (e.g., `"eng"`, `"fra"`) or ISO 639-1 for HTTP OCR servers (e.g., `"en"`, `"fr"`).

###### Default Value

`"en"`

##### ocrServerUrl?

> `optional` **ocrServerUrl**: `string`

Defined in: [core/types.ts:59](https://github.com/run-llama/liteparse/blob/e1a13f0421526d0d515ee8388d2f5619a5ad8551/src/core/types.ts#L59)

URL of an HTTP OCR server implementing the LiteParse OCR API. If not provided, the built-in Tesseract.js engine is used.

###### See

[OCR API Specification](https://github.com/run-llama/liteparse/blob/main/OCR_API_SPEC.md)

##### outputFormat

> **outputFormat**: [`OutputFormat`](#outputformat-1)

Defined in: [core/types.ts:107](https://github.com/run-llama/liteparse/blob/e1a13f0421526d0d515ee8388d2f5619a5ad8551/src/core/types.ts#L107)

Output format for parsed results.

###### Default Value

`"json"`

##### password?

> `optional` **password**: `string`

Defined in: [core/types.ts:140](https://github.com/run-llama/liteparse/blob/e1a13f0421526d0d515ee8388d2f5619a5ad8551/src/core/types.ts#L140)

Password for opening encrypted/protected documents. Used for password-protected PDFs and office documents.

###### Default Value

`undefined`

##### ~~preciseBoundingBox~~

> **preciseBoundingBox**: `boolean`

Defined in: [core/types.ts:118](https://github.com/run-llama/liteparse/blob/e1a13f0421526d0d515ee8388d2f5619a5ad8551/src/core/types.ts#L118)

Calculate precise bounding boxes for each text line. Disable for faster parsing when bounding boxes aren’t needed.

###### Deprecated

Controls the deprecated `boundingBoxes` output. Will be removed in v2.0. Text item coordinates (`x`, `y`, `width`, `height`) are always present regardless.

###### Default Value

`true`

##### preserveLayoutAlignmentAcrossPages

> **preserveLayoutAlignmentAcrossPages**: `boolean`

Defined in: [core/types.ts:132](https://github.com/run-llama/liteparse/blob/e1a13f0421526d0d515ee8388d2f5619a5ad8551/src/core/types.ts#L132)

Maintain consistent text alignment across page boundaries.

###### Default Value

`false`

##### preserveVerySmallText

> **preserveVerySmallText**: `boolean`

Defined in: [core/types.ts:125](https://github.com/run-llama/liteparse/blob/e1a13f0421526d0d515ee8388d2f5619a5ad8551/src/core/types.ts#L125)

Preserve very small text that would normally be filtered out.

###### Default Value

`false`

##### targetPages?

> `optional` **targetPages**: `string`

Defined in: [core/types.ts:92](https://github.com/run-llama/liteparse/blob/e1a13f0421526d0d515ee8388d2f5619a5ad8551/src/core/types.ts#L92)

Specific pages to parse, as a comma-separated string of page numbers and ranges.

###### Example

```
`"1-5,10,15-20"`
```

##### tessdataPath?

> `optional` **tessdataPath**: `string`

Defined in: [core/types.ts:70](https://github.com/run-llama/liteparse/blob/e1a13f0421526d0d515ee8388d2f5619a5ad8551/src/core/types.ts#L70)

Path to a directory containing Tesseract `.traineddata` files. Used as both the language data source and cache directory for Tesseract.js.

If not set, falls back to the `TESSDATA_PREFIX` environment variable. If neither is set, Tesseract.js downloads data from cdn.jsdelivr.net.

###### Example

```
`/opt/tessdata`
```

---

### MarkupData

Defined in: [core/types.ts:207](https://github.com/run-llama/liteparse/blob/e1a13f0421526d0d515ee8388d2f5619a5ad8551/src/core/types.ts#L207)

Markup annotation data associated with a text item.

#### Properties

##### highlight?

> `optional` **highlight**: `string`

Defined in: [core/types.ts:209](https://github.com/run-llama/liteparse/blob/e1a13f0421526d0d515ee8388d2f5619a5ad8551/src/core/types.ts#L209)

Highlight color (e.g., `"yellow"`, `"#FFFF00"`), or `undefined` if not highlighted.

##### squiggly?

> `optional` **squiggly**: `boolean`

Defined in: [core/types.ts:213](https://github.com/run-llama/liteparse/blob/e1a13f0421526d0d515ee8388d2f5619a5ad8551/src/core/types.ts#L213)

Whether the text has a squiggly underline.

##### strikeout?

> `optional` **strikeout**: `boolean`

Defined in: [core/types.ts:215](https://github.com/run-llama/liteparse/blob/e1a13f0421526d0d515ee8388d2f5619a5ad8551/src/core/types.ts#L215)

Whether the text is struck out.

##### underline?

> `optional` **underline**: `boolean`

Defined in: [core/types.ts:211](https://github.com/run-llama/liteparse/blob/e1a13f0421526d0d515ee8388d2f5619a5ad8551/src/core/types.ts#L211)

Whether the text is underlined.

---

### ParsedPage

Defined in: [core/types.ts:295](https://github.com/run-llama/liteparse/blob/e1a13f0421526d0d515ee8388d2f5619a5ad8551/src/core/types.ts#L295)

Parsed data for a single page of a document.

#### Properties

##### ~~boundingBoxes?~~

> `optional` **boundingBoxes**: [`BoundingBox`](#boundingbox)\[]

Defined in: [core/types.ts:310](https://github.com/run-llama/liteparse/blob/e1a13f0421526d0d515ee8388d2f5619a5ad8551/src/core/types.ts#L310)

###### Deprecated

Use [TextItem](#textitem) coordinates instead. Will be removed in v2.0. Present when [LiteParseConfig.preciseBoundingBox](#preciseboundingbox) is enabled.

##### height

> **height**: `number`

Defined in: [core/types.ts:301](https://github.com/run-llama/liteparse/blob/e1a13f0421526d0d515ee8388d2f5619a5ad8551/src/core/types.ts#L301)

Page height in PDF points.

##### pageNum

> **pageNum**: `number`

Defined in: [core/types.ts:297](https://github.com/run-llama/liteparse/blob/e1a13f0421526d0d515ee8388d2f5619a5ad8551/src/core/types.ts#L297)

1-indexed page number.

##### text

> **text**: `string`

Defined in: [core/types.ts:303](https://github.com/run-llama/liteparse/blob/e1a13f0421526d0d515ee8388d2f5619a5ad8551/src/core/types.ts#L303)

Full text content of the page with spatial layout preserved.

##### textItems

> **textItems**: [`TextItem`](#textitem)\[]

Defined in: [core/types.ts:305](https://github.com/run-llama/liteparse/blob/e1a13f0421526d0d515ee8388d2f5619a5ad8551/src/core/types.ts#L305)

Individual text elements extracted from the page.

##### width

> **width**: `number`

Defined in: [core/types.ts:299](https://github.com/run-llama/liteparse/blob/e1a13f0421526d0d515ee8388d2f5619a5ad8551/src/core/types.ts#L299)

Page width in PDF points.

---

### ParseResult

Defined in: [core/types.ts:376](https://github.com/run-llama/liteparse/blob/e1a13f0421526d0d515ee8388d2f5619a5ad8551/src/core/types.ts#L376)

The result of parsing a document with [LiteParse.parse](#parse).

#### Properties

##### json?

> `optional` **json**: [`ParseResultJson`](#parseresultjson-1)

Defined in: [core/types.ts:382](https://github.com/run-llama/liteparse/blob/e1a13f0421526d0d515ee8388d2f5619a5ad8551/src/core/types.ts#L382)

Structured JSON data. Present when [LiteParseConfig.outputFormat](#outputformat) is `"json"`.

##### pages

> **pages**: [`ParsedPage`](#parsedpage)\[]

Defined in: [core/types.ts:378](https://github.com/run-llama/liteparse/blob/e1a13f0421526d0d515ee8388d2f5619a5ad8551/src/core/types.ts#L378)

Per-page parsed data.

##### text

> **text**: `string`

Defined in: [core/types.ts:380](https://github.com/run-llama/liteparse/blob/e1a13f0421526d0d515ee8388d2f5619a5ad8551/src/core/types.ts#L380)

Full document text, concatenated from all pages.

---

### ParseResultJson

Defined in: [core/types.ts:353](https://github.com/run-llama/liteparse/blob/e1a13f0421526d0d515ee8388d2f5619a5ad8551/src/core/types.ts#L353)

Structured JSON representation of parsed document data. Returned when [LiteParseConfig.outputFormat](#outputformat) is `"json"`.

#### Properties

##### pages

> **pages**: `object`\[]

Defined in: [core/types.ts:355](https://github.com/run-llama/liteparse/blob/e1a13f0421526d0d515ee8388d2f5619a5ad8551/src/core/types.ts#L355)

Array of page data.

###### ~~boundingBoxes~~

> **boundingBoxes**: [`BoundingBox`](#boundingbox)\[]

###### Deprecated

Use `textItems` coordinates instead. Will be removed in v2.0.

###### height

> **height**: `number`

Page height in PDF points.

###### page

> **page**: `number`

1-indexed page number.

###### text

> **text**: `string`

Full text content of the page.

###### textItems

> **textItems**: [`JsonTextItem`](#jsontextitem)\[]

Individual text elements with position and font metadata.

###### width

> **width**: `number`

Page width in PDF points.

---

### ScreenshotResult

Defined in: [core/types.ts:388](https://github.com/run-llama/liteparse/blob/e1a13f0421526d0d515ee8388d2f5619a5ad8551/src/core/types.ts#L388)

The result of generating a screenshot with [LiteParse.screenshot](#screenshot).

#### Properties

##### height

> **height**: `number`

Defined in: [core/types.ts:394](https://github.com/run-llama/liteparse/blob/e1a13f0421526d0d515ee8388d2f5619a5ad8551/src/core/types.ts#L394)

Image height in pixels.

##### imageBuffer

> **imageBuffer**: `Buffer`

Defined in: [core/types.ts:396](https://github.com/run-llama/liteparse/blob/e1a13f0421526d0d515ee8388d2f5619a5ad8551/src/core/types.ts#L396)

Raw image data as a Node.js Buffer (PNG or JPG).

##### imagePath?

> `optional` **imagePath**: `string`

Defined in: [core/types.ts:398](https://github.com/run-llama/liteparse/blob/e1a13f0421526d0d515ee8388d2f5619a5ad8551/src/core/types.ts#L398)

File path if the screenshot was saved to disk.

##### pageNum

> **pageNum**: `number`

Defined in: [core/types.ts:390](https://github.com/run-llama/liteparse/blob/e1a13f0421526d0d515ee8388d2f5619a5ad8551/src/core/types.ts#L390)

1-indexed page number.

##### width

> **width**: `number`

Defined in: [core/types.ts:392](https://github.com/run-llama/liteparse/blob/e1a13f0421526d0d515ee8388d2f5619a5ad8551/src/core/types.ts#L392)

Image width in pixels.

---

### SearchItemsOptions

Defined in: [core/types.ts:338](https://github.com/run-llama/liteparse/blob/e1a13f0421526d0d515ee8388d2f5619a5ad8551/src/core/types.ts#L338)

Options for [searchItems](#searchitems).

#### Properties

##### caseSensitive?

> `optional` **caseSensitive**: `boolean`

Defined in: [core/types.ts:346](https://github.com/run-llama/liteparse/blob/e1a13f0421526d0d515ee8388d2f5619a5ad8551/src/core/types.ts#L346)

Whether the search should be case-sensitive.

###### Default Value

`false`

##### phrase

> **phrase**: `string`

Defined in: [core/types.ts:340](https://github.com/run-llama/liteparse/blob/e1a13f0421526d0d515ee8388d2f5619a5ad8551/src/core/types.ts#L340)

Find text items containing this phrase. Matches can span multiple adjacent items.

---

### TextItem

Defined in: [core/types.ts:169](https://github.com/run-llama/liteparse/blob/e1a13f0421526d0d515ee8388d2f5619a5ad8551/src/core/types.ts#L169)

An individual text element extracted from a page, with position, size, and font metadata.

Coordinates use the PDF coordinate system where the origin is at the top-left of the page, x increases to the right, and y increases downward.

#### Properties

##### confidence?

> `optional` **confidence**: `number`

Defined in: [core/types.ts:201](https://github.com/run-llama/liteparse/blob/e1a13f0421526d0d515ee8388d2f5619a5ad8551/src/core/types.ts#L201)

Confidence score from 0.0 to 1.0. Native PDF text defaults to 1.0, OCR text reflects engine confidence.

##### fontName?

> `optional` **fontName**: `string`

Defined in: [core/types.ts:185](https://github.com/run-llama/liteparse/blob/e1a13f0421526d0d515ee8388d2f5619a5ad8551/src/core/types.ts#L185)

Font name (e.g., `"Helvetica"`, `"Times-Roman"`, `"OCR"` for OCR-detected text).

##### fontSize?

> `optional` **fontSize**: `number`

Defined in: [core/types.ts:187](https://github.com/run-llama/liteparse/blob/e1a13f0421526d0d515ee8388d2f5619a5ad8551/src/core/types.ts#L187)

Font size in PDF points.

##### h

> **h**: `number`

Defined in: [core/types.ts:183](https://github.com/run-llama/liteparse/blob/e1a13f0421526d0d515ee8388d2f5619a5ad8551/src/core/types.ts#L183)

Alias for [height](#height-3).

##### height

> **height**: `number`

Defined in: [core/types.ts:179](https://github.com/run-llama/liteparse/blob/e1a13f0421526d0d515ee8388d2f5619a5ad8551/src/core/types.ts#L179)

Height of the text item in PDF points.

##### markup?

> `optional` **markup**: [`MarkupData`](#markupdata)

Defined in: [core/types.ts:195](https://github.com/run-llama/liteparse/blob/e1a13f0421526d0d515ee8388d2f5619a5ad8551/src/core/types.ts#L195)

Markup annotations (highlights, underlines, etc.) applied to this text.

##### r?

> `optional` **r**: `number`

Defined in: [core/types.ts:189](https://github.com/run-llama/liteparse/blob/e1a13f0421526d0d515ee8388d2f5619a5ad8551/src/core/types.ts#L189)

Rotation angle in degrees. One of `0`, `90`, `180`, or `270`.

##### rx?

> `optional` **rx**: `number`

Defined in: [core/types.ts:191](https://github.com/run-llama/liteparse/blob/e1a13f0421526d0d515ee8388d2f5619a5ad8551/src/core/types.ts#L191)

X coordinate after rotation transformation.

##### ry?

> `optional` **ry**: `number`

Defined in: [core/types.ts:193](https://github.com/run-llama/liteparse/blob/e1a13f0421526d0d515ee8388d2f5619a5ad8551/src/core/types.ts#L193)

Y coordinate after rotation transformation.

##### str

> **str**: `string`

Defined in: [core/types.ts:171](https://github.com/run-llama/liteparse/blob/e1a13f0421526d0d515ee8388d2f5619a5ad8551/src/core/types.ts#L171)

The text content of this item.

##### w

> **w**: `number`

Defined in: [core/types.ts:181](https://github.com/run-llama/liteparse/blob/e1a13f0421526d0d515ee8388d2f5619a5ad8551/src/core/types.ts#L181)

Alias for [width](#width-3).

##### width

> **width**: `number`

Defined in: [core/types.ts:177](https://github.com/run-llama/liteparse/blob/e1a13f0421526d0d515ee8388d2f5619a5ad8551/src/core/types.ts#L177)

Width of the text item in PDF points.

##### x

> **x**: `number`

Defined in: [core/types.ts:173](https://github.com/run-llama/liteparse/blob/e1a13f0421526d0d515ee8388d2f5619a5ad8551/src/core/types.ts#L173)

X coordinate of the top-left corner, in PDF points.

##### y

> **y**: `number`

Defined in: [core/types.ts:175](https://github.com/run-llama/liteparse/blob/e1a13f0421526d0d515ee8388d2f5619a5ad8551/src/core/types.ts#L175)

Y coordinate of the top-left corner, in PDF points.

## Type Aliases

### LiteParseInput

> **LiteParseInput** = `string` | `Buffer` | `Uint8Array`

Defined in: [core/types.ts:18](https://github.com/run-llama/liteparse/blob/e1a13f0421526d0d515ee8388d2f5619a5ad8551/src/core/types.ts#L18)

Accepted input types for [LiteParse.parse](#parse) and [LiteParse.screenshot](#screenshot).

- `string` — A file path to a document on disk.
- `Buffer | Uint8Array` — Raw file bytes (PDF bytes go straight to the parser with zero disk I/O; non-PDF bytes are written to a temp file for format conversion).

---

### OutputFormat

> **OutputFormat** = `"json"` | `"text"`

Defined in: [core/types.ts:9](https://github.com/run-llama/liteparse/blob/e1a13f0421526d0d515ee8388d2f5619a5ad8551/src/core/types.ts#L9)

## Functions

### searchItems()

> **searchItems**(`items`, `options`): [`JsonTextItem`](#jsontextitem)\[]

Defined in: [processing/searchItems.ts:26](https://github.com/run-llama/liteparse/blob/e1a13f0421526d0d515ee8388d2f5619a5ad8551/src/processing/searchItems.ts#L26)

Search text items for matches, returning synthetic merged items for each match.

For phrase searches, consecutive text items are concatenated and searched. When a phrase spans multiple items, the result is a single merged item with combined bounding box and the matched text. Font metadata is taken from the first matched item.

#### Parameters

##### items

[`JsonTextItem`](#jsontextitem)\[]

##### options

[`SearchItemsOptions`](#searchitemsoptions)

#### Returns

[`JsonTextItem`](#jsontextitem)\[]

#### Example

```
import { LiteParse, searchItems } from "@llamaindex/liteparse";


const parser = new LiteParse({ outputFormat: "json" });
const result = await parser.parse("report.pdf");


for (const page of result.json.pages) {
  const matches = searchItems(page.textItems, { phrase: "0°C to 70°C" });
  for (const match of matches) {
    console.log(`Found "${match.text}" at (${match.x}, ${match.y})`);
  }
}
```
