LiteParse provides the lit CLI with three commands: parse, batch-parse, and screenshot.
Parse a single document.
lit parse [options] <file>
Argument Description filePath to the document file, or - to read from stdin
Option Description Default -o, --output <file>Write output to a file instead of stdout — --format <format>Output format: json or text text--ocr-server-url <url>HTTP OCR server URL — (uses Tesseract) --no-ocrDisable OCR entirely — --ocr-language <lang>OCR language code en--num-workers <n>Pages to OCR in parallel CPU cores - 1 --max-pages <n>Maximum pages to parse 10000--target-pages <pages>Pages to parse (e.g., "1-5,10") — (all pages) --dpi <dpi>Rendering DPI 150--no-precise-bboxDeprecated: Disable populating the output boundingBoxes array. Will be removed in v2.0. Text item coordinates (x, y, width, height) are always present regardless.— --preserve-small-textKeep very small text — --password <password>Password for encrypted/protected documents — --config <file>JSON config file path — -q, --quietSuppress progress output —
# JSON output with bounding boxes
lit parse report.pdf --format json -o report.json
# Parse pages 1-5 only, no OCR
lit parse report.pdf --target-pages "1-5" --no-ocr
# High-DPI rendering with French OCR
lit parse report.pdf --dpi 300 --ocr-language fra
# Use an external OCR server
lit parse report.pdf --ocr-server-url http://localhost:8828/ocr
# Pipe output to another tool
lit parse report.pdf -q | wc -l
# Parse a remote file via stdin
curl -sL https://example.com/report.pdf | lit parse --no-ocr -
Parse multiple documents in a directory.
lit batch-parse [options] <input-dir> <output-dir>
Argument Description input-dirDirectory containing documents to parse output-dirDirectory for output files
Option Description Default --format <format>Output format: json or text text--ocr-server-url <url>HTTP OCR server URL — (uses Tesseract) --no-ocrDisable OCR entirely — --ocr-language <lang>OCR language code en--num-workers <n>Pages to OCR in parallel CPU cores - 1 --max-pages <n>Maximum pages per file 10000--dpi <dpi>Rendering DPI 150--no-precise-bboxDeprecated: Disable populating the output boundingBoxes array. Will be removed in v2.0. Text item coordinates (x, y, width, height) are always present regardless.— --recursiveSearch subdirectories — --extension <ext>Only process this extension (e.g., ".pdf") — (all supported) --password <password>Password for encrypted/protected documents (applied to all files) — --config <file>JSON config file path — -q, --quietSuppress progress output —
# Parse all supported files in a directory
lit batch-parse ./documents ./output
# Recursively parse only PDFs
lit batch-parse ./documents ./output --recursive --extension ".pdf"
# Batch parse with JSON output and no OCR
lit batch-parse ./documents ./output --format json --no-ocr
# Use a config file for consistent settings
lit batch-parse ./documents ./output --config liteparse.config.json
Generate page images from a PDF.
lit screenshot [options] <file>
Argument Description filePath to the PDF file
Option Description Default -o, --output-dir <dir>Output directory ./screenshots--target-pages <pages>Pages to screenshot (e.g., "1,3,5" or "1-5") — (all pages) --dpi <dpi>Rendering DPI 150--format <format>Image format: png or jpg png--password <password>Password for encrypted/protected documents — --config <file>JSON config file path — -q, --quietSuppress progress output —
lit screenshot document.pdf -o ./pages
# First 5 pages at high DPI
lit screenshot document.pdf --pages "1-5" --dpi 300 -o ./pages
# JPG format for smaller files
lit screenshot document.pdf --format jpg -o ./pages
lit screenshot document.pdf --pages "1,5,10" -o ./pages
These options are available on all commands:
Option Description -h, --helpShow help for a command -V, --versionShow version number