Skip to content

Generate Extraction Schema

POST/api/v2/extract/schema/generate

Generate a JSON schema and return a product configuration request.

Query ParametersExpand Collapse
organization_id: optional string
project_id: optional string
Cookie ParametersExpand Collapse
session: optional string
Body ParametersJSONExpand Collapse
data_schema: optional map[map[unknown] or array of unknown or string or 2 more]

Optional schema to validate, refine, or extend

Accepts one of the following:
UnionMember0 = map[unknown]
UnionMember1 = array of unknown
UnionMember2 = string
UnionMember3 = number
UnionMember4 = boolean
file_id: optional string

Optional file ID to analyze for schema generation

name: optional string

Name for the generated configuration (auto-generated if omitted)

maxLength255
prompt: optional string

Natural language description of the data structure to extract

ReturnsExpand Collapse
name: string

Human-readable name for this configuration.

maxLength255
minLength1
parameters: object { categories, product_type, splitting_strategy } or object { data_schema, product_type, cite_sources, 10 more } or object { product_type, rules, mode, parsing_configuration } or 2 more

Product-specific configuration parameters.

Accepts one of the following:
SplitV1 = object { categories, product_type, splitting_strategy }

Typed parameters for a split v1 product configuration.

categories: array of SplitCategory { name, description }

Categories to split documents into.

name: string

Name of the category.

maxLength200
minLength1
description: optional string

Optional description of what content belongs in this category.

maxLength2000
minLength1
product_type: "split_v1"

Product type.

splitting_strategy: optional object { allow_uncategorized }

Strategy for splitting documents.

allow_uncategorized: optional "include" or "forbid" or "omit"

Controls handling of pages that don't match any category. 'include': pages can be grouped as 'uncategorized' and included in results. 'forbid': all pages must be assigned to a defined category. 'omit': pages can be classified as 'uncategorized' but are excluded from results.

Accepts one of the following:
"include"
"forbid"
"omit"
ExtractV2 = object { data_schema, product_type, cite_sources, 10 more }

Typed parameters for an extract v2 product configuration.

data_schema: map[map[unknown] or array of unknown or string or 2 more]

JSON Schema defining the fields to extract. Validate with the /schema/validate endpoint first.

Accepts one of the following:
UnionMember0 = map[unknown]
UnionMember1 = array of unknown
UnionMember2 = string
UnionMember3 = number
UnionMember4 = boolean
product_type: "extract_v2"

Product type.

cite_sources: optional boolean

Include citations in results

confidence_scores: optional boolean

Include confidence scores in results

extract_version: optional string

Extract algorithm version. Use 'latest' or a date string.

extraction_target: optional "per_doc" or "per_page" or "per_table_row"

Granularity of extraction: per_doc returns one object per document, per_page returns one object per page, per_table_row returns one object per table row

Accepts one of the following:
"per_doc"
"per_page"
"per_table_row"
lang: optional string

ISO 639-1 language code for the document

max_pages: optional number

Maximum number of pages to process. Omit for no limit.

minimum1
parse_config_id: optional string

Saved parse configuration ID to control how the document is parsed before extraction

parse_tier: optional string

Parse tier to use before extraction (fast, cost_effective, or agentic)

system_prompt: optional string

Custom system prompt to guide extraction behavior

target_pages: optional string

Comma-separated page numbers or ranges to process (1-based). Omit to process all pages.

tier: optional "cost_effective" or "agentic"

Extract tier: cost_effective (5 credits/page) or agentic (15 credits/page)

Accepts one of the following:
"cost_effective"
"agentic"
ClassifyV2 = object { product_type, rules, mode, parsing_configuration }

Typed parameters for a classify v2 product configuration.

product_type: "classify_v2"

Product type.

rules: array of object { description, type }

Classify rules to evaluate against the document (at least one required)

description: string

Natural language criteria for matching this rule

maxLength500
minLength10
type: string

Document type to assign when rule matches

maxLength50
minLength1
mode: optional "FAST"

Classify execution mode

parsing_configuration: optional object { lang, max_pages, target_pages }

Parsing configuration for classify jobs.

lang: optional string

ISO 639-1 language code for the document

max_pages: optional number

Maximum number of pages to process. Omit for no limit.

minimum1
target_pages: optional string

Comma-separated page numbers or ranges to process (1-based). Omit to process all pages.

ParseV2 = object { product_type, tier, version, 11 more }

Configuration for LlamaParse v2 document parsing.

Includes tier selection, processing options, output formatting, page targeting, and webhook delivery. Refer to the LlamaParse documentation for details on each field.

product_type: "parse_v2"

Product type.

tier: "fast" or "cost_effective" or "agentic" or "agentic_plus"

Parsing tier: 'fast' (rule-based, cheapest), 'cost_effective' (balanced), 'agentic' (AI-powered with custom prompts), or 'agentic_plus' (premium AI with highest accuracy)

Accepts one of the following:
"fast"
"cost_effective"
"agentic"
"agentic_plus"
version: "2025-12-11" or "2025-12-18" or "2025-12-31" or 31 more or string

Tier version. Use 'latest' for the current stable version, or specify a specific version (e.g., '1.0', '2.0') for reproducible results

Accepts one of the following:
UnionMember0 = "2025-12-11" or "2025-12-18" or "2025-12-31" or 31 more

Tier version. Use 'latest' for the current stable version, or specify a specific version (e.g., '1.0', '2.0') for reproducible results

Accepts one of the following:
"2025-12-11"
"2025-12-18"
"2025-12-31"
"2026-01-08"
"2026-01-09"
"2026-01-16"
"2026-01-21"
"2026-01-22"
"2026-01-24"
"2026-01-29"
"2026-01-30"
"2026-02-03"
"2026-02-18"
"2026-02-20"
"2026-02-24"
"2026-02-26"
"2026-03-02"
"2026-03-03"
"2026-03-04"
"2026-03-05"
"2026-03-09"
"2026-03-10"
"2026-03-11"
"2026-03-12"
"2026-03-17"
"2026-03-19"
"2026-03-20"
"2026-03-22"
"2026-03-23"
"2026-03-24"
"2026-03-25"
"2026-03-26"
"2026-03-27"
"latest"
UnionMember1 = string
agentic_options: optional object { custom_prompt }

Options for AI-powered parsing tiers (cost_effective, agentic, agentic_plus).

These options customize how the AI processes and interprets document content. Only applicable when using non-fast tiers.

custom_prompt: optional string

Custom instructions for the AI parser. Use to guide extraction behavior, specify output formatting, or provide domain-specific context. Example: 'Extract financial tables with currency symbols. Format dates as YYYY-MM-DD.'

client_name: optional string

Identifier for the client/application making the request. Used for analytics and debugging. Example: 'my-app-v2'

crop_box: optional object { bottom, left, right, top }

Crop boundaries to process only a portion of each page. Values are ratios 0-1 from page edges

bottom: optional number

Bottom boundary as ratio (0-1). 0=top edge, 1=bottom edge. Content below this line is excluded

maximum1
minimum0
left: optional number

Left boundary as ratio (0-1). 0=left edge, 1=right edge. Content left of this line is excluded

maximum1
minimum0
right: optional number

Right boundary as ratio (0-1). 0=left edge, 1=right edge. Content right of this line is excluded

maximum1
minimum0
top: optional number

Top boundary as ratio (0-1). 0=top edge, 1=bottom edge. Content above this line is excluded

maximum1
minimum0
disable_cache: optional boolean

Bypass result caching and force re-parsing. Use when document content may have changed or you need fresh results

fast_options: optional unknown

Options for fast tier parsing (rule-based, no AI).

Fast tier uses deterministic algorithms for text extraction without AI enhancement. It's the fastest and most cost-effective option, best suited for simple documents with standard layouts. Currently has no configurable options but reserved for future expansion.

input_options: optional object { html, pdf, presentation, spreadsheet }

Format-specific options (HTML, PDF, spreadsheet, presentation). Applied based on detected input file type

html: optional object { make_all_elements_visible, remove_fixed_elements, remove_navigation_elements }

HTML/web page parsing options (applies to .html, .htm files)

make_all_elements_visible: optional boolean

Force all HTML elements to be visible by overriding CSS display/visibility properties. Useful for parsing pages with hidden content or collapsed sections

remove_fixed_elements: optional boolean

Remove fixed-position elements (headers, footers, floating buttons) that appear on every page render

remove_navigation_elements: optional boolean

Remove navigation elements (nav bars, sidebars, menus) to focus on main content

pdf: optional unknown

PDF-specific parsing options (applies to .pdf files)

presentation: optional object { out_of_bounds_content, skip_embedded_data }

Presentation parsing options (applies to .pptx, .ppt, .odp, .key files)

out_of_bounds_content: optional boolean

Extract content positioned outside the visible slide area. Some presentations have hidden notes or content that extends beyond slide boundaries

skip_embedded_data: optional boolean

Skip extraction of embedded chart data tables. When true, only the visual representation of charts is captured, not the underlying data

spreadsheet: optional object { detect_sub_tables_in_sheets, force_formula_computation_in_sheets, include_hidden_sheets }

Spreadsheet parsing options (applies to .xlsx, .xls, .csv, .ods files)

detect_sub_tables_in_sheets: optional boolean

Detect and extract multiple tables within a single sheet. Useful when spreadsheets contain several data regions separated by blank rows/columns

force_formula_computation_in_sheets: optional boolean

Compute formula results instead of extracting formula text. Use when you need calculated values rather than formula definitions

include_hidden_sheets: optional boolean

Parse hidden sheets in addition to visible ones. By default, hidden sheets are skipped

output_options: optional object { extract_printed_page_number, images_to_save, markdown, 2 more }

Output formatting options for markdown, text, and extracted images

extract_printed_page_number: optional boolean

Extract the printed page number as it appears in the document (e.g., 'Page 5 of 10', 'v', 'A-3'). Useful for referencing original page numbers

images_to_save: optional array of "screenshot" or "embedded" or "layout"

Image categories to extract and save. Options: 'screenshot' (full page renders useful for visual QA), 'embedded' (images found within the document), 'layout' (cropped regions from layout detection like figures and diagrams). Empty list saves no images

Accepts one of the following:
"screenshot"
"embedded"
"layout"
markdown: optional object { annotate_links, inline_images, tables }

Markdown formatting options including table styles and link annotations

Add link annotations to markdown output in the format text. When false, only the link text is included

inline_images: optional boolean

Embed images directly in markdown as base64 data URIs instead of extracting them as separate files. Useful for self-contained markdown output

tables: optional object { compact_markdown_tables, markdown_table_multiline_separator, merge_continued_tables, output_tables_as_markdown }

Table formatting options including markdown vs HTML format and merging behavior

compact_markdown_tables: optional boolean

Remove extra whitespace padding in markdown table cells for more compact output

markdown_table_multiline_separator: optional string

Separator string for multiline cell content in markdown tables. Example: '
' to preserve line breaks, ' ' to join with spaces

merge_continued_tables: optional boolean

Automatically merge tables that span multiple pages into a single table. The merged table appears on the first page with merged_from_pages metadata

output_tables_as_markdown: optional boolean

Output tables as markdown pipe tables instead of HTML

tags. Markdown tables are simpler but cannot represent complex structures like merged cells

spatial_text: optional object { do_not_unroll_columns, preserve_layout_alignment_across_pages, preserve_very_small_text }

Spatial text output options for preserving document layout structure

do_not_unroll_columns: optional boolean

Keep multi-column layouts intact instead of linearizing columns into sequential text. Automatically enabled for non-fast tiers

preserve_layout_alignment_across_pages: optional boolean

Maintain consistent text column alignment across page boundaries. Automatically enabled for document-level parsing modes

preserve_very_small_text: optional boolean

Include text below the normal size threshold. Useful for footnotes, watermarks, or fine print that might otherwise be filtered out

tables_as_spreadsheet: optional object { enable, guess_sheet_name }

Options for exporting tables as XLSX spreadsheets

enable: optional boolean

Whether this option is enabled

guess_sheet_name: optional boolean

Automatically generate descriptive sheet names from table context (headers, surrounding text) instead of using generic names like 'Table_1'

page_ranges: optional object { max_pages, target_pages }

Page selection: limit total pages or specify exact pages to process

max_pages: optional number

Maximum number of pages to process. Pages are processed in order starting from page 1. If both max_pages and target_pages are set, target_pages takes precedence

minimum1
target_pages: optional string

Comma-separated list of specific pages to process using 1-based indexing. Supports individual pages and ranges. Examples: '1,3,5' (pages 1, 3, 5), '1-5' (pages 1 through 5 inclusive), '1,3,5-8,10' (pages 1, 3, 5-8, and 10). Pages are sorted and deduplicated automatically. Duplicate pages cause an error

processing_control: optional object { job_failure_conditions, timeouts }

Job execution controls including timeouts and failure thresholds

job_failure_conditions: optional object { allowed_page_failure_ratio, fail_on_buggy_font, fail_on_image_extraction_error, 2 more }

Quality thresholds that determine when a job should fail vs complete with partial results

allowed_page_failure_ratio: optional number

Maximum ratio of pages allowed to fail before the job fails (0-1). Example: 0.1 means job fails if more than 10% of pages fail. Default is 0.05 (5%)

maximum1
exclusiveMinimum0
fail_on_buggy_font: optional boolean

Fail the job if a problematic font is detected that may cause incorrect text extraction. Buggy fonts can produce garbled or missing characters

fail_on_image_extraction_error: optional boolean

Fail the entire job if any embedded image cannot be extracted. By default, image extraction errors are logged but don't fail the job

fail_on_image_ocr_error: optional boolean

Fail the entire job if OCR fails on any image. By default, OCR errors result in empty text for that image

fail_on_markdown_reconstruction_error: optional boolean

Fail the entire job if markdown cannot be reconstructed for any page. By default, failed pages use fallback text extraction

timeouts: optional object { base_in_seconds, extra_time_per_page_in_seconds }

Timeout settings for job execution. Increase for large or complex documents

base_in_seconds: optional number

Base timeout for the job in seconds (max 1800 = 30 minutes). This is the minimum time allowed regardless of document size

maximum1800
exclusiveMinimum0
extra_time_per_page_in_seconds: optional number

Additional timeout per page in seconds (max 300 = 5 minutes). Total timeout = base + (this value × page count)

maximum300
exclusiveMinimum0
processing_options: optional object { aggressive_table_extraction, auto_mode_configuration, cost_optimizer, 4 more }

Document processing options including OCR, table extraction, and chart parsing

aggressive_table_extraction: optional boolean

Use aggressive heuristics to detect table boundaries, even without visible borders. Useful for documents with borderless or complex tables

auto_mode_configuration: optional array of object { parsing_conf, filename_match_glob, filename_match_glob_list, 33 more }

Conditional processing rules that apply different parsing options based on page content, document structure, or filename patterns. Each entry defines trigger conditions and the parsing configuration to apply when triggered

parsing_conf: object { adaptive_long_table, aggressive_table_extraction, crop_box, 11 more }

Parsing configuration to apply when trigger conditions are met

adaptive_long_table: optional boolean

Whether to use adaptive long table handling

aggressive_table_extraction: optional boolean

Whether to use aggressive table extraction

crop_box: optional object { bottom, left, right, top }

Crop box options for auto mode parsing configuration.

bottom: optional number

Bottom boundary of crop box as ratio (0-1)

maximum1
minimum0
left: optional number

Left boundary of crop box as ratio (0-1)

maximum1
minimum0
right: optional number

Right boundary of crop box as ratio (0-1)

maximum1
minimum0
top: optional number

Top boundary of crop box as ratio (0-1)

maximum1
minimum0
custom_prompt: optional string

Custom AI instructions for matched pages. Overrides the base custom_prompt

extract_layout: optional boolean

Whether to extract layout information

high_res_ocr: optional boolean

Whether to use high resolution OCR

ignore: optional object { ignore_diagonal_text, ignore_hidden_text }

Ignore options for auto mode parsing configuration.

ignore_diagonal_text: optional boolean

Whether to ignore diagonal text in the document

ignore_hidden_text: optional boolean

Whether to ignore hidden text in the document

language: optional string

Primary language of the document

outlined_table_extraction: optional boolean

Whether to use outlined table extraction

presentation: optional object { out_of_bounds_content, skip_embedded_data }

Presentation-specific options for auto mode parsing configuration.

out_of_bounds_content: optional boolean

Extract out of bounds content in presentation slides

skip_embedded_data: optional boolean

Skip extraction of embedded data for charts in presentation slides

spatial_text: optional object { do_not_unroll_columns, preserve_layout_alignment_across_pages, preserve_very_small_text }

Spatial text options for auto mode parsing configuration.

do_not_unroll_columns: optional boolean

Keep column structure intact without unrolling

preserve_layout_alignment_across_pages: optional boolean

Preserve text alignment across page boundaries

preserve_very_small_text: optional boolean

Include very small text in spatial output

specialized_chart_parsing: optional "agentic_plus" or "agentic" or "efficient"

Enable specialized chart parsing with the specified mode

Accepts one of the following:
"agentic_plus"
"agentic"
"efficient"
tier: optional "fast" or "cost_effective" or "agentic" or "agentic_plus"

Override the parsing tier for matched pages. Must be paired with version

Accepts one of the following:
"fast"
"cost_effective"
"agentic"
"agentic_plus"
version: optional "2025-12-11" or "2025-12-18" or "2025-12-31" or 31 more or string

Tier version when overriding tier. Required when tier is specified

Accepts one of the following:
UnionMember0 = "2025-12-11" or "2025-12-18" or "2025-12-31" or 31 more

Tier version when overriding tier. Required when tier is specified

Accepts one of the following:
"2025-12-11"
"2025-12-18"
"2025-12-31"
"2026-01-08"
"2026-01-09"
"2026-01-16"
"2026-01-21"
"2026-01-22"
"2026-01-24"
"2026-01-29"
"2026-01-30"
"2026-02-03"
"2026-02-18"
"2026-02-20"
"2026-02-24"
"2026-02-26"
"2026-03-02"
"2026-03-03"
"2026-03-04"
"2026-03-05"
"2026-03-09"
"2026-03-10"
"2026-03-11"
"2026-03-12"
"2026-03-17"
"2026-03-19"
"2026-03-20"
"2026-03-22"
"2026-03-23"
"2026-03-24"
"2026-03-25"
"2026-03-26"
"2026-03-27"
"latest"
UnionMember1 = string
filename_match_glob: optional string

Single glob pattern to match against filename

filename_match_glob_list: optional array of string

List of glob patterns to match against filename

filename_regexp: optional string

Regex pattern to match against filename

filename_regexp_mode: optional string

Regex mode flags (e.g., 'i' for case-insensitive)

full_page_image_in_page: optional boolean

Trigger if page contains a full-page image (scanned page detection)

full_page_image_in_page_threshold: optional number or string

Threshold for full page image detection (0.0-1.0, default 0.8)

Accepts one of the following:
UnionMember0 = number
UnionMember1 = string
image_in_page: optional boolean

Trigger if page contains non-screenshot images

layout_element_in_page: optional string

Trigger if page contains this layout element type

layout_element_in_page_confidence_threshold: optional number or string

Confidence threshold for layout element detection

Accepts one of the following:
UnionMember0 = number
UnionMember1 = string
page_contains_at_least_n_charts: optional number or string

Trigger if page has more than N charts

Accepts one of the following:
UnionMember0 = number
UnionMember1 = string
page_contains_at_least_n_images: optional number or string

Trigger if page has more than N images

Accepts one of the following:
UnionMember0 = number
UnionMember1 = string
page_contains_at_least_n_layout_elements: optional number or string

Trigger if page has more than N layout elements

Accepts one of the following:
UnionMember0 = number
UnionMember1 = string
page_contains_at_least_n_lines: optional number or string

Trigger if page has more than N lines

Accepts one of the following:
UnionMember0 = number
UnionMember1 = string

Trigger if page has more than N links

Accepts one of the following:
page_contains_at_least_n_numbers: optional number or string

Trigger if page has more than N numeric words

Accepts one of the following:
UnionMember0 = number
UnionMember1 = string
page_contains_at_least_n_percent_numbers: optional number or string

Trigger if page has more than N% numeric words

Accepts one of the following:
UnionMember0 = number
UnionMember1 = string
page_contains_at_least_n_tables: optional number or string

Trigger if page has more than N tables

Accepts one of the following:
UnionMember0 = number
UnionMember1 = string
page_contains_at_least_n_words: optional number or string

Trigger if page has more than N words

Accepts one of the following:
UnionMember0 = number
UnionMember1 = string
page_contains_at_most_n_charts: optional number or string

Trigger if page has fewer than N charts

Accepts one of the following:
UnionMember0 = number
UnionMember1 = string
page_contains_at_most_n_images: optional number or string

Trigger if page has fewer than N images

Accepts one of the following:
UnionMember0 = number
UnionMember1 = string
page_contains_at_most_n_layout_elements: optional number or string

Trigger if page has fewer than N layout elements

Accepts one of the following:
UnionMember0 = number
UnionMember1 = string
page_contains_at_most_n_lines: optional number or string

Trigger if page has fewer than N lines

Accepts one of the following:
UnionMember0 = number
UnionMember1 = string

Trigger if page has fewer than N links

Accepts one of the following:
page_contains_at_most_n_numbers: optional number or string

Trigger if page has fewer than N numeric words

Accepts one of the following:
UnionMember0 = number
UnionMember1 = string
page_contains_at_most_n_percent_numbers: optional number or string

Trigger if page has fewer than N% numeric words

Accepts one of the following:
UnionMember0 = number
UnionMember1 = string
page_contains_at_most_n_tables: optional number or string

Trigger if page has fewer than N tables

Accepts one of the following:
UnionMember0 = number
UnionMember1 = string
page_contains_at_most_n_words: optional number or string

Trigger if page has fewer than N words

Accepts one of the following:
UnionMember0 = number
UnionMember1 = string
page_longer_than_n_chars: optional number or string

Trigger if page has more than N characters

Accepts one of the following:
UnionMember0 = number
UnionMember1 = string
page_md_error: optional boolean

Trigger on pages with markdown extraction errors

page_shorter_than_n_chars: optional number or string

Trigger if page has fewer than N characters

Accepts one of the following:
UnionMember0 = number
UnionMember1 = string
regexp_in_page: optional string

Regex pattern to match in page content

regexp_in_page_mode: optional string

Regex mode flags for regexp_in_page

table_in_page: optional boolean

Trigger if page contains a table

text_in_page: optional string

Trigger if page text/markdown contains this string

trigger_mode: optional string

How to combine multiple trigger conditions: 'and' (all conditions must match, this is the default) or 'or' (any single condition can trigger)

cost_optimizer: optional object { enable }

Cost optimizer configuration for reducing parsing costs on simpler pages.

When enabled, the parser analyzes each page and routes simpler pages to faster, cheaper processing while preserving quality for complex pages. Only works with 'agentic' or 'agentic_plus' tiers.

enable: optional boolean

Enable cost-optimized parsing. Routes simpler pages to faster processing while complex pages use full AI analysis. May reduce speed on some documents. IMPORTANT: Only available with 'agentic' or 'agentic_plus' tiers

disable_heuristics: optional boolean

Disable automatic heuristics including outlined table extraction and adaptive long table handling. Use when heuristics produce incorrect results

ignore: optional object { ignore_diagonal_text, ignore_hidden_text, ignore_text_in_image }

Options for ignoring specific text types (diagonal, hidden, text in images)

ignore_diagonal_text: optional boolean

Skip text rotated at an angle (not horizontal/vertical). Useful for ignoring watermarks or decorative angled text

ignore_hidden_text: optional boolean

Skip text marked as hidden in the document structure. Some PDFs contain invisible text layers used for accessibility or search indexing

ignore_text_in_image: optional boolean

Skip OCR text extraction from embedded images. Use when images contain irrelevant text (watermarks, logos) that shouldn't be in the output

ocr_parameters: optional object { languages }

OCR configuration including language detection settings

languages: optional array of ParsingLanguages

Languages to use for OCR text recognition. Specify multiple languages if document contains mixed-language content. Order matters - put primary language first. Example: ['en', 'es'] for English with Spanish

Accepts one of the following:
"af"
"az"
"bs"
"cs"
"cy"
"da"
"de"
"en"
"es"
"et"
"fr"
"ga"
"hr"
"hu"
"id"
"is"
"it"
"ku"
"la"
"lt"
"lv"
"mi"
"ms"
"mt"
"nl"
"no"
"oc"
"pi"
"pl"
"pt"
"ro"
"rs_latin"
"sk"
"sl"
"sq"
"sv"
"sw"
"tl"
"tr"
"uz"
"vi"
"ar"
"fa"
"ug"
"ur"
"bn"
"as"
"mni"
"ru"
"rs_cyrillic"
"be"
"bg"
"uk"
"mn"
"abq"
"ady"
"kbd"
"ava"
"dar"
"inh"
"che"
"lbe"
"lez"
"tab"
"tjk"
"hi"
"mr"
"ne"
"bh"
"mai"
"ang"
"bho"
"mah"
"sck"
"new"
"gom"
"sa"
"bgc"
"th"
"ch_sim"
"ch_tra"
"ja"
"ko"
"ta"
"te"
"kn"
specialized_chart_parsing: optional "agentic_plus" or "agentic" or "efficient"

Enable AI-powered chart analysis. Modes: 'efficient' (fast, lower cost), 'agentic' (balanced), 'agentic_plus' (highest accuracy). Automatically enables extract_layout and precise_bounding_box when set

Accepts one of the following:
"agentic_plus"
"agentic"
"efficient"
webhook_configurations: optional array of object { webhook_events, webhook_headers, webhook_url }

Webhook endpoints for job status notifications. Multiple webhooks can be configured for different events or services

webhook_events: optional array of string

Events that trigger this webhook. Options: 'parse.success' (job completed), 'parse.failure' (job failed), 'parse.partial' (some pages failed). If not specified, webhook fires for all events

webhook_headers: optional map[unknown]

Custom HTTP headers to include in webhook requests. Use for authentication tokens or custom routing. Example: {'Authorization': 'Bearer xyz'}

webhook_url: optional string

HTTPS URL to receive webhook POST requests. Must be publicly accessible

Unknown = object { product_type }

Catch-all for configurations without a dedicated typed schema.

Accepts arbitrary JSON fields alongside product_type.

product_type: "unknown"

Product type.

Generate Extraction Schema

curl https://api.cloud.llamaindex.ai/api/v2/extract/schema/generate \
    -H 'Content-Type: application/json' \
    -H "Authorization: Bearer $LLAMA_CLOUD_API_KEY" \
    -d '{}'
{
  "name": "x",
  "parameters": {
    "categories": [
      {
        "name": "x",
        "description": "x"
      }
    ],
    "product_type": "split_v1",
    "splitting_strategy": {
      "allow_uncategorized": "include"
    }
  }
}
Returns Examples
{
  "name": "x",
  "parameters": {
    "categories": [
      {
        "name": "x",
        "description": "x"
      }
    ],
    "product_type": "split_v1",
    "splitting_strategy": {
      "allow_uncategorized": "include"
    }
  }
}