Skip to content

Get Extract Job

GET/api/v2/extract/{job_id}

Get a single extraction job by ID.

Returns the job status and results when complete. Use expand=configuration to include the full configuration used, and expand=extract_metadata for per-field metadata.

Path ParametersExpand Collapse
job_id: string
Query ParametersExpand Collapse
expand: optional array of string

Additional fields to include: configuration, extract_metadata

organization_id: optional string
project_id: optional string
Cookie ParametersExpand Collapse
session: optional string
ReturnsExpand Collapse
ExtractV2Job = object { id, created_at, document_input_value, 9 more }

An extraction job.

id: string

Unique job identifier (job_id)

created_at: string

Creation timestamp

formatdate-time
document_input_value: string

File ID or parse job ID that was extracted

project_id: string

Project this job belongs to

status: string

Current job status.

  • PENDING — queued, not yet started
  • RUNNING — actively processing
  • COMPLETED — finished successfully
  • FAILED — terminated with an error
  • CANCELLED — cancelled by user
updated_at: string

Last update timestamp

formatdate-time
configuration: optional ExtractConfiguration { data_schema, cite_sources, confidence_scores, 9 more }

Extract configuration combining parse and extract settings.

data_schema: map[map[unknown] or array of unknown or string or 2 more]

JSON Schema defining the fields to extract. Validate with the /schema/validate endpoint first.

Accepts one of the following:
UnionMember0 = map[unknown]
UnionMember1 = array of unknown
UnionMember2 = string
UnionMember3 = number
UnionMember4 = boolean
cite_sources: optional boolean

Include citations in results

confidence_scores: optional boolean

Include confidence scores in results

extract_version: optional string

Extract algorithm version. Use 'latest' or a date string.

extraction_target: optional "per_doc" or "per_page" or "per_table_row"

Granularity of extraction: per_doc returns one object per document, per_page returns one object per page, per_table_row returns one object per table row

Accepts one of the following:
"per_doc"
"per_page"
"per_table_row"
lang: optional string

ISO 639-1 language code for the document

max_pages: optional number

Maximum number of pages to process. Omit for no limit.

minimum1
parse_config_id: optional string

Saved parse configuration ID to control how the document is parsed before extraction

parse_tier: optional string

Parse tier to use before extraction (fast, cost_effective, or agentic)

system_prompt: optional string

Custom system prompt to guide extraction behavior

target_pages: optional string

Comma-separated page numbers or ranges to process (1-based). Omit to process all pages.

tier: optional "cost_effective" or "agentic"

Extract tier: cost_effective (5 credits/page) or agentic (15 credits/page)

Accepts one of the following:
"cost_effective"
"agentic"
configuration_id: optional string

Saved extract configuration ID used for this job, if any

error_message: optional string

Error details when status is FAILED

extract_metadata: optional ExtractJobMetadata { field_metadata, parse_job_id, parse_tier }

Extraction metadata.

field_metadata: optional ExtractedFieldMetadata { document_metadata, page_metadata, row_metadata }

Metadata for extracted fields including document, page, and row level info.

document_metadata: optional map[map[unknown] or array of unknown or string or 2 more]

Document-level metadata (citations, confidence) keyed by field name

Accepts one of the following:
UnionMember0 = map[unknown]
UnionMember1 = array of unknown
UnionMember2 = string
UnionMember3 = number
UnionMember4 = boolean
page_metadata: optional array of map[map[unknown] or array of unknown or string or 2 more]

Per-page metadata when extraction_target is per_page

Accepts one of the following:
UnionMember0 = map[unknown]
UnionMember1 = array of unknown
UnionMember2 = string
UnionMember3 = number
UnionMember4 = boolean
row_metadata: optional array of map[map[unknown] or array of unknown or string or 2 more]

Per-row metadata when extraction_target is per_table_row

Accepts one of the following:
UnionMember0 = map[unknown]
UnionMember1 = array of unknown
UnionMember2 = string
UnionMember3 = number
UnionMember4 = boolean
parse_job_id: optional string

Reference to the ParseJob ID used for parsing

parse_tier: optional string

Parse tier used for parsing the document

extract_result: optional map[map[unknown] or array of unknown or string or 2 more] or array of map[map[unknown] or array of unknown or string or 2 more]

Extracted data conforming to the data_schema. Returns a single object for per_doc, or an array for per_page / per_table_row.

Accepts one of the following:
UnionMember0 = map[map[unknown] or array of unknown or string or 2 more]
Accepts one of the following:
UnionMember0 = map[unknown]
UnionMember1 = array of unknown
UnionMember2 = string
UnionMember3 = number
UnionMember4 = boolean
UnionMember1 = array of map[map[unknown] or array of unknown or string or 2 more]
Accepts one of the following:
UnionMember0 = map[unknown]
UnionMember1 = array of unknown
UnionMember2 = string
UnionMember3 = number
UnionMember4 = boolean
metadata: optional object { usage }

Job-level metadata.

usage: optional ExtractJobUsage { num_document_tokens, num_output_tokens, num_pages_extracted }

Extraction usage metrics.

num_document_tokens: optional number

Number of document tokens

num_output_tokens: optional number

Number of output tokens

num_pages_extracted: optional number

Number of pages extracted

Get Extract Job

curl https://api.cloud.llamaindex.ai/api/v2/extract/$JOB_ID \
    -H "Authorization: Bearer $LLAMA_CLOUD_API_KEY"
{
  "id": "ext-aaaaaaaa-bbbb-cccc-dddd-eeeeeeeeeeee",
  "created_at": "2019-12-27T18:11:19.117Z",
  "document_input_value": "dfl-aaaaaaaa-bbbb-cccc-dddd-eeeeeeeeeeee",
  "project_id": "prj-aaaaaaaa-bbbb-cccc-dddd-eeeeeeeeeeee",
  "status": "COMPLETED",
  "updated_at": "2019-12-27T18:11:19.117Z",
  "configuration": {
    "data_schema": {
      "foo": {
        "foo": "bar"
      }
    },
    "cite_sources": true,
    "confidence_scores": true,
    "extract_version": "latest",
    "extraction_target": "per_doc",
    "lang": "en",
    "max_pages": 10,
    "parse_config_id": "cfg-11111111-2222-3333-4444-555555555555",
    "parse_tier": "fast",
    "system_prompt": "Extract all monetary values in USD. If a currency is not specified, assume USD.",
    "target_pages": "1,3,5-7",
    "tier": "cost_effective"
  },
  "configuration_id": "cfg-11111111-2222-3333-4444-555555555555",
  "error_message": "error_message",
  "extract_metadata": {
    "field_metadata": {
      "document_metadata": {
        "foo": {
          "foo": "bar"
        }
      },
      "page_metadata": [
        {
          "foo": {
            "foo": "bar"
          }
        }
      ],
      "row_metadata": [
        {
          "foo": {
            "foo": "bar"
          }
        }
      ]
    },
    "parse_job_id": "parse_job_id",
    "parse_tier": "parse_tier"
  },
  "extract_result": {
    "foo": {
      "foo": "bar"
    }
  },
  "metadata": {
    "usage": {
      "num_document_tokens": 0,
      "num_output_tokens": 0,
      "num_pages_extracted": 0
    }
  }
}
Returns Examples
{
  "id": "ext-aaaaaaaa-bbbb-cccc-dddd-eeeeeeeeeeee",
  "created_at": "2019-12-27T18:11:19.117Z",
  "document_input_value": "dfl-aaaaaaaa-bbbb-cccc-dddd-eeeeeeeeeeee",
  "project_id": "prj-aaaaaaaa-bbbb-cccc-dddd-eeeeeeeeeeee",
  "status": "COMPLETED",
  "updated_at": "2019-12-27T18:11:19.117Z",
  "configuration": {
    "data_schema": {
      "foo": {
        "foo": "bar"
      }
    },
    "cite_sources": true,
    "confidence_scores": true,
    "extract_version": "latest",
    "extraction_target": "per_doc",
    "lang": "en",
    "max_pages": 10,
    "parse_config_id": "cfg-11111111-2222-3333-4444-555555555555",
    "parse_tier": "fast",
    "system_prompt": "Extract all monetary values in USD. If a currency is not specified, assume USD.",
    "target_pages": "1,3,5-7",
    "tier": "cost_effective"
  },
  "configuration_id": "cfg-11111111-2222-3333-4444-555555555555",
  "error_message": "error_message",
  "extract_metadata": {
    "field_metadata": {
      "document_metadata": {
        "foo": {
          "foo": "bar"
        }
      },
      "page_metadata": [
        {
          "foo": {
            "foo": "bar"
          }
        }
      ],
      "row_metadata": [
        {
          "foo": {
            "foo": "bar"
          }
        }
      ]
    },
    "parse_job_id": "parse_job_id",
    "parse_tier": "parse_tier"
  },
  "extract_result": {
    "foo": {
      "foo": "bar"
    }
  },
  "metadata": {
    "usage": {
      "num_document_tokens": 0,
      "num_output_tokens": 0,
      "num_pages_extracted": 0
    }
  }
}