Create Batch Job
Create a batch processing job.
Processes files from a directory or a specific list of item IDs. Supports batch parsing and classification operations.
Provide either directory_id to process all files in a directory,
or item_ids for specific items. The job runs asynchronously —
poll GET /batch/{job_id} for progress.
Query ParametersExpand Collapse
Header ParametersExpand Collapse
Cookie ParametersExpand Collapse
Body ParametersJSONExpand Collapse
job_config: object { correlation_id, job_name, parameters, 6 more } or ClassifyJob { id, project_id, rules, 9 more }
Job configuration — either a parse or classify config
BatchParseJobRecordCreate = object { correlation_id, job_name, parameters, 6 more }
Batch-specific parse job record for batch processing.
This model contains the metadata and configuration for a batch parse job, but excludes file-specific information. It's used as input to the batch parent workflow and combined with DirectoryFile data to create full ParseJobRecordCreate instances for each file.
Attributes: job_name: Must be PARSE_RAW_FILE partitions: Partitions for job output location parameters: Generic parse configuration (BatchParseJobConfig) session_id: Upstream request ID for tracking correlation_id: Correlation ID for cross-service tracking parent_job_execution_id: Parent job execution ID if nested user_id: User who created the job project_id: Project this job belongs to webhook_url: Optional webhook URL for job completion notifications
correlation_id: optional string
The correlation ID for this job. Used for tracking the job across services.
parameters: optional object { adaptive_long_table, aggressive_table_extraction, annotate_links, 122 more }
Generic parse job configuration for batch processing.
This model contains the parsing configuration that applies to all files in a batch, but excludes file-specific fields like file_name, file_id, etc. Those file-specific fields are populated from DirectoryFile data when creating individual ParseJobRecordCreate instances for each file.
The fields in this model should be generic settings that apply uniformly to all files being processed in the batch.
custom_metadata: optional map[unknown]
The custom metadata to attach to the documents.
images_to_save: optional array of "screenshot" or "embedded" or "layout"
input_s3_region: optional string
The region for the input S3 bucket.
lang: optional string
The language.
output_s3_path_prefix: optional string
If specified, llamaParse will save the output to the specified path. All output file will use this 'prefix' should be a valid s3:// url
output_s3_region: optional string
The region for the output S3 bucket.
outputBucket: optional string
The output bucket.
Enum for representing the mode of parsing to be used.
pipeline_id: optional string
The pipeline ID.
priority: optional "low" or "medium" or "high" or "critical"
The priority for the request. This field may be ignored or overwritten depending on the organization tier.
Enum for representing the different available page error handling modes.
resource_info: optional map[unknown]
The resource info about the file
webhook_configurations: optional array of object { webhook_events, webhook_headers, webhook_output_format, webhook_url }
Outbound webhook endpoints to notify on job status changes
webhook_events: optional array of "extract.pending" or "extract.success" or "extract.error" or 14 more
Events to subscribe to (e.g. 'parse.success', 'extract.error'). If null, all events are delivered.
webhook_headers: optional map[string]
Custom HTTP headers sent with each webhook request (e.g. auth tokens)
webhook_output_format: optional string
Response format sent to the webhook: 'string' (default) or 'json'
webhook_url: optional string
URL to receive webhook POST notifications
parent_job_execution_id: optional string
The ID of the parent job execution.
partitions: optional map[string]
The partitions for this execution. Used for determining where to save job output.
project_id: optional string
The ID of the project this job belongs to.
session_id: optional string
The upstream request ID that created this job. Used for tracking the job across services.
user_id: optional string
The ID of the user that created this job
webhook_url: optional string
The URL that needs to be called at the end of the parsing job.
ClassifyJob = object { id, project_id, rules, 9 more }
A classify job.
id: string
Unique identifier
project_id: string
The ID of the project
The rules to classify the files
description: string
Natural language description of what to classify. Be specific about the content characteristics that identify this document type.
type: string
The document type to assign when this rule matches (e.g., 'invoice', 'receipt', 'contract')
The status of the classify job
user_id: string
The ID of the user
created_at: optional string
Creation datetime
error_message: optional string
Error message for the latest job attempt, if any.
job_record_id: optional string
The job record ID associated with this status, if any.
mode: optional "FAST" or "MULTIMODAL"
The classification mode to use
The configuration for the parsing job
The language to parse the files in
max_pages: optional number
The maximum number of pages to parse
target_pages: optional array of number
The pages to target for parsing (0-indexed, so first page is at 0)
updated_at: optional string
Update datetime
continue_as_new_threshold: optional number
Maximum files to process per execution cycle in directory mode. Defaults to page_size.
directory_id: optional string
ID of the directory containing files to process
item_ids: optional array of string
List of specific item IDs to process. Either this or directory_id must be provided.
page_size: optional number
Number of files to process per batch when using directory mode
ReturnsExpand Collapse
id: string
Unique identifier for the batch job
job_type: "parse" or "extract" or "classify"
Type of processing operation (parse or classify)
project_id: string
Project this job belongs to
status: "pending" or "running" or "dispatched" or 3 more
Current job status
total_items: number
Total number of items in the job
completed_at: optional string
Timestamp when job completed
created_at: optional string
Creation datetime
directory_id: optional string
Directory being processed
error_message: optional string
Error message for the latest job attempt, if any.
failed_items: optional number
Number of items that failed processing
job_record_id: optional string
The job record ID associated with this status, if any.
processed_items: optional number
Number of items processed so far
skipped_items: optional number
Number of items skipped (already processed or size limit)
started_at: optional string
Timestamp when job processing started
updated_at: optional string
Update datetime
workflow_id: optional string
Async job tracking ID
Create Batch Job
curl https://api.cloud.llamaindex.ai/api/v1/beta/batch-processing \
-H 'Content-Type: application/json' \
-H "Authorization: Bearer $LLAMA_CLOUD_API_KEY" \
-d '{
"job_config": {}
}'{
"id": "bjb-aaaaaaaa-bbbb-cccc-dddd-eeeeeeeeeeee",
"job_type": "parse",
"project_id": "proj-aaaaaaaa-bbbb-cccc-dddd-eeeeeeeeeeee",
"status": "pending",
"total_items": 0,
"completed_at": "2019-12-27T18:11:19.117Z",
"created_at": "2019-12-27T18:11:19.117Z",
"directory_id": "dir-aaaaaaaa-bbbb-cccc-dddd-eeeeeeeeeeee",
"effective_at": "2019-12-27T18:11:19.117Z",
"error_message": "error_message",
"failed_items": 0,
"job_record_id": "job_record_id",
"processed_items": 0,
"skipped_items": 0,
"started_at": "2019-12-27T18:11:19.117Z",
"updated_at": "2019-12-27T18:11:19.117Z",
"workflow_id": "workflow_id"
}Returns Examples
{
"id": "bjb-aaaaaaaa-bbbb-cccc-dddd-eeeeeeeeeeee",
"job_type": "parse",
"project_id": "proj-aaaaaaaa-bbbb-cccc-dddd-eeeeeeeeeeee",
"status": "pending",
"total_items": 0,
"completed_at": "2019-12-27T18:11:19.117Z",
"created_at": "2019-12-27T18:11:19.117Z",
"directory_id": "dir-aaaaaaaa-bbbb-cccc-dddd-eeeeeeeeeeee",
"effective_at": "2019-12-27T18:11:19.117Z",
"error_message": "error_message",
"failed_items": 0,
"job_record_id": "job_record_id",
"processed_items": 0,
"skipped_items": 0,
"started_at": "2019-12-27T18:11:19.117Z",
"updated_at": "2019-12-27T18:11:19.117Z",
"workflow_id": "workflow_id"
}