Metadata
In JSON mode, LlamaParse will return a data structure representing the parsed object. This is useful for further processing or analysis.
To use this mode, set the result type to “json”.
curl -X 'POST' \ 'https://api.cloud.llamaindex.ai/api/v1/parsing/job/<job_id>/result/json' \ -H 'accept: application/json' \ -H 'Content-Type: multipart/form-data' \ -H "Authorization: Bearer $LLAMA_CLOUD_API_KEY"Result format
Section titled “Result format”{ "pages": [ ..page objects.. ], "job_metadata": { "job_pages": int, "job_is_cache_hit": boolean }}Page objects
Section titled “Page objects”Within page objects, the following keys may be present depending on your document.
page: The page number of the document.text: The text extracted from the page.md: The markdown version of the extracted text.images: Any images extracted from the page.items: An array ofheading,textandtableobjects in the order they appear on the page.
Retrieving images
Section titled “Retrieving images”Images are returned as an array of image objects, of the form:
{ "name": "img_p2_5.png", "height": 718, "width": 251}You can retrieve the image extracted directly using the value of the name, like this:
curl -X 'POST' \ 'https://api.cloud.llamaindex.ai/api/v1/parsing/job/<job_id>/result/image/<name>' \ -H 'accept: application/json' \ -H 'Content-Type: multipart/form-data' \ -H "Authorization: Bearer $LLAMA_CLOUD_API_KEY" \ --output "file.png"Note the additional --output argument to curl to get the binary saved to a file.
Slide speaker notes
Section titled “Slide speaker notes”For certain presentation formats, if a slide contains speaker notes, the speaker notes will be extracted and returned in the slideSpeakerNotes entry for the page:
[ { "page": 1, "text": "Hello\nSlide with notes", "slideSpeakerNotes": "This is a speaker note", ... },...Currently supported formats for slide speaker notes extraction:
.pptx(PowerPoint 2007+)
For these formats, slide speaker notes extraction is supported in all parse modes except parse_page_with_lvm.