Layout Extraction
LlamaParse supports layout extraction. This can be useful if you want to be able to reconstitute the original look of the document by putting things back in their original places.
If you set extract_layout=True on the API and request JSON output it will include bounding boxes for the following types:
- tables
- figures
- titles
- text
- lists
The layout data is returned in the JSON data, as a layout property attached to each page.
Each layout entry contains:
- A
bboxexpressed as a fraction of page width and height (a number between 0 and 1) - An
imagename corresponding to an image of the element. This can be retrieved with the image API just like other images. - A
confidencescore (for 0 to 1, 1 mean good) - A
labelindicating the type of element isLikelyNoise, set totrueif our NMS detects that the element is likely to be noise.
Ignore document elements for layout detection
Section titled “Ignore document elements for layout detection”By default the layout extraction is aligned on the underlying bbox of element we extract form the document. If this is causing issue it is possible to deactivate this alignment by setting ignore_document_elements_for_layout_detection=true.
Example
Section titled “Example”{ "bbox": { "x": 0.176, "y": 0.497, "w": 0.651, "h": 0.112 }, "image": "page_1_text_1.jpg", "confidence": 0.996, "label": "text", "isLikelyNoise": false},Layout extraction costs 1 extra credit per page.