Send Document to Workflow
After uploading a document, you can use this API endpoint to send the document to a Workflow. This guide explains how.
Note that you can send the same document to multiple Workflows, or to the same Workflow multiple times. Each time a document is sent to a workflow a unique request ID will be generated.
API Endpoint
Note that parser and workflow are using interchangeably here
GET https://api.documentpro.ai/v1/documents/{document_id}/run_parser
Path Parameters
document_id
(required): The unique identifier of the document you want to parse.
Query Parameters
template_id
(required): The unique identifier of the Workflow you want to use.use_ocr
(conditional): Must be set totrue
ifquery_model
is "gpt-3.5-turbo" or if using any OCR-related parameters.query_model
(optional): Specifies the AI model for parsing (e.g., "gpt-4o-mini", "gpt-4o").detect_layout
(optional): Set totrue
to detect document layout. Requiresuse_ocr=true
.detect_tables
(optional): Set totrue
to detect tables in the document. Requiresuse_ocr=true
.page_ranges
(optional): Specifies which pages to parse (e.g., "1-3,5,7-9").
Document Segmentation
chunk_by_pages
(optional): An integer specifying how many pages to use in each segment for method 1 segmentation.rolling_window
(optional): An integer specifying the window size for method 2 segmentation.start_regex
(optional): A regex pattern to define where parsing should begin for method 3 segmentation. Requiresuse_ocr=true
.end_regex
(optional): A regex pattern to define where parsing should end for method 3 segmentation. Requiresuse_ocr=true
.split_regex
(optional): A regex pattern to split the document into sections for method 4 segmentation. Requiresuse_ocr=true
.use_all_matches
(optional): Set totrue
to use all regex matches instead of just the first for methods 3 and 4. Requiresuse_ocr=true
.
Headers
x-api-key
(required): Your API key for authentication.Accept
(optional): Specify the desired response format (e.g., "application/json", "application/xml").
Example Implementation
Using cURL
curl --location 'https://api.documentpro.ai/v1/documents/15dadb3e-b177-4069-be85-9711bb8e3ed1/run_parser?template_id=8e9beda9-5cba-42eb-a70a-b3e5eec9120a&use_ocr=true&query_model=gpt-4o&detect_layout=true&detect_tables=true&page_ranges=1&chunk_by_pages=5&start_regex=&end_regex=&split_regex=&use_all_matches=true' \
--header 'x-api-key: YOUR_API_KEY' \
--header 'Accept: application/json'
Using Python
import requests
document_id = "15dadb3e-b177-4069-be85-9711bb8e3ed1"
url = f"https://api.documentpro.ai/v1/documents/{document_id}/run_parser"
headers = {
'x-api-key': 'YOUR_API_KEY',
'Accept': 'application/json'
}
params = {
'template_id': '8e9beda9-5cba-42eb-a70a-b3e5eec9120a',
'use_ocr': 'true',
'query_model': 'gpt-4o',
'detect_layout': 'true',
'detect_tables': 'true',
'page_ranges': '1',
'chunk_by_pages': '5',
'start_regex': '',
'end_regex': '',
'split_regex': '',
'use_all_matches': 'true'
}
response = requests.get(url, headers=headers, params=params)
if response.status_code == 200:
print('Parser run successfully initiated')
print(response.json())
else:
print('Failed to run parser')
print(response.text)
Response
The response will contain information about the parsing job, including a request ID that you can use to check the status of the parsing process.
Successful Response (Status Code: 200)
{
"request_id": "a7813466-6f9a-4c33-8128-427e7a4df755",
"request_status": "pending",
"response_body": {
"file_name": "abcd.pdf",
"file_presigned_url": null,
"user_error_msg": null,
"template_id": "8e9beda9-5cba-42eb-a70a-b3e5eec9120a",
"template_type": "asbestos report",
"template_title": "Acorn",
"num_pages": 1,
"human_verification_status": "pending",
"has_missing_required_fields": false,
"result_json_data": null
},
"created_at": "2024-07-25T16:29:40.102372",
"updated_at": "2024-07-25T16:29:40.191219"
}
Error Response (Status Codes: 400, 401, 403, 404, 500)
{
"success": false,
"error": "error_code",
"message": "descriptive error message"
}
Response Fields Explained
request_id
: Unique identifier for the parsing job. Use this to retrieve results later.request_status
: Can be "pending", "processing", "complete", "exception", or "failure".- "exception" indicates an AI-related error.
- If status is "failed" or "exception",
user_error_msg
will contain a human-readable error message.
file_presigned_url
: URL for downloading the parsed file (only the selected pages). Available when processing is complete.human_verification_status
: Can be "pending", "approved", or "rejected".result_json_data
: Will be populated with the extracted data when processing is completed.
Important Notes
- Only the
template_id
query parameter is required; all others are optional. use_ocr
must be set totrue
if:- The
query_model
is "gpt-3.5-turbo" - You're using any OCR-related parameters (detect_layout, detect_tables, start_regex, end_regex, split_regex, use_all_matches)
- The
- The
page_ranges
parameter does not apply to image files. - Regex parameters are powerful tools for customizing the parsing process. Use them carefully.
- The parsing process is asynchronous. This API call initiates the process but does not return the parsed results directly.
- You can apply multiple parsers to the same document or the same parser multiple times. Each application will generate a unique
request_id
. - For long documents (more than 10 pages), consider using one of the segmentation methods to improve parsing performance.
Next Steps
After initiating a parsing job:
- Retrieve Workflow results once the Workflow is complete.
- If needed, update the Workflow configuration to adjust default settings for future Workflow jobs.
- Consider applying additional Workflows to the same document if you need to extract different types of information.