Skip to main content

Send Document to Workflow

After uploading a document, you can use this API endpoint to send the document to a Workflow. This guide explains how.

Note that you can send the same document to multiple Workflows, or to the same Workflow multiple times. Each time a document is sent to a workflow a unique request ID will be generated.

API Endpoint

Note that parser and workflow are using interchangeably here

GET https://api.documentpro.ai/v1/documents/{document_id}/run_parser

Path Parameters

  • document_id (required): The unique identifier of the document you want to parse.

Query Parameters

  • template_id (required): The unique identifier of the Workflow you want to use.
  • use_ocr (conditional): Must be set to true if query_model is "gpt-3.5-turbo" or if using any OCR-related parameters.
  • query_model (optional): Specifies the AI model for parsing (e.g., "gpt-4o-mini", "gpt-4o").
  • detect_layout (optional): Set to true to detect document layout. Requires use_ocr=true.
  • detect_tables (optional): Set to true to detect tables in the document. Requires use_ocr=true.
  • page_ranges (optional): Specifies which pages to parse (e.g., "1-3,5,7-9").

Document Segmentation

  • chunk_by_pages (optional): An integer specifying how many pages to use in each segment for method 1 segmentation.
  • rolling_window (optional): An integer specifying the window size for method 2 segmentation.
  • start_regex (optional): A regex pattern to define where parsing should begin for method 3 segmentation. Requires use_ocr=true.
  • end_regex (optional): A regex pattern to define where parsing should end for method 3 segmentation. Requires use_ocr=true.
  • split_regex (optional): A regex pattern to split the document into sections for method 4 segmentation. Requires use_ocr=true.
  • use_all_matches (optional): Set to true to use all regex matches instead of just the first for methods 3 and 4. Requires use_ocr=true.

Headers

  • x-api-key (required): Your API key for authentication.
  • Accept (optional): Specify the desired response format (e.g., "application/json", "application/xml").

Example Implementation

Using cURL

curl --location 'https://api.documentpro.ai/v1/documents/15dadb3e-b177-4069-be85-9711bb8e3ed1/run_parser?template_id=8e9beda9-5cba-42eb-a70a-b3e5eec9120a&use_ocr=true&query_model=gpt-4o&detect_layout=true&detect_tables=true&page_ranges=1&chunk_by_pages=5&start_regex=&end_regex=&split_regex=&use_all_matches=true' \
--header 'x-api-key: YOUR_API_KEY' \
--header 'Accept: application/json'

Using Python

import requests

document_id = "15dadb3e-b177-4069-be85-9711bb8e3ed1"
url = f"https://api.documentpro.ai/v1/documents/{document_id}/run_parser"

headers = {
'x-api-key': 'YOUR_API_KEY',
'Accept': 'application/json'
}

params = {
'template_id': '8e9beda9-5cba-42eb-a70a-b3e5eec9120a',
'use_ocr': 'true',
'query_model': 'gpt-4o',
'detect_layout': 'true',
'detect_tables': 'true',
'page_ranges': '1',
'chunk_by_pages': '5',
'start_regex': '',
'end_regex': '',
'split_regex': '',
'use_all_matches': 'true'
}

response = requests.get(url, headers=headers, params=params)

if response.status_code == 200:
print('Parser run successfully initiated')
print(response.json())
else:
print('Failed to run parser')
print(response.text)

Response

The response will contain information about the parsing job, including a request ID that you can use to check the status of the parsing process.

Successful Response (Status Code: 200)

{
"request_id": "a7813466-6f9a-4c33-8128-427e7a4df755",
"request_status": "pending",
"response_body": {
"file_name": "abcd.pdf",
"file_presigned_url": null,
"user_error_msg": null,
"template_id": "8e9beda9-5cba-42eb-a70a-b3e5eec9120a",
"template_type": "asbestos report",
"template_title": "Acorn",
"num_pages": 1,
"human_verification_status": "pending",
"has_missing_required_fields": false,
"result_json_data": null
},
"created_at": "2024-07-25T16:29:40.102372",
"updated_at": "2024-07-25T16:29:40.191219"
}

Error Response (Status Codes: 400, 401, 403, 404, 500)

{
"success": false,
"error": "error_code",
"message": "descriptive error message"
}

Response Fields Explained

  • request_id: Unique identifier for the parsing job. Use this to retrieve results later.
  • request_status: Can be "pending", "processing", "complete", "exception", or "failure".
    • "exception" indicates an AI-related error.
    • If status is "failed" or "exception", user_error_msg will contain a human-readable error message.
  • file_presigned_url: URL for downloading the parsed file (only the selected pages). Available when processing is complete.
  • human_verification_status: Can be "pending", "approved", or "rejected".
  • result_json_data: Will be populated with the extracted data when processing is completed.

Important Notes

  1. Only the template_id query parameter is required; all others are optional.
  2. use_ocr must be set to true if:
    • The query_model is "gpt-3.5-turbo"
    • You're using any OCR-related parameters (detect_layout, detect_tables, start_regex, end_regex, split_regex, use_all_matches)
  3. The page_ranges parameter does not apply to image files.
  4. Regex parameters are powerful tools for customizing the parsing process. Use them carefully.
  5. The parsing process is asynchronous. This API call initiates the process but does not return the parsed results directly.
  6. You can apply multiple parsers to the same document or the same parser multiple times. Each application will generate a unique request_id.
  7. For long documents (more than 10 pages), consider using one of the segmentation methods to improve parsing performance.

Next Steps

After initiating a parsing job:

  1. Retrieve Workflow results once the Workflow is complete.
  2. If needed, update the Workflow configuration to adjust default settings for future Workflow jobs.
  3. Consider applying additional Workflows to the same document if you need to extract different types of information.