Skip to main content

Get Document Data from Workflow

Once a document has been processed by the Workflow, you can retrieve the results using a GET request with the request_id of the document. This guide explains how to use the API to fetch parsing results.

API Endpoint

GET https://api.documentpro.ai/files

Query Parameters

  • request_id (required): The unique identifier of the workflow job you want to retrieve results for.

Headers

  • x-api-key (required): Your API key for authentication.

Example Implementation

Using Python

import requests

api_key = 'YOUR_API_KEY'
request_id = 'YOUR_REQUEST_ID'

url = f"https://api.documentpro.ai/files"

headers = {
'x-api-key': api_key
}

params = {
'request_id': request_id
}

response = requests.get(url, headers=headers, params=params)

if response.status_code == 200:
print('Results retrieved successfully')
print(response.json())
else:
print('Failed to retrieve results')
print(response.text)

Response

The response will contain information about the workflow job and its results.

Successful Response (Status Code: 200)

{
"request_id": "a7813466-6f9a-4c33-8128-427e7a4df755",
"request_status": "completed",
"response_body": {
"file_name": "Q2_Financial_Report_2024.pdf",
"file_presigned_url": "https://documentpro-parsed-files.s3.amazonaws.com/Q2_Financial_Report_2024_parsed.pdf?X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Credential=...",
"user_error_msg": null,
"template_id": "8e9beda9-5cba-42eb-a70a-b3e5eec9120a",
"template_type": "financial_report",
"template_title": "Quarterly Financial Report Parser",
"num_pages": 15,
"human_verification_status": "approved",
"has_missing_required_fields": false,
"result_json_data": {
"company_name": "TechCorp Innovations Inc.",
"report_period": "Q2 2024",
"financial_highlights": {
"total_revenue": 1250000,
"net_income": 450000,
"earnings_per_share": 2.25,
"operating_cash_flow": 550000
},
"balance_sheet_summary": {
"total_assets": 10000000,
"total_liabilities": 4000000,
"total_equity": 6000000
},
"key_ratios": {
"gross_margin": 0.45,
"operating_margin": 0.22,
"return_on_equity": 0.075,
"debt_to_equity": 0.67
},
"segment_performance": [
{
"segment_name": "Software Solutions",
"revenue": 750000,
"operating_income": 225000
},
{
"segment_name": "Hardware Products",
"revenue": 500000,
"operating_income": 150000
}
],
"risk_factors": [
"Intense market competition",
"Rapid technological changes",
"Global economic uncertainties"
]
}
},
"created_at": "2024-07-25T14:30:10.696893",
"updated_at": "2024-07-25T14:30:29.565249"
}

Error Response (Status Codes: 400, 401, 403, 404, 500)

{
"success": false,
"error": "error_code",
"message": "descriptive error message"
}

Response Fields Explained

  • request_id: Unique identifier for the parsing job.
  • request_status: Current status of the parsing job. Possible values are:
    • "pending": The document has not started the parsing process
    • "processing": The document is being parsed
    • "completed": The document has been parsed successfully
    • "failed": Workflow failed due to an application or document error
    • "exception": Workflow failed. These are retryable requests
  • file_name: Name of the original document file.
  • file_presigned_url: Temporary URL to download the parsed document (if available).
  • user_error_msg: Contains a human-readable error message if status is "failed" or "exception".
  • template_id: Unique identifier of the Workflow used.
  • template_type and template_title: Type and title of the Workflow used.
  • num_pages: Number of pages in the document.
  • human_verification_status: Can be "pending", "approved", or "rejected".
  • has_missing_required_fields: Indicates if any required fields were not extracted.
  • result_json_data: Contains the extracted data when parsing is completed.

Important Notes

  1. The file_presigned_url is temporary and will expire after a certain period.
  2. If request_status is "pending" or "processing", result_json_data will be null.
  3. The structure of result_json_data depends on the Workflow used and the document type.
  4. Always check the request_status before attempting to use the parsed data.
  5. If status is "failed" or "exception", check the user_error_msg for more information.

Next Steps

After retrieving the parsing results:

  1. If the status is "completed", you can use the extracted data in result_json_data for your application.
  2. If the status is "pending" or "processing", wait and retry the request after a short delay.
  3. If the status is "failed" or "exception", check the user_error_msg and consider resubmitting the document for parsing.
  4. You may want to download the parsed document using the file_presigned_url if available.