Create Workflow
Read the introduction to parsers to understand Parsers.
You can use the REST API to create a Workflow in DocumentPro. Once you've created a Workflow, you can upload documents to the Workflow for document parsing, post processing and export.
Guide to creating a Workflow
Note that Workflows and Templates are names used interchangeably in DocumentPro.
API Endpoint
POST https://api.documentpro.ai/v1/templates
Required Fields
When creating a Workflow, the following fields are required:
template_title
(string): A unique title for your Workflow within your account.template_type
(string): Describes the type of document (e.g., "Invoice", "Purchase Order", "W-8 BEN") the workflow handles.template_schema
(object): Defines the structure of data to be extracted. See below for details.
All other fields are optional.
Example Implementation using Python
import requests
import json
url = "https://api.documentpro.ai/v1/templates"
payload = {
"template_title": "Invoice Processing Workflow",
"template_type": "invoice",
"template_schema": {
"fields": [
{
"name": "buyer_name",
"type": "text",
"description": "name of the buyer"
},
{
"name": "total_amount",
"type": "number"
},
{
"name": "line_items",
"type": "table",
"description": "invoice line items",
"subFields": [
{
"name": "description",
"type": "text",
"description": "description of item"
},
{
"name": "subtotal",
"type": "number"
}
]
}
]
},
"webhook_url": "https://your-application.com/documentpro-webhook",
"parser_config": {
"date_format": "%Y-%m-%d",
"ocr_config": {
"use_ocr": true,
"detect_layout": true,
"detect_tables": true
},
"query_config": {
"query_model": "gpt-4o-mini",
"page_ranges": "1-3"
}
}
}
headers = {
'x-api-key': 'YOUR_API_KEY',
'Content-Type': 'application/json'
}
response = requests.post(url, headers=headers, json=payload)
# If the request was successful, status_code will be 200
if response.status_code == 200:
print('Workflow created successfully')
print(json.dumps(response.json(), indent=2))
else:
print('Failed to create Workflow')
print(response.text)
Workflow response
A 200
status code will have the following body structure.
{
"template_id": "710a20fc-e280-43eb-9a9f-5436e600c710",
"template_title": "Invoice Processing Workflow",
"template_type": "invoice",
"template_schema": {
"fields": [
{
"name": "buyer_name",
"type": "text",
"description": "name of the buyer"
},
{
"name": "total_amount",
"type": "number"
},
{
"name": "line_items",
"type": "table",
"description": "invoice line items",
"subFields": [
{
"name": "description",
"type": "text",
"description": "description of item"
},
{
"name": "subtotal",
"type": "number"
}
]
}
]
},
"email_id": "invoice_processing_workflow_fbb499@inbox.documentpro-ai.com",
"webhook_url": "https://your-application.com/documentpro-webhook",
"parser_config": {
"parse_email_attachments": true,
"parse_email_body": false,
"date_format": "%Y-%m-%d",
"ocr_config": {
"use_ocr": true,
"detect_layout": true,
"detect_tables": true,
"remove_headers": false,
"remove_footers": false,
"remove_tables": false
},
"query_config": {
"query_model": "gpt-4o-mini",
"page_ranges": "1-3"
}
},
"created_at": "2023-11-15T12:13:12.056281"
}
The template_id
is the unique id for the Workflow that can be used when uploading documents against the Workflow. template_id
and workflow_id
are used interchangeably in DocumentPro.
For status codes 400
, 404
and 500
you will get the following response body.
{
"success": false,
"error": "error code",
"message": "descriptive error message"
}
Request attributes
template_title
(required)
The title of the Workflow must be unique to your account.
template_type
(required)
The type of the Workflow should describe the type of document it is. E.g. an invoice, purchase order, w-8 ben etc.
template_schema
(required)
The template_schema
is the query language used by DocumentPro to extract information from your document. You use the query language to describe the fields, tables and lists that you want to DocumentPro to capture.
A field can have the following attributes
Attributes | Type | Required | Constraints | Description |
---|---|---|---|---|
name | string | Yes | 50 characters, only lower case letters, numbers or underscore | Name of the field. It should describe the information you want the AI to retrieve e.g invoice_number |
description | string | No | 150 characters | A description of the field to help the AI interpret the field name better. You can also add data formats e.g '9 alphanumeric characters unique identifier' |
type | enum, string | Yes | text, number, date, table | The type of the field as mentioned in the constraint. |
required | boolean | No | true, false | DocumentPro will flag this field as missing if it is required but not extracted |
subFields | array | Conditional | Required if type is table | subFields is a list of fields as shown below |
At least one subField is required if the parent field is a table
. The attributes are:
Attributes | Type | Required | Constraints | Description |
---|---|---|---|---|
name | string | Yes | 50 characters, only lower case letters, numbers or underscore | Name of the field. It should describe the information you want the AI to retrieve e.g invoice_number |
description | string | No | 150 characters | A description of the field to help the AI interpret the field name better. You can also add data formats e.g '9 alphanumeric characters unique identifier' |
type | enum, string | Yes | text, number, date | The type of the field as mentioned in the constraint. |
required | boolean | No | true, false | DocumentPro will flag this field as missing if it is required but not extracted |
webhook_url
(optional)
You can set a webhook url to your server or to a third-party platform like Zapier to get parsed data from DocumentPro in real-time.
parser_config
(optional)
Parser config holds settings that allow you to change how the document parser processes your document. It includes:
parse_email_attachments
: Boolean to determine if email attachments should be parsed (default: true).parse_email_body
: Boolean to determine if the email body should be parsed (default: false).date_format
: Specifies the format for parsing date fields (e.g., "%Y-%m-%d" for YYYY-MM-DD).ocr_config
: Controls the OCR process.query_config
: Configures the AI query process.
OCR Config Options
use_ocr
: Boolean to determine if OCR should be used (default: false).detect_layout
: Boolean to determine if document layout should be detected (default: true).detect_tables
: Boolean to determine if tables should be detected (default: false).remove_headers
: Boolean to remove headers before parsing (default: false).remove_footers
: Boolean to remove footers before parsing (default: false).remove_tables
: Boolean to remove tables before parsing (default: false).
Query Config Options
query_model
: Specifies the AI model to use for parsing (e.g., "gpt-4o-mini", "gpt-4o").page_ranges
: Optional string to specify which pages to process (e.g., "1-3,5,7-9").start_regex
: Optional regex pattern to define where parsing should begin.end_regex
: Optional regex pattern to define where parsing should end.split_regex
: Optional regex pattern to split the document into sections.use_all_matches
: Boolean to determine if all matches should be used (default: false).
Remember, while these configurations provide powerful customization options, they are all optional. You can create a functional Workflow using only the required fields: template_title
, template_type
, and template_schema
.