Skip to main content

Create Workflow

Read the introduction to parsers to understand Parsers.

You can use the REST API to create a Workflow in DocumentPro. Once you've created a Workflow, you can upload documents to the Workflow for document parsing, post processing and export.

Guide to creating a Workflow

Note that Workflows and Templates are names used interchangeably in DocumentPro.

API Endpoint

POST https://api.documentpro.ai/v1/templates

Required Fields

When creating a Workflow, the following fields are required:

  1. template_title (string): A unique title for your Workflow within your account.
  2. template_type (string): Describes the type of document (e.g., "Invoice", "Purchase Order", "W-8 BEN") the workflow handles.
  3. template_schema (object): Defines the structure of data to be extracted. See below for details.

All other fields are optional.

Example Implementation using Python

import requests
import json

url = "https://api.documentpro.ai/v1/templates"

payload = {
"template_title": "Invoice Processing Workflow",
"template_type": "invoice",
"template_schema": {
"fields": [
{
"name": "buyer_name",
"type": "text",
"description": "name of the buyer"
},
{
"name": "total_amount",
"type": "number"
},
{
"name": "line_items",
"type": "table",
"description": "invoice line items",
"subFields": [
{
"name": "description",
"type": "text",
"description": "description of item"
},
{
"name": "subtotal",
"type": "number"
}
]
}
]
},
"webhook_url": "https://your-application.com/documentpro-webhook",
"parser_config": {
"date_format": "%Y-%m-%d",
"ocr_config": {
"use_ocr": true,
"detect_layout": true,
"detect_tables": true
},
"query_config": {
"query_model": "gpt-4o-mini",
"page_ranges": "1-3"
}
}
}

headers = {
'x-api-key': 'YOUR_API_KEY',
'Content-Type': 'application/json'
}

response = requests.post(url, headers=headers, json=payload)

# If the request was successful, status_code will be 200
if response.status_code == 200:
print('Workflow created successfully')
print(json.dumps(response.json(), indent=2))
else:
print('Failed to create Workflow')
print(response.text)

Workflow response

A 200 status code will have the following body structure.

{
"template_id": "710a20fc-e280-43eb-9a9f-5436e600c710",
"template_title": "Invoice Processing Workflow",
"template_type": "invoice",
"template_schema": {
"fields": [
{
"name": "buyer_name",
"type": "text",
"description": "name of the buyer"
},
{
"name": "total_amount",
"type": "number"
},
{
"name": "line_items",
"type": "table",
"description": "invoice line items",
"subFields": [
{
"name": "description",
"type": "text",
"description": "description of item"
},
{
"name": "subtotal",
"type": "number"
}
]
}
]
},
"email_id": "invoice_processing_workflow_fbb499@inbox.documentpro-ai.com",
"webhook_url": "https://your-application.com/documentpro-webhook",
"parser_config": {
"parse_email_attachments": true,
"parse_email_body": false,
"date_format": "%Y-%m-%d",
"ocr_config": {
"use_ocr": true,
"detect_layout": true,
"detect_tables": true,
"remove_headers": false,
"remove_footers": false,
"remove_tables": false
},
"query_config": {
"query_model": "gpt-4o-mini",
"page_ranges": "1-3"
}
},
"created_at": "2023-11-15T12:13:12.056281"
}

The template_id is the unique id for the Workflow that can be used when uploading documents against the Workflow. template_id and workflow_id are used interchangeably in DocumentPro.

For status codes 400, 404 and 500 you will get the following response body.

{
"success": false,
"error": "error code",
"message": "descriptive error message"
}

Request attributes

template_title (required)

The title of the Workflow must be unique to your account.

template_type (required)

The type of the Workflow should describe the type of document it is. E.g. an invoice, purchase order, w-8 ben etc.

template_schema (required)

The template_schema is the query language used by DocumentPro to extract information from your document. You use the query language to describe the fields, tables and lists that you want to DocumentPro to capture.

A field can have the following attributes

AttributesTypeRequiredConstraintsDescription
namestringYes50 characters, only lower case letters, numbers or underscoreName of the field. It should describe the information you want the AI to retrieve e.g invoice_number
descriptionstringNo150 charactersA description of the field to help the AI interpret the field name better. You can also add data formats e.g '9 alphanumeric characters unique identifier'
typeenum, stringYestext, number, date, tableThe type of the field as mentioned in the constraint.
requiredbooleanNotrue, falseDocumentPro will flag this field as missing if it is required but not extracted
subFieldsarrayConditionalRequired if type is tablesubFields is a list of fields as shown below

At least one subField is required if the parent field is a table. The attributes are:

AttributesTypeRequiredConstraintsDescription
namestringYes50 characters, only lower case letters, numbers or underscoreName of the field. It should describe the information you want the AI to retrieve e.g invoice_number
descriptionstringNo150 charactersA description of the field to help the AI interpret the field name better. You can also add data formats e.g '9 alphanumeric characters unique identifier'
typeenum, stringYestext, number, dateThe type of the field as mentioned in the constraint.
requiredbooleanNotrue, falseDocumentPro will flag this field as missing if it is required but not extracted

webhook_url (optional)

You can set a webhook url to your server or to a third-party platform like Zapier to get parsed data from DocumentPro in real-time.

parser_config (optional)

Parser config holds settings that allow you to change how the document parser processes your document. It includes:

  • parse_email_attachments: Boolean to determine if email attachments should be parsed (default: true).
  • parse_email_body: Boolean to determine if the email body should be parsed (default: false).
  • date_format: Specifies the format for parsing date fields (e.g., "%Y-%m-%d" for YYYY-MM-DD).
  • ocr_config: Controls the OCR process.
  • query_config: Configures the AI query process.
OCR Config Options
  • use_ocr: Boolean to determine if OCR should be used (default: false).
  • detect_layout: Boolean to determine if document layout should be detected (default: true).
  • detect_tables: Boolean to determine if tables should be detected (default: false).
  • remove_headers: Boolean to remove headers before parsing (default: false).
  • remove_footers: Boolean to remove footers before parsing (default: false).
  • remove_tables: Boolean to remove tables before parsing (default: false).
Query Config Options
  • query_model: Specifies the AI model to use for parsing (e.g., "gpt-4o-mini", "gpt-4o").
  • page_ranges: Optional string to specify which pages to process (e.g., "1-3,5,7-9").
  • start_regex: Optional regex pattern to define where parsing should begin.
  • end_regex: Optional regex pattern to define where parsing should end.
  • split_regex: Optional regex pattern to split the document into sections.
  • use_all_matches: Boolean to determine if all matches should be used (default: false).

Remember, while these configurations provide powerful customization options, they are all optional. You can create a functional Workflow using only the required fields: template_title, template_type, and template_schema.