Skip to main content

Create a parser

Read the introduction to parsers to understand Parsers.

You can use the REST API to create a parser in DocumentPro. Once you've created a parser, you can upload documents against the parser and extract data from them.

Guide to creating a parser#

API Endpoint#

POST https://2sy6t3lcbd.execute-api.us-east-1.amazonaws.com/prod/v1/templates

Example Implementation using Python#

import requestsimport json
url = "https://2sy6t3lcbd.execute-api.us-east-1.amazonaws.com/prod/v1/templates"
payload = json.dumps({  "template_title": "Custom Invoice Parser",  "template_type": "Invoice",  "template_schema": {    "fields": [      {        "name": "buyer_name",        "type": "text",        "description": "name of the buyer"      },      {        "name": "total_amount",        "type": "number"      },      {        "name": "line_items",        "type": "table",        "description": "invoice line items",        "subFields": [          {            "name": "description",            "type": "text",            "description": "description of item"          },          {            "name": "subtotal",            "type": "number"          }        ]      }    ]  }})
headers = {  'x-api-key': 'YOUR_API_KEY',  'Content-Type': 'application/json'}
response = requests.request("POST", url, headers=headers, data=payload)
# If the request was successful, status_code will be 200if response.status_code == 200:    print('Parser created successfully')else:    print('Failed to create parser')

Parser response#

A 200 status code will have the following body structure.

{    "template_id": "710a20fc-e280-43eb-9a9f-5436e600c710",    "template_title": "Custom Invoice Parser",    "template_type": "Invoice",    "template_category": "other",    "template_schema": {      "fields": [        {          "name": "buyer_name",          "type": "text",          "description": "name of the buyer"        },        {          "name": "total_amount",          "type": "number"        },        {          "name": "line_items",          "type": "table",          "description": "invoice line items",          "subFields": [            {              "name": "description",              "type": "text",              "description": "description of item"            },            {              "name": "subtotal",              "type": "number"            }          ]        }      ]    }    "email_id": "test_parser_22_fbb499@inbox.documentpro-ai.com",    "webhook_url": null,    "parser_config": {      "date_format": null,      "outbound_integration": null,      "ocr_config": {        "engine": "aws_textract",        "precision": "low",        "auto_select_precision": true,        "formatting_level": "low",        "show_page_number": false,        "split_by_page": false,        "trim_spaces": true,        "show_type_label": true      },      "query_config": {        "query_model": "gpt-3.5-turbo-1106",        "set_max_output_tokens": false,        "include_example": false,        "minimize_tokens": false,        "selected_language": "english"      }    },    "created_at": "2023-11-15T12:13:12.056281"}

The template_id is the unique id for the parser that can be used when uploading documents against the parser. template_id and parser_id are used interchangeably in DocumentPro.

For status codes 400, 404 and 500 you will get the following response body.

{    "success": false,    "error": "error code",    "message": "descriptive error message"}

Request attributes#

template_title (required)#

The title of the parser must be unique to your account.

template_type (required)#

The type of the parser should describe the type of document it is. E.g. an invoice, purchase order, w-8 ben etc.

template_category (optional)#

Options
finance
identity
logistics
human resources
medical
other (default)

template_schema (required)#

The template_schema is the query language used by DocumentPro to extract information from your document. You use the query language to describe the fields, tables and lists that you want to DocumentPro to capture.

A field can have the following attributes

AttributesTypeRequiredConstraintsDescription
namestringYes30 characters, only lower case letters, numbers or underscoreName of the field. It should describe the information you want the AI to retrieve e.g invoice_number
descriptionstringNo50 charactersA description of the field to help the AI interpret the field name better. You can also add data formats e.g '9 alphanumeric characters unique identifier'
typeenum, stringYestext, number, date, tableThe type of the field as mentioned in the constraint.
requiredbooleanNotrue, falseDocumentPro will flag this field as missing if it is required but not extracted
subFieldsarrayConditionalRequired if type is tablesubFields is a list of fields as shown below

At least one subField is required if the parent field is a table. The attributes are:

AttributesTypeRequiredConstraintsDescription
namestringYes30 characters, only lower case letters, numbers or underscoreName of the field. It should describe the information you want the AI to retrieve e.g invoice_number
descriptionstringNo50 charactersA description of the field to help the AI interpret the field name better. You can also add data formats e.g '9 alphanumeric characters unique identifier'
typeenum, stringYestext, number, date, tableThe type of the field as mentioned in the constraint.
requiredbooleanNotrue, falseDocumentPro will flag this field as missing if it is required but not extracted