Skip to main content

Extracting data from a document

In this tutorial, we will show you step-by-step how to extract data from a document using DocumentPro.

Prerequisite#

If you haven't done so already, you will need to sign up for a DocumentPro account.

Example for this tutorial: extract data from an invoice#

We'll take the simple example of an accounting department that receives many PDF invoices from different vendors and need to input the data into their accounting system.

From each invoice, they want to capture the invoice number, date, total amount and the line items.

Understand the "Your Parsers" page#

On the "Your Parsers" page, you can:

  • See all the parsers in your account and the number of documents they have processed.
  • When you create an account we automatically add a few prebuilt parsers like Invoice and Receipt for you.
  • You can create your own parsers by clicking the "Create New Parser" button.
  • You can explore other parsers by clicking on the "Parser Library" link.

Your parsers page

Let's go ahead and parser our first document.

Step 1: Select the parser#

We need to tell DocumentPro what fields and tables you want to extract from your document. For an invoice, we need a parser that extracts fields like invoice number, date, total amount and line items.

DocumentPro adds a prebuilt invoice parser to your account when you sign up. We'll use this for the tutorial.

Invoice parser selected

Step 2: Understanding the parser page#

Once you click "View Parser" on the Invoice parser, you will enter the parser. This page has four tabs:

1. Parser#

On this tab you can:

  • View the fields and table columns that the parser extracts.
  • Add, remove or modify fields, their data types and more.
  • Select a document to test the parser and ensure your parser has all the fields you need.

Invoice parser selected

2. Documents#

On this tab you can:

  • See all the documents that have been processed by this parser.
  • See the status of these documents. i.e. processing, failed, completed
  • Click on the document to see the extracted data and the original document.

Invoice parser selected

3. Preview#

On this tab you can:

  • See a tabular preview of all the document data extracts from your documents.
  • Export data from different date ranges to a CSV or XLSX file.

Invoice parser selected

4. Settings#

On this tab you can:

  • Set parser engine configurations for OCR and GPT
  • Set data formatting settings like date and number formats
  • Set up Integrations like Webhook, Email forwarding and more

Invoice parser selected

Step 3: Parser your document#

Now that we understand the parser page, let's go ahead and parse our first document.

Click on the "Parser" tab and then click on upload to upload your document from your computer.

This is the document we will be using for this tutorial:

Invoice parser selected

Click on "Parse Document"

Step 4: Wait for the document to be processed#

After you click "Parse Document" you will be moved to the "Documents" tab where you can see the status of your document.

Invoice parser selected

Step 5: Review the extracted data#

Click on the "View" button to see the results of your parsed document.

Congratulations! You have successfully extracted data from a document using DocumentPro.

Now you can review the extracted data, make changes and save them, and approve/reject the extract.

Invoice parser selected

Step 6: Export the extracted data#

Click on the "Download" tab. On this tab you can select the fields you want to include in your export. Then click "Export to file" to download the data as a CSV or XLSX file.

Invoice parser selected

What's Next?#

See how to extract data from a document using the API.