Skip to main content

Classify Document

Use this endpoint to classify a document into one of your defined categories in a single API call. No saved classifier is required — you provide the labels inline with each request. This is ideal for one-off classifications or when your label set changes per request.

For recurring automation (e.g. auto-routing every incoming document by type), consider creating a saved Classifier instead and calling Run Classification.

Prerequisite

The document must already be uploaded and OCR-processed before it can be classified. Upload the document and run an extract first to prepare it.

API Endpoint

POST https://api.documentpro.ai/v1/classify

Headers

  • x-api-key (required): Your API key for authentication.
  • Content-Type: application/json

Request Body

FieldTypeRequiredDescription
document_idstring (UUID)YesThe ID of the document to classify.
classification_schemaarrayYesList of category objects, each with a label and description. Minimum 2 categories.
page_rangestringNoPages to use for classification (e.g. "1-3", "1,3,5"). Defaults to all pages.
query_modelstringNoAI model to use. Options: "gpt-4o-mini" (default), "gpt-4o".

Each item in classification_schema must have:

  • label (string): The category name that will be returned if the document matches (e.g. "invoice").
  • description (string): A plain-English description that helps the AI understand what this category means.

Example Implementation

Using cURL

curl --location 'https://api.documentpro.ai/v1/classify' \
--header 'x-api-key: YOUR_API_KEY' \
--header 'Content-Type: application/json' \
--data '{
"document_id": "0b13c9f2-5148-4ffb-bb7b-de03bb071ca8",
"classification_schema": [
{ "label": "invoice", "description": "A document requesting payment for goods or services rendered" },
{ "label": "purchase_order", "description": "A buyer-issued document authorizing a purchase from a supplier" },
{ "label": "contract", "description": "A legally binding agreement between two or more parties" },
{ "label": "other", "description": "Any document that does not fit the above categories" }
],
"page_range": "1-2",
"query_model": "gpt-4o-mini"
}'

Using Python

import requests
import json

url = "https://api.documentpro.ai/v1/classify"

headers = {
'x-api-key': 'YOUR_API_KEY',
'Content-Type': 'application/json'
}

payload = {
"document_id": "0b13c9f2-5148-4ffb-bb7b-de03bb071ca8",
"classification_schema": [
{ "label": "invoice", "description": "A document requesting payment for goods or services rendered" },
{ "label": "purchase_order", "description": "A buyer-issued document authorizing a purchase from a supplier" },
{ "label": "contract", "description": "A legally binding agreement between two or more parties" },
{ "label": "other", "description": "Any document that does not fit the above categories" }
],
"page_range": "1-2",
"query_model": "gpt-4o-mini"
}

response = requests.post(url, headers=headers, data=json.dumps(payload))

if response.status_code == 200:
result = response.json()
print(f"Classification: {result['classification']}")
print(f"Confidence scores: {result['confidence_scores']}")
else:
print('Classification failed')
print(response.text)

Response

Successful Response (Status Code: 200)

{
"request_id": "a7813466-6f9a-4c33-8128-427e7a4df755",
"document_id": "0b13c9f2-5148-4ffb-bb7b-de03bb071ca8",
"classifier_id": null,
"classification": "invoice",
"confidence_scores": {
"invoice": 0.9312,
"purchase_order": 0.0421,
"contract": 0.0198,
"other": 0.0069
},
"request_status": "completed",
"credits_used": 2,
"num_pages": 2,
"duration": 1.4
}

Error Response (Status Codes: 400, 403, 404, 500)

{
"success": false,
"error": "error_code",
"message": "descriptive error message"
}

Response Fields Explained

  • request_id: Unique identifier for this classification run. Store it if you want to audit classification history.
  • document_id: The document that was classified.
  • classifier_id: Always null for inline classification. Populated when using a saved classifier.
  • classification: The single best-matching label from your classification_schema.
  • confidence_scores: A score between 0.0 and 1.0 for every label you provided. All scores sum to approximately 1.0.
  • request_status: "completed" on success, "failed" or "exception" on error.
  • credits_used: Credits deducted from your plan for this classification.
  • num_pages: Number of pages in the document (or in the page_range if specified).
  • duration: Time in seconds the classification took.

Important Notes

  1. The document must have been processed through the Extract pipeline before classifying. If the document has not been OCR'd, you will receive a 400 error.
  2. You must provide at least 2 labels in classification_schema.
  3. Write clear, distinct descriptions for each label. The more specific the description, the more accurate the classification.
  4. Use page_range to limit classification to the most informative pages of long documents — this reduces credit usage and improves speed.
  5. Each call to /v1/classify consumes credits based on the number of pages processed.

Next Steps