{
  "id": "123e4567-e89b-12d3-a456-426614174000",
  "file_name": "invoice.pdf",
  "file_size": 1048576,
  "created_at": "2024-03-21T13:45:00Z",
  "status": "PENDING",
  "page_count": 12,
  "extractor_id": "987fcdeb-a654-3210-9876-543210987654",
  "extractor": {
    "id": "987fcdeb-a654-3210-9876-543210987654",
    "name": "Invoice Extractor",
    "slug": "invoice-extractor"
  }
}

Run Extractor

Run a custom extractor on a document to extract specific information. Each extractor is designed to identify and extract particular types of data from documents.

Request

Headers

Authorization
string
required

Bearer token authentication. Example: Bearer YOUR_API_KEY

Path Parameters

slug
string
required

The unique identifier (slug) of the extractor to use

Body Parameters

file
file
required

The PDF file to process

callback_url
string

Optional webhook URL to receive processing status updates

Example Request

curl -X POST https://api.doctly.ai/api/v1/e/invoice-extractor \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -F "file=@invoice.pdf"

Response

Example Responses

{
  "id": "123e4567-e89b-12d3-a456-426614174000",
  "file_name": "invoice.pdf",
  "file_size": 1048576,
  "created_at": "2024-03-21T13:45:00Z",
  "status": "PENDING",
  "page_count": 12,
  "extractor_id": "987fcdeb-a654-3210-9876-543210987654",
  "extractor": {
    "id": "987fcdeb-a654-3210-9876-543210987654",
    "name": "Invoice Extractor",
    "slug": "invoice-extractor"
  }
}

Webhook Notifications

If a callback_url is provided, you will receive POST requests with status updates:

{
  "document_id": "123e4567-e89b-12d3-a456-426614174000",
  "status": "COMPLETED",
  "page_count": 12,
  "output_file_url": "https://...",
  "created_at": "2024-03-21T13:45:00Z",
  "extractor": {
    "id": "987fcdeb-a654-3210-9876-543210987654",
    "name": "Invoice Extractor",
    "slug": "invoice-extractor",
  }
}

Each extractor is designed for specific types of documents. Using the wrong extractor may result in incomplete or incorrect data extraction.

Next Step: Poll for Completion

After you run an extractor the status will be PENDING. Call Get Document periodically using the returned id until status changes to COMPLETED or FAILED. The output_file_url field will then be available for download.

# Poll every 5 s until status==COMPLETED or status==FAILED
DOC_ID="123e4567-e89b-12d3-a456-426614174000"
while true; do
  STATUS=$(curl -s https://api.doctly.ai/api/v1/documents/$DOC_ID \
    -H "Authorization: Bearer YOUR_API_KEY" | jq -r '.status')
  echo "Status: $STATUS"
  if [ "$STATUS" = "COMPLETED" ] || [ "$STATUS" = "FAILED" ]; then
    break
  fi
  sleep 5
done