curl -X POST https://api.doctly.ai/api/v1/e/invoice-extractor \
-H "Authorization: Bearer YOUR_API_KEY" \
-F "file=@invoice.pdf"
POST /api/v1/e/{slug}
Run a custom extractor on a document file. Extractors are pre-configured pipelines that extract specific data from documents and return structured output (JSON, CSV, XML, or Markdown).
Request
Bearer token authentication. Example: Bearer YOUR_API_KEY
Path Parameters
The unique slug identifier of the extractor (e.g., invoice-extractor)
Body Parameters
The document file to process. Supported formats: PDF, DOCX, PNG, JPG, JPEG, WEBP, GIF. Provide either file or url, not both.
URL to download the document from. The file will be fetched and processed. Provide either file or url, not both.
Webhook URL to receive a POST request when extraction completes.
Example Request
curl -X POST https://api.doctly.ai/api/v1/e/invoice-extractor \
-H "Authorization: Bearer YOUR_API_KEY" \
-F "file=@invoice.pdf"
From URL
curl -X POST https://api.doctly.ai/api/v1/e/invoice-extractor \
-H "Authorization: Bearer YOUR_API_KEY" \
-F "url=https://example.com/invoice.pdf"
Response
Unique identifier (UUID) for the document/extraction job
Size of the file in bytes
Number of pages in the document
Processing status: PENDING, PROCESSING, COMPLETED, or FAILED
UUID of the extractor used
Details of the extractor used Display name of the extractor
Pricing model: PER_PAGE or PER_DOCUMENT
Credits charged per page or per document
Example Responses
200 OK
400 Bad Request
400 Bad Request — Missing Input
404 Not Found
422 Unprocessable Entity
{
"id" : "123e4567-e89b-12d3-a456-426614174000" ,
"file_name" : "invoice.pdf" ,
"file_size" : 524288 ,
"page_count" : 2 ,
"status" : "PENDING" ,
"extractor_id" : "987fcdeb-a654-3210-9876-543210987654" ,
"extractor" : {
"id" : "987fcdeb-a654-3210-9876-543210987654" ,
"name" : "Invoice Extractor" ,
"slug" : "invoice-extractor" ,
"cost_type" : "PER_PAGE" ,
"cost_credits" : 5
},
"created_at" : "2024-03-21T13:45:00Z"
}
Webhooks
If callback_url is provided, you’ll receive a POST request when extraction completes:
{
"document_id" : "123e4567-e89b-12d3-a456-426614174000" ,
"status" : "COMPLETED" ,
"file_name" : "invoice.pdf" ,
"extractor" : {
"id" : "987fcdeb-a654-3210-9876-543210987654" ,
"name" : "Invoice Extractor" ,
"slug" : "invoice-extractor"
}
}
Polling for Results
After running an extractor, poll Get Document until status is COMPLETED:
DOC_ID = "123e4567-e89b-12d3-a456-426614174000"
while true ; do
RESP = $( curl -s https://api.doctly.ai/api/v1/documents/ $DOC_ID \
-H "Authorization: Bearer YOUR_API_KEY" )
STATUS = $( echo $RESP | jq -r '.status' )
echo "Status: $STATUS "
if [ " $STATUS " = "COMPLETED" ] || [ " $STATUS " = "FAILED" ]; then
echo $RESP | jq -r '.output_file_url'
break
fi
sleep 5
done
Extractors can output data in different formats:
Format Content-Type Description JSON application/jsonStructured data as JSON object CSV text/csvTabular data as CSV XML application/xmlStructured data as XML Markdown text/markdownFormatted text as Markdown
The output format is determined by the extractor configuration.
Each extractor is designed for specific document types. Using an invoice extractor on a resume may produce incomplete or incorrect results.