PDF Table Extraction API enables developers to reliably extract structured tabular data from PDF documents and convert it into machine-readable formats such as JSON, Excel, or CSV.
This API focuses exclusively on true table extraction, not general PDF text parsing. It automatically detects grid-based tabular structures within PDFs and ignores non-tabular content such as titles, headers, footers, and paragraphs. This makes it ideal for automation, ETL pipelines, data ingestion workflows, and backend systems that require clean, predictable output.
Detects and extracts one or multiple tables from a single PDF
Supports tables spanning multiple pages
Returns results in JSON, Excel (.xlsx), or CSV
Multiple tables are returned as:
An array in JSON
Separate worksheets in Excel
Separate CSV files packaged in a ZIP archive
Deterministic output: same input always produces the same result
Optional confidence scores per table
Designed for automation and backend use cases
Identifies tabular data based on layout and structure
Preserves row and column alignment
Handles irregular tables, empty cells, and uneven rows
Returns structured output suitable for programmatic processing
Does not extract free-form text outside tables
Does not perform OCR on scanned PDFs
Does not attempt semantic interpretation of table contents
Does not modify or enrich data values
Extract invoice line items from PDF documents
Convert financial reports into structured datasets
Ingest tabular data from customer-uploaded PDFs
Automate data pipelines from PDF sources
Replace manual copy-paste workflows
JSON
Tables returned as an array
Each table includes rows, page range, and confidence score
Excel (.xlsx)
One workbook per request
Each table placed in a separate worksheet
CSV
Each table exported as a separate CSV file
All CSV files returned in a ZIP archive
Stateless and privacy-friendly
No data is stored after processing
Secure HTTPS-only communication
Suitable for production workloads
Maximum PDF size limits apply
Text-based PDFs only (no OCR support)
Tables must be visually structured (grid or aligned rows)
This API is designed for developers who need reliable table extraction, predictable output, and clean integration into automated systems — without the complexity or cost of large enterprise document platforms.
If you need structured data from PDF tables — not text blobs, not images, and not manual cleanup — this API provides a fast, deterministic, and developer-friendly solution.
{"tables":[{"tableIndex":0,"pageRange":[1,1],"rows":[["Lorem ipsum","","","","","","","",""],["condimentum.","Vivamus","dapibus","sodales","ex,","vitae","malesuada","ipsum","cursus"],["convallis. Maecenas sed egestas nulla, ac condimentum orci.","Mauris diam felis,","","","","","","",""],["ac accumsan nunc vehicula vitae.","Nulla eget justo in felis tristique fringilla. Morbi sit amet","","","","","","",""],["","Maecenas non lorem quis tellus placerat varius.","","","","","","",""],["","Aenean congue fringilla justo ut aliquam.","","","","","","",""],["","Mauris id ex erat.","Nunc vulputate neque vitae justo facilisis, non condimentum ante","","","","","",""],["sagittis.","","","","","","","",""],["","Morbi viverra semper lorem nec molestie.","","","","","","",""],["","Maecenas tincidunt est efficitur ligula euismod, sit amet ornare est vulputate.","","","","","","",""],["12","","","","","","","",""],["10","","","","","","","",""],["8","","","","","","","",""],["Column 1","","","","","","","",""],["6","","","","","","","",""],["Column 2","","","","","","","",""],["4 Column 3","","","","","","","",""],["2","","","","","","","",""],["0","","","","","","","",""],["Row 1","Row 2","Row 3","Row 4","","","","",""]],"rowCount":20,"columnCount":9,"strategyUsed":"stream","warnings":[],"confidence":0.85},{"tableIndex":1,"pageRange":[2,2],"rows":[["velit.","Pellentesque","fermentum","nisl","vitae","fringilla","venenatis.","Etiam","id","mauris","vitae","orci"],["a.","","","","","","","","","","",""],["Lorem ipsum","Lorem ipsum","Lorem ipsum","","","","","","","","",""],["1","In eleifend velit vitae libero sollicitudin euismod.","Lorem","","","","","","","","",""],["2","Cras fringilla ipsum magna, in fringilla dui commodo Ipsum","","","","","","","","","",""],["a.","","","","","","","","","","",""],["3","Aliquam erat volutpat.","Lorem","","","","","","","","",""],["4","Fusce vitae vestibulum velit.","Lorem","","","","","","","","",""],["5","Etiam vehicula luctus fermentum.","Ipsum","","","","","","","","",""],["et","pulvinar","nunc.","Pellentesque","fringilla","mollis","efficitur.","Nullam","venenatis","commodo","",""]],"rowCount":10,"columnCount":12,"strategyUsed":"stream","warnings":[],"confidence":0.85},{"tableIndex":2,"pageRange":[3,3],"rows":[["elit.","","","","","","","","","","",""],["dictum tellus.","","","","","","","","","","",""],["Aliquam","erat","volutpat.","Vestibulum","in","egestas","velit.","Pellentesque","fermentum","nisl","vitae",""],["fringilla","venenatis.","Etiam","id","mauris","vitae","orci","maximus","ultricies.","Cras","fringilla","ipsum"],["et","pulvinar","nunc.","Pellentesque","fringilla","mollis","efficitur.","Nullam","venenatis","commodo","",""]],"rowCount":5,"columnCount":12,"strategyUsed":"stream","warnings":[],"confidence":0.85}],"summary":{"tableCount":3,"pageCount":4}}
curl --location 'https://zylalabs.com/api/11754/pdf+table+extraction+api/22299/extract+data' \
--header 'Content-Type: application/json' \
--form 'image=@"FILE_PATH"'
| Header | Description |
|---|---|
Authorization
|
[Required] Should be Bearer access_key. See "Your API Access Key" above when you are subscribed. |
No long-term commitment. Upgrade, downgrade, or cancel anytime. Free Trial includes up to 50 requests.
The API returns structured tabular data extracted from PDF documents. This includes multiple tables, each represented as an array in JSON format, with options to receive the data in Excel (.xlsx) or CSV formats.
The response includes key fields such as `tableIndex`, `pageRange`, `rows`, `rowCount`, `columnCount`, `strategyUsed`, and `confidence`. Each table's data is organized to facilitate easy programmatic processing.
The response data is organized into a summary section that includes the total number of tables and pages, followed by an array of tables. Each table contains its rows, page range, and confidence score, making it easy to navigate and utilize.
The primary parameter for the endpoint is the PDF file itself, which can be uploaded directly. Additional parameters may include options for output format (JSON, Excel, CSV) and settings for confidence scoring.
Data accuracy is maintained through deterministic output, meaning the same input consistently produces the same result. The API also provides optional confidence scores for each table, indicating the reliability of the extraction.
Typical use cases include extracting invoice line items, converting financial reports into structured datasets, automating data pipelines, and ingesting tabular data from customer-uploaded PDFs, streamlining data processing workflows.
Users can leverage the structured output for integration into data pipelines, ETL processes, or backend systems. The organized format allows for easy manipulation and analysis of the extracted tables in various applications.
Users can expect data patterns that reflect the original table structure, including row and column alignment. The API handles irregular tables and empty cells, ensuring that the output remains structured and usable for further processing.
The API can extract various types of structured tables, including those with irregular layouts, empty cells, and uneven rows. It automatically detects single or multiple tables within a PDF, ensuring that only grid-based tabular structures are processed.
The API supports tables that span multiple pages, accurately capturing the entire table structure and returning it in a single output. Each table's page range is included in the response for easy reference.
Yes, users can customize their data requests by specifying the desired output format: JSON, Excel (.xlsx), or CSV. This flexibility allows integration into various applications and workflows.
The API offers optional confidence scores for each extracted table, indicating the reliability of the extraction. This feature helps users assess the quality of the data returned.
The API is designed to be stateless and privacy-friendly, ensuring that no data is stored after processing. It uses secure HTTPS-only communication to protect user data during transmission.
Users can expect the API to handle empty cells gracefully, preserving the overall structure of the table. The output will reflect the original layout, allowing for straightforward data manipulation despite any missing values.
Confidence scores range from 0 to 1, indicating the likelihood that the extracted table is accurate. A higher score suggests greater reliability, helping users determine which tables to trust for further processing.
The `strategyUsed` field indicates the method employed by the API to extract the table data. This information can help users understand the extraction process and assess the suitability of the output for their specific needs.
To obtain your API key, you first need to sign in to your account and subscribe to the API you want to use. Once subscribed, go to your Profile, open the Subscription section, and select the specific API. Your API key will be available there and can be used to authenticate your requests.
You can’t switch APIs during the free trial. If you subscribe to a different API, your trial will end and the new subscription will start as a paid plan.
If you don’t cancel before the 7th day, your free trial will end automatically and your subscription will switch to a paid plan under the same plan you originally subscribed to, meaning you will be charged and gain access to the API calls included in that plan.
The free trial ends when you reach 50 API requests or after 7 days, whichever comes first.
No, the free trial is available only once, so we recommend using it on the API that interests you the most. Most of our APIs offer a free trial, but some may not include this option.
Yes, we offer a 7-day free trial that allows you to make up to 50 API calls at no cost, so you can test our APIs without any commitment.
Zyla API Hub is like a big store for APIs, where you can find thousands of them all in one place. We also offer dedicated support and real-time monitoring of all APIs. Once you sign up, you can pick and choose which APIs you want to use. Just remember, each API needs its own subscription. But if you subscribe to multiple ones, you'll use the same key for all of them, making things easier for you.
Please have a look at our Refund Policy: https://zylalabs.com/terms#refund
Service Level:
91%
Response Time:
2,513ms
Service Level:
100%
Response Time:
0ms
Service Level:
100%
Response Time:
1,716ms
Service Level:
100%
Response Time:
1,945ms
Service Level:
100%
Response Time:
1,812ms
Service Level:
100%
Response Time:
3,168ms
Service Level:
100%
Response Time:
0ms
Service Level:
100%
Response Time:
3,107ms
Service Level:
100%
Response Time:
4,048ms
Service Level:
100%
Response Time:
1,429ms
Service Level:
100%
Response Time:
8,098ms
Service Level:
100%
Response Time:
3,188ms
Service Level:
100%
Response Time:
671ms
Service Level:
100%
Response Time:
808ms
Service Level:
100%
Response Time:
1,190ms
Service Level:
67%
Response Time:
2,541ms
Service Level:
100%
Response Time:
1,576ms
Service Level:
100%
Response Time:
763ms
Service Level:
100%
Response Time:
54ms
Service Level:
100%
Response Time:
450ms