PDF Table Extraction API enables developers to reliably extract structured tabular data from PDF documents and convert it into machine-readable formats such as JSON, Excel, or CSV.
This API focuses exclusively on true table extraction, not general PDF text parsing. It automatically detects grid-based tabular structures within PDFs and ignores non-tabular content such as titles, headers, footers, and paragraphs. This makes it ideal for automation, ETL pipelines, data ingestion workflows, and backend systems that require clean, predictable output.
Detects and extracts one or multiple tables from a single PDF
Supports tables spanning multiple pages
Returns results in JSON, Excel (.xlsx), or CSV
Multiple tables are returned as:
An array in JSON
Separate worksheets in Excel
Separate CSV files packaged in a ZIP archive
Deterministic output: same input always produces the same result
Optional confidence scores per table
Designed for automation and backend use cases
Identifies tabular data based on layout and structure
Preserves row and column alignment
Handles irregular tables, empty cells, and uneven rows
Returns structured output suitable for programmatic processing
Does not extract free-form text outside tables
Does not perform OCR on scanned PDFs
Does not attempt semantic interpretation of table contents
Does not modify or enrich data values
Extract invoice line items from PDF documents
Convert financial reports into structured datasets
Ingest tabular data from customer-uploaded PDFs
Automate data pipelines from PDF sources
Replace manual copy-paste workflows
JSON
Tables returned as an array
Each table includes rows, page range, and confidence score
Excel (.xlsx)
One workbook per request
Each table placed in a separate worksheet
CSV
Each table exported as a separate CSV file
All CSV files returned in a ZIP archive
Stateless and privacy-friendly
No data is stored after processing
Secure HTTPS-only communication
Suitable for production workloads
Maximum PDF size limits apply
Text-based PDFs only (no OCR support)
Tables must be visually structured (grid or aligned rows)
This API is designed for developers who need reliable table extraction, predictable output, and clean integration into automated systems — without the complexity or cost of large enterprise document platforms.
If you need structured data from PDF tables — not text blobs, not images, and not manual cleanup — this API provides a fast, deterministic, and developer-friendly solution.
{"tables":[{"tableIndex":0,"pageRange":[1,1],"rows":[["Lorem ipsum","","","","","","","",""],["condimentum.","Vivamus","dapibus","sodales","ex,","vitae","malesuada","ipsum","cursus"],["convallis. Maecenas sed egestas nulla, ac condimentum orci.","Mauris diam felis,","","","","","","",""],["ac accumsan nunc vehicula vitae.","Nulla eget justo in felis tristique fringilla. Morbi sit amet","","","","","","",""],["","Maecenas non lorem quis tellus placerat varius.","","","","","","",""],["","Aenean congue fringilla justo ut aliquam.","","","","","","",""],["","Mauris id ex erat.","Nunc vulputate neque vitae justo facilisis, non condimentum ante","","","","","",""],["sagittis.","","","","","","","",""],["","Morbi viverra semper lorem nec molestie.","","","","","","",""],["","Maecenas tincidunt est efficitur ligula euismod, sit amet ornare est vulputate.","","","","","","",""],["12","","","","","","","",""],["10","","","","","","","",""],["8","","","","","","","",""],["Column 1","","","","","","","",""],["6","","","","","","","",""],["Column 2","","","","","","","",""],["4 Column 3","","","","","","","",""],["2","","","","","","","",""],["0","","","","","","","",""],["Row 1","Row 2","Row 3","Row 4","","","","",""]],"rowCount":20,"columnCount":9,"strategyUsed":"stream","warnings":[],"confidence":0.85},{"tableIndex":1,"pageRange":[2,2],"rows":[["velit.","Pellentesque","fermentum","nisl","vitae","fringilla","venenatis.","Etiam","id","mauris","vitae","orci"],["a.","","","","","","","","","","",""],["Lorem ipsum","Lorem ipsum","Lorem ipsum","","","","","","","","",""],["1","In eleifend velit vitae libero sollicitudin euismod.","Lorem","","","","","","","","",""],["2","Cras fringilla ipsum magna, in fringilla dui commodo Ipsum","","","","","","","","","",""],["a.","","","","","","","","","","",""],["3","Aliquam erat volutpat.","Lorem","","","","","","","","",""],["4","Fusce vitae vestibulum velit.","Lorem","","","","","","","","",""],["5","Etiam vehicula luctus fermentum.","Ipsum","","","","","","","","",""],["et","pulvinar","nunc.","Pellentesque","fringilla","mollis","efficitur.","Nullam","venenatis","commodo","",""]],"rowCount":10,"columnCount":12,"strategyUsed":"stream","warnings":[],"confidence":0.85},{"tableIndex":2,"pageRange":[3,3],"rows":[["elit.","","","","","","","","","","",""],["dictum tellus.","","","","","","","","","","",""],["Aliquam","erat","volutpat.","Vestibulum","in","egestas","velit.","Pellentesque","fermentum","nisl","vitae",""],["fringilla","venenatis.","Etiam","id","mauris","vitae","orci","maximus","ultricies.","Cras","fringilla","ipsum"],["et","pulvinar","nunc.","Pellentesque","fringilla","mollis","efficitur.","Nullam","venenatis","commodo","",""]],"rowCount":5,"columnCount":12,"strategyUsed":"stream","warnings":[],"confidence":0.85}],"summary":{"tableCount":3,"pageCount":4}}
curl --location 'https://zylalabs.com/api/11754/pdf+table+extraction+api/22299/extract+data' \
--header 'Content-Type: application/json' \
--form 'image=@"FILE_PATH"'
| 标头 | 描述 |
|---|---|
授权
|
[必需] 应为 Bearer access_key. 订阅后,请查看上方的"您的 API 访问密钥"。 |
无长期承诺。随时升级、降级或取消。 免费试用包括最多 50 个请求。
API返回从PDF文档提取的结构化表格数据。这包括多个表格,每个表格以JSON格式的数组表示,用户可以选择接收Excel(.xlsx)或CSV格式的数据
响应包括关键字段,例如 `tableIndex`、`pageRange`、`rows`、`rowCount`、`columnCount`、`strategyUsed` 和 `confidence`。每个表的数据都经过组织,以便于程序化处理
响应数据被组织成一个摘要部分,包括表格和页面的总数量,后面跟着一个表格数组。每个表格包含其行、页面范围和信心分数,使得导航和利用变得简单
端点的主要参数是PDF文件本身,可以直接上传 额外的参数可能包括输出格式的选项(JSON Excel CSV)和置信评分的设置
数据的准确性通过确定性输出得以保持,这意味着相同的输入始终会产生相同的结果。API还为每个表提供可选的置信度分数,指示提取的可靠性
典型的用例包括提取发票项目将财务报告转换为结构化数据集自动化数据管道以及从客户上传的PDF中摄取表格数据以简化数据处理工作流
用户可以利用结构化输出将其集成到数据管道、ETL 过程或后台系统中。组织良好的格式使得在各种应用中对提取的表格进行轻松的操作和分析成为可能
用户可以期待反映原始表格结构的数据模式,包括行和列对齐。API处理不规则表格和空单元格,确保输出保持结构化并适合进一步处理
API可以提取各种类型的结构化表格,包括那些不规则布局、空单元格和行数不一致的表格。它自动检测PDF中的单个或多个表格,确保仅处理基于网格的表格结构
该API支持跨多个页面的表格,准确捕捉整个表格结构并以单个输出返回。每个表格的页面范围包含在响应中以便于参考
是的 用户可以通过指定所需的输出格式来自定义数据请求 JSON Excel (.xlsx) 或 CSV 这种灵活性允许集成到各种应用程序和工作流程中
该API为每个提取的表格提供可选的置信分数,以指示提取的可靠性 此功能帮助用户评估返回数据的质量
该API旨在无状态和隐私友好,确保处理后不存储数据。它使用安全的仅限HTTPS通信来保护用户在传输过程中的数据
用户可以期待API优雅地处理空单元格,保持表格的整体结构。输出将反映原始布局,使得尽管存在缺失值,数据操作仍然简单明了
置信分数范围从0到1,表示提取的表格准确的可能性。较高的分数表示更高的可靠性,帮助用户确定哪些表格值得信任以便进一步处理
`strategyUsed`字段指示API提取表格数据所采用的方法。该信息可以帮助用户了解提取过程并评估输出对其特定需求的适用性
服务级别:
91%
响应时间:
2,513ms
服务级别:
100%
响应时间:
1,073ms
服务级别:
100%
响应时间:
0ms
服务级别:
100%
响应时间:
1,716ms
服务级别:
100%
响应时间:
1,945ms
服务级别:
100%
响应时间:
3,168ms
服务级别:
100%
响应时间:
1,812ms
服务级别:
100%
响应时间:
3,107ms
服务级别:
100%
响应时间:
0ms
服务级别:
100%
响应时间:
4,048ms