Marker
Convert PDFs and documents to clean Markdown at speed
Marker is profiled here as a Document Processing tool for engineering teams. Read about features, pricing, and how it compares to related options in the tools directory.
Description
Marker is an open-source document conversion tool from Datalab, the company started by Vik Paruchuri. It turns PDFs, Office files, and images into Markdown, JSON, or HTML while preserving headings, tables, equations, and reading order through a pipeline of specialized models. Marker runs locally and processes documents quickly on a GPU, which makes it practical for preparing large corpora for retrieval pipelines. An optional pass through a language model raises accuracy on dense tables and complex layouts. It supports forced OCR for scanned pages and batched conversion across a whole directory of files.
Key Capabilities:
PDF, Office, and image conversion to Markdown, JSON, and HTML
Layout-aware extraction of tables, headings, and reading order
Equation conversion to LaTeX
Optional LLM pass to raise accuracy on complex pages
Batch processing tuned for GPU throughput
Self-hostable with a commercial-use license tier for larger organizations
Alternative tools
- Mathpix
OCR for math, science, and technical documents
