PyMuPDF

PyMuPDF is a high performance Python library for data extraction, analysis, conversion & manipulation of PDF (and other) documents.

‍

PyMuPDF4LLM is aimed to make it easier to extract PDF content in the format you need for LLM&RAG environments. It supports Markdown extraction as well as LlamaIndex document output. It support multi-column pages, image and vector graphics extraction (and inclusion of references in the MD text), and page chunking.

‍

Related Blueprints

Blueprints

PyMuPDF

Llama.cpp

Streamlit

Query structured documents using a lightweight LLM workflow

This Blueprint demonstrates how to use open-source models and a simple LLM workflow to answer questions based on structured documents.

View more Blueprints

PyMuPDF

Related Blueprints

Related content