Developer Hub

Motivations and Technical Decisions

Motivations

Turning documents into podcasts has gained attention recently, driven by tools like Google’s NotebookLM. However, many options are closed-source, rely on proprietary models, and cater more to consumers than developers looking to build their own apps. Our Blueprint focuses on:

Fully local setup: Standalone functionality with no third-party APIs, ensuring privacy and data control.
Low compute requirements: Models optimized for local setups, even in GitHub Codespaces, without needing high-end GPUs or API keys.
Customization: Flexible and extensible, with guidance on building basic apps using its components.

‍

Technical Decisions

Document Pre-processing: Chose Python’s re library for text cleaning, optimizing token usage for smaller models with limited context windows. However we are exploring using alternatives like MarkItDown.
Podcast Script Generation:
- Used llama_cpp for local, CPU-friendly inference, supporting the GGUF binary format for optimized model performance.
- Selected Qwen2.5-3B-Instruct-GGUF as the default model for balanced performance.
- Improved script consistency by formatting outputs as JSON using the response_format argument in llama_cpp.
Audio Generation: Adopted OuteTTS for its consistent pronunciation of domain-specific words.
Deployment options:
- GitHub Codespaces for quick, remote usage and no local setup.
- Local CLI and Demo app for on-device usage.
- GPU-enabled Colab notebooks for easy experimentation without setup.
- HF Spaces for the hosted demo in the Blueprints Hub.

Related Blueprints

Blueprints

Llama.cpp

OuteTTS

Streamlit

Create your own tailored podcast using your documents

This blueprint demonstrates how you can use open-source models & tools to convert input documents into a podcast featuring two speakers.

Load more Blueprints

Motivations and Technical Decisions

Motivations

Technical Decisions

Related content

Related Blueprints