How I built a hybrid, local-first transcript scraping & synthesis tool to bypass YouTube's cloud IP blocking, automate structured market intelligence, and feed my personal wiki.
Building a growth-marketing tool is more than putting APIs together. It requires understanding low-level constraints, network challenges, and architecting systems that run reliably.
YouTube aggressively blocks cloud hosting providers (AWS, GCP, DigitalOcean, Cloudflare) when retrieving transcripts, leading to instant HTTP 403 errors or captcha challenges. Cloud-native scrapers fail quickly. To solve this, I designed a hybrid execution model: ingestion sits in the cloud, but transcript scraping executes on my local residential IP, bypassing blocks natively.
A public Cloudflare Worker receives YouTube PubSubHubbub webhooks whenever a tracked channel publishes. It logs metadata in Cloudflare KV. A lightweight local node (running in Docker on a residential IP) uses **Trigger.dev v3** to subscribe to the task queue, pull down the transcript locally, call Google Gemini, and push the synthesis downstream.
Not all content is equal. A technical tutorial needs step-by-step SOP extraction, whereas a business strategy video requires macroeconomic frameworks. I built a dynamic classification system in Hono that automatically categorizes incoming channels (Tactical, Ideation, Strategy, PKM) and feeds the transcript to Gemini with custom-tailored prompt templates.
I specialize in designing and engineering autonomous systems, growth tooling, and developer integrations. I build custom pipelines that solve real-world problems.
Click through the steps below to trigger a simulated pipeline run. Watch how data routes from YouTube's pub-sub system through Cloudflare, executes in Docker locally, calls Gemini, and delivers findings.
Follow the data journey from ingestion to delivery.
Building a robust automation requires thinking about all states of the project, from cost control to data integration.
A standalone Cloudflare Worker registers to YouTube PubSubHubbub hub endpoints. New video alerts are ingested asynchronously via webhooks, with state locked safely in Cloudflare KV store.
Built using Manifest V3 guidelines. Serves as a local panel dashboard allowing users to filter categories, review the queue, configure prompts, and inspect processing logs instantly.
Classifies videos into distinct categories (Ideation, Strategy, PKM, news, etc.) and routes transcript data through tailored LLM instructions for hyper-relevant action extraction.
Orchestration runs on local Docker, synthesis uses the Gemini free-tier quota, and ingestion lives on Cloudflare's free tier. Run a fully automated knowledge center for $0/month.
All synthesized files are dispatched formatted in clean, clean Markdown. Perfect to drop directly into Obsidian or Logseq to automatically compile an interlinked personal knowledge graph.
Instant updates on the go. Summaries push automatically as rich documents to Telegram, and email daily digests summarize key knowledge topics so you never miss anything.