AI & GTM Engineering Showcase

Local YouTube Knowledge Pipeline

How I built a hybrid, local-first transcript scraping & synthesis tool to bypass YouTube's cloud IP blocking, automate structured market intelligence, and feed my personal wiki.

Watch Project Explainer Video

The Product Builder's Perspective

Building a growth-marketing tool is more than putting APIs together. It requires understanding low-level constraints, network challenges, and architecting systems that run reliably.

The Challenge: Bypassing YouTube's Aggressive IP Blocks

YouTube aggressively blocks cloud hosting providers (AWS, GCP, DigitalOcean, Cloudflare) when retrieving transcripts, leading to instant HTTP 403 errors or captcha challenges. Cloud-native scrapers fail quickly. To solve this, I designed a hybrid execution model: ingestion sits in the cloud, but transcript scraping executes on my local residential IP, bypassing blocks natively.

The Architecture: Hybrid Cloud & Local Docker Sync

A public Cloudflare Worker receives YouTube PubSubHubbub webhooks whenever a tracked channel publishes. It logs metadata in Cloudflare KV. A lightweight local node (running in Docker on a residential IP) uses **Trigger.dev v3** to subscribe to the task queue, pull down the transcript locally, call Google Gemini, and push the synthesis downstream.

GTM Automation: Category-Specific Prompt Synthesis

Not all content is equal. A technical tutorial needs step-by-step SOP extraction, whereas a business strategy video requires macroeconomic frameworks. I built a dynamic classification system in Hono that automatically categorizes incoming channels (Tactical, Ideation, Strategy, PKM) and feeds the transcript to Gemini with custom-tailored prompt templates.

NB

Nico

AI & GTM Engineer

I specialize in designing and engineering autonomous systems, growth tooling, and developer integrations. I build custom pipelines that solve real-world problems.

Core Focus AI/GTM Automation
Favorite Stack TS / Hono / Workers
Orchestration Trigger.dev / Docker
AI Integration Google Gemini API
Let's Collaborate

The Local-First Orchestration Flow

Click through the steps below to trigger a simulated pipeline run. Watch how data routes from YouTube's pub-sub system through Cloudflare, executes in Docker locally, calls Gemini, and delivers findings.

Pipeline Steps

Follow the data journey from ingestion to delivery.

Pipeline Environment
YouTube PubSub
CF Worker
Cloudflare KV
YouTube Pipeline 1 Video Pending
Building AI Agents...
Channel: DevMastery
Local Residential Node
Container: Active
Template: Tactical
Model: Gemini 3.5
Task dQw4w9WgXcQ Completed!
Structured Markdown summary dispatched to chat.

Fully Integrated Solutions

Building a robust automation requires thinking about all states of the project, from cost control to data integration.

Autopilot Ingestion

A standalone Cloudflare Worker registers to YouTube PubSubHubbub hub endpoints. New video alerts are ingested asynchronously via webhooks, with state locked safely in Cloudflare KV store.

Chrome Extension UI

Built using Manifest V3 guidelines. Serves as a local panel dashboard allowing users to filter categories, review the queue, configure prompts, and inspect processing logs instantly.

Category-Specific Prompts

Classifies videos into distinct categories (Ideation, Strategy, PKM, news, etc.) and routes transcript data through tailored LLM instructions for hyper-relevant action extraction.

Zero-Cost Infrastructure

Orchestration runs on local Docker, synthesis uses the Gemini free-tier quota, and ingestion lives on Cloudflare's free tier. Run a fully automated knowledge center for $0/month.

Markdown / Obsidian Sync

All synthesized files are dispatched formatted in clean, clean Markdown. Perfect to drop directly into Obsidian or Logseq to automatically compile an interlinked personal knowledge graph.

Telegram & Digest Emails

Instant updates on the go. Summaries push automatically as rich documents to Telegram, and email daily digests summarize key knowledge topics so you never miss anything.

Pipeline Technical Performance

~5.2s
Execution Speed
$0.00
Server Cost
100%
IP Blocks Bypassed
10+ hrs
Time Saved / Wk