Free live masterclass · 90 minutes

Shipping RAG That Doesn't Break in Production

A free live session for Python engineers who've built a RAG demo that works on their laptop — and now need it to work in front of users.

Sunday, October 11, 2026 · 4:00 PM IST

Live on Zoom · 90 minutes · Recorded

Doors open 15 min early — come check your audio

Save my seat

Replay sent to everyone who registers — even if you can't make it live.

What you'll walk away with

Why pure vector search fails in production (and the hybrid pattern that fixes it)
A chunking strategy that doesn't lose meaning and doesn't balloon your token bill
How to evaluate a RAG pipeline without kidding yourself (retrieval vs generation)
The real latency + cost math — measured, not guessed
A debugging playbook for 'why did my RAG return garbage for this specific query'

Who this is for

You'll get the most out of this if:

You've built a RAG demo and are now asked to put it in front of paying users
You're a Python engineer who can call the OpenAI / Anthropic API and want to build beyond it
You're tired of 'RAG in 10 lines of code' threads and want the actual production shape

Not for you if:

You're brand new to LLMs
You want a tutorial on calling an API

Agenda

0:00Live: a RAG pipeline returning wrong answers, diagnosed live

0:10Embeddings, chunking, retrieval, reranking — the production shape

0:50Evals that actually tell you if things got better

1:05Q&A

1:20What the 9-weekend cohort covers (optional to stay)

Your host

Arnab Pal

Engineer and founder. Has shipped production RAG and agent features on Anthropic + OpenAI with pgvector — for content platforms, customer support, and internal tooling. Runs Stackedtensor and arnabpal.me.

FAQ

Save your seat

90 minutes live · Recording sent to everyone who registers

Already know you want to go deeper? See the full 9-weekend cohort →

What you'll walk away with

Why pure vector search fails in production (and the hybrid pattern that fixes it)

A chunking strategy that doesn't lose meaning and doesn't balloon your token bill

How to evaluate a RAG pipeline without kidding yourself (retrieval vs generation)

The real latency + cost math — measured, not guessed

A debugging playbook for 'why did my RAG return garbage for this specific query'

Who this is for

You'll get the most out of this if:

You've built a RAG demo and are now asked to put it in front of paying users
You're a Python engineer who can call the OpenAI / Anthropic API and want to build beyond it
You're tired of 'RAG in 10 lines of code' threads and want the actual production shape

Not for you if:

You're brand new to LLMs
You want a tutorial on calling an API

Shipping RAG That Doesn't Break in Production

What you'll walk away with

Who this is for

Agenda

Arnab Pal

FAQ

Do I need to know vector databases already?

Which model / API will you use?

Will the session be recorded?

Is this an LLM theory talk?

Save your seat

Shipping RAG That Doesn't Break in Production

What you'll walk away with

Who this is for

Agenda

Arnab Pal

FAQ

Do I need to know vector databases already?

Which model / API will you use?

Will the session be recorded?

Is this an LLM theory talk?

Save your seat