Posts

Day 13- AI Engineering - Building a RAG pipeline

Image
I Built a RAG Pipeline From Scratch. The LLM Was the Least Important Part. AI Engineering Journey - Day 13 After 10+ years of building software, I thought I had a decent mental model of how systems work. Databases store data. APIs serve it. Frontends render it. Clean separation of concerns. Then I started learning AI engineering, and for the first few days, my mental model was embarrassingly simple: User types question -> LLM thinks hard -> Answer appears That's it. That was my entire understanding of how ChatGPT-like systems work. Today I built an actual RAG pipeline from scratch - embeddings, vector search, retrieval, prompt construction, LLM generation - everything wired together. And the thing that hit me hardest? The LLM is the dumbest part of the system. It just generates text from whatever you hand it. All the intelligence is in what you choose...

Day 12- AI Engineering - Why RAG Systems Need Vector Databases

Image
Day 12 — Why RAG Systems Need Vector Databases AI Engineering — Day by Day My journey to becoming an AI Engineer After learning about embeddings and chunking, I reached an interesting point in my AI engineering journey. I understood: How text becomes vectors How retrieval works semantically Why chunking affects answer quality But then a much bigger question appeared: What happens when the system has thousands or millions of chunks? This is where I discovered: Vector Databases The Problem with Naive Retrieval Initially, my retrieval pipeline looked something like this: Query ↓ Generate embedding ↓ Compare against all embeddings ↓ Return closest match This works perfectly for: 5 chunks 20 chunks Small experiments But it quickly breaks at scale. Imagine: 100,000+ chunks Now for every query: The system compares against every vector Latency increases Memory usage grows Performance drops signific...

Day 11- AI Engineering - Testing Chunking Strategies in RAG

Image
Day 11 -Testing Chunking Strategies in RAG  AI Engineering — Day by Day My journey to becoming an AI Engineer In my previous post, I explored chunking conceptually and realized something important: Chunking is not just preprocessing — it directly affects retrieval quality. But this time, I wanted to go beyond theory. I wanted to actually test: How different chunking strategies behave How retrieval scores change with each strategy Why some approaches fail and what that looks like in real numbers How embeddings work under the hood How to set up HuggingFace models locally So I built a small experiment pipeline locally — and what I saw completely changed how I think about RAG systems. The Goal of the Experiment The idea was simple — build a mini retrieval pipeline from scratch: Document ↓ Chunking Strategy (split text into pieces) ↓ Embeddings (convert each chunk into numbers) ↓ Similarity Search (compare query numbers to chunk n...

Day 10 - AI Engineering - Chunking in RAG

Image
Day 10 — Chunking in RAG (The Most Underrated Part of AI Systems) AI Engineering — Day by Day My journey to becoming an AI Engineer After learning about embeddings and semantic search, I started feeling like I finally understood how retrieval works. But then I realized something important: Even perfect embeddings cannot save a badly chunked system. And honestly, this completely changed how I think about RAG pipelines. The Question That Started Everything Once I understood embeddings, the next question became: What exactly are we embedding and retrieving? The answer: Chunks of text And this process of splitting documents into smaller pieces is called: Chunking Why Chunking Exists Documents are usually: Large Unstructured Too big for direct retrieval For example: 100-page PDF Large knowledge base Long policy documents We cannot simply embed an entire document as one giant block. So, we split it into smaller meaningful ...

Day 9 - AI Engineering journey - RAG Learning - Embeddings

Image
Day 9— Embeddings (The Backbone of RAG Systems) AI Engineering — Day by Day My journey to becoming an AI Engineer After understanding why prompting alone is not enough and how RAG changes the system design, I reached a point where one question became unavoidable: How does a system actually find the “right” information? This is where I came across one of the most important concepts in modern AI systems: Embeddings What I Initially Thought At first, I assumed search would work like: Match keywords Find exact words But that approach quickly breaks: "refund" ≠ "money back" "car" ≠ "vehicle" And that’s when I realized: Machines don’t need to match words — they need to match meaning. What is an Embedding? An embedding is a way to convert text into numbers such that: Similar meaning → similar numerical representation For example: "Apple is a fruit" "Banana is a fruit" ...

Day 8 - AI Engineering - Why Prompting Alone Is Not Enough (Understanding RAG)

Image
Day 8 - Why Prompting Alone Is Not Enough (Understanding RAG) AI Engineering — Day by Day My journey to becoming an AI Engineer After building my LLM playground and experimenting with prompts, I started noticing something: No matter how good the prompt is… the model still fails in certain situations. This made me question something fundamental: If LLMs are so powerful, why do they still struggle with real-world tasks? That’s when I started exploring something called: Retrieval Augmented Generation (RAG) The Problem I Ran Into Before learning RAG, my approach was simple: Write a better prompt → Get a better answer But this approach has clear limits: The model doesn’t have real-time knowledge It hallucinates even when it “knows” something It struggles with large or specific documents Even after improving prompts, these issues didn’t go away. Why Prompting Alone Fails Prompting works well when: The question is general The m...

Day 7 - AI Engineering - Introduction to RAG - Why Giving a PDF to an LLM Doesn’t Work

Image
Day 7 — Why Giving a PDF to an LLM Doesn’t Work AI Engineering — Day by Day My journey to becoming an AI Engineer After building my first LLM playground, I had a simple thought: “If LLMs are so powerful, why not just give them the entire document and ask questions?” At first, this feels like it should work. But when I started thinking deeper, I realized something important: This approach breaks in multiple ways — and understanding that is what leads to RAG. đź§  My Initial Understanding My first assumption was: Give full PDF to LLM It keeps it in context Ask questions → get answers And technically… this can work for small inputs. But only under very limited conditions. ⚠️ Where This Approach Breaks 1. Context Window Limitation LLMs can only process a fixed number of tokens. Large documents don’t fit Important information gets truncated 2. Attention Dilution Even if the document fits: Too much information → weaker...