Building My Own Nakama with RAG

Last year when I went to React Miami, I was talking with a friend about what anime they were watching and a popular one jumped into the conversation: One Piece. Many friends had recommended it to me, but due to the amount of episodes I doubted if I should start watching it. My friend convinced me and when I got home I started watching (at that moment I didn't know about One Pace).

In December 2025, I was able to finish watching One Piece, and it was an amazing experience. I loved the story, the references, the characters, and the world created by Eiichiro Oda.

So, maybe you're wondering, why build a RAG system for One Piece? Let me tell you why:

When I started the anime, I had a lot of questions. What did I do? Research, ask friends, fall into wiki rabbit holes. And I thought, what if I could build my own nakama - a virtual crewmate who's seen every episode, read the wiki, and can answer anything I ask?

Could I just Google everything? Sure. But I wanted something more - a companion for my One Piece journey. Someone who knows the lore, understands the terminology, and is always there when I need answers. That's why I decided to build this RAG system: my own nakama.

What we will cover in this blog post:

How I gave my nakama a brain (data pipeline and chunking)
RAG best practices that actually matter
The Buccaneer problem
What I learned
Going to production
When a nakama is (and isn't) the answer
How you can build your own

Let's start with: What is RAG? RAG comes from Retrieval-Augmented Generation, and what does that mean?

R: Retrieval: We have a knowledge base (in this case, One Piece data) that we can search for relevant information.
A: Augmented: We use the retrieved information to augment the input to a language model.
G: Generation: The language model generates a response based on the augmented input.

Giving My Nakama a Brain

Before my nakama could answer anything, I needed to give them knowledge. Where do you get One Piece data? The internet has plenty of sources. I started with three (you can always add more, but be careful - data quality matters more than quantity):

The Data Sources

1. Subtitles from One Pace One Pace is a fan project that removes filler from the anime. I grabbed subtitles from 400+ episodes - every line of dialogue, every scene. This became the core of my nakama's memory.

2. Wiki Pages The One Piece Wiki has detailed information about characters, races, Devil Fruits, and lore. I scraped key pages and translated them (the Spanish wiki often has more recent info).

3. Manga Transcripts For the latest arcs not yet in the anime, manga transcripts fill the gaps.

The Chunking Challenge

You can't just dump entire episodes into a database. The text needs to be split into chunks small enough to search effectively, but large enough to preserve context.

# My chunking config
chunk_size: int = 500      # Max characters per chunk
chunk_overlap: int = 50    # Overlap between chunks

Why overlap? Imagine a sentence gets cut in half at a chunk boundary. With overlap, the important context carries into the next chunk.

Cleaning the Data

Subtitles are messy. Music markers, formatting codes, speaker tags - all noise that confuses the search:

# Before cleaning
"♪ We Are! ♪ [LUFFY] I'm gonna be King of the Pirates!"

# After cleaning
"I'm gonna be King of the Pirates!"

The result? 136,000+ clean chunks of One Piece knowledge, ready to be searched.

The Architecture

RAG Architecture Diagram

User asks a question → the query gets embedded → we search the vector database for similar chunks → rerank the results → feed the best chunks + question to the LLM → get an answer with context.

RAG Best Practices (What I Actually Implemented)

Building this nakama taught me that generic RAG tutorials skip the important stuff. This is what worked for me:

1. Hybrid Search: Don't Choose Between Semantic and Keyword

Most tutorials tell you to use vector search OR keyword search. I use both:

# Combine vector similarity (70%) with keyword matching (30%)
combined_score = vector_score * 0.7 + keyword_score * 0.3

Why? Vector search understands meaning ("Who is the rubber pirate?" finds Luffy), but misses exact terms. Keyword search catches specific names and techniques. Together, they cover more ground.

2. Two-Stage Retrieval: Fast First, Accurate Second

Searching 136K chunks needs to be fast. But accuracy matters too. So I split it into two stages.

# Stage 1: Fast bi-encoder retrieves 3x more candidates
retrieval_top_k = final_top_k * 3  # Get 24 candidates if we need 8

# Stage 2: Slower cross-encoder re-ranks for precision
reranked = reranker.rerank(candidates, top_k=final_top_k)

The bi-encoder (embedding search) is fast - milliseconds to scan thousands of chunks. The cross-encoder is slower but more accurate - it compares the query directly against each candidate. You get speed AND accuracy.

3. Overlapping Chunks: Don't Lose Context at Boundaries

When building a RAG, overlapping is easy to ignore - until you lose answers you know are in your data. Here's what happens without overlap:

# Chunk 1 ends here:
"...Luffy ate the Gomu Gomu no Mi which is actually"

# Chunk 2 starts here:
"the Hito Hito no Mi Model: Nika, a mythical zoan fruit..."

Someone asks: "What is Luffy's real Devil Fruit?" Neither chunk answers it. The connection between them got split at the boundary.

With 50 characters of overlap:

# Chunk 1:
"...Luffy ate the Gomu Gomu no Mi which is actually the Hito Hito no Mi"

# Chunk 2:
"which is actually the Hito Hito no Mi Model: Nika, a mythical zoan fruit..."

Now both chunks contain the connection. The search can find it.

chunk_size: int = 500
chunk_overlap: int = 50  # 10% overlap - don't skip this

4. Lazy-Load Heavy Models

Embedding models are big. Loading them blocks your application startup. Instead, load on first use:

@property
def model(self) -> SentenceTransformer:
    """Lazy-load the embedding model on first access."""
    if self._model is None:
        self._model = SentenceTransformer(self._model_name)
    return self._model

Your API starts instantly. The model loads when someone actually asks a question.

5. Cache, But Be Smart About It

Repeated questions? Cache the response. But conversational follow-ups need fresh context:

# Cache standalone questions
if not conversation_history:
    cached = cache.get(question)
    if cached:
        return cached

# Skip cache for conversations (context-dependent)

The Buccaneer Problem

Someone asked: "Is Whitebeard a buccaneer?"

My nakama said yes.

"Yes, Whitebeard is considered a buccaneer in the One Piece universe..."

Wrong. In One Piece, "Buccaneer" isn't just a pirate term - it's a race. Kuma is the last confirmed Buccaneer. Whitebeard is human.

(There's a fan theory that Whitebeard might have Buccaneer blood - similar build to Kuma, and Oda dropped some hints in Chapter 1100. Not confirmed though.)

So what happened? The retrieval found chunks about Whitebeard, not about the Buccaneer race. And without that context, the LLM fell back to what it knows: "buccaneer" = pirate. Technically correct in the real world, completely wrong in One Piece.

This is the thing about domain-specific terminology. One Piece has its own dictionary - Devil Fruits, Haki, specific races. If your data doesn't cover these terms explicitly, your RAG will hallucinate real-world meanings.

I fixed it two ways.

More data. I scraped wiki pages specifically about One Piece races:

WIKI_PAGES = [
    "Buccaneer",
    "Lunaria",
    "Gyojin",  # Fishmen
    "Bartholomew_Kuma",
    # ...
]

Better prompts. Your system prompt isn't just instructions - it's context the LLM always sees, regardless of what gets retrieved. I added a terminology section:

SYSTEM_PROMPT = """You are an expert on One Piece...

CRITICAL ONE PIECE TERMINOLOGY:
- "Buccaneer" is a RACE of people, NOT just a pirate term.
  Bartholomew Kuma is the last known Buccaneer.
- "Lunarian" is a race from the Red Line (King is a Lunarian)
- "Fishmen" and "Merfolk" are aquatic races
...
"""

Now when someone asks about Whitebeard and buccaneers:

"No, Whitebeard (Edward Newgate) is not a Buccaneer. In the One Piece universe, Buccaneers are a specific race of people, with Bartholomew Kuma being the last known representative..."

Much better.

What I Learned Building This

Domain vocabulary will break your RAG. One Piece has its own dictionary - Devil Fruits, Haki, races like Buccaneers. If your data doesn't define these terms, your RAG will hallucinate real-world meanings. Before building, list the weird terms your users might ask about.

Your system prompt is part of your knowledge base. I used to think prompts were just instructions. They're not - it's context the LLM always sees. The terminology section in my prompt catches stuff that retrieval misses.

Easy questions won't expose problems. "Who is Luffy?" works on day one. "Is Whitebeard a buccaneer?" breaks on day 30. Build a test set of weird questions early.

You're never done. I thought I'd build it, deploy it, move on. Nope. Data → Test → Fail → Fix → Repeat. The Buccaneer problem showed up weeks after I thought I was finished.

Going to Production

Building a RAG that works on your laptop is one thing. Putting it in front of real users? Different game.

Monitor what fails. Track which questions get weird answers. The Buccaneer problem? I only found it because someone told me. In production, you need logs. What queries returned low-relevance chunks? What made users ask again? That's your debugging goldmine.

Teach it to say "I don't know". A confident wrong answer is worse than admitting ignorance. I added a relevance threshold - if the best chunks score below 0.3, my nakama says it doesn't have enough info instead of guessing.

Plan for data updates. One Piece isn't finished. New episodes, new manga chapters, new lore. How do you add data without re-embedding everything? I batch new content separately and merge it. Not perfect, but it works.

Watch your costs. Every question = embedding call + LLM call. At scale, this adds up fast. Caching helps (I mentioned this earlier), but also consider: do you really need GPT-4 for every query? Sometimes a smaller model is enough.

Rate limit everything. Someone will try to break your system. Rate limiting isn't just about costs - it's about not letting one curious user crash your database with 1000 queries.

136K chunks sounds like a lot, but you can always add more. Just be careful - bad data creates bad answers. Quality > quantity, always.

When a Nakama Isn't the Answer

Could I just Google my One Piece questions? Yes. Is this over-engineering? Maybe.

When Google is enough:

Simple factual lookups ("When did Ace die?")
Checking episode numbers
Finding character images

When a nakama shines:

Questions that span multiple sources ("What connections exist between Joy Boy and the Buccaneers?")
Conversational follow-ups ("What about his Devil Fruit?")
Domain-specific terminology (the Buccaneer problem!)
Having a companion who understands context

I didn't build this just to answer questions. Using One Piece made it fun to show how RAG actually works - and how it fails. The Buccaneer problem says it all.

And honestly? It's fun having a nakama who knows One Piece as well as I do.

Build Your Own Nakama

Want to try it yourself? The code is open source.

Tech Stack:

FastAPI + Python (Backend)
PostgreSQL + pgvector (Vector DB)
sentence-transformers (Embeddings)
Next.js (Frontend)

git clone https://github.com/codeanding/onepiece-rag
cd onepiece-rag
docker-compose up -d
# Visit http://localhost:3000

Check the README for detailed setup instructions.

Final Thoughts

Building this showed me the best way to explain RAG - use a domain you love. The Buccaneer problem wasn't a bug; it was the perfect example of how RAG fails.

Every question it can't answer is a chance to teach it something new. That's the fun part.

If you're thinking about building a RAG system, my advice: pick a domain you love. You'll be debugging it at 2am, and you should at least enjoy the data.

Now if you'll excuse me, I need to go ask my nakama about the Void Century.

In the next entry, we'll explore a more real use case - RAG for the healthcare industry. Stay tuned!

Keep coding and learning!

Building My Own Nakama with RAG

Giving My Nakama a Brain

The Data Sources

The Chunking Challenge

Cleaning the Data

The Architecture

RAG Best Practices (What I Actually Implemented)

1. Hybrid Search: Don't Choose Between Semantic and Keyword

2. Two-Stage Retrieval: Fast First, Accurate Second

3. Overlapping Chunks: Don't Lose Context at Boundaries

4. Lazy-Load Heavy Models

5. Cache, But Be Smart About It

The Buccaneer Problem

What I Learned Building This

Going to Production

When a Nakama Isn't the Answer

Build Your Own Nakama

Final Thoughts

Comments

More from this blog

Deploying a clinical RAG to AWS with Terraform

Why my clinical RAG is mostly SQL

Supabase in Practice: Auth, Database, and Storage

From Life Moments to Songs

Command Palette

Giving My Nakama a Brain

The Data Sources

The Chunking Challenge

Cleaning the Data

The Architecture

RAG Best Practices (What I Actually Implemented)

1. Hybrid Search: Don't Choose Between Semantic and Keyword

2. Two-Stage Retrieval: Fast First, Accurate Second

3. Overlapping Chunks: Don't Lose Context at Boundaries

4. Lazy-Load Heavy Models

5. Cache, But Be Smart About It

The Buccaneer Problem

What I Learned Building This

Going to Production

When a Nakama Isn't the Answer

Build Your Own Nakama

Final Thoughts

Comments

More from this blog