Building a Simple RAG Q&A Assistant with LangChain and OpenAI

I recently explored how to combine LangChain and OpenAI’s GPT models to create an intelligent Retrieval-Augmented Generation (RAG) assistant.

This project can read your local documents, store them in a searchable vector database, and answer questions based on your own text .

What is RAG?

RAG (Retrieval-Augmented Generation) is a powerful technique that combines information retrieval with language generation.

Instead of relying only on the model’s built-in knowledge, RAG allows your assistant to:

Retrieve relevant information from your own documents.
Use that context to generate accurate and grounded answers.

This makes it perfect for building document-based assistants, knowledge bots, and enterprise Q&A tools.

Tech Stack

Here’s what I used:

LangChain — for chaining the retrieval and LLM steps.
OpenAI GPT-4o-mini — as the language model (cheap and fast!).
text-embedding-3-small — for embedding text into vector space.
ChromaDB — as the vector database for storing document chunks.
Python — to tie everything together.

Step-by-Step Implementation

from langchain_openai import OpenAIEmbeddings, ChatOpenAI
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain.vectorstores import Chroma
from langchain.chains import RetrievalQA
from langchain.prompts import PromptTemplate
from langchain_community.document_loaders import TextLoader

# 1️ load document into Document objects
loader = TextLoader("knowledge.txt", encoding="utf-8")
docs = loader.load()

# 2️ split doc to chunk（500 char）
splitter = RecursiveCharacterTextSplitter(chunk_size=500, chunk_overlap=100)

#chunk is list[Document]
chunks = splitter.split_documents(docs)

# 3️ embedding and create vector store
# 
embeddings = OpenAIEmbeddings(model="text-embedding-3-small")

# vectorstore is the knowledge base , 
# where each piece of text (chunks) is stored in vector form
vectorstore = Chroma.from_documents(chunks, embeddings)

# 4 create retriver
# Embed your query
# Compare it with all stored vectors
# Return the top 3 most similar text chunks.
retriever = vectorstore.as_retriever(search_kwargs={"k": 3})

# 5️ LLM
llm = ChatOpenAI(model="gpt-4o-mini", temperature=0.2)

# 6 prompt template
prompt_template = """
You are an intelligent assistant. Use the following context to answer the question.
If the answer is not in the context, say "I don't know based on the given document."

Context:
{context}

Question:
{question}
"""

prompt = PromptTemplate(input_variables=["context", "question"], template=prompt_template)

# 7 create RAG Q/A chaine
qa_chain = RetrievalQA.from_chain_type(
    llm=llm,
    retriever=retriever,
    chain_type_kwargs={"prompt": prompt}
)

# 8 interactive 
print("RAG Documentation Q&A Assistant is launched! (Type 'exit' to exit）")

while True:
    query = input("\nYour Question：")
    if query.lower() == "exit":
        print("Bye！")
        break
    answer = qa_chain.run(query)
    print("Answer：", answer)

Building the knowledge (knowledge.txt)

LangChain is a framework for building applications based on Large Language Models (LLMs).
It allows developers to easily manage prompts, call models, store memory, and interact with external data sources.
LangChain also supports integration with models from OpenAI, Anthropocalypse, Google Gemini, and others.

How It Works

Load your document — for example, knowledge.txt.

Split it into chunks so the model can process long texts efficiently.

Embed and store each chunk in a vector database (Chroma).

When you ask a question, the system:

Embeds your query
Finds the most similar chunks
Sends both your question and the context to GPT

GPT then generates a contextual and accurate answer.

Example Output

> python3 .\rag_local_qa.py 
Your Question：What is LangChain used for?
Answer： LangChain is used for building applications based on Large Language Models (LLMs). It allows developers to manage prompts, call models, store memory, and interact with external data sources.

About chromadb

Chromadb is an lightweight vector database designed for storing and searching vector embeddings, which are numerical representations of data like text and images. very easy to use, very freindly for developer.

to install chromadb:

pip install langchain langchain-openai langchain-community chromadb tiktoken

Final Thoughts

From a business standpoint, building a internal Knowledge Management System with combination RAG + AI model + Langchain + vector database . The organizations can quickly turn their existing documents, manuals, and FAQs into a smart, searchable knowledge assistant. It requires minimal development effort yet delivers huge productivity gains .

Building a Simple RAG Q&A Assistant with LangChain and OpenAI

What is RAG?

Tech Stack

Step-by-Step Implementation

How It Works

Example Output

About chromadb

Final Thoughts

Published by dbaliw

Leave a comment Cancel reply

What is RAG?

Tech Stack

Step-by-Step Implementation

How It Works

Example Output

About chromadb

Final Thoughts

Share this:

Related

Published by dbaliw

Leave a comment Cancel reply