Building Your Own Basic RAG Pipeline with LangChain and Llama3 - Sahaj Software Solutions (2024)

Building Your Own Basic RAG Pipeline with LangChain and Llama3 - Sahaj Software Solutions (1) Tarannum S

Llm RAG

According to Wikipedia, Retrieval Augmented Generation (RAG) can be defined as a technique that grants generative artificial intelligence models information retrieval capabilities. It modifies interactions with a Large Language Model (LLM) so that the model responds to user queries with reference to a specified set of documents, using this information to augment information drawn from its own vast, static training data.

So, what is a RAG pipeline? In simple words, a RAG pipeline retrieves texts from a retriever, asks a text generation model to augment its response. Describing more, it retrieves relevant documents or data chunks from a large corpus using a retriever, then feeds this information into a language model to generate a response, getting a contextually accurate response.

We could split the whole RAG pipeline into two flows: Ingestion of data and Querying

Building Your Own Basic RAG Pipeline with LangChain and Llama3 - Sahaj Software Solutions (2)

How do I tell my Large Language Model (LLM) that it needs to search for answers from the data I have provided? You can ingest data into a vector store and get answers using an LLM by invoking it with a query and retrieved context.

While building a RAG pipeline using OpenAI’s GPT may offer a simple solution, it involves sending your data to OpenAI’s servers. It raises security and privacy concerns and if confidentiality of information is the topmost priority then hosting your own LLM is the best solution to ensure data privacy and security.

If we split the RAG flow, ingestion could be split into these parts:

  1. Load any document and extract text from it

  2. Chunk the document text into smaller parts

  3. Create embeddings for the Text

  4. Store it in a vector store of your choice

If we could split the retriever part of RAG flow,

  1. Pass in a query

  2. Use the query to search for context in our vector store

  3. Take the query and the context and ask LLM to provide an answer

Voila, you’ve built yourself a RAG chain.

Basically, you convert your data into vectors as shown in the image below for machines to understand. The text is converted into vectors to capture its meaning. Similar ideas end up with similar vectors, making it easier to allow the system to measure and compare the semantic similarity of different pieces of text. This mathematical representation enables efficient retrieval and reasoning over text data based on the meaning it encodes.

Building Your Own Basic RAG Pipeline with LangChain and Llama3 - Sahaj Software Solutions (3)

Chunking your document helps in accuracy. When large documents are split into smaller chunks, each chunk can be individually scored for relevance. This granular approach makes it easier for the retrieval model to focus on the specific parts of a document that are most relevant to the query. Make sure to not chunk it into very small chunks because the chunk might lose its meaning.

LangChain is a framework to get to these steps easier. It simplifies your process.

Enough talking now… should we look at some code?

If you want to host your LLM locally, you can use Ollama to get the LLM of your choice up.

  1. Install Ollama

  2. Run the command
    ollama run llama3

Let’s do some initial setup.

  1. We will be using ChromaDB as our vector store. Let’s set it up.

  2. Set up Ollama in your machine and use the model needed (Llama3 in our case)

  3. Declare your ChromaDB client, Embedding model, and give a collection name.

llm = Ollama("llama3")client = chromadb.HttpClient(host="localhost", port=8083)embedding_function = HuggingFaceBgeEmbeddings()collection_name = "test-collection"langchain_chroma = Chroma( client=client, collection_name=collection_name, embedding_function=embedding_function,)

The populate_data method will store the data for a folder path passed as argument.

  1. TextLoader, a document loader provided by LangChain will load only .txt files

  2. RecursiveCharacterTextSplitter is a text splitter which splits text based on chunk size and overlap size you have provided.

  3. It will create a list of Documents with filename as metadata and store them in ChromaDB. LangChain will take care of creation of collection, indexing and storing.

def populate_data(filenames): all_docs = [] for file_name in filenames: loader = TextLoader(file_name) documents = loader.load()text_splitter = RecursiveCharacterTextSplitter(chunk_size=2000, chunk_overlap=200) docs: list[Document] = text_splitter.split_documents(documents) for doc in docs: doc.metadata = { "file_name": file_name } all_docs += docs langchain_chroma.add_documents(documents=all_docs)

The retrieve_answers method will take in a question and return a response. We have pulled the basic prompt from LangChain hub, but you can pass in customisable prompts.

  1. context | format_docs passes the question through the retriever(Chroma), extracting Document objects, and then to format_docs to generate strings

  2. RunnablePassthrough() passes through the input question unchanged.

  3. The input to prompt is expected to be a dict with keys "context" and "question"

  4. The last step is where StrOutputParser() extracts the string content from the LLM’s output response.

  5. You invoke the rag_chain to retrieve answers.

def retrieve_answers(question): context = langchain_chroma.as_retriever() def format_docs(docs: list[Document]) -> str: """Convert document page content to string by replacing new line character.""" return "\n\n".join(doc.page_content for doc in docs) rag_prompt = hub.pull("rlm/rag-prompt") rag_chain = ( {"context": context | format_docs, "question": RunnablePassthrough()} | rag_prompt | llm | StrOutputParser() ) result = rag_chain.invoke(question) return result
import chromadbfrom langchain import hubfrom langchain_chroma import Chromafrom langchain_community.document_loaders import TextLoaderfrom langchain_community.embeddings import HuggingFaceBgeEmbeddingsfrom langchain_community.llms.ollama import Ollamafrom langchain_core.documents import Documentfrom langchain_core.output_parsers import StrOutputParserfrom langchain_core.runnables import RunnablePassthroughfrom langchain_core.vectorstores import VectorStoreRetrieverfrom langchain_text_splitters import RecursiveCharacterTextSplitterllm = Ollama("llama3")client = chromadb.HttpClient(host="localhost", port=8083)embedding_function = HuggingFaceBgeEmbeddings()collection_name = "test-collection"langchain_chroma = Chroma( client=client, collection_name=collection_name, embedding_function=embedding_function,)def populate_data(filenames): all_docs = [] for file_name in filenames: loader = TextLoader(file_name) documents = loader.load() text_splitter = RecursiveCharacterTextSplitter(chunk_size=2000, chunk_overlap=200) docs: list[Document] = text_splitter.split_documents(documents) for doc in docs: doc.metadata = { "file_name": file_name } all_docs += docs langchain_chroma.add_documents(documents=all_docs)def retrieve_answers(question): retriever: VectorStoreRetriever = langchain_chroma.as_retriever() def format_docs(docs: list[Document]) -> str: """Convert document page content to string by replacing new line character.""" return "\n\n".join(doc.page_content for doc in docs) rag_prompt = hub.pull("rlm/rag-prompt") rag_chain = ( RunnablePassthrough.assign(context=(lambda x: format_docs(x["context"]))) | rag_prompt | llm | StrOutputParser() ) result = rag_chain.invoke(question) return result
Building Your Own Basic RAG Pipeline with LangChain and Llama3 - Sahaj Software Solutions (2024)
Top Articles
Latest Posts
Recommended Articles
Article information

Author: Greg Kuvalis

Last Updated:

Views: 5993

Rating: 4.4 / 5 (55 voted)

Reviews: 94% of readers found this page helpful

Author information

Name: Greg Kuvalis

Birthday: 1996-12-20

Address: 53157 Trantow Inlet, Townemouth, FL 92564-0267

Phone: +68218650356656

Job: IT Representative

Hobby: Knitting, Amateur radio, Skiing, Running, Mountain biking, Slacklining, Electronics

Introduction: My name is Greg Kuvalis, I am a witty, spotless, beautiful, charming, delightful, thankful, beautiful person who loves writing and wants to share my knowledge and understanding with you.