The world is evolving toward Large Language Models (LLMs), and every company wants to embrace the power of generative AI. Hence, there is a big surge in LLM-powered applications for various custom tasks.
This has given rise to AI frameworks, like LangChain that simplify building custom AI applications using LLMs — even ones that handle specific tasks. Today, we’re walking you through how to build AI applications using LangChain, complete with a full tutorial.
What is LangChain?
LangChain is an open-source AI framework for AI/ML/data engineers to develop sophisticated AI-driven applications powered by LLMs. Developers use LangChain since it enables reduced coding complexity and innovation by providing the application's building blocks and components. LangChain bridges the gap between LLMs and real-world applications. LangChain facilitates the integration of language models with all the required components including external databases (like a vector database), logic, APIs, etc., enhancing the capabilities of LLMs’ problem-solving strategies.
LangChain is composed of six modules/components.
- Large Language Models. LangChain serves as a standard interface that allows for interactions with a wide range of LLMs and interfaces from OpenAI to HuggingFace.
- Prompt construction. LangChain offers a variety of classes and functions designed to simplify the process of creating and handling prompts.
- Conversational memory. LangChain incorporates memory modules that enable the management and alteration of past chat conversations, a key feature for chatbots that need to recall previous interactions.
- Intelligent agents. LangChain equips agents with a comprehensive toolkit, allowing agents to choose which tools to utilize based on user input.
- Indexes. Indexes in LangChain are methods for organizing documents in a manner that facilitates effective interaction with LLMs.
- Chains. While using a single LLM may be sufficient for simpler tasks, LangChain provides a standard interface and some commonly used implementations for chaining LLMs together for more complex applications — either among themselves or with other specialized modules.
All these modules help AI/ML engineers build robust AI applications. LangChain’s adoption is skyrocketing because of the advantages it provides over traditional frameworks that are more complex.
AI applications with LangChain: Tutorial
We will install and import the LangChain framework. Then, we will load a publicly available PDF, split the content of the PDF and store the chunks in a vector store like SingleStore. Finally, we’ll ask a query and retrieve the most accurate response for the query.
We will use the SingleStore Notebook feature to run our commands. SingleStore Notebooks are just like Jupyter Notebooks and Google Colab where you can do data analytics, machine learning, data exploration and build real-time AI applications.
Activate your free SingleStore trial to get started with Notebooks.You will receive $600 worth of free credits — you can also try our Free Shared Tier. The sign-up flow will also guide you to create your workspace by default. You can skip this step if you want, but we recommend you don’t.
Click the Continue button and you will see your workspace being deployed and ready for your project. By default, the name of your workspace will be ‘my-workspace.’
In the main dashboard, click on the Develop tab and create a blank Notebook.
Next, click on New Notebook > New Notebook.
Create your Notebook with any name you’d like. You can also choose between personal or shared. Personal is good when you want to keep the admin access of the Notebook to yourself, and shared is ideal when you are working in a group and would like to share your Notebook with others.
Start working on your project. Once the Notebook playground/dashboard is ready, start running the following commands.
Install the required libraries from LangChain
!pip install langchain-community langchain-core --quiet!pip install -U langchain-text-splitters --quiet
Load the PDF
from langchain_community.document_loaders import PyPDFLoaderloader = PyPDFLoader("http://leavcom.com/pdf/DBpdf.pdf")pages = loader.load_and_split()data = loader.load()
Split and read the content of the PDF
from langchain.text_splitter import RecursiveCharacterTextSplitterprint (f"You have {len(data)} document(s) in your data")print (f"There are {len(data[0].page_content)} characters in yourdocument")text_splitter = RecursiveCharacterTextSplitter(chunk_size=2000,chunk_overlap=0)texts = text_splitter.split_documents(data)print (f"You have {len(texts)} pages")
Set up the database to store the contents of our PDF
%%sqlDROP DATABASE IF EXISTS lang_db;CREATE DATABASE IF NOT EXISTS lang_db;%%sqlDROP TABLE IF EXISTS pdf_docs1;CREATE TABLE IF NOT EXISTS pdf_docs1 (id INT PRIMARY KEY,content TEXT CHARACTER SET utf8mb4 COLLATE utf8mb4_general_ci,vector BLOB);
Mention the OpenAI API Key
import osimport getpassos.environ["OPENAI_API_KEY"] = getpass.getpass("OpenAI API Key:")
Create embeddings and insert them into the database
import jsonimport sqlalchemy as safrom langchain.embeddings import OpenAIEmbeddingsfrom singlestoredb import create_engineconn = create_engine().connect()embedder = OpenAIEmbeddings()# Fetch all embeddings in one callembeddings = embedder.embed_documents([doc.page_content for doc intexts])# Build query parametersparams = []for i, (text_content, embedding) in enumerate(zip(texts, embeddings)):params.append(dict(id=i+1, content=text_content,vector=json.dumps(embedding)))stmt = sa.text("""INSERT INTO pdf_docs1 (id,content,vector)VALUES (:id,:content,JSON_ARRAY_PACK_F32(:vector))""")conn.execute(stmt, params)
Check the embedding data
%%sqlSELECT JSON_ARRAY_UNPACK_F32(vector) as vectorFROM pdf_docs1LIMIT 1;
Ask the query
query_text = "Will object-oriented databases be commerciallysuccessful?"query_embedding = embedder.embed_documents([query_text])[0]stmt = sa.text("""SELECTcontent,DOT_PRODUCT_F32(JSON_ARRAY_PACK_F32(:embedding), vector) ASscoreFROM pdf_docs1ORDER BY score DESCLIMIT 1""")results = conn.execute(stmt,dict(embedding=json.dumps(query_embedding)))for row in results:print(row[0])
Find the most similar text from the PDF document
import openaiclient = openai.OpenAI()prompt = f"The user asked: {query_text}. The most similar text fromthe document is: {row[0]}"response = client.chat.completions.create(model="gpt-3.5-turbo",messages=[{"role": "system", "content": "You are a helpful assistant."},{"role": "user", "content": prompt}])print(response.choices[0].message.content)
You can also refer to the complete notebook code here.
LangChain has quickly become one of the robust frameworks to build AI applications. It supports your AI application by provisioning the toolkit required for efficient working. Using a data platform like SingleStore facilitates the efficient storage of vector data, while hybrid search enables fast and accurate retrieval for both vector embeddings and text search.
It's time to build exciting AI applications. Get started with SingleStore.