How to Create Open-Source AI Apps with LangChain

PB

Pavan Belagatti

Developer Evangelist

The world is evolving toward Large Language Models (LLMs), and every company wants to embrace the power of generative AI. Hence, there is a big surge in LLM-powered applications for various custom tasks.

How to Create Open-Source AI Apps with LangChain

This has given rise to AI frameworks, like LangChain that simplify building custom AI applications using LLMs — even ones that handle specific tasks. Today, we’re walking you through how to build AI applications using LangChain, complete with a full tutorial. 

what-is-lang-chainWhat is LangChain?

LangChain is an open-source AI framework for AI/ML/data engineers to develop sophisticated AI-driven applications powered by LLMs. Developers use LangChain since it enables reduced coding complexity and innovation by providing the application's building blocks and components. LangChain bridges the gap between LLMs and real-world applications. LangChain facilitates the integration of language models with all the required components including external databases (like a vector database), logic, APIs, etc., enhancing the capabilities of LLMs’ problem-solving strategies.

LangChain is composed of six modules/components.

Image credits: ByteByteGo

  • Large Language Models. LangChain serves as a standard interface that allows for interactions with a wide range of LLMs and interfaces from OpenAI to HuggingFace.
  • Prompt construction. LangChain offers a variety of classes and functions designed to simplify the process of creating and handling prompts.
  • Conversational memory. LangChain incorporates memory modules that enable the management and alteration of past chat conversations, a key feature for chatbots that need to recall previous interactions.
  • Intelligent agents. LangChain equips agents with a comprehensive toolkit, allowing agents to choose which tools to utilize based on user input.
  • Indexes. Indexes in LangChain are methods for organizing documents in a manner that facilitates effective interaction with LLMs.
  • Chains. While using a single LLM may be sufficient for simpler tasks, LangChain provides a standard interface and some commonly used implementations for chaining LLMs together for more complex applications — either among themselves or with other specialized modules.

All these modules help AI/ML engineers build robust AI applications. LangChain’s adoption is skyrocketing because of the advantages it provides over traditional frameworks that are more complex.

ai-applications-with-lang-chain-tutorialAI applications with LangChain: Tutorial

We will install and import the LangChain framework. Then, we will load a publicly available PDF, split the content of the PDF and store the chunks in a vector store like SingleStore. Finally, we’ll ask a query and retrieve the most accurate response for the query.

We will use the SingleStore Notebook feature to run our commands. SingleStore Notebooks are just like Jupyter Notebooks and Google Colab where you can do data analytics, machine learning, data exploration and build real-time AI applications.

Activate your free SingleStore trial to get started with Notebooks.You will receive $600 worth of free credits — you can also try our Free Shared Tier. The sign-up flow will also guide you to create your workspace by default. You can skip this step if you want, but we recommend you don’t.

Click the Continue button and you will see your workspace being deployed and ready for your project. By default, the name of your workspace will be ‘my-workspace.’

In the main dashboard, click on the Develop tab and create a blank Notebook.

Next, click on New Notebook > New Notebook.

Create your Notebook with any name you’d like. You can also choose between personal or shared. Personal is good when you want to keep the admin access of the Notebook to yourself, and shared is ideal when you are working in a group and would like to share your Notebook with others.

Start working on your project. Once the Notebook playground/dashboard is ready, start running the following commands.

Install the required libraries from LangChain

!pip install langchain-community langchain-core --quiet
!pip install -U langchain-text-splitters --quiet

Load the PDF

from langchain_community.document_loaders import PyPDFLoader
loader = PyPDFLoader("http://leavcom.com/pdf/DBpdf.pdf")
pages = loader.load_and_split()
data = loader.load()

Split and read the content of the PDF

from langchain.text_splitter import RecursiveCharacterTextSplitter
print (f"You have {len(data)} document(s) in your data")
print (f"There are {len(data[0].page_content)} characters in your
document")
text_splitter = RecursiveCharacterTextSplitter(chunk_size=2000,
chunk_overlap=0)
texts = text_splitter.split_documents(data)
print (f"You have {len(texts)} pages")

Set up the database to store the contents of our PDF

%%sql
DROP DATABASE IF EXISTS lang_db;
CREATE DATABASE IF NOT EXISTS lang_db;
%%sql
DROP TABLE IF EXISTS pdf_docs1;
CREATE TABLE IF NOT EXISTS pdf_docs1 (
id INT PRIMARY KEY,
content TEXT CHARACTER SET utf8mb4 COLLATE utf8mb4_general_ci,
vector BLOB
);

Mention the OpenAI API Key

import os
import getpass
os.environ["OPENAI_API_KEY"] = getpass.getpass("OpenAI API Key:")

Create embeddings and insert them into the database

import json
import sqlalchemy as sa
from langchain.embeddings import OpenAIEmbeddings
from singlestoredb import create_engine
conn = create_engine().connect()
embedder = OpenAIEmbeddings()
# Fetch all embeddings in one call
embeddings = embedder.embed_documents([doc.page_content for doc in
texts])
# Build query parameters
params = []
for i, (text_content, embedding) in enumerate(zip(texts, embeddings)):
params.append(dict(id=i+1, content=text_content,
vector=json.dumps(embedding)))
stmt = sa.text("""
INSERT INTO pdf_docs1 (
id,
content,
vector
)
VALUES (
:id,
:content,
JSON_ARRAY_PACK_F32(:vector)
)
""")
conn.execute(stmt, params)

Check the embedding data

%%sql
SELECT JSON_ARRAY_UNPACK_F32(vector) as vector
FROM pdf_docs1
LIMIT 1;

Ask the query

query_text = "Will object-oriented databases be commercially
successful?"
query_embedding = embedder.embed_documents([query_text])[0]
stmt = sa.text("""
SELECT
content,
DOT_PRODUCT_F32(JSON_ARRAY_PACK_F32(:embedding), vector) AS
score
FROM pdf_docs1
ORDER BY score DESC
LIMIT 1
""")
results = conn.execute(stmt,
dict(embedding=json.dumps(query_embedding)))
for row in results:
print(row[0])

Find the most similar text from the PDF document

import openai
client = openai.OpenAI()
prompt = f"The user asked: {query_text}. The most similar text from
the document is: {row[0]}"
response = client.chat.completions.create(
model="gpt-3.5-turbo",
messages=[
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": prompt}
]
)
print(response.choices[0].message.content)

You can also refer to the complete notebook code here.

LangChain has quickly become one of the robust frameworks to build AI applications. It supports your AI application by provisioning the toolkit required for efficient working. Using a data platform like SingleStore facilitates the efficient storage of vector data, while hybrid search enables fast and accurate retrieval for both vector embeddings and text search. 

It's time to build exciting AI applications. Get started with SingleStore.


Share