How to Create Open-Source AI Apps with LangChain

The world is evolving toward Large Language Models (LLMs), and every company wants to embrace the power of generative AI. Hence, there is a big surge in LLM-powered applications for various custom tasks.

This has given rise to AI frameworks, like LangChain that simplify building custom AI applications using LLMs — even ones that handle specific tasks. Today, we’re walking you through how to build AI applications using LangChain, complete with a full tutorial.

What is LangChain?

LangChain is an open-source AI framework for AI/ML/data engineers to develop sophisticated AI-driven applications powered by LLMs. Developers use LangChain since it enables reduced coding complexity and innovation by providing the application's building blocks and components. LangChain bridges the gap between LLMs and real-world applications. LangChain facilitates the integration of language models with all the required components including external databases (like a vector database), logic, APIs, etc., enhancing the capabilities of LLMs’ problem-solving strategies.

LangChain is composed of six modules/components.

Large Language Models. LangChain serves as a standard interface that allows for interactions with a wide range of LLMs and interfaces from OpenAI to HuggingFace.
Prompt construction. LangChain offers a variety of classes and functions designed to simplify the process of creating and handling prompts.
Conversational memory. LangChain incorporates memory modules that enable the management and alteration of past chat conversations, a key feature for chatbots that need to recall previous interactions.
Intelligent agents. LangChain equips agents with a comprehensive toolkit, allowing agents to choose which tools to utilize based on user input.
Indexes. Indexes in LangChain are methods for organizing documents in a manner that facilitates effective interaction with LLMs.
Chains. While using a single LLM may be sufficient for simpler tasks, LangChain provides a standard interface and some commonly used implementations for chaining LLMs together for more complex applications — either among themselves or with other specialized modules.

All these modules help AI/ML engineers build robust AI applications. LangChain’s adoption is skyrocketing because of the advantages it provides over traditional frameworks that are more complex.

AI applications with LangChain: Tutorial

We will install and import the LangChain framework. Then, we will load a publicly available PDF, split the content of the PDF and store the chunks in a vector store like SingleStore. Finally, we’ll ask a query and retrieve the most accurate response for the query.

We will use the SingleStore Notebook feature to run our commands. SingleStore Notebooks are just like Jupyter Notebooks and Google Colab where you can do data analytics, machine learning, data exploration and build real-time AI applications.

Activate your free SingleStore trial to get started with Notebooks.You will receive $600 worth of free credits — you can also try our Free Shared Tier. The sign-up flow will also guide you to create your workspace by default. You can skip this step if you want, but we recommend you don’t.

Click the Continue button and you will see your workspace being deployed and ready for your project. By default, the name of your workspace will be ‘my-workspace.’

In the main dashboard, click on the Develop tab and create a blank Notebook.

Next, click on New Notebook > New Notebook.

Create your Notebook with any name you’d like. You can also choose between personal or shared. Personal is good when you want to keep the admin access of the Notebook to yourself, and shared is ideal when you are working in a group and would like to share your Notebook with others.

Start working on your project. Once the Notebook playground/dashboard is ready, start running the following commands.

Install the required libraries from LangChain

1!pip install langchain-community langchain-core --quiet2!pip install -U langchain-text-splitters --quiet

Load the PDF

1from langchain_community.document_loaders import PyPDFLoader2loader = PyPDFLoader("http://leavcom.com/pdf/DBpdf.pdf")3pages = loader.load_and_split()4data = loader.load()

Split and read the content of the PDF

1from langchain.text_splitter import RecursiveCharacterTextSplitter2print (f"You have {len(data)} document(s) in your data")3print (f"There are {len(data[0].page_content)} characters in your 4document")5
6text_splitter = RecursiveCharacterTextSplitter(chunk_size=2000,7chunk_overlap=0)8texts = text_splitter.split_documents(data)9print (f"You have {len(texts)} pages")

Set up the database to store the contents of our PDF

1%%sql2DROP DATABASE IF EXISTS lang_db;3CREATE DATABASE IF NOT EXISTS lang_db;4
5%%sql6DROP TABLE IF EXISTS pdf_docs1;7CREATE TABLE IF NOT EXISTS pdf_docs1 (8    id INT PRIMARY KEY,9    content TEXT CHARACTER SET utf8mb4 COLLATE utf8mb4_general_ci,10    vector BLOB11);

Mention the OpenAI API Key

1import os2import getpass3
4os.environ["OPENAI_API_KEY"] = getpass.getpass("OpenAI API Key:")

Create embeddings and insert them into the database

1import json2import sqlalchemy as sa3from langchain.embeddings import OpenAIEmbeddings4from singlestoredb import create_engine5
6conn = create_engine().connect()7
8embedder = OpenAIEmbeddings()9
10# Fetch all embeddings in one call11embeddings = embedder.embed_documents([doc.page_content for doc in 12texts])13
14# Build query parameters15params = []16for i, (text_content, embedding) in enumerate(zip(texts, embeddings)):17    params.append(dict(id=i+1, content=text_content, 18vector=json.dumps(embedding)))19
20stmt = sa.text("""21    INSERT INTO pdf_docs1 (22        id,23        content,24        vector25    )26    VALUES (27        :id,28        :content,29        JSON_ARRAY_PACK_F32(:vector)30    )31""")32
33conn.execute(stmt, params)

Check the embedding data

1%%sql2SELECT JSON_ARRAY_UNPACK_F32(vector) as vector3FROM pdf_docs14LIMIT 1;

Ask the query

1query_text = "Will object-oriented databases be commercially 2successful?"3
4query_embedding = embedder.embed_documents([query_text])[0]5
6stmt = sa.text("""7    SELECT8        content,9        DOT_PRODUCT_F32(JSON_ARRAY_PACK_F32(:embedding), vector) AS10score11    FROM pdf_docs112    ORDER BY score DESC13    LIMIT 114""")15results = conn.execute(stmt, 16dict(embedding=json.dumps(query_embedding)))17
18for row in results:19    print(row[0])

Find the most similar text from the PDF document

1import openai2
3client = openai.OpenAI()4
5prompt = f"The user asked: {query_text}. The most similar text from 6the document is: {row[0]}"7
8response = client.chat.completions.create(9    model="gpt-3.5-turbo",10    messages=[11        {"role": "system", "content": "You are a helpful assistant."},12        {"role": "user", "content": prompt}13    ]14)15
16print(response.choices[0].message.content)

You can also refer to the complete notebook code here.

LangChain has quickly become one of the robust frameworks to build AI applications. It supports your AI application by provisioning the toolkit required for efficient working. Using a data platform like SingleStore facilitates the efficient storage of vector data, while hybrid search enables fast and accurate retrieval for both vector embeddings and text search.

It's time to build exciting AI applications. Get started with SingleStore.