Chat With PDF Using LangChain ChatGPT API And Python Streamlit

Hello Learners…

Welcome to the blog…

Table Of Contents

  • Introduction
  • Chat With PDF Using LangChain ChatGPT API And Python Streamlit
  • Summary
  • References

Introduction

This post discusses how we can Chat With PDF Using LangChain ChatGPT API And Python Streamlit.

Chat With PDF Using LangChain ChatGPT API And Python Streamlit

Here we implement how to Chat With PDF Using LangChain and Streamlit Python. This is a simple example in which we create a web application to chat with pdf files using Python And Streamlit.

Here we use the OpenAI ChatGPT API to generate the answers.

NOTE: Here we use paid OpenAI Key, you don’t need to buy an API key for personal learning if you have any further plans then you can buy it.

You can buy the OpenAI API key from the below,

Now we are going to create a web interface using the python streamlit app framework.

First, we have to install the required libraries, Open the terminal create a Python virtual environment if you are comfortable with venv and activate it then run the below command.

pip install streamlit langchain openai tiktoken pymupdf chromadb

Next, create an app.py file and paste the below code into this file.

import streamlit as st
from langchain.llms import OpenAI
from langchain.text_splitter import CharacterTextSplitter
from langchain.embeddings import OpenAIEmbeddings
from langchain.vectorstores import Chroma
from langchain.chains import RetrievalQA
import re
import fitz

def preprocess(text):
    text = text.replace('\n', ' ')
    text = re.sub('\s+', ' ', text)
    return text

def pdf_to_text(path, start_page=1, end_page=None):
    doc = fitz.open(path)
    total_pages = doc.page_count

    if end_page is None:
        end_page = total_pages

    text_list = []

    for i in range(start_page - 1, end_page):
        text = doc.load_page(i).get_text("text")
        text = preprocess(text)
        text_list.append(text)

    doc.close()
    return text_list

def generate_response(uploaded_file, openai_api_key, query_text):
    # Load document if file is uploaded
    if uploaded_file is not None:
        documents=pdf_to_text(uploaded_file)
        
        # Split documents into chunks
        text_splitter = CharacterTextSplitter(chunk_size=1000, chunk_overlap=0)
        texts = text_splitter.create_documents(documents)
        # Select embeddings
        embeddings = OpenAIEmbeddings(openai_api_key=openai_api_key)
        # Create a vectorstore from documents
        db = Chroma.from_documents(texts, embeddings)
        # Create retriever interface
        retriever = db.as_retriever()
        # Create QA chain
        qa = RetrievalQA.from_chain_type(llm=OpenAI(openai_api_key=openai_api_key), chain_type='stuff', retriever=retriever)
        return qa.run(query_text)

# Page title
st.set_page_config(page_title='🦜🔗 Ask the Doc App')
st.title('🦜🔗 Ask the Doc App')

# File upload
uploaded_file = st.file_uploader('Upload an article', type='pdf')
# Query text
query_text = st.text_input('Enter your question:', placeholder = 'Please provide a short summary.', disabled=not uploaded_file)

# Form input and query
result = []
with st.form('myform', clear_on_submit=True):
    openai_api_key = st.text_input('OpenAI API Key', type='password', disabled=not (uploaded_file and query_text))
    submitted = st.form_submit_button('Submit', disabled=not(uploaded_file and query_text))
    if submitted and openai_api_key.startswith('sk-'):
        with st.spinner('Calculating...'):
            response = generate_response(uploaded_file, openai_api_key, query_text)
            result.append(response)
            del openai_api_key

if len(result):
    st.info(response)

After that run the file using the below command

streamlit run app.py

Chat With PDF Using LangChain and Streamlit Python Demo

We get two URLs and it’s automatically redirected to the default browser if not then open this URL in any browser and you can see as below,

Here we can upload our PDF file and enter our questions after that also we have to provide our paid OpenAI API key and then click on submit button and we get our response

Here we use a book on NLP, we provide a pdf book that contains 10 pages and after that, we asked the question ‘What is NLP’,

We can get the response below

Now we can upload any pdf files and ask our questions and we get the answers from the pdf file.

The above API Call (app.py) Charges $0.04 Approx. This Charge is for just one question.

It will take more charges if we pass bigger PDF files. Also here we use OpenAI Embdding which also takes charge of embeddings.

You can refer to this for more about OpenAI Pricing,

Now you can play with different PDF files and create a system for your document files.

Download Full Source Code:

Summary

In Summary, the integration of LangChain, ChatGPT API, and Streamlit in a Python application enables seamless communication and interaction with PDF documents. This combination empowers users to extract valuable information, perform text analysis, and have dynamic conversations with PDF content

Also, you can read,

References

1 thought on “Chat With PDF Using LangChain ChatGPT API And Python Streamlit”

Leave a Comment