Building an LLM-powered Chatbot That Queries Documents -or- Celebrating the Work of System2's Summer Interns

Welcome to another edition of “In the Minds of Our Analysts.”

At System2, we foster a culture of encouraging our team to express their thoughts, investigate, pen down, and share their perspectives on various topics. This series provides a space for our analysts to showcase their insights.

All opinions expressed by System2 employees and their guests are solely their own and do not reflect the opinions of System2. This post is for informational purposes only and should not be relied upon as a basis for investment decisions. Clients of System2 may maintain positions in the securities discussed in this post.

Today’s post was written by our summer 2023 interns: SHubham Bhalala, Jackson Hejtmanel, Eklavya Jain, and Wesley Shi.


Background

The idea of Artificial Intelligence (AI) dates back to the 1950/60s when researchers like Christopher Strachey developed a program to play checkers. During that time, AI was merely a rule-based algorithm. In the following decades, most of the resources were invested in making these algorithms more optimized.

By the start of this century, computers were built on the same idea of AI and only a few researchers worked on developing intelligent systems. However, with the advent of big data tools and powerful chips, the research began to flourish. This led to a booming era of generative AI where machines can now understand human language and provide answers. The AI that enables this is a Large Language Model (LLM).

Does it sound like a Google search? or maybe a database query?

Well, it is similar to a Google search but with a component of intelligence. This intelligence is nothing but a machine (LLM) that was provided with a lot of data (textual, tables, etc.) and asked to represent words with a bunch of numbers (called vectors or embeddings). It was told how humans have normal conversations and asked to remember how sentences are formed. So, whenever a person asks a question, it finds the relevant data and forms an answer that is human-like.

What’s Generative AI?

https://makeameme.org/meme/i-dont-know-f6a7498754

To formally introduce the concept, LLMs are models that can perform a variety of language processing tasks like information retrieval, text generation, sentiment analysis, and chat-bots. If you’re still confused as to why it has “large” in the name, it’s simply because of the data it has been trained on, which is large (chatGPT 3.5 was trained on 570 GB of textual data).

One application we are interested in is its ability to retrieve information from a pool of documents. Every day, fundamental analysts have to go through a lot of documents having different file formats to stay up to date with the latest happenings. Using LLMs, we can summarize documents within a matter of seconds. By uploading a few news articles and a bunch of recent SEC filings of Apple, analysts can get an overall idea of how Apple is performing. Analysts can go as deep as asking how sales were per category, and get a precise answer.

A popular LLM in the market right now chatGPT (1.5 Billion visits last month) by OpenAI has made their LLM open-source for consumers to use at a very minimal rate ($0.0035 for an input & output of 1k tokens). A few months back Meta also made their LLaMa 2 open for developers to use. We are currently leveraging both models to provide document querying and summarization features with the support of 9 different file formats.

How we use Generative AI?

The recent buzz around Generative AI has been driven by the simplicity of new user interfaces that create highly coherent texts (ChatGPT, Bard) and visually appealing graphics (Dall-E). ChatGPT, in particular, has found great popularity amongst students and the young working population, mostly because of its ability to generate seemingly accurate responses.

https://i.chzbgr.com/full/9767056384/h9ECD4DC9/person-realize-chatgpt-can-do-my-job-realize-chatgpt-can-do-my-job

https://global-uploads.webflow.com/5ef788f07804fb7d78a4127a/64255f3d8f7ba494d4a4d1df_a5XWQgL_460s.jpg

The world of investing has caught up with generative AI and has started incorporating it into their day-to-day activities. GPT’s ability to synthesize breaking news headlines, understand financial data, and judge online sentiments has transformed the market, allowing anyone to become a data analyst.

Well, why doesn’t everyone train a GPT-like large language model for all problems? For a couple of reasons:

  1. Curating data and managing resources can be expensive

  2. Unnecessary labor for marginal improvement.

The existing LLMs, for example, the underlying model of ChatGPT does a pretty good job at almost all text-based tasks, ranging from text summarization to writing code. So, there is no need to spend time and money on building a resource that, in the best case, would do as well as ChatGPT. The good news is while OpenAI is not so “open” anymore, they do have an API platform that offers their latest models.

Our interns used the OpenAI API to build an LLM-powered chatbot that lets users upload documents and query them. Since these answers are used by buy-side analysts to make investment decisions, maintaining the quality of responses is key. Accordingly, the prompts are engineered such that the model spits out precise, non-hallucinated answers along with the traceback of the information source. Being able to trace back the sources of the response builds trust and lets the user verify the response for themselves, in case they really want to. It makes the model much more transparent and explainable.

Things are about to get technical ⚠️.

While building this tool, we came across a brilliant Python framework for large language models called LangChain. It provides all the building blocks that are necessary for creating a chatbot with integrations to OpenAI API, and much more.

We have three basic workflows that run behind our app.

Workspaces & Shared Space:

We created the notion of workspaces in order to keep things organized. One way to organize your files would be to put files for different names under different workspaces. However, some files, like news articles or stock indices’ performances can be shared across these workspaces.

Upload Files:

Another key workflow is related to when a user uploads files. Assuming you know how embeddings work (go here if you don’t), we split pages-long files into chunks, and embed each chunk using OpenAI’s API. These embeddings (vectors of numbers) are stored in something called a vector database that allows for fast retrieval based on similarity or maximal marginal relevance.

Query:

Once the data is embedded, the user is prompted to enter a query. The query is embedded (again, using OpenAI’s API), and a retriever fetches the relevant chunks from our vector database. Finally, a prompt is created using an engineered instruction, the input query, and the retrieved context. The final prompt is fed into ChatGPT and the response is rendered on the app.

Simple enough, right?

This application provides a huge benefit to our clients because of two reasons:

  1. It saves significant brain power and time

  2. It costs next to nothing (Fun fact: It took $8 to build and test the entire app)

While this is just one use case, LLMs can be used to solve or facilitate abundant other problems. All data science teams, across all fields, are developing their own chatbots - either fine-tuned or vanilla (something like ours). A really interesting example is how law firms are using these large-language models to parse collections of documents (Reuters, Techopedia).

What next?

At the heart, LLMs are language processors that get better with more data. There will be always a scope of improvement in terms of building “large” models because we generate terabytes of data every day. However, these LLMs are fruitless without simplistic user interfaces. Recently, companies have been trying to revolutionize the search engine by being much more concise in their results.

Additionally, a lot of chatbots have opened doors to unlimited data by incorporating live web searches which seriously hampers the accuracy of these models. The generated responses have a high degree of hallucination, rendering them useless for real-life use.

Will this kill a lot of jobs?

If you’re reading this on a system having multi qubits quantum computing chip integrated, the answer will be yes, we are close enough to the era of Universal Base Income.

Until then, it will affect a few jobs but not at scale because these models are not complete. They still require a human touch to guide them to perfection.

matei zatreanu