Building Large Language Model-powered AI Applications

A Look at Emerging Technical Stacks and Enabling Technologies

8 min readMay 8, 2023

Challenges of building LLM-powered apps

I want to survey building AI applications powered by large language models and related emerging technologies. I have written several articles (1, 2, 3) on large language models and generative AI. However, there are 2 main challenges in building applications powered by LLMs:

LLM has no memory or state. How can we provide LLM with proper context of our own data
LLM has token limit (usually several K tokens). We cannot feed all data at once (it is limited by the design and expensive even there is no such limit)

Emerging LLM Tech stack

The following article on main components in emerging LLM tech stack. It proposes embedding and LLM programming framework before LLM endpoint.

LLMs and the Emerging ML Tech Stack

The pace of development in the Large Language Model (LLM) space has exploded over the past several months and one of…

medium.com

Machine learning tech stack with Large language model

How to create a private ChatGPT with your own data

Learn the architecture and data requirements needed to create your own Q&A engine with ChatGPT/LLMs.

medium.com

Separate knowledge from language model. This allows us to leverage the semantic understanding of our language model while also providing our users with the most relevant information.

The approach for this would be as follows:

User asks a question
Application finds the most relevant text that (most likely) contains the answer
A concise prompt with relevant document text is sent to the LLM
User will receive an answer or ‘No answer found’ response

From above article, we know that context is key. To ensure the language model has the right information to work with, we need to build a knowledge base that can be used to find the most relevant documents through semantic search. So we need to provide context in the limit that LLM can accept (we cannot just throw all data to LLM and hope it magically returns what we want). To do this, we need

Embedding, which encapsulates semantic relationship of text strings
Vector search technology that can search based on semantic similarity
Knowledge layer, could combine embedding/vector search, knowledge graph technology to find correct context

With the context provided, LLM is now returning what you want. However, users may not just want text response, but actions. So there is a concept called autonomous agent which takes actions (connecting digital world to physical world). Now it seems there is workflow involved, you need some framework to orchestrate different steps in the workflow.

Cookbook for solving common problems in building GPT/LLM apps

A guide to long-term memory, question answering, semantic search, formatting, caching, and local deployment for LLMs

bootcamp.uxdesign.cc

Intra-conversation/short-term memory: LLM does not keep state, and there context limit of the LLM (e.g. GPT-3.5 has a 3000-token limit). You can use buffer window memory which is similar to ChatGPT — just discard any messages before the context window size, either by the number of messages or by tokens; or summarization which is to summarize the messages and attach the summary as context for the conversation; or create knowledge graph of the entities, their attributes, and their relationships; or use vector store/database to save the entire conversation and query the top_k most relevant messages as context (which loses sequential order of conversation interactions)
Long-term memory with vector databases with Chunking (Fixed-size by tokens, Split by sentence, Overlapping chunking, Recursive chunking, Chunk by document format), Embedding (fastTex, SentenceTransformers, Commercial APIs), Storing to vector database, Retrieving relevant chunks, Sending to LLM in prompts

Components of LLM-powered apps

How to build long-term corproate memory for your organisation using ChatGPT

Banish corporate amnesia through a personalised “corporate co-pilot”.

ai.plainenglish.io

Core components involved: LLMs, vector databases, agents
Use case 1: Using large language models with your own data to build a “corporate brain” for your organisation
Embeddings are numeric measurements held in multi-dimensional vectors determining the relatedness of text strings.

Embeddings are typically used in the following use cases (as per OpenAI):

Search (where results are ranked by relevance to a query string)
Clustering (where text strings are grouped by similarity)
Recommendations (where items with related text strings are recommended)
Anomaly detection (where outliers with little relatedness are identified)
Diversity measurement (where similarity distributions are analysed)
Classification (where text strings are classified by their most similar label)

Use case 2: Integrating “tools” into LLMs

Agents use an LLM to determine which actions to take and in what order to take them when analysing a user query, whereas tools are functions that agents can use to interact with the world, e.g. python_repl (python shell to execute python commands), serpapi (search engine), wolfram-alpha (search engine for answering Math, Science, Technology, Culture, Society and Everyday Life questions), requests (get content from url), terminal (execute command), llm-math (answer questions about math), open-meteo-api (get weather information from the OpenMeteo API), news-api (get information about the top headlines of current news stories), google-search (wrapper around Google Search), wikipedia (wrapper around Wikipedia)

Author gives an example to use serpapi to search the date of an event and llm-math to calculate how many days apart from today.

Technique to overview token limit of LLMs

Chunking

During data preprocessing, you need to break down very large document into chunks, because later, when you create embeddings, there is also limit on how long the input is allowed

So you want to build an AI application powered by LLM: Let’s talk about Data Pre-Processing

Data Pre-Processing for an AI application powered by LLM

blog.devgenius.io

Since large language model has token limits, you need to apply chunking strategies to break large documents within LLM token limit. Chunking strategies:

By paragraph with no overlapping, e.g. 3 paragraphs as a chunk (spacy.sents can return paragraphs in a document)
By paragraph with overlapping, e.g. 1–3, 2–4, etc.

Similar things are mentioned in article below

How to Chunk Text Data — A Comparative Analysis

Exploring and comparing distinct approaches to text chunking.

towardsdatascience.com

How to Get Around OpenAI GPT-3 Token Limits

Python Developer’s Guide to OpenAI GPT-3 API

blog.devgenius.io

Following articles have the similar patterns of using text embeddings, vector database, similarity search, GPT/large language model (knowledge embedding) to create more intelligent chatbot:

How to ensure OpenAI’s GPT-3 provides an accurate answer using embedding and semantic search

Chat with Knowledgebase using OpenAI ChatGPT API, Embedding, and Semantic Search

blog.devgenius.io

Customize OpenAI’s GPT-3 to give an accurate answer based on your knowledge base and stay on a specific topic.

Create a knowledge base database using embedding (stored in Chroma, the AI-native open-source embedding database).
The semantic search of the knowledge base using the user question.
Include the semantic search result(s) in the prompt with the same user question.
Ask OpenAI GPT-3 to find the answer within the semantic search result(s).
If GPT-3 finds an answer, it returns the answer.
If GPT-3 does not find an answer, it returns “I’m sorry, but the given context does not provide information on …..”

Chat with Document(s) using OpenAI ChatGPT API and Text Embedding

How to chat with any documents, PDFs, and books using OpenAI ChatGPT API and Text Embedding

blog.devgenius.io

Create your Document ChatBot with GPT-3 and Langchain

AI assistants, also known as chatbots, are computer programs designed to simulate conversations with human users. They…

medium.com

Optimize Your Chatbot’s Conversational Intelligence Using GPT-3

Give your chatbot the power of neural search with OpenAI

betterprogramming.pub

There are currently 2 main ways to extend your knowledge base to the GPT models:

Finetuning — covered in this post. Straight-forward approach, but you possess no control over the model response apart from the initial prompt engineering.
Embeddings — a better approach to extend the model’s domain-specific knowledge, allowing more flexibility and control over the generated model output.

Process: Use OpenAI embeddings API to get embeddings for the document, store in vector database (e.g. pinecone, weaviate) where you can search similar text based on your question, and use your existing knowledge base as the ground source of the truth/context, then pass this as prompt to OpenAI ChatGPT.

Developing TaxGPT using OpenAI GPT and Chroma

Developing TaxGPT application that can answer complex tax questions for tax professionals

blog.devgenius.io

To create TaxGPT, the following steps are taken by the author;

An embedding database of the Internal Revenue Codes was created, which was scraped from Bloomberg Tax.
An embedding database of the Internal Revenue Regulations was created, which was scraped from Internal Revenue Service.
The embedding database of Internal Revenue Codes was queried using the tax question, which will yield a list of applicable Internal Revenue Codes (I.R.C.). This was done to assist in querying Internal Revenue Regulations since that database is big.
This list of I.R.C . was then appended to the tax question, and the embedding database of Internal Revenue Regulations was queried.
Finally, using GPT on search results, an answer that includes relevant citations can be generated.

In next few articles, I will look into details of these enabling technologies.

Navigating the AI Hype and Thinking about Niche LLM Applications

betterprogramming.pub

How to setup your own ChatGPT and connect it to your own data

TLDR: I'm going through a simple POC (Proof of Concept) how companies can setup their own ChatGPT-like professional…

mmlind.github.io

Appendix

Emerging Architectures for LLM Applications | Andreessen Horowitz

A reference architecture for the LLM app stack. It shows the most common systems, tools, and design patterns used by AI…

a16z.com

Hands-On GenAI for Product & Engineering Leaders

Make better product decisions by taking a peek under the hood of LLM-based products

towardsdatascience.com

How Large Language Models Work

From zero to ChatGPT

medium.com

New Google Cloud generative AI training resources | Google Cloud Blog

Google Cloud Skills Boost now includes no-cost generative AI training.

cloud.google.com

The 6 Foundational Courses To Learn Large Language Models | Deepgram

Explore our guide to foundational courses for learning large language models, such as GPT-4. Dive deep into the world…

deepgram.com

Top Free Courses on Large Language Models — KDnuggets

Interested in learning how ChatGPT and other AI chatbots work under the hood? Look no further. Check out these free…

www.kdnuggets.com

COS 597G: Understanding Large Language Models

Instructor Danqi Chen (danqic AT cs.princeton.edu) Teaching assistant Alexander Wettig (awettig AT cs.princeton.edu)…

www.cs.princeton.edu

The Generative AI Lifecycle

Part 2: Maturing GenAI : Patterns, Cycles and Strategies of Increasing Sophistication

dr-arsanjani.medium.com

Building Large Language Model-powered AI Applications

A Look at Emerging Technical Stacks and Enabling Technologies

Challenges of building LLM-powered apps

Emerging LLM Tech stack

LLMs and the Emerging ML Tech Stack

The pace of development in the Large Language Model (LLM) space has exploded over the past several months and one of…

How to create a private ChatGPT with your own data

Learn the architecture and data requirements needed to create your own Q&A engine with ChatGPT/LLMs.

Other articles on common problems building LLM apps

Cookbook for solving common problems in building GPT/LLM apps

A guide to long-term memory, question answering, semantic search, formatting, caching, and local deployment for LLMs

Components of LLM-powered apps

How to build long-term corproate memory for your organisation using ChatGPT

Banish corporate amnesia through a personalised “corporate co-pilot”.

Technique to overview token limit of LLMs

Chunking

So you want to build an AI application powered by LLM: Let’s talk about Data Pre-Processing

Data Pre-Processing for an AI application powered by LLM

How to Chunk Text Data — A Comparative Analysis

Exploring and comparing distinct approaches to text chunking.

How to Get Around OpenAI GPT-3 Token Limits

Python Developer’s Guide to OpenAI GPT-3 API

How to ensure OpenAI’s GPT-3 provides an accurate answer using embedding and semantic search

Chat with Knowledgebase using OpenAI ChatGPT API, Embedding, and Semantic Search

Chat with Document(s) using OpenAI ChatGPT API and Text Embedding

How to chat with any documents, PDFs, and books using OpenAI ChatGPT API and Text Embedding

Create your Document ChatBot with GPT-3 and Langchain

AI assistants, also known as chatbots, are computer programs designed to simulate conversations with human users. They…

Optimize Your Chatbot’s Conversational Intelligence Using GPT-3

Give your chatbot the power of neural search with OpenAI

Developing TaxGPT using OpenAI GPT and Chroma

Developing TaxGPT application that can answer complex tax questions for tax professionals

Navigating the AI Hype and Thinking about Niche LLM Applications

How to setup your own ChatGPT and connect it to your own data

TLDR: I'm going through a simple POC (Proof of Concept) how companies can setup their own ChatGPT-like professional…

Appendix

Emerging Architectures for LLM Applications | Andreessen Horowitz

A reference architecture for the LLM app stack. It shows the most common systems, tools, and design patterns used by AI…

Hands-On GenAI for Product & Engineering Leaders

Make better product decisions by taking a peek under the hood of LLM-based products

How Large Language Models Work

From zero to ChatGPT

New Google Cloud generative AI training resources | Google Cloud Blog

Google Cloud Skills Boost now includes no-cost generative AI training.

The 6 Foundational Courses To Learn Large Language Models | Deepgram

Explore our guide to foundational courses for learning large language models, such as GPT-4. Dive deep into the world…

Top Free Courses on Large Language Models — KDnuggets

Interested in learning how ChatGPT and other AI chatbots work under the hood? Look no further. Check out these free…

COS 597G: Understanding Large Language Models

Instructor Danqi Chen (danqic AT cs.princeton.edu) Teaching assistant Alexander Wettig (awettig AT cs.princeton.edu)…

The Generative AI Lifecycle

Part 2: Maturing GenAI : Patterns, Cycles and Strategies of Increasing Sophistication

Mastering Generative AI: A Roadmap from Zero to Expertise in Gen AI field

Are you interested in learning Generative AI but worried about the math involved? Don’t fret! In this guide, we’ll…

BECOME a WRITER at MLearning.ai // AI Agents // Super Cheap AI.

Mlearning.ai Submission Suggestions

How to become a writer on Mlearning.ai

Written by Xin Cheng