Building GenAI Apps Using AWS Bedrock: Architecture Patterns (4/5)

Archishman Bandyopadhyay
Jul 18, 2024
11 min read

Various architecture patterns that can be implemented with Amazon Bedrock for building useful generative AI applications include:

Text generation
Text summarization
Question answering
Chatbots
Code generation
LangChain agents
Agents for Amazon Bedrock

1. Text Generation and Text Summarization

1.Text generation:

Text generation is a term used for any use case where the output of the model is newly generated text. You can use it to write articles, poems, blogs, books, emails, and so forth. In Amazon Bedrock, you can use various foundation models (FMs) for text generation tasks.

The architecture pattern for text generation using Amazon Bedrock is illustrated in the following image. You can pass the Amazon Bedrock foundation model a prompt using an Amazon Bedrock playground or an Amazon Bedrock API call. The model will generate text based on the input prompt you provide.

Text generation with LangChain

For text generation, you can also use a LangChain layer to add a conversation chain to specific text generation use cases. LangChain is a powerful open source library. It pairs well with some of the strongest text generation FMs on Amazon Bedrock to efficiently create conversations, text generation, and more.

2. Text summarization

Text summarization is a natural language processing (NLP) task that condenses the text from a given input while preserving the key information and meaning of the text. The following are the two ways to do summarization:

Select a subset of text from input that represents key ideas.
Create new sentences that capture the key concepts and elements of the source text.

With advances in generative AI and large language models (LLMs), the field of text summarization has witnessed significant improvements. LLMs are well suited for the task of generating summaries because of their effectiveness in understanding and synthesizing text. In this section, you will explore concepts, applications, and architecture patterns related to text summarization with large language models.

Text summarization techniques

There are two main types of text summarization techniques: extractive and abstractive. To learn more about these techniques, choose each of the following two tabs.

A. Extractive Summarization: This technique involves identifying and selecting the most important words, phrases, or sentences from source a document and then concatenating them to form a summary. The selected elements are usually the most informative and representative parts of the text.

B. Abstractive Summarization: This technique involves generating new text that is not a rephrasing of the source document. It consists of creating new text summaries that capture the key ideas and elements of the source text. Abstractive methods should produce coherent text that is similar to human-generated summaries.

Architecture patterns for text summarization applications

When working with document summaries, the choice of application architecture pattern depends on the size of the document relative to the model context size. Other challenges with large documents include out-of-memory errors and hallucinations. You can use the following two architecture approaches to create a summarization application based on the document size.

A. Text summarization for small documents: One can apply this architecture pattern when the document text can fit in the LLM context size. A small piece of text (or a small file) is directly passed to the LLM, which responds with a summary of the text. The architecture pattern for this scenario is illustrated in the following figure.

Note: This approach will not work when the document or text size is large and can’t fit into the LLM context window.

B. Text summarization for large documents: For large documents or text, it is not possible to pass the entire text to the LLM because of the limited context size. To solve this problem, you can use a map-reduce architecture and apply the concepts of chunking and chaining prompts.

The test summarization architecture for large documents includes the following steps:

Split a large document into multiple small number (n) chunks using tools such as LangChain.
Send each chunk to the LLM to generate a corresponding summary.
Append the next chunk to the first summary generated and summarize again.
Iterate on each chunk to create a final summarized output.

2. Question Answering

1.Question answering architecture

Question answering is an important task that involves extracting answers to factual queries posed in natural language. Typically, a question answering system processes a query against a knowledge base containing structured or unstructured data and generates a response with accurate information. Ensuring high accuracy is key to developing a useful, reliable, and trustworthy question answering system, especially for enterprise use cases. Next, you will review two architecture patterns for question answering applications.

A. Generic base use case: With the generic base use case, the user can prompt the FM in Amazon Bedrock and get responses based on generic information the public will understand. The generic base use case lacks the element of personalization.

There is a small document that a user can use to ask questions. The document goes through further processing in the FM on Amazon Bedrock to get a generated response. This response might come naturally from the base model without any specific retrievals. It might lack personalization for use cases that require specific information and attention to detail. Examples include medical use cases, legal and financial documents, and so forth. This architecture pattern is illustrated in the following image.

B. Personalized and specific use cases: The following architectural pattern is specifically used for personalized use cases. These use cases require specific responses and attention to detail in different domains, so using a base model is not sufficient. You can use the Retrieval Augmented Generation (RAG) technique to locate the relevant chunks of text that are best suited to answer the user’s question. RAG then concatenates the best set of strings as a response to the user from the relevant chunks retrieved.

RAG involves the following steps:

The user asks a question.
Domain-specific documents are converted into embeddings using the Amazon Titan Embeddings model. The embeddings are stored in a knowledge base (vector database) for subsequent retrieval.
The user's question is used to retrieve the relevant chunks of data, which will act as the context, from the knowledge base. The user's question and the context are then passed to the FM to get an accurate response to the user.
When the user poses a prompt, the FM identifies the context and refers to the knowledge base to get the relevant chunks of data. The FM then interacts with another FM to get an accurate response to the user.

3. Chatbots

1.Conversational interfaces

You can use conversational interfaces, such as chatbots and virtual assistants, to enhance the user experience for customers. Chatbots use NLP and machine learning (ML) algorithms to understand and respond to user queries. You can use chatbots in a variety of applications, such as customer service, sales, and ecommerce, to provide quick and efficient responses to users. You can access chatbots through various channels, such as websites, social media platforms, and messaging apps.

A basic architectural pattern of a chatbot use case with Amazon Bedrock is illustrated in the following diagram.

This architecture includes the following steps:

The user queries the chatbot.
The chat history (if there is any) is passed on to the Amazon Bedrock model along with the user’s current query.
The model then generates a response.
The model passes the response back to the user.

In this basic use case, the user might enter a specific prompt, such as a question to Amazon Bedrock. The FM stores the chat, or the questions and responses generated, in a chat history. Based on the history of the chat and the current prompt from the user, Amazon Bedrock provides an accurate and helpful response.

Chatbot use cases

You can categorize the chatbot use cases as follows:

Chatbot (Basic): This is a zero-shot chatbot with an FM model.
Chatbot using a prompt template (LangChain): This is a chatbot with some context provided in the prompt template.
Chatbot with a persona: This is a chatbot with defined roles, such as a career coach with human interactions.
Contextual-aware chatbot: This is a chatbot that passes context through an external file by generating embeddings.

2. Architecture for a context-aware chatbot

A simple architecture for a context-aware chatbot is shown in the following diagram. This architecture includes the following steps:

The user asks a question (user query) to the LLM on Amazon Bedrock.
Step 2:
The LLM sends the modified question to the embeddings model.
The chat history is updated.
The user query is converted to a vector embedding using the Amazon Titan Embeddings model.
A similarity search is performed. The result of the search is a set of relevant text chunks.
Based on the stored information, an answer to the prompt (user query) is generated from the final FM.
Step 5:
The user query and the response from the FM are added to the chat history.
The response (answer) is given to the user at the same time.

In this pattern, the user poses a question to Amazon Bedrock. The user-submitted query is converted to a vector embedding using the Amazon Titan Embeddings model. A similarity search compares the embedding of the user query with the vectors in the vector database. The result of the search is a set of relevant text chunks. The user query, chat history, and the relevant text chunks are submitted to the FM, and the FM returns an answer.

The user query and the response from the FM are added to the chat history to support future conversations. The knowledge library is created ahead of time by creating embeddings of domain-specific documents and storing them in the vector database.

Based on the information stored as embeddings in the vector database and the current user prompt, a relevant answer to the prompt is generated from the final FM. The answer is processed through the chat history for further conversational efficiency and for it to be remembered. The answer is given to the user at the same time.

This is a brief description of the architectural pattern. A more basic description is that the question is asked to an LLM, which is embedded. Using the embedding and other chat history from the vector database, another FM then creates a relevant search on the data. It then gives an accurate response, which is stored in the chat history and sent to the user at low latency.

4. Code Generation

1.Coding and programming tasks

You can also use the foundation models in Amazon Bedrock for various coding and programming related tasks. Examples include code and SQL query generation, code explanation and translation, bug fixing, code optimization, and so forth. Using foundation models for coding related tasks helps developers and data scientists rapidly prototype their ideas and use cases.

The following architecture pattern illustrates the use case of using the FMs in Amazon Bedrock for coding and programming.

The steps are as follows:

The user enters a code prompt.
A foundation model processes the input data.
The model returns the generated code.

In this pattern, you provide a prompt in plain text to the foundation model. The prompt includes an instruction telling the model what to generate and, in some cases, a few code examples. You can use this architecture for several use cases, such as SQL query generation, completing certain tasks requiring specific code instructions, and so forth.

5. LangChain and Agents for Amazon Bedrock

1.LangChain agents

Foundation models undergo extensive training on vast amounts of data. Despite their substantial natural language understanding capabilities, they cannot independently perform tasks like processing insurance claims or making flight reservations. This limitation arises from the necessity for access to the latest company or industry-specific data, which foundation models cannot obtain from up-to-date knowledge sources by default. Additionally, FMs cannot take specific actions to fulfill requests without a great deal of manual programming.

Certain applications demand an adaptable sequence of calls to language models and various utilities depending on user input. The agent interface provides flexibility for these applications. An agent has availability to a range of resources and selects which ones to use based on the user input. Agents can use multiple tools, and they can use the output of one tool as the input for the next.

The architecture is illustrated in the following image.

In this pattern, the agent has access to multiple tools, and it selects the appropriate tool at the time of processing based on the tool's description. Depending on your application’s requirements, tools can be straightforward, such as carrying out a web search, or they can be complex, such as carrying out mathematical calculations.

2. Using ReAct to run agents on LangChain

You can run agents on LangChain by using one of two techniques: plan and execute or ReAct, which stands for reasoning and acting. The ReAct technique will evaluate the prompt and determine the next step in solving the problem. It will then run that step and then repeat the process until the LLM can answer the question. Plan and execute works a little differently in that it determines the steps needed ahead of time and performs them sequentially.

Reason : The LLM will evaluate the question to determine how to solve the request

Select tool: Based on the information needed to solve the request, a list of tools is provided. The LLM determines which tool is best to get the data needed.

Action: Call the tool to get the data. The tool can run a search, query a database, call an API, or call a specialized LLM to provide a response.

Answer: After including the tool's data into the prompt, is there enough information to respond to the prompt, or is additional data needed? If yes, continue to create the response. If not, go back to the select tool step to find which tool can provide the data.

Respond: With all the data collected, the LLM creates a response to the prompt and returns the result.

3. Agents for Amazon Bedrock

Amazon Bedrock is a fully managed offering that makes it more efficient for developers to automate tasks. With agents for Amazon Bedrock, FMs can understand user requests, break down complex tasks into multiple steps, and take necessary actions to fulfill requests. Developers can use agents for Amazon Bedrock to create an orchestration plan without any manual coding. For example, an agent-powered generative AI restaurant application can provide a basic response to the question, “Do we have sufficient amounts of dough to sustain this week?” However, it can also help you with the task of updating your inventory.

Agents need access to an external data source, and they need to connect to your existing APIs. Developers can use the Amazon Bedrock console or the AWS SDK to upload the API schema. The agents will orchestrate tasks with the help of FMs and perform API calls using AWS Lambda functions. Therefore, agents remove the need to manage system integration and infrastructure provisioning.

Example: Connecting foundation models to your company data sources with agents for Amazon Bedrock

Sometimes, you want your foundation models to have access to additional data to help the model generate more relevant, context-specific responses without regularly retraining your foundation model.

The following steps provide agents access to a knowledge base in Amazon Bedrock. To learn about each step, expand each of the following four categories.

Step 1: Use RAG

Agents for Amazon Bedrock use RAG to provide agents access to a knowledge base in Amazon Bedrock. The knowledge base is created in Amazon Bedrock and points to a data source on Amazon Simple Storage Service (Amazon S3) that contains your data. You have the option to sync your knowledge base in Amazon Bedrock to your data on Amazon S3.

Step 2: Select an embeddings model and vector database

Next, you will select an embeddings model and provide details for a vector database. For the vector database, you can choose between Amazon OpenSearch Serverless, Pinecone, and Redis Enterprise Cloud.

Step 3: Add the knowledge base

You can add the knowledge base when creating or updating agents for Amazon Bedrock.

Step 4: Create and add action groups

When adding the knowledge base to the agent, you can also create and add action groups to it. Action groups are tasks that an agent can perform autonomously. You can provide Lambda functions that represent your business logic and the related API schema to run those functions. Although action groups are not required to create an agent, they can augment model performance to yield better outputs.

When a user query reaches the agent, it will identify the appropriate knowledge base from the user input and retrieve the relevant information. The agent will add the retrieved information to the input prompt. This helps the FM gain access to more contextual information so it can generate a more accurate response. Following this pattern, you can use agents for Amazon Bedrock to perform complex business tasks by dynamically invoking APIs without worrying about provisioning and maintaining any infrastructure.

This architecture is implemented with the following components:

The knowledge base has Amazon S3 as a data source.
There is a foundation model, such as Amazon Titan Embeddings, to convert data into vector embeddings.
There is a vector database to store vector data from the previous step. Amazon Bedrock will take care of creating, storing, managing, and updating your embeddings in the vector database.