Building GenAI Apps Using AWS Bedrock: Foundation Models (2/5)

Archishman Bandyopadhyay
Jul 8, 2024
9 min read

Updated: Jul 12, 2024

1. Introduction to Amazon Bedrock Foundation Models

Amazon Bedrock FMs

Amazon Bedrock offers a wide choice of foundation models (FMs) from leading artificial intelligence (AI) startups and Amazon. Each of these FMs cater to different generative artificial intelligence (generative AI) use cases, such as summarization, language translation, coding, and image generation.

Inference parameters

When interacting with an FM, you can configure the inference parameters to customize the FM’s response. Generally, you should only adjust one parameter at a time, and the results can vary depending on the FM.

The following parameters can be used to modify the output from the LLMs. Not all parameters are available with all LLMs.

A. Randomness and diversity

Foundation models typically support the following parameters to control randomness and diversity in the response.

1.Temperature controls randomness in word choice. Lower values lead to more predictable responses. The following table lists minimum, maximum, and default values for the temperature parameter.

2.Top K limits word choices to the K most probable options. Lower values reduce unusual responses.

3.Top P cuts off low probability word choices based on cumulative probability. It tightens overall response distribution. The following table lists minimum, maximum, and default values for the Top P parameter.

B. Length

Foundation models typically support the following parameters to control the length of the generated response.

1.Response length sets minimum and maximum token counts. It sets a hard limit on response size. The following table lists minimum, maximum, and default values for the response length parameter.

2. Length penalty encourages more concise responses by penalizing longer ones. It sets a soft limit on size.

3. Stop sequences include specific character combinations that signal the model to stop generating tokens when encountered. It is used for the early termination of responses.

2. Using Amazon Bedrock FMs for Inference

Some inference parameters are common across most models, such as temperature, Top P, Top K, and response length. You will dive deep into unique model-specific parameters and I/O configuration you can tune to achieve the desired output based on the use case.

1. Amazon Titan

Amazon Titan models are Amazon foundation models. Amazon offers the Amazon Titan Text model and the Amazon Titan Embeddings model through Amazon Bedrock. Amazon Titan models support the following unique inference parameters in addition to temperature, Top P, and response length, which are common parameters across multiple models.

Stop sequencesWith stop sequences (stopSequences), you can specify character sequences to indicate where the model should stop. Use the pipe symbol (|) to separate different sequences (maximum 20 characters).

Amazon Titan Text

Amazon Titan Text is a generative LLM for tasks such as summarization, text generation, classification, open-ended question answering, and information extraction. The text generation model is trained on many different programming languages and Rich Text Format (RTF), like tables, JSON, comma-separated values (CSV), and others.

The following example shows an input configuration used to invoke a response from Amazon Titan Text using Amazon Bedrock. You can pass input configuration parameters along with an input prompt to the model.

Input

{
     "inputText": "<prompt>",
     "textGenerationConfig" : {
          "maxTokenCount": 512,
          "stopSequences": [],
          "temperature": 0.1,
          "topP": 0.9
     }
}

The following example shows the output from Amazon Titan Text for the input supplied in the previous code block. The model returns the output along with parameters, such as the number of input and output tokens generated.

Output

{
     "inputTextTokenCount": 613,
     "results": [{
          "tokenCount": 219,
          "outputText": "<output>"
     }]
}

Amazon Titan Embeddings

The Amazon Titan Embeddings model translates text inputs (words and phrases) into numerical representations (embeddings). Applications of this model include personalization and search. Comparing embeddings produces more relevant and contextual responses than word matching.

The following example demonstrates how you can create embeddings vectors from prompts using the Amazon Titan Embeddings model.

Input

{
     body = json.dumps({"inputText": <prompt>,})
     model_id = 'amazon.titan-embed-text-v1'
     accept = 'application/json'
     content_type = 'application/json'
     response = bedrock_runtime.invoke_model(
          body=body,
          modelId=model_id,
          accept=accept,
          contentType=content_type )
     response_body = json.loads(response['body'].read())
     embedding = response_body.get('embedding')
}

This will generate an embeddings vector consisting of numbers that look like the following output.

Output

[0.82421875, -0.6953125, -0.115722656, 0.87890625, 0.05883789, -0.020385742,
0.32421875, -0.00078201294, -0.40234375, 0.44140625, ...]

2. AI21 Jurassic-2 (Mid and Ultra)

Common parameters for Jurassic-2 models include temperature, Top P, and stop sequences. Jurassic-2 models support the following unique parameters to control randomness, diversity, length, or repetition in the response:

Max completion length (maxTokens): Specify the maximum number of tokens to use in the generated response.
Presence penalty (presencePenalty): Use a higher value to lower the probability of generating new tokens that already appear at least once in the prompt or in the completion.
Count penalty (countPenalty): Use a higher value to lower the probability of generating new tokens that already appear at least once in the prompt or in the completion. The value is proportional to the number of appearances.
Frequency penalty (frequencyPenalty): Use a higher value to lower the probability of generating new tokens that already appear at least once in the prompt or in the completion. The value is proportional to the frequency of the token appearances (normalized to text length).
Penalize special tokens: Reduce the probability of repetition of special characters. The default values are true as follows:
Whitespaces (applyToWhitespaces): A true value applies the penalty to white spaces and new lines.
Punctuations (applyToPunctuation): A true value applies the penalty to punctuation.
Numbers (applyToNumbers): A true value applies the penalty to numbers.
Stop words (applyToStopwords): A true value applies the penalty to stop words.
Emojis (applyToEmojis): A true value excludes emojis from the penalty.

Jurassic-2 Mid

This is a mid-sized model that is optimized to follow natural language instructions and context, so there is no need to provide it with any examples. It is ideal for composing human-like text and solving complex language tasks, such as question answering, and summarization.

Jurassic-2 Ultra

Ultra is a large-sized model that you can apply to language comprehension or generation tasks. Use cases include generating marketing copy, powering chatbots, assisting with creative writing, performing summarization, and extracting information.

Input

{
     "prompt": "<prompt>",
     "maxTokens": 200,
     "temperature": 0.5,
     "topP": 0.5,
     "stopSequences": [],
     "countPenalty": {"scale": 0},
     "presencePenalty": {"scale": 0},
     "frequencyPenalty": {"scale": 0}
}

Output

{
     "id": 1234,
     "prompt": {
          "text": "<prompt>",
          "tokens": [
               {
                    "generatedToken": {
                         "token": "\u2581who\u2581is",
                         "logprob": -12.980147361755371,
                         "raw_logprob": -12.980147361755371
                    },
                    "topTokens": null,
                    "textRange": {"start": 0, "end": 6}
               },
               //...
          ]
     },
     "completions": [
          {
               "data": {
                    "text": "<output>",
                    "tokens": [
                         {
                              "generatedToken": {
                                   "token": "<|newline|>",
                                   "logprob": 0.0,
                                   "raw_logprob": -0.01293118204921484
                              },
                              "topTokens": null,
                              "textRange": {"start": 0, "end": 1}
                         },
                         //...
                    ]
               },
               "finishReason": {"reason": "endoftext"}
          }
     ]
}

3. Anthropic Claude 2

Anthropic Claude 2 is another model available for text generation on Amazon Bedrock. Claude is a generative AI model by Anthropic. It is purpose built for conversations, summarization, question answering, workflow automation, coding, and more. It supports everything from sophisticated dialogue and creative content generation to detailed instruction following.

Claude uses common parameters, such as temperature, Top P, Top K, and stop sequences. In addition, Claude models use the following unique parameter to further tune the response output.

Maximum length (max_tokens_to_sample): Specify the maximum number of tokens to use in the generated response.

The following example shows an input configuration used to invoke a response from Anthropic Claude 2 using Amazon Bedrock.

Input

{
     "prompt": "\n\nHuman:<prompt>\n\nAnswer:",
     "max_tokens_to_sample": 300,
     "temperature": 0.5,
     "top_k": 250,
     "top_p": 1,
     "stop_sequences": ["\n\nHuman:"]
}

Output

{
     "completion": "<output>",
     "stop_reason": "stop_sequence"
}

4. Stability AI (SDXL)

This is a text-to-image model used to generate detailed images. SDXL includes support for the following types of image creation:

Image-to-image prompting: This involves inputting one image to get variations of that image.
Inpainting: This involves reconstructing the missing parts of an image.
Outpainting: This involves constructing a seamless extension of an existing image.

Stability AI Diffusion models support the following controls:

Prompt strength (cfg_scale): This control determines how much the final image portrays the prompt. Use a lower number to increase randomness in the generation.
Generation step (steps): This control determines how many times the image is sampled. More steps can result in a more accurate result.
Seed (seed): This control determines the initial noise setting. Use the same seed and the same settings as a previous run so inference can create a similar image. If you don't set this value, it is set as a random number.

The following example shows an input configuration used to invoke a response from SDXL using Amazon Bedrock.

Input

{
     "text_prompts": [
          {"text": "this is where you place your input text"}
     ],
     "cfg_scale": 10,
     "seed": 0,
     "steps": 50
}

Output

{
     "result": "success",
     "artifacts": [
          {
               "seed": 123,
               "base64": "<image in base64>",
               "finishReason": "SUCCESS"
          },
          //...
     ]
}

4. Cohere Command

Command is the flagship text generation model by Cohere. It is trained to follow user commands and be useful instantly in practical business applications, such as summarization, copywriting, dialogue, extraction, and question answering. Optimized for business priorities, Cohere is System and Organizations Control (SOC) 2 compliant and emphasizes security, privacy, and responsible AI.

In addition to temperature, Top P, Top K, maximum length, and stop sequences, the Cohere Command model supports the following unique controls:

Return likelihoods (return_likelihoods): Specify how and if the token likelihoods are returned with the response. You can specify the following options:

GENERATION: This option only returns likelihoods for generated tokens.
ALL: This option returns likelihoods for all tokens.
NONE: This option doesn’t return any likelihoods. This is the default option.

Stream (stream): Specify true to return the response piece by piece in real time and false to return the complete response after the process finishes.

The following example shows an input configuration used to invoke a response from Cohere Command using Amazon Bedrock.

2. Amazon Bedrock Methods

Amazon Bedrock provides a list of APIs you can access in your respective notebooks and AWS Lambda functions to access Amazon Bedrock. There are Amazon Bedrock configuration related APIs and runtime-related APIs that you will explore in this section.

1. Amazon Bedrock set up and configuration related APIs

ListFoundationModels

This method is used to provide a list of Amazon Bedrock foundation models that you can use. The following example demonstrates how to list the base models using Python.

Input

%pip install --upgrade boto3
import boto3
import json
bedrock = boto3.client(service_name='bedrock')
model_list=bedrock.list_foundation_models()
for x in range(len(model_list.get('modelSummaries'))):
     print(model_list.get('modelSummaries')[x]['modelId'])

You get a list of all the foundation models available on Amazon Bedrock and their respective metadata information. Following is an example of the output filtered on the model ID.

Output

amazon.titan-tg1-large
amazon.titan-e1t-medium
amazon.titan-embed-g1-text-02
amazon.titan-text-express-v1
amazon.titan-embed-text-v1
stability.stable-diffusion-xl
stability.stable-diffusion-xl-v0
ai21.j2-grande-instruct
ai21.j2-jumbo-instruct
ai21.j2-mid
ai21.j2-mid-v1
ai21.j2-ultra
ai21.j2-ultra-v1
anthropic.claude-instant-v1
anthropic.claude-v1
anthropic.claude-v2
cohere.command-text-v14

2. Amazon Bedrock runtime-related APIs

InvokeModel

This API invokes the specified Amazon Bedrock model to run inference using the input provided in the request body. You use InvokeModel to run inference for text models, image models, and embedding models.

Run inference on a text model

You can send an invoke request to run inference on a model. Set the accept parameter to accept any content type in the response.

The following example details how to generate text with Python using the prompt "What is Amazon Bedrock?"

Input

bedrock_rt = boto3.client(service_name='bedrock-runtime')
prompt = "What is Amazon Bedrock?"
configs= {
"inputText": prompt,
"textGenerationConfig": {
"maxTokenCount": 4096,
"stopSequences": [],
"temperature":0,
"topP":1
}
}
body=json.dumps(configs)
modelId = 'amazon.titan-tg1-large'
accept = 'application/json'
contentType = 'application/json'
response = bedrock_rt.invoke_model(
     body=body,
     modelId=modelId,
     accept=accept,
     contentType=contentType
)
response_body = json.loads(response.get('body').read())
print(response_body.get('results')[0].get('outputText'))

Sample output

Amazon Bedrock is a managed service that makes foundation models from leading AI startups and Amazon Titan models available through APIs. For up-to-date information on Amazon Bedrock and how to use it, see the provided documentation and relevant FAQs.

InvokeModelWithResponseStream

This API invokes the specified Amazon Bedrock model to run inference using the input provided. It returns the response in a stream.

For streaming, you can set x-amzn-bedrock-accept-type in the header to contain the required content type of the response. In the following example, the header is set to accept any content type. The default value is application/json.

The example details how to generate streaming text with Python. It uses the Amazon titan-tg1-large model and the prompt "Write an essay for living on Mars using 10 sentences."

Input

prompt = "Write an essay for living on Mars using 10 sentences."

configs= {
     "inputText": prompt,
     "textGenerationConfig": {
          "temperature":0
     }
}

body=json.dumps(configs)

accept = 'application/json'
contentType = 'application/json'
modelId = 'amazon.titan-tg1-large'

response = bedrock_rt.invoke_model_with_response_stream(
     modelId=modelId,
     body=body,
     accept=accept,
     contentType=contentType
)

stream = response.get('body')
if stream:
     for event in stream:
          chunk = event.get('chunk')
          if chunk:
               print((json.loads(chunk.get('bytes').decode())))

The following response includes the streaming output in chunks of data when invoked through the Amazon Bedrock streaming API.

Sample output

{'outputText': "\nIt is difficult to imagine life on Mars because the planet is so different from Earth. The environment on Mars is extremely cold, dry, and dusty, making it difficult for organisms to survive without specialized adaptations. The planet's low gravity also affec", 'index': 0, 'totalOutputTextTokenCount': None, 'completionReason': None, 'inputTextTokenCount': 12}
 {'outputText': 'ts human physical and mental health, requiring special accommodations. However, with proper planning and preparation, it is possible for humans to live on Mars. To establish a sustainable colony on Mars, astronauts would need to live in specially designed habitats, wear protective gear, and rely on artificial systems for food, water, and air. Communication with Earth would be limited, and astronauts would need to be self-sufficient', 'index': 0, 'totalOutputTextTokenCount': 128, 'completionReason': 'LENGTH', 'inputTextTokenCount': None}

Use Amazon CloudWatch to track usage and AWS CloudTrail to monitor API activity and troubleshoot issues, as other systems are integrated into your generative artificial intelligence (generative AI) applications.

Building GenAI Apps Using AWS Bedrock: Foundation Models (2/5)

1. Introduction to Amazon Bedrock Foundation Models

2. Using Amazon Bedrock FMs for Inference

1. Amazon Titan

2. AI21 Jurassic-2 (Mid and Ultra)

3. Anthropic Claude 2

4. Stability AI (SDXL)

4. Cohere Command

2. Amazon Bedrock Methods

1. Amazon Bedrock set up and configuration related APIs

2. Amazon Bedrock runtime-related APIs

Recent Posts

Comments