Encoder-Decoder Architecture in Generative AI

Updated: Jun 19, 2023

Presenter: Benoit Dherin, Machine Learning Engineer, Google Advanced Solutions Lab

1. Overview

This lecture focuses on the encoder-decoder architecture, which is at the core of large language models. The main topics covered are:

Takes a sequence of words as input (e.g., a sentence in English) and outputs a sequence of words (e.g., the translation in French).
Examples include translation, text summarization, and dialogue generation.

Processes the input sequence and produces a vector representation of the input sentence.
Can be implemented with various internal architectures, such as a Recurrent Neural Network (RNN) or a more complex Transformer block.
In an RNN encoder, each token in the input sequence is processed one at a time, producing a state representing the token and previously ingested tokens. The state is then used as input for the next encoding step, along with the next token.

Takes the vector representation of the input sentence from the encoder and generates an output sequence.
Can also be implemented with different internal architectures, like RNN or Transformer block.
In an RNN decoder, the output is decoded one token at a time using the current state and what has been decoded so far.

A collection of input/output pairs that the model should imitate.
For translation, this would include sentence pairs in the source and target languages.

Feed the dataset to the model, which adjusts its weights during training based on the error it produces for a given input.
The error is the difference between the model's generated output and the true output sequence in the dataset.

During training, the decoder needs the correct previous translated token as input to generate the next token, rather than what the decoder has generated so far.
Requires preparing two input sentences: the original one fed to the encoder and the original one shifted to the left, fed to the decoder.

Feed the encoder representation of the prompt to the decoder along with a special token (e.g., "GO").
The decoder generates the first word by taking the highest probability token with greedy search or the highest probability chunk with beam search.
Repeat this process for subsequent words until the output sequence is complete.

Greedy search: Select the token with the highest probability.
Beam search: Use probabilities generated by the decoder to evaluate the probability of sentence chunks rather than individual words. Keep the most likely generated chunk at each step.

Check out the complete video lecture here :

Did you find this useful ?