Transformers

Let’s break down how transformers and attention mechanisms work, using a simple example related to changing the tone of text.

May 14, 2024

What Are Transformers?

Transformers are a type of model used in machine learning, particularly for tasks that involve language, like translating between languages, summarizing text, or changing the tone of text. They are called “transformers” because they can transform one form of data into another.

How Do Transformers Work?

Imagine you want to rewrite a sentence from a formal tone to a casual tone. Here’s how a transformer helps in doing that:

Input Processing: The transformer takes in the sentence and breaks it down into words or small parts of words (like “The president” becomes “The” and “president”).
Attention Mechanism: This is where it gets interesting. The transformer looks at each word in the sentence and decides which other words are important for understanding its meaning. For example, in the sentence “The president announced a new policy,” the word “president” might pay more attention to “announced” and “policy” because those words are key to understanding what the president did.
Understanding Context: The attention mechanism allows the transformer to understand the context of each word, not just its individual meaning. This helps it capture more nuanced meanings which depend on other words around them.
Generating Output: Based on this understanding, the transformer can then rewrite the sentence in a new tone while maintaining its original meaning. It does this by predicting words that fit the new tone but still convey the same information.

Encoding “Tone” in Transformers

Understanding how something like “tone” is encoded and managed in language models involves delving a bit deeper into the underlying mechanisms of transformers and their training data.

Data and Annotation: First, to handle “tone”, a transformer model needs training on a dataset where text is not only available but also annotated with tone information. This could include labels such as “formal”, “casual”, “sarcastic”, etc. Each piece of text in the training data must be tagged appropriately so the model learns what patterns correspond to which tones.
Embeddings: At the heart of how transformers process text are “embeddings”. Embeddings are vector representations (lists of numbers) of words that capture their meanings, relationships, and properties like tone. These embeddings are learned from the training data. When a transformer model is trained, it adjusts these embeddings to encode not just the literal meaning of the words but also nuances such as tone based on the context they appear in.
Attention and Contextual Understanding: During training, the attention mechanism of the transformer learns to pay more or less attention to different words in a sentence based on their contribution to the meaning and tone. For example, in a formal tone, the model learns to focus on and prioritize words and structures that convey formality.
Storing Tone Information: Technically, tone is not “stored” in a specific location but is rather encoded throughout the network in the weights and biases of the model—these are the parameters that determine how input signals are transformed through the network layers. The way words and phrases relate to each other in terms of tone becomes part of the overall parameters of the model.

Associating Tone with Words and Sentences

When a transformer model processes a sentence, it uses both the learned embeddings and the contextual clues picked up by the attention mechanisms to infer the tone and adjust the output accordingly. Here’s how it works step-by-step:

Input Analysis: The model reads the input sentence and maps each word to its corresponding embedding.
Attention Processing: As the model processes the sentence, its attention mechanisms determine how the meanings and tones of words influence each other. For example, the presence of words like “please” and “advise” in a formal context might enhance the formal tone through the attention they receive relative to other words.
Output Generation: When generating text, the model uses its understanding of both the literal content and the tone to select words that match the desired output tone. If the desired tone is “casual”, the model looks for words and structures in its training data that have been associated with a casual tone and constructs the output accordingly.

Example of Tone Adjustment

If a transformer is tasked to change the sentence “I request you submit your report by Monday.” to a more casual tone, it might generate “Hey, can you get your report in by Monday?” Here, it uses its training on how formal and casual tones are structured differently, choosing “Hey” and “get your report in” to replace the more formal “request” and “submit.”

One more example:

Original Text: “Please advise on the status of the current initiative as soon as feasible.”

Current tone: Formal and direct.

Desired Tone: Casual and friendly.

Transformed Text: “Could you let me know what’s up with our current project whenever you get a chance?”

Summary

This capability of transformers to handle such nuanced tasks as tone adjustment comes from their extensive training on large datasets, sophisticated attention mechanisms, and the ability to learn and manipulate embeddings effectively.

In this process:

Transformers handle the heavy lifting of understanding and generating text.
Attention mechanisms determine which parts of the input are important and how they relate to each other to better preserve the meaning across transformations.

You can go a bit deeper in this post:

Improving text -or not- with Large Language Models
Carlos Robles
·
May 12, 2024
Some time ago I saw someone in LinkedIn posting a text and adding “I improved this text with ChatGPT”. The text was fairly simple, profesional but sober and with kind of normal words and not complicated estructures. It was indeed correct, but my thought was what it usually is in this cases:
Read full story

Leadership in the time of the robots

Improving text -or not- with Large Language Models

Discussion about this post