How does temperature affect subsequent token prediction in LLMs? | by Ankur Manikandan

Introduction
Massive Language Fashions (LLMs) are versatile generative fashions suited to a wide selection of duties. They will produce constant, repeatable outputs or generate inventive content material by putting unlikely phrases collectively. The “temperature” setting permits customers to fine-tune the mannequin’s output, controlling the diploma of predictability.

Let’s take a hypothetical instance to grasp the affect of temperature on the subsequent token prediction.

We requested an LLM to finish the sentence, “This can be a great _____.” Let’s assume the potential candidate tokens are:

|   token    | logit |
|------------|-------|
| day        |    40 |
| area      |     4 |
| furnishings  |     2 |
| expertise |    35 |
| downside    |    25 |
| problem  |    15 |

The logits are handed by a softmax perform in order that the sum of the values is the same as one. Primarily, the softmax perform generates likelihood estimates for every token.

Let’s calculate the likelihood estimates in Python.

import numpy as np
import seaborn as sns
import pandas as pd
import matplotlib.pyplot as plt
from ipywidgets import interactive, FloatSliderdef softmax(logits):
exps = np.exp(logits)
return exps / np.sum(exps)
knowledge = {
"tokens": ["day", "space", "furniture", "experience", "problem", "challenge"],
"logits": [5, 2.2, 2.0, 4.5, 3.0, 2.7]
}
df = pd.DataFrame(knowledge)
df['probabilities'] = softmax(df['logits'].values)
df

| No. |   tokens   | logits | possibilities |
|-----|------------|--------|---------------|
|   0 | day        |    5.0 |      0.512106 |
|   1 | area      |    2.2 |      0.031141 |
|   2 | furnishings  |    2.0 |      0.025496 |
|   3 | expertise |    4.5 |      0.310608 |
|   4 | downside    |    3.0 |      0.069306 |
|   5 | problem  |    2.7 |      0.051343 |

ax = sns.barplot(x="tokens", y="possibilities", knowledge=df)
ax.set_title('Softmax Chance Estimates')
ax.set_ylabel('Chance')
ax.set_xlabel('Tokens')
plt.xticks(rotation=45)
for bar in ax.patches:
ax.textual content(bar.get_x() + bar.get_width() / 2, bar.get_height(), f'{bar.get_height():.2f}',
ha='heart', va='backside', fontsize=10, rotation=0)
plt.present()

The softmax perform with temperature is outlined as follows:

the place (T) is the temperature, (x_i) is the (i)-th part of the enter vector (logits), and (n) is the variety of parts within the vector.

def softmax_with_temperature(logits, temperature):
if temperature <= 0:
temperature = 1e-10  # Stop division by zero or damaging temperatures
scaled_logits = logits / temperature
exps = np.exp(scaled_logits - np.max(scaled_logits))  # Numerical stability enchancment
return exps / np.sum(exps)def plot_interactive_softmax(temperature):
possibilities = softmax_with_temperature(df['logits'], temperature)
plt.determine(figsize=(10, 5))
bars = plt.bar(df['tokens'], possibilities, coloration='blue')
plt.ylim(0, 1)
plt.title(f'Softmax Possibilities at Temperature = {temperature:.2f}')
plt.ylabel('Chance')
plt.xlabel('Tokens')
# Add textual content annotations
for bar, likelihood in zip(bars, possibilities):
yval = bar.get_height()
plt.textual content(bar.get_x() + bar.get_width()/2, yval, f"{likelihood:.2f}", ha='heart', va='backside', fontsize=10)
plt.present()
interactive_plot = interactive(plot_interactive_softmax, temperature=FloatSlider(worth=1, min=0, max=2, step=0.01, description='Temperature'))
interactive_plot

At T = 1,

At a temperature of 1, the likelihood values are the identical as these derived from the usual softmax perform.

At T > 1,

Elevating the temperature inflates the possibilities of the much less doubtless tokens, thereby broadening the vary of potential candidates (or variety) for the mannequin’s subsequent token prediction.

At T < 1,

Decreasing the temperature, alternatively, makes the likelihood of the more than likely token method 1.0, boosting the mannequin’s confidence. Lowering the temperature successfully eliminates the uncertainty throughout the mannequin.

Conclusion

LLMs leverage the temperature parameter to supply flexibility of their predictions. The mannequin behaves predictably at a temperature of 1, intently following the unique softmax distribution. Growing the temperature introduces better variety, amplifying much less doubtless tokens. Conversely, reducing the temperature makes the predictions extra targeted, growing the mannequin’s confidence in essentially the most possible token by lowering uncertainty. This adaptability permits customers to tailor LLM outputs to a wide selection of duties, hanging a stability between inventive exploration and deterministic output.