T5Gemma

A collection of encoder-decoder models that provide a strong quality-inference efficiency tradeoff.

T5Gemma adapts pretrained decoder-only Gemma 2 models into an encoder-decoder architecture. These models are trained with either PrefixLM for strong generative performance or UL2 for high-quality contextual representations.

Capabilities

rebase

Enhanced reasoning

Dedicated encoder significantly boosts performance on tasks requiring deep context comprehension, such as math reasoning (GSM8K).

account_tree

Flexible architecture

Model adaptation techniques allows for flexible configurations, including "unbalanced" models where the encoder and decoder have different sizes.

performance_auto

High efficiency

Superior quality-to-efficiency ratio without extensive compute requirements.


Models

Gemma 2 sizes

Checkpoints based on the official Gemma 2 2B and 9B models, as well as the “unbalanced” 9B-2B checkpoint.

hexagon

T5 sizes

Small, Base, Large, and XL sizes following the T5 configuration, plus an additional model sized between T5 Large and T5 X.