Skip to content

Multi-Head Attention

Entity Type: Glossary ID: multi-head-attention

Definition: An extension of self-attention that runs multiple attention mechanisms in parallel, each focusing on different aspects of the input. The outputs are concatenated and projected to create richer representations in transformer models.

Related Terms: - self-attention - attention-mechanism - transformer - parallel-processing

Source Urls: - https://en.wikipedia.org/wiki/Attention_(machine_learning)#Multi-head_attention

Tags: - attention - transformers - parallel-processing

Status: active

Version: 1.0.0

Created At: 2025-08-31

Last Updated: 2025-08-31