Mixture of Experts (MoE)¶

Entity Type: Glossary ID: mixture-of-experts

Definition: A neural network architecture that uses multiple specialized sub-networks (experts) along with a gating mechanism to selectively activate only a subset of experts for each input. This approach allows for scaling model capacity while keeping computational costs manageable, as only a fraction of the model's parameters are used for any given input. MoE models can achieve better performance than dense models with similar computational budgets and have been successfully applied to language models, computer vision, and other domains.

Related Terms: - sparse-models - gating-mechanisms - expert-networks - conditional-computation - parameter-efficiency

Source Urls: - https://arxiv.org/abs/1701.06538 - https://arxiv.org/abs/2101.03961 - https://huggingface.co/blog/moe

Tags: - model-architecture - efficiency - sparse-computation - scaling

Status: active

Version: 1.0.0

Created At: 2025-09-10

Last Updated: 2025-09-10