Mixture of Experts (MoE)¶
Entity Type: Glossary
ID: mixture-of-experts
Definition: A neural network architecture that uses multiple specialized sub-networks (experts) along with a gating mechanism to selectively activate only a subset of experts for each input. This approach allows for scaling model capacity while keeping computational costs manageable, as only a fraction of the model's parameters are used for any given input. MoE models can achieve better performance than dense models with similar computational budgets and have been successfully applied to language models, computer vision, and other domains.
Related Terms: - sparse-models - gating-mechanisms - expert-networks - conditional-computation - parameter-efficiency
Source Urls: - https://arxiv.org/abs/1701.06538 - https://arxiv.org/abs/2101.03961 - https://huggingface.co/blog/moe
Tags: - model-architecture - efficiency - sparse-computation - scaling
Status: active
Version: 1.0.0
Created At: 2025-09-10
Last Updated: 2025-09-10