Constitutional AI¶

Entity Type: Glossary ID: constitutional-ai

Definition: A training approach developed by Anthropic that aims to create AI systems guided by a set of principles or 'constitution' that define helpful, harmless, and honest behavior. The method combines supervised learning on human-written responses with AI-generated self-critiques and revisions based on constitutional principles, followed by reinforcement learning from AI feedback (RLAIF) rather than solely human feedback.

Related Terms: - ai-alignment - ai-safety - reinforcement-learning-from-human-feedback - harmlessness - ai-principles

Source Urls: - https://arxiv.org/abs/2212.08073 - https://www.anthropic.com/news/constitutional-ai-harmlessness-from-ai-feedback - https://www.anthropic.com/news/claudes-constitution

Tags: - ai-safety - alignment - training-methods - anthropic

Status: active

Version: 1.0.0

Created At: 2025-09-10

Last Updated: 2025-09-10