Constitutional AI¶
Entity Type: Glossary
ID: constitutional-ai
Definition: A training approach developed by Anthropic that aims to create AI systems guided by a set of principles or 'constitution' that define helpful, harmless, and honest behavior. The method combines supervised learning on human-written responses with AI-generated self-critiques and revisions based on constitutional principles, followed by reinforcement learning from AI feedback (RLAIF) rather than solely human feedback.
Related Terms: - ai-alignment - ai-safety - reinforcement-learning-from-human-feedback - harmlessness - ai-principles
Source Urls: - https://arxiv.org/abs/2212.08073 - https://www.anthropic.com/news/constitutional-ai-harmlessness-from-ai-feedback - https://www.anthropic.com/news/claudes-constitution
Tags: - ai-safety - alignment - training-methods - anthropic
Status: active
Version: 1.0.0
Created At: 2025-09-10
Last Updated: 2025-09-10