AI Safety
Alignment, robustness, evals, and red-teaming — keeping models behaving as capabilities scale.
No articles published in this cluster yet.
In the news
-
Anthropic and OpenAI Publish Joint Cross-Lab Alignment Evaluation
The two frontier labs ran each other's safety evaluations against their own production models — a rare cross-lab exercise that exposed sharply different failure modes and methodological priorities.
AI Safety · May 15, 2026