AI Safety

Alignment, robustness, evals, and red-teaming — keeping models behaving as capabilities scale.

No articles published in this cluster yet.

Anthropic and OpenAI Publish Joint Cross-Lab Alignment Evaluation
The two frontier labs ran each other's safety evaluations against their own production models — a rare cross-lab exercise that exposed sharply different failure modes and methodological priorities.

AI Safety · May 15, 2026