AI SAFETY IS
THE GREATEST PRIORITY
NOW...
What we do
Model Evaluations
We specialize in developing and conducting evaluations of advanced AI systems. Our focus includes assessing language model agents for strategic deception and creating model organisms to study scheming behaviors.
Interpretability Research
Governance & Policy
Our applied interpretability research focuses on improving our model evaluation processes, while our foundational work explores innovative approaches to understanding the inner workings of neural networks.
We help governments and international organizations develop AI governance frameworks, focusing on third-party evaluations, regulating advanced AI systems, and setting standards.
AI Safety Evaluation
Independent AI Assessments
Thorough evaluations to ensure AI systems meet safety and ethical guidelines effectively.
Ethical AI Standards
Promoting responsible AI usage through comprehensive safety and ethical evaluations for various technologies.
Collaborative Safety Initiatives
Working with organizations to enhance AI safety practices and develop best standards together.
Our Projects
Do AI Companies Really Care About AI Safety?
To systematically identify and document vulnerabilities in emerging AI models through jailbreaking techniques, report these vulnerabilities to the respective companies, and measure their response time and mitigation effectiveness.