AI SAFETY IS

THE GREATEST PRIORITY

NOW...

What we do

Model Evaluations

We specialize in developing and conducting evaluations of advanced AI systems. Our focus includes assessing language model agents for strategic deception and creating model organisms to study scheming behaviors.

Interpretability Research
Governance & Policy

Our applied interpretability research focuses on improving our model evaluation processes, while our foundational work explores innovative approaches to understanding the inner workings of neural networks.

We help governments and international organizations develop AI governance frameworks, focusing on third-party evaluations, regulating advanced AI systems, and setting standards.

AI Safety Evaluation

a robot that is standing in the water
a robot that is standing in the water
Independent AI Assessments

Thorough evaluations to ensure AI systems meet safety and ethical guidelines effectively.

Ethical AI Standards

Promoting responsible AI usage through comprehensive safety and ethical evaluations for various technologies.

Collaborative Safety Initiatives

Working with organizations to enhance AI safety practices and develop best standards together.

Our Projects

Do AI Companies Really Care About AI Safety?

To systematically identify and document vulnerabilities in emerging AI models through jailbreaking techniques, report these vulnerabilities to the respective companies, and measure their response time and mitigation effectiveness.