Trustible AI

Model Rating Report

Claude 4 Family

Claude Opus 4 is a state-of-the-art text and code generation model, with sustained performance on complex, long-running tasks and agent workflows. Claude 4 models are hybrid models offering two modes: near-instant responses and extended thinking for deeper reasoning.

Developer

Anthropic

Country of Origin

USA

Systemic Risk

Open Data

Open Weight

API Access Only

Ratings

Overall Transparency

58%

Data Transparency

37%

Model Transparency

30%

Evaluation Transparency

74%

EU AI Act Readiness

50%

CAIT-D Readiness

36%

Transparency Assessment

The transparency assessment evaluates how clear and detailed the model creators are about their practices. Our assessment is based on the official documentation lists in Sources above. While external analysis may contain additional details about this system, our goal is to evaluate transparency of the providers themselves.

Sources

System Card: https://www-cdn.anthropic.com/6be99a52cb68eb70eb9572b4cafad13df32ed995.pdf

Announcements:
https://www.anthropic.com/claude/opus

https://www.anthropic.com/news/claude-4

https://www.anthropic.com/claude/sonnet

Basic Details

Date of Release

Claude Opus 4 and Claude Sonnet 4 were released on May 22, 2025. The announcement was made on this date, with both models becoming available to users at that time.

Methods of Distribution

Claude 4 Opus and Claude 4 Sonnet are both available through multiple distributions including include web browser interface, mobile apps (iOS and Android), and through APIs including Anthropic's own API, Amazon Bedrock, and Google Cloud's Vertex AI. Opus 4 is available to Pro, Max, Team, and Enterprise users, while Sonnet 4 is also available to free users.

Modality

Input options include image and text, but only text generation for output. Both models feature hybrid reasoning capability with both standard and extended thinking modes.

Input and Output Format

200K token context window for inputs and up to 64k output tokens for Sonnet 4 and 32K output tokens for Opus 4.

License

Proprietary.

Instructions for Use

The formal documentation provides general guidance on use cases for each model, and the [API documentation](https://docs.anthropic.com/en/docs/build-with-claude/overview) provide both high-level and low-level instructions for use.

Documentation Support

Medium Transparency

The documentation extensively covers model capabilities, applications, and some technical specifications. It includes information about how to understand the model's capabilities and proper use, but lacks detail about data and model design.

Changelog

The developers provide a changelog for the app, the api and versioned system prompts [here](https://docs.anthropic.com/en/release-notes/overview).

Policy

Acceptable Use Policy

https://www.anthropic.com/legal/aup

User Data

The materials mention that Anthropic uses data from Claude users who have opted in to have their data used for training, indicating that user data may be collected and used for model training with user consent.

Data Takedown

Anthropic had a detailed data takedown and [privacy policy](https://www.anthropic.com/legal/privacy). (Article on [copyright infringement](https://privacy.anthropic.com/en/articles/7996901-i-think-a-user-is-infringing-my-copyright-or-other-intellectual-property-how-do-i-report-it)).

AI Ethics Statement

Anthropic uses a [Responsible Scaling Policy](https://www.anthropic.com/news/anthropics-responsible-scaling-policy) and has a [constitution](https://www.anthropic.com/news/claudes-constitution) used during model training.

Incident Reporting

Incidents can be reported by emailing usersafety@anthropic.com. In addition, a Responsible Disclosure Policy is documented [here](https://www.anthropic.com/responsible-disclosure-policy).

Model and Training

Task Description

High Transparency

The documents detail numerous tasks that Claude 4 models excel at, including coding (both models leading on SWE-bench), agentic search, AI agent applications, content creation, customer-facing AI assistants, visual data extraction, robotic process automation, and knowledge Q&A. Opus 4 is particularly noted for its ability to handle complex, long-running tasks.

In terms of limitations, the model can hallucinate, reinforce disparate treatment (e.g. produce responses that favor certain populations) and be susceptible to prompt injections and jailbreaks (at rates lower than Claude-3.7).

Number of Parameters

No information is provided in any of the available documentation.

Model Design

Low Transparency

Claude 4 models are described as "hybrid reasoning models" that offer two modes: near-instant responses and extended thinking for deeper reasoning. They feature extended thinking with tool use capabilities, allowing them to alternate between reasoning and tool use to improve responses. In addition, the AI system provides summaries of long thought processes, generated by an additional smaller model, instead of showing the whole trace (developers can opt-out of this process).

Training Methodology

Low Transparency

Claude Opus 4 and Claude Sonnet 4 were trained with a focus on being helpful, honest, and harmless. They were pre-trained on large, diverse datasets to acquire language capabilities and used human feedback, Constitutional AI (based on principles such as the UN's Universal Declaration of Human Rights), and training of selected character traits.

Computational Resources

The materials provided do not disclose the computational resources used to train Claude 4 models.

Energy Consumption

No information is provided on the carbon footprint or specific mitigations for energy consumption beyond general claims of "model efficiency".

System Architecture

Claude 4 models are hybrid reasoning models with an "extended thinking mode", but no specific architecture details are provided.

Training Hardware

The materials provided do not specify the training hardware used for Claude 4 models.

Data

Dataset Size

The materials provided do not disclose the size of the datasets used to train Claude 4 models.

Dataset Description

Low Transparency

Claude Opus 4 and Claude Sonnet 4 were trained on a proprietary mix of publicly available information on the Internet, non-public data from third parties, data provided by data-labeling services and paid contractors, data from Claude users who opted in to have their data used for training, and internally generated data at Anthropic. Numeric analysis about the characteristics of the dataset is not available.

Data Sources

Medium Transparency

Training data sources include publicly available information on the Internet, non-public data from third parties, data from data-labeling services and paid contractors, data from Claude users who have opted in to have their data used for training, and data generated internally at Anthropic. For web data, the web followed industry-standard practices with respect to "robots.txt" instructions and did not access password-protected pages or those requiring sign-in or CAPTCHA verification.

Data Collection - Human Labor

Medium Transparency

Anthropic partners with data work platforms to engage workers who help improve their models through preference selection, safety evaluation, and adversarial testing. They state they only work with platforms that align with their belief in providing fair and ethical compensation to workers and are committed to safe workplace practices regardless of location. Anthropic publicized the [Inbound Services Agreement](https://www.anthropic.com/legal/inbound-services-agreement) that crowd workers abide agree to.

Data Preprocessing

Low Transparency

Anthropic employed several data cleaning and filtering methods during the training process, including deduplication and classification.

Data Bias Detection

Unknown

The materials provided do not specifically address how Anthropic detected or addressed biases in their training data beyond data collection and processing for alignment, broadly.

Data Deduplication

Data Toxic and Hateful Language Handling

The materials provided do not specifically address how toxic and hateful language was handled in the training data.

IP Handling in Data

No information is provided about IP handling in Data.

Data PII Handling

The materials provided do not specifically address how personally identifiable information (PII) was handled in the training data. They do note that Claude users who have allowed share access to their usage data has been incorporated in some way to model training, but do not discuss how or whether these data are fully de-identified.

Data Collection Period

The materials mention that the models were trained on publicly available information on the Internet as of March 2025, indicating that this was the cutoff date for the training data, though no start data is provided.

Evaluation

Performance Evaluation

Medium Transparency

The materials provide extensive benchmark results showing Claude 4 models' performance on coding (SWE-bench, Terminal-bench), reasoning (GPQA Diamond), multilingual Q&A (MMMLU), visual reasoning (MMMU), and high school math competition (AIME 2020). Both models show significant improvements over previous versions and competitive performance against other leading models.

Evaluation of Limitations

High Transparency

The System Card includes an extensive section on bias evaluations that assess the models' treatment of political topics and potential discriminatory bias, among other topics. Claude Opus 4 and Claude Sonnet 4 demonstrated bias levels similar to or less than Claude Sonnet 3.7.

Evaluation with Public Tools

Adversarial Testing Procedure

High Transparency

The System Card details single-turn violative request evaluations, ambiguous context evaluations, multi-turn testing, and jailbreak resistance testing using the StrongREJECT benchmark. The alignment assessment section also describes various adversarial testing procedures, including alignment faking assessment and reward hacking evaluations.

Model Mitigations

Medium Transparency

The System Card describes the iterative model evaluations throughout training to understand how catastrophic risk-related capabilities evolved over time. They tested multiple different model snapshots and implemented appropriate safeguards, with Claude Opus 4 being deployed with ASL-3 safeguards and Claude Sonnet 4 with ASL-2 safeguards. The actual mitigation techniques are described in broad terms; they included post-training using Constitutional AI and "training for specific characteristics".