Trustible AI

Model Rating Report

Claude 3.7 Sonnet

Claude 3.7 Sonnet is a hybrid reasoning model with a standard text generation and separate "extended thinking" mode.

Developer

Anthropic

Country of Origin

USA

Systemic Risk

Open Data

Open Weight

API Access Only

Ratings

Overall Transparency

54%

Data Transparency

29%

Model Transparency

24%

Evaluation Transparency

74%

EU AI Act Readiness

44%

CAIT-D Readiness

30%

Transparency Assessment

The transparency assessment evaluates how clear and detailed the model creators are about their practices. Our assessment is based on the official documentation lists in Sources above. While external analysis may contain additional details about this system, our goal is to evaluate transparency of the providers themselves.

Sources

Release Announcement: https://www.anthropic.com/news/claude-3-7-sonnet
System Card: https://assets.anthropic.com/m/785e231869ea8b3b/original/claude-3-7-sonnet-system-card.pdf
Developer Documentation: https://docs.anthropic.com/en/docs/welcome

Basic Details

Date of Release

February 24, 2025

Methods of Distribution

Claude is available through Anthropic API, Amazon Bedrock, and Google Cloud’s Vertex AI.

Modality

Claude models can take text and images as input and output text.

Input and Output Format

The context window is 200k. The max output is 8192 tokens in regular mode and 65k tokens in 'Extended thinking' mode.

License

Properiatary

Instructions for Use

The [User Guide](https://docs.anthropic.com/en/docs/welcome) provides detailed instructions.

Documentation Support

Medium Transparency

The Developer Documentation is extensive and provides a lot of hands-on guidance for use. While the System Card contains a lot of information on system testing, it provides few details on the model and the data, and could be better organized.

Changelog

https://docs.anthropic.com/en/release-notes/overview

Policy

Acceptable Use Policy

https://www.anthropic.com/legal/aup

User Data

Model inputs and outputs are not used to train models. Exceptions apply when the data is flagged for review by the Trust & Safety team or reported by the user.

Data Takedown

Anthropic had a detailed data takedown and [privacy policy](https://www.anthropic.com/legal/privacy). (Article on [copyright infringement](https://privacy.anthropic.com/en/articles/7996901-i-think-a-user-is-infringing-my-copyright-or-other-intellectual-property-how-do-i-report-it)).

AI Ethics Statement

Anthropic uses a [Responsible Scaling Policy](https://www.anthropic.com/news/anthropics-responsible-scaling-policy) and has a [constitution](https://www.anthropic.com/news/claudes-constitution) used during model training.

Incident Reporting

Incidents can be reported by emailing usersafety@anthropic.com. In addition, a Responsible Disclosure Policy is documented [here](https://www.anthropic.com/responsible-disclosure-policy).

Model and Training

Task Description

Medium Transparency

Claude can be used for reasoning, coding, multilingual tasks and image understanding. The "extended thinking" mode can be used for improved performance on math, physics, instruction-following and coding tasks. The developer documentation provides multiple extended examples.

Number of Parameters

Model Design

Low Transparency

The model has a "hybrid" design that allows it to provide instant responses or engage into a more complex reasoning mode. Details of the implementation are not provided.

Training Methodology

Low Transparency

Claude was pre-trained on a large dataset for next word prediction. It was post-trained using human feedback techniques to produce responses that are helpful and harmless. Part of post-training involved Constitutional AI, a reinforcement learning technique that aligns the model with a set of rules and principles.

Computational Resources

Energy Consumption

System Architecture

Training Hardware

Data

Dataset Size

Dataset Description

Low Transparency

Claude 3.7 Sonnet is trained on a proprietary mix of publicly available information on the Internet, as well as non-public data from third parties, data provided by data labeling services and paid contractors, and data we generate internally. The model was not been trained on any user prompt or output data submitted to us by users or customers, including free users, Claude Pro users, and API customers.

Data Sources

Low Transparency

The training dataset consists of a proprietary mix of publicly available information on the Internet, non-public data from third parties, data provided by data labeling services and paid contractors, and data created internally. The web data is collected using a general purpose web-crawler that respects robot.txt files and does not attempt to bypass CAPTCHA controls.

Data Collection - Human Labor

Low Transparency

The documentation explicitly references data labeling services and paid contractors, but does not provide any additional details.

Data Preprocessing

Low Transparency

The system card states that data cleaning and filtering methods were used, like deduplication and classification. No additional information is provided.

Data Bias Detection

Unknown

No information provided.

Data Deduplication

Data cleaning included deduplication.

Data Toxic and Hateful Language Handling

No information provided.

IP Handling in Data

No information provided.

Data PII Handling

No information provided.

Data Collection Period

Claude's knowledge cut-off is the end of October 2024 (web data was collected through November 2024)

Evaluation

Performance Evaluation

Medium Transparency

Claude was evaluated on reasoning, math, coding and agentic tool use benchmarks. The model excelled in coding and agentic tool-use capabilities: out-performing existing models by 13% on the SWE-bench Verified, which measures the models ability to solve real-world software issues, and setting a new SotA on TAU-bench, a framework for testing AI agent performance on complex tasks that involve user and tool interactions.

Evaluation of Limitations

High Transparency

Claude was evaluated for Appropriate Harmlessness, Bias, Computer Use Safety and Chain-of-Thought Faithfulness.

Appropriate Harmlessness is a newly developed evaluation that considers both if a model refused to reply and if it generated unsafe content. This scheme was used to account for ambiguous input prompts and safe responses to prompts labeled as unsafe. On an internal dataset, Claude produced unnecessary refusal 12.5% of the time and a policy violation 0.6% of the time.

Bias evaluation was conducted using the Bias Benchmark for Question Answering, which measures whether a model relies of stereotypes during question answering. Claude was shown to maintain neutrality without compromising accuracy.

Computer Use safety evaluated whether Claude was susceptible to Indirect Prompt Injection when used in Computer Use. This evaluation was conducted using a hand-crafted dataset of "computer screens" containing unsafe content. The study found that prompt injections could be prevented 88% of the time by the final model.

Chain-of-Thought (CoT) Faithfulness was a new evaluation designed to measure whether the CoT generated by the "extended thinking" mode aligns with the final response. They found that the CoT does not reliably reveal the full reasoning process.

Evaluation with Public Tools

Adversarial Testing Procedure

High Transparency

Claude underwent extensive testing under Anthropic's Responsible Scaling Policy and received an ASL-2 rating (same as Claude 3.5). This evaluation included CBRN, autonomy and Cyber Risk evaluations. Risks were measured using multiple techniques including:

- Uplift trials - Compare how individuals perform on a sensitive task with access to Claude vs the internet alone
- Knowledge Benchmarks - evaluated model's knowledge of sensitive subjects
- Capability Benchmarks - evaluated model's ability to solve custom cybersecurity challenges
- External Red-Teaming

Detailed discussion of each evaluation is included in the System Card.

Model Mitigations

Medium Transparency

Model mitigations were implemented to cover a broad range of risks including Child Safety, Cyber Attacks, Dangerous Weapons and Technology, Hate & Discrimination and CBRN harms. The mitigations including post-training techniques like Constitutional AI and RLHF.