Model Rating Report Logo

GPT-o3 and o4

GPT-o3 and -GPT-o4 mini are designed to reason for longer before responding. The models are able to agentically interact with all tools currently available to ChatGPT. This includes searching the internet, analysing uploaded files and visual inputs, and generating images.

Developer

OpenAI

Country of Origin

USA

Systemic Risk

Open Data

Open Weight

API Access Only

Ratings

Overall Transparency

47%

Data Transparency

22%

Model Transparency

18%

Evaluation Transparency

59%

EU AI Act Readiness

46%

CAIT-D Readiness

40%

Transparency Assessment

The transparency assessment evaluates how clear and detailed the model creators are about their practices. Our assessment is based on the official documentation lists in Sources above. While external analysis may contain additional details about this system, our goal is to evaluate transparency of the providers themselves.

Basic Details

Date of Release

16 April 2025


Methods of Distribution

The models can be accessed through the ChatGPT API


Modality

The models can take text, files, code, and images as input. They are able to access tools within the ChatGPT catalogue (web browsing, Python, image analysis and generation etc) and produce text.


Input and Output Format

The context window for this model is 200,000 tokens and it can output a maximum of 100,000 tokens.


License

Proprietary


Instructions for Use

Instructions for using this model can be found both on the OpenAI website and on the ChatGPT page.


Documentation Support
Low Transparency

The documentation is clear and easy to read. Most key information is available but too general. Some information is missing entirely, such as details on the system architecture or training processes.


Changelog

You can find the changelog [here](https://platform.openai.com/docs/changelog); however, it may not contain all the details related to minor changes.


Policy

Acceptable Use Policy

Usage Policies are available on Open AI's website.


User Data

User data is used to train ChatGPT models. This includes log data (e.g. your IP address), usage data, device information, location information, and cookies.


Data Takedown

You can find how to opt out of model training and remove your data [here](https://help.openai.com/en/articles/7730893-data-controls-faq)


AI Ethics Statement

OpenAI describe their principles in their [OpenAI Charter](https://openai.com/charter/).


Incident Reporting

ChatGPT has a reporting feature that you can use to give feedback and report incidents. You can find more information [here](https://chatgpt.com/g/g-Jjm1uZYHz-incident-reporting).


Model and Training

Task Description
Medium Transparency

The models are capable of responding to text, image, code, and file input. They have access to the full range of ChatGPT tools, including searching the internet, image analysis and generation, and Python.


Number of Parameters


Model Design
Unknown

Not explicitly stated in the provided documents.


Training Methodology
Low Transparency

OpenAI reasoning models are trained using reinforcement learning on chains of thought to encourage reasoning.


Computational Resources


Energy Consumption


System Architecture


Training Hardware


Data

Dataset Size


Dataset Description
Low Transparency

The two models were trained on diverse datasets. This included information that is available publicly online, information from third parties, and information from users, human trainers and researchers. Data is pre-processed to maintain quality and mitigate potential risks.


Data Sources
Low Transparency

The data used to train these models was sourced from third parties, publicly available information on the internet, and from ChatGPT users, researchers, and human trainers.


Data Collection - Human Labor
Low Transparency

Human labour is used in the production of this data. This includes data produced and generated by researchers and human trainers.


Data Preprocessing
Low Transparency

Data was filtered to maintain quality and mitigate a series of identified risks. Personal information was reduced from the training data. Moderation API and safety classifiers were also used to help prevent the use of harmful or sensitive content, including explicit material.


Data Bias Detection
Unknown


Data Deduplication


Data Toxic and Hateful Language Handling


IP Handling in Data


Data PII Handling

Personal information is removed from the dataset through pre-training filtering.


Data Collection Period


Evaluation

Performance Evaluation
Medium Transparency

The models are tested against a variety of safety and performance evaluations. These include in-house evaluations, third-party evaluations by groups including Apollo Research and METR, PersonQA, PaperBench, SWE-Lancer, and MMLU. The results of each evaluation are listed in the system card and compared to other OpenAI models. Many of these benchmarks are reported with clear explanations for how and why the evaluation was conducted, but this is not the case for all of the evaluations.


Evaluation of Limitations
Medium Transparency

The models can hallucinate, be jailbroken (i.e. prompted to produce inappropriate content) and produce incorrect refusals. Hallucinations are a significant concern with the models hallucinating 33% and 48% percent of the time on PersonQA (a benchmark that asks questions about public figures). Detailed results related to these limitations are reported in the System Card.


Evaluation with Public Tools


Adversarial Testing Procedure
Medium Transparency

The model is tested extensively for safety risks. This includes through jailbreak testing to evaluate the robustness of the model using adversarial prompts. These jailbreaks are either human sourced or they are from the StrongReject database. Other safety risks are evaluated, including harmful image generation, production of disallowed content, hallucinations, and bias, among others. The measures taken to prevent each risk, the evaluations used to test them, and the results of each evaluation are included in the system card.


Model Mitigations
Medium Transparency

The model mitigations included post-training to teach the model about refusal behavior for harmful requests and using moderation models for the most egregious content. The final models are tested for a variety of safety risks including fairness and bias, personal identification, and deception by both OpenAI and third parties.