Trustible AI

Model Rating Report

GPT-4o

GPT-4o is a multilingual, multimodal transformer model for processing text, images, audio and code.

Developer

OpenAI

Country of Origin

USA

Systemic Risk

Open Data

Open Weight

API Access Only

Ratings

Overall Transparency

55%

Data Transparency

41%

Model Transparency

18%

Evaluation Transparency

70%

EU AI Act Readiness

51%

CAIT-D Readiness

56%

Transparency Assessment

The transparency assessment evaluates how clear and detailed the model creators are about their practices. Our assessment is based on the official documentation lists in Sources above. While external analysis may contain additional details about this system, our goal is to evaluate transparency of the providers themselves.

Sources

- https://openai.com/index/hello-gpt-4o/
- https://openai.com/index/gpt-4o-system-card/
- https://openai.com/index/gpt-4o-image-generation-system-card-addendum/

Basic Details

Date of Release

May 13th, 2024. A new image generation approach was added on March 25th, 2025.

Methods of Distribution

GPT-4o and GPT-4o mini are available via the Chat Completions API, Assistants API, and Batch API so long as you have an OpenAI API account. You can also access GPT-4o via the OpenAI Playground.

Modality

GPT-4o is a multimodal that can take in and output any combination of text, audio, and image. It can also take video as an input, but it currently is unable to produce video output.

Input and Output Format

The input context window for GPT-4o is 128K tokens with an output limit of 2,048 tokens. The input and output format can be found on the [model release page](https://openai.com/index/hello-gpt-4o/).

License

You can find the terms of use for your country[here](https://openai.com/policies/terms-of-use/) as the model is proprietary.

Instructions for Use

You can find instructions on how to use any of the ChatGPT models [here](https://openai.com/chatgpt/overview/).

Documentation Support

Low Transparency

The documentation is easy to access and navigate. It covers the capabilities, evaluations, and mitigations in a lot of detail. Some information is available about training data. However, there is no information available on the architecture and hardware used to train the model.

Changelog

You can find the changelog for GPT-4o [here](http://platform.openai.com/docs/models#gpt-4o).

Policy

Acceptable Use Policy

The OpenAI Usage policies can be found [here](https://openai.com/policies/usage-policies/).

User Data

User data is used to train ChatGPT models. This includes log data (e.g. your IP address), usage data, device information, location information, and cookies.

Data Takedown

You can find how to opt out of model training and remove your data [here](https://help.openai.com/en/articles/7730893-data-controls-faq)

AI Ethics Statement

OpenAI describe their principles in their [OpenAI Charter](https://openai.com/charter/).

Incident Reporting

ChatGPT has a reporting feature that you can use to give feedback and report incidents. You can find more information [here](https://chatgpt.com/g/g-Jjm1uZYHz-incident-reporting).

Model and Training

Task Description

Medium Transparency

GPT-4o is capable of producing a combination of text, audio, and images in response to multimodal inputs (text, audio, image, and video). OpenAI provide a series of capability "explorations" [here](https://openai.com/index/hello-gpt-4o/). These are samples of the model's responses to a series of prompt scenarios, including "Poster creation for the movie 'detective'" and "Lecture summarisation". Some limitations are shown in a video highlighting issues with the model switching languages mid-sentence, but there is no detailed explanation beyond this and no clear identification of an exhaustive list of limitations.

Number of Parameters

None

Model Design

Unknown

None

Training Methodology

Low Transparency

Training involved a pre-training phase using partially filtered data, a post-training phase where the model is aligned to human preferences via red-teaming.

Computational Resources

None

Energy Consumption

None

System Architecture

None

Training Hardware

None

Data

Dataset Size

None

Dataset Description

Low Transparency

Key dataset components include data from public web pages, code and math data, and multimodal data. There is no publicly available information on exactly what data was used to train the model.

Data Sources

Low Transparency

The pre-training data for GPT-4o included a curated set of publicly available data, mostly collected from machine learning datasets and web crawls, and proprietary data from data partnerships. These partnerships allow OpenAI to access non-publicly available data, such as data usually behind a paywall, archives, and metadata.

Data Collection - Human Labor

Unknown

None

Data Preprocessing

Medium Transparency

The data used to train the model was pre-filtered to remove any unwanted and harmful information. The use of the Moderation API and safety classifiers means they can filter out data that may contribute to harmful content or information hazards. This includes CSAM, hateful content, violence, and CBRN. Image datasets are filtered for explicit content. Personal information is also removed from the training data using advanced data filtering processes.

Data Bias Detection

Unknown

None

Data Deduplication

None

Data Toxic and Hateful Language Handling

Data that could contribute to harmful content or information hazards is filtered out. The documentation lists "hateful content" as a category of data that would be removed from the set before training, but does not give a clear description for what this is.

IP Handling in Data

IP data is stored and used to train ChatGPT models, it is not removed from the dataset.

Data PII Handling

Personal information is reduced from the training data using advanced data filtering processes.

Data Collection Period

The data cut-off is October 2023.

Evaluation

Performance Evaluation

Low Transparency

GPT-4o has been evaluated on a number of benchmarks relating to text evaluation, audio ASR performance, audio translation performance, multimodal performance (M3Exam Zero-Shot), and vision understanding. The results of these capability evaluations can be found [here](https://openai.com/index/hello-gpt-4o/).

Evaluation of Limitations

Medium Transparency

Red teaming was used to evaluate the model to determine any possible dangerous capabilities, assess risks, and stress test mitigations. Red teamers covered categories that spanned (amongst others) violative and disallowed content, misinformation, bias, ungrounded inferences, emotional perception and anthropomorphisation risks, and copyright. The data generated by this led to the creation of a series of quantitative evaluations.

Alongside the new evaluation methods, a range of pre-existing evaluation datasets were altered to allow them to work for speech-to-speech models. This does lead to heavy reliance on the text-to-speech model.

One type of evaluation dataset used to assess GPT-4o includes the capture the flag challenges designed to determine cybersecurity risk. These evals covered web application exploitation, reverse engineering, remote exploitation, and cryptography. The biological risk and persuasiveness of the model was also assessed.

Finally, third-party evaluations were also run by METR and ApolloResearch to evaluate the risks posed by GPT-4o.

Evaluation with Public Tools

GPT-4o is evaluated using public tools including MMLU, HumanEval, and M3Exam. You can find the full list of performance evaluations [here](https://openai.com/index/hello-gpt-4o/)

Adversarial Testing Procedure

Medium Transparency

Red-teaming was used to test for potentially dangerous capabilities in GPT-4o. The red-teamers were external to OpenAI and were given access to snapshots of the model at various stages of training. Red-teaming occurred in four stages: the first three tested the model via an internal tool, and the latter used the full iOS experience.

Red teamers were asked to assess novel risks and stress test mitigations covering a wide range of categories. This included disallowed content, bias, ungrounded inference, and fraudulent behaviour.

Automated red teaming was also used to test the addendum native image generation capabilities.

Model Mitigations

High Transparency

The creation of GPT-4o involved a multi-stage mitigation process including pre-processing of the training data, alignment with human values, multi-stage red teaming, and a series of both internal and external evaluations. Based on these evaluations, specific measures were taken to mitigate the risk posed by the model. There are also clear feedback channels for users and ways to report incidences that allow the data provided to further improve the model.

For example, the model was seen to make potentially biased inferences about speakers. To mitigate this issue, the model was post-trained to refuse to comply with questions asking it to make ungrounded inferences and hedge questions that require sensitive trait attribution. Compared to the initial model, the post-trained model was 24 points more likely to correctly responded to sensitive trait identification requests.

The addition of new native image generation techniques in March 2025 also introduced new safety risks due to increased image generation and modification capabilities. Mitigation strategies against this risk include chat model refusals, prompt blocking, output blocking, and increased safeguards for minors. These are detailed in the addendum to the GPT-4o system card.

The details and results of the rest of the mitigations are published on the [GPT-4o System Card](http://openai.com/index/gpt-4o-system-card/#a-violative-and-disallowed-content-full-evaluations).