Model Rating Report

Last Updated: March 9, 2026

Overview

Llama-3 Family

Llama-3 Family

Llama-3 is a family of open-access of large-language models released by Meta. This page provides an analysis for both the "base" models and the "instruct" models.

Developer

Meta

Country of Origin

USA

Systemic Risk

Open Data

Open Weight

API Access Only

Ratings

Overall Transparency

75%

Data Transparency

60%

Model Transparency

76%

Evaluation Transparency

92%

EU AI Act Readiness

81%

CAIT-D Readiness

76%

Transparency Assessment

The transparency assessment evaluates how clear and detailed the model creators are about their practices. Our assessment is based on the official documentation lists in Sources above. While external analysis may contain additional details about this system, our goal is to evaluate transparency of the providers themselves.

Basic Details

Llama-3 models were released on in April and in July 2024

TRUSTIBLE GUIDANCE

Date of Release

The Date of Release is the date when the model was made available to the public. Publishing these dates is especially important when multiple versions of the model are released over time.

EU AI Act Requirements

Annex XI Section 1.1c: date of release

CAIT-D Requirements
California’s AI Training Data Transparency Act

- (11) The dates the datasets were first used during the development of the artificial intelligence system or service.

The models can be accessed through Hugging Face or downloaded directly from the Meta website.

TRUSTIBLE GUIDANCE

Methods of Distribution

The Methods of Distribution specify all the ways a model can be accessed. Common methods include: a direct model download, access through an API or a hybrid option where a private API endpoint is externally hosted.

EU AI Act Requirements

Annex XI Section 1.1c: methods of distribution

Llama-3 is a text-to-text generative model.

TRUSTIBLE GUIDANCE

Modality

The Modality specifies the types of data that the model can process and output. Common domains include text, images, video, audio and tabular data.

EU AI Act Requirements

Annex XI Section 1.1e: modality (e.g., text, image)

Prompt and output formatting is available in the Hugging Face and Github documentation. All models support sequence length up to 8192 tokens.

TRUSTIBLE GUIDANCE

Input and Output Format

The Input and Output Format are specifications for how the data should be provided to the model and the exact format output by the model. If applicable, the documentation should include maximum size of the data (e.g. context window length).

EU AI Act Requirements

Annex XI Section 1.1e: format of the inputs and outputs and their maximum size (e.g. context window length, etc.)

Llama-3 is released under a custom license with some limitations for commercial use. The license requires explicit attribution: Derived models, for instance, need to include "Llama 3" at the beginning of their name, and you also need to mention "Built with Meta Llama 3" in derivative works or services.License: https://llama.meta.com/llama3/license/

TRUSTIBLE GUIDANCE

License

The License is the terms under which the model is released. It indicates whether the model can be used for commercial purposes and whether it can be modified and redistributed. Models released exclusively through an API may not have a license, but still be governed through a "Terms of Service".

EU AI Act Requirements

Annex XI Section 1.1f: the license

Instructions for Use are available on Github and on Hugging Face.

TRUSTIBLE GUIDANCE

Instructions for Use

Instructions for Use provide guidance for using the model. Ideally, these include specific examples and recommendations. If applicable, the instructions should specify any software/hardware dependencies needed to use the system, and how the model can interact with hardware/software that is not part of the model itself.

EU AI Act Requirements

Annex XI Section 1.2a: the technical means (e.g. instructions of use, infrastructure, tools) required for the general-purpose AI model to be integrated in AI systems.
Annex XII 1.d: how the model interacts, or can be used to interact, with hardware or software that is not part of the model itself, where applicable;

No explanation provided for this rating.

TRUSTIBLE GUIDANCE

Documentation Support

Documentation Support evaluates the accessibility and usefulness of the model's documentation. For a high score in this category, the key details required across the categories need to both exist and be easily accessible.

Rating Guide
Unknown

Documentation is not available or limited to several broad sentences.


Low Transparency

Documentation touches many key topics with some detail, but certain areas (e.g. training data) are missing entirely.


Medium Transparency

Documentation covers all topics that are necessary to use and evaluate the model, but some areas are described vaguely. Excellent documentation may be places in this category, if it is difficult to find and navigate.


High Transparency

Documentation covers almost all or all topics in detail and is easy to navigate.


Llama-3 is a one-time released model.

TRUSTIBLE GUIDANCE

Changelog

A Changelog is an artifact that lists out versions of the model with changes that were added in each version. Entries in the changelog should make it clear to a user how the system has changed, and what modifications need to be made for effective use. If a model was released once and no changes will be applied, the documentation should make this clear.

Report Issue / Feedback

Policy

The Acceptable Use Policy is available here: https://llama.meta.com/llama3/use-policy/

TRUSTIBLE GUIDANCE

Acceptable Use Policy

The Acceptable Use Policy specifies how a model can/can not be used. When a model is released under a fully open-source license, this policy may not be necessary.

EU AI Act Requirements

Annex XI Section 1.1b: acceptable use policies applicable

Meta user data is not used to train Llama models.

TRUSTIBLE GUIDANCE

User Data

Model developers should clearly state whether user data is used to train models. For models that are not accessed via an API, the documentation should make it clear, if user data from other products offered by the developer were used. Listing out an explicit set of external datasets is an allowed alternative to a “user data” statement, but it should be very clear that whether user-related data is in this set.

No explanation provided for this rating.

TRUSTIBLE GUIDANCE

Data Takedown

Model providers should provide a clear mechanism for submitting takedown claims for copyrighted or personal data. The mechanism can include an online form, an email or an in-app button.

Meta detail their approach to responsible AI production here.

TRUSTIBLE GUIDANCE

AI Ethics Statement

Model providers should publish an AI Ethics statement that captures the principles used during model development. Alternatively, a company can publish a set of RAI objectives.

Reporting issues with the model: https://github.com/meta-llama/llama-models/issues . Reporting risky content generated by the model: developers.facebook.com/llamaoutputfeedback

TRUSTIBLE GUIDANCE

Incident Reporting

Model Providers should provide a clear mechanism for submitting model feedback and/or for incident reporting.

Report Issue / Feedback

Model and Training

The base models are intended to be adapted for a variety of natural language and code generation tasks, while the -instruct models are intended to be sued for assistant-like chat. Evaluation on several common reasoning and knowledge benchmarks is published.

The documentation discusses in general terms that some safety protections into the -Instruct models, but inaccurate, biased or other objectionable responses to user prompts are possible. In addition, the documentation states that most evaluation was done on English-language content and consistent performance in other languages is not guaranteed.

TRUSTIBLE GUIDANCE

Task Description

The task description should clearly describe the intended uses for the model. Detailed documentation should, also, cover limitations and out-of-scope uses.
Transparency around model capabilities allows users to properly assess if the model is suitable for their task.

EU AI Act Requirements

Annex XI Section 1.1a: the tasks that the model is intended to perform and the type and nature of AI systems in which it can be integrated

Rating Guide
Unknown

Model capabilities are not documented.


Low Transparency

A general description of model capabilities is provided. For example, the documentation only states that the model can be used for "coding, math and reasoning" tasks.


Medium Transparency

Intended uses of the model are described in detail with examples. Some general limitations are mentioned.


High Transparency

Both model capabilities and limitations are described in detail and with examples.


The Model family consists of a 8B and 70B parameter variant.

TRUSTIBLE GUIDANCE

Number of Parameters

The Number of Parameters indicates how large the model is.

EU AI Act Requirements

Annex XI Section 1.1d: number of parameters

Llama-3 is based on the Llama-2 architecture, that is discussed in detail in that model's paper (see our Llama-2 rating page for more details). Two key architecture changes are included: the vocabulary is expanded to 128k tokens to encode language more effectively and Grouped-Query Attention is used for increased efficiency.

TRUSTIBLE GUIDANCE

Model Design

The Model Design should cover key components of the model and explain how the inputs get transformed into the outputs. Transparency around model architecture can help users understand the suitability of the model for a particular task.

EU AI Act Requirements

Annex XI Section 1.2b: the design specifications of the model...the key design choices including the rationale and assumptions made

Rating Guide
Unknown

Model design is not documented.


Low Transparency

Model architecture is discussed in general terms.


Medium Transparency

Key components of the model are documented.


High Transparency

Model components are described in detail. Rationales and assumptions are documented.


Training process is described in general terms: Pre-training for the base model, followed by a combination of supervised fine-tuning (SFT), rejection sampling, proximal policy optimization (PPO), and direct policy optimization (DPO) for the Instruction model.

In the release announcement, a short discussion of the pre-training process, like parallelization strategies is provided.

TRUSTIBLE GUIDANCE

Training Methodology

The Training Methodology should cover the key steps involved in training the model. This should involve both high-level steps and details of the process. For example, Foundation Models are often trained in multiple phases: pretraining, supervised fine-tuning and alignment with human preference/safety. Each step can be implemented via different techniques (e.g. alignment can be done via RLHF or Constitutional AI). Documenting this process can help the users understand the strengths and weaknesses of a particular model.

EU AI Act Requirements

Annex XI Section 1.2b: the design specifications of...training process, including training methodologies and techniques, the key design choices including the rationale and assumptions made; what the model is designed to optimise for

Rating Guide
Unknown

Training methodology is not documented.


Low Transparency

Training procedures and/or target objectives are mentioned in general terms.


Medium Transparency

Main steps of the training process are described in detail, including objectives.


High Transparency

Training process is described in detail, including a rationale for the design and any assumptions.


Information about parameters used during training was not identified.

TRUSTIBLE GUIDANCE

Hyperparameters

Hyperpameters are external configuration variables that control the model shape and training settings. These are important for understanding the overall structure of the model and how it was trained.

EU AI Act Requirements

Annex XI Section 1.2b: the relevance of the different parameters, as applicable

Rating Guide
Unknown

No model parameters are documented.


Low Transparency

Key model parameters are documented with no additional information.


Medium Transparency

Key model parameters are documented, and their roles are explained.


High Transparency

Key model parameters are documented, and their roles are explained. Discussion is included of how the values are chosen and/or how values should be adjusted for fine-tuning.


According to the Model Card, the model's were trained on Meta's Research SuperCluster. A total of7.7M GPU hours of computation on hardware of type H100-80GB were used to train the family of models.

TRUSTIBLE GUIDANCE

Computational Resources

Computational Resources can include training times, FLOPs (floating point operations) and other details that can be used to assess the magnitude of resources used to train the model.

EU AI Act Requirements

Annex XI Section 1.2d: the computational resources used to train the model (e.g. number of floating point operations – FLOPs-), training time, and other relevant details related to the training;

According to the Model Card, the estimated total emissions were 2290 tCO2eq.

TRUSTIBLE GUIDANCE

Energy Consumption

Energy Consumption refers to the carbon emission associated with training the model. It can be approximated based on GPUs used and training time (Calculator: https://mlco2.github.io/impact/#compute).

EU AI Act Requirements

Annex XI Section 1.2e: known or estimated energy consumption of the model...
With regard to [this] point, where the energy consumption of the model is unknown, the energy consumption may be based on information about computational resources used.

No explanation provided for this rating.

TRUSTIBLE GUIDANCE

System Architecture

The System Architecture description should explain how the model is connected to the end-to-end system. For example, an LLM could be connected to a separate content filtering models for its inputs and/or outputs.

This rating only applies to API-only systems.

EU AI Act Requirements

Annex XI Section 2.3: Where applicable, a detailed description of the system architecture explaining how software components build or feed into each other and integrate into the overall processing.

No explanation provided for this rating.

TRUSTIBLE GUIDANCE

Training Hardware

The Training Hardware documentation can include exact GPU models or country of origin of the technology. This information may be necessary as part of vendor security vetting.

Report Issue / Feedback

Data

According to the Model Card, Llama 3 was pretrained on over 15 trillion tokens of data from publicly available sources. The fine-tuning data includes publicly available instruction datasets, as well as over 10M human-annotated examples.

TRUSTIBLE GUIDANCE

Dataset Size

Dataset Size indicates how much data was used to train the model. This can be specified in terms of the number of documents, tokens or other measures.

EU AI Act Requirements

Annex XI Section 1.2c: the number of data points

CAIT-D Requirements
California’s AI Training Data Transparency Act

- (3) The number of data points included in the datasets, which may be in general ranges, and with estimated figures for dynamic datasets.

The pre-training dataset consists of a mix of about 50% general-knowledge tokens (primarily filtered web data), 25% of mathematical and reasoning tokens, 17% code tokens, and 8% multilingual tokens.

The post-training preference dataset was collected internally at Meta and involved annotator ranking the strength of their preference into one of four levels. The preference conversations topics included general english interactions (50%), reasoning and tool use (21%), and coding (15%).

The supervised fine-tuning data consisted of human annotated examples, synthetic data and additional curated datasets discussed in the report.

TRUSTIBLE GUIDANCE

Dataset Description

The Dataset Description provides an overview of the data used for training. It should include a description of individual datapoints, distinct subpopulations and the corpus as a whole. The characteristics described can include low-level properties like number of tokens from each data source and semantic properties like percent of documents in each language. While the Data Sources category focuses on the origins of the data, this category is used to review transparency of the final dataset.

EU AI Act Requirements

Annex XI Section 1.2c: information on the data used for training, testing and validation, including...the number of datapoints, their scope and main characteristics

CAIT-D Requirements
California’s AI Training Data Transparency Act

- (4) A description of the types of data points within the datasets. For purposes of this paragraph, the following definitions apply: (A) As applied to datasets that include labels, “types of data points” means the types of labels used. (B) As applied to datasets without labeling, “types of data points” refers to the general characteristics.

Rating Guide
Unknown

No description or analysis of the dataset is available.


Low Transparency

Dataset is described in general terms.


Medium Transparency

Dataset is described in terms of multiple characteristics with some numeric analysis.


High Transparency

Dataset is analyzed across multiple dimensions, including a separate analysis for different subpopulations of the data (e.g. different sources).


The pre-training data consists of a curated set of web data, collected via an unspecified process. The post-training datasets consisted of human-annotated prompt-output pairs and of new synthetic data that targeted coding, multi-lingual, mathematical, long-context and tool-use capabilities. Details about how the different post-training datasets were collected are available in the report.

TRUSTIBLE GUIDANCE

Data Sources

The Data Source documentation covers the types of data used to train the model and how they were collected. Transparency around sources can help users to assess if a model is suitable for their task (e.g. was it trained on their language) and to gauge the risk profile of the model (e.g. was it trained on unrestricted internet data). The documentation should make it clear if the dataset was purchased or licensed.

When creating new datasets, documentation should cover the steps involved in creation and limitations, like missing data. In addition, documentation should state when the dataset was collected. If synthetic data was generated for the dataset, the documentation should make the process involved clear.

EU AI Act Requirements

Annex XI Section 1.2c: information on the data used for training, testing and validation ... including type and provenance of data ... how the data was obtained and selected. ... [and] all other measures to detect the unsuitability of data sources

CAIT-D Requirements
California’s AI Training Data Transparency Act

- (1) The sources or owners of the datasets
- (2) A description of how the datasets further the intended purpose of the artificial intelligence system or service.
- (6) Whether the datasets were purchased or licensed by the developer
- (12) Whether the generative artificial intelligence system or service used or continuously uses synthetic data generation in its development. A developer may include a description of the functional need or desired purpose of the synthetic data in relation to the intended purpose of the system or service.

Rating Guide
Unknown

Very few or no details are provided about the data sources. When this rating is applied, the user has little ability to determine if the data sources used are appropriate to use for their task.


Low Transparency

The data sources and process for collecting them are described in general terms.


Medium Transparency

Data sources are enumerated, and the collection process is described in some detail. Justifications and limitations surrounding data curation are not addressed.


High Transparency

Data sources are documented in detail, including justifications and limitations for the choices. Data collection process is described in detail, including a discussion of any missing data, limitations and/or assumptions.


The documentation references human-annotated examples used during fine-tuning, but does not provide any details about how these examples were produced.

TRUSTIBLE GUIDANCE

Data Collection - Human Labor

This category assesses the transparency surrounding the human labor involved in the generation of training data. For human labor, we refer to individuals outside of the development team that were employed to create, annotate or review datasets. Datasets include both original pre-training data and human preference data that is used iteratively during post-training. Transparency in this category can help assess potential biases in the data and hold developers accountable to fair labor practices.

If no manual annotation or review was used when constructing the dataset, this category may be marked as Not Applicable. This may occur if the dataset is composed entirely of off-the-shelf data from the Internet.

Rating Guide
Unknown

No information is provided about labor used for dataset construction.


Low Transparency

The documentation acknowledges that human labor was involved in the data collection or annotation process but lacks specific details.


Medium Transparency

The documentation describes the role of human labor in dataset creation, annotation, or review, including methods and scale (e.g., "data annotated by X number of workers from platform Y"). Some information about contributors demographics, compensations and biases/limitations is included but lacks comprehensiveness.


High Transparency

The human labor process is fully documented, including the number of contributors, their geographic or demographic diversity, and the specific tasks performed. The documentation includes the sourcing of data (e.g., platforms, partnerships) and a thorough description of labor practices, including payment rates and working conditions. Potential biases introduced by human labor practices are discussed.


The pre-training web data was processed using a multi-step process that included PII and safety filtering (by excluding websites that are known to contain harmful and adult content), custom HTML parsing, deduplication (on a URL, Document and Line level), heuristic filtering to remove low quality data (e.g. removing lines that consisted of duplicated content by error messages) and model-based quality filtering (using Llama-2 to do quality assessment). Separate filtering and extraction methods were used for English, multilingual and code/math documents. Post-training data, which was largely synthetically generated, was cleaned to remove undesired observed patterns (e.g.using an overly-apologetic tone).

TRUSTIBLE GUIDANCE

Data Preprocessing

The Data Preprocessing documentation covers how data sources were preprocessed for training. This should include both filtering steps (i.e. how were datapoints excluded) and transformation steps (i.e. how were source datapoints modified before training). A clear description of this process is important for assessing risks. For example, it can indicate how personally identifying information (PII) was handled: were documents containing PII removed and/or were PII words, like emails, replaced with a placeholder. In addition, this documentation will enable users to set-up additional data correctly for fine-tuning.

EU AI Act Requirements

Annex XI Section 1.2c: information on the data used for training, testing and validation, where applicable, including ... curation methodologies (e.g. cleaning, filtering etc)

CAIT-D Requirements
California’s AI Training Data Transparency Act

- (9) Whether there was any cleaning, processing, or other modification to the datasets by the developer, including the intended purpose of those efforts in relation to the artificial intelligence system or service.

Rating Guide
Unknown

No data preprocessing techniques are documented.


Low Transparency

Data filtering and/or cleaning is mentioned in very broad terms.


Medium Transparency

Detailed description of data preprocessing is provided. For filtering datapoints and/or excluding entire sources, procedures are clearly described. If data filtering is not performed, the documentation should provide a clear justification for the choice.


High Transparency

Detailed description of data preprocessing is provided with justifications and a discussion of limitations. Documentation should include a discussion of the data filtering criteria across multiple risk criteria (e.g removing duplicate data, handling toxic language and checking for data poisoning). While filtering may not be implemented for each dimension, the developers should show that they considered them and provide an explanation for their choice.


No preprocessing or analyses relating to bias could be identified.

TRUSTIBLE GUIDANCE

Data Bias Detection

The Data Bias Detection category assesses how developers reviewed the data for potential biases. We use Bias to refer to an incorrect or incomplete representation of human subpopulations (i.e. people of a certain race, gender or religion). Bias can appear in data both as text or images containing stereotypes and harmful content and/or as a lack of representation of a particular group. For this evaluation, we focus specifically on the training dataset, not on mitigations implemented during training during training like safety alignment.

EU AI Act Requirements

Annex XI Section 1.2c: information on the data used for training, testing and validation, including...methods to detect identifiable biases, where applicable

Rating Guide
Unknown

No bias analysis was conducted, and potential biases in the data are not discussed.


Low Transparency

The potential or realized biases in the training dataset are discussed, but no quantitative analysis is included. Biases towards individuals/groups should be referenced explicitly (e.g. an overall mention to unsafe/low-quality content is not sufficient).


Medium Transparency

In-depth bias analysis is conducted. For example, a demographic analysis was combined with some sentiment/bias analysis.


High Transparency

In-depth analysis for bias is combined with: a documented procedure to reduce bias, explanation for why bias is sufficiently low or justification for not modifying the dataset.


Deduplication is implemented at a URL, Document and Line level.

TRUSTIBLE GUIDANCE

Data Deduplication

The Data Deduplication category assesses whether the documentation discusses how duplicate entries were treated in the training data.

Filters are implemented to remove data from domains ranked as harmful by Meta safety standards and domains known to contain adult content.

TRUSTIBLE GUIDANCE

Data Toxic and Hateful Language Handling

The Data Toxic and Hateful Language Handling category assesses whether the documentation discusses how toxic and hateful entries were treated in the training data. The developers may chose to not remove such language, but they should provide a clear explanation for their decision (e.g. better performance or allowing customization of the final model).

No explanation provided for this rating.

TRUSTIBLE GUIDANCE

IP Handling in Data

This category assesses whether the documentation discusses how copyrighted entries and other types of IP were treated in the training data.

CAIT-D Requirements
California’s AI Training Data Transparency Act

- (5) Whether the datasets include any data protected by copyright, trademark, or patent, or whether the datasets are entirely in the public domain.

Filters are implemented to remove data from websites known to contain high volumes of PII.

TRUSTIBLE GUIDANCE

Data PII Handling

The Data PII (Personally identifiable information) Handling category assesses whether the documentation discusses how PII entries were treated in the training data. While in general developers should take care to remove such data, a clear justification of why such data was not removed will suffice for this category.

CAIT-D Requirements
California’s AI Training Data Transparency Act

- (7) Whether the datasets include personal information, as defined in subdivision (v) of Section 1798.140.
- (8) Whether the datasets include aggregate consumer information, as defined in subdivision (b) of Section 1798.140.

The knowledge cut-off for the pre-training dataset is at the end of 2023.

TRUSTIBLE GUIDANCE

Data Collection Period

Model documentation should clearly the state last date used for the model's training data (e.g. April 2023). This information is necessary to assess the accuracy of the model outputs.

Report Issue / Feedback

Evaluation

The performance of the Llama models was evaluated on general knowledge, mathematical reasoning, coding, multi-linguality, tool-use and long-context benchmarks. LLama-3-7B performs similarly to models of the same size, while Llama-3-405B generally outperforms other available open-source model and lags slightly behind GPT-4 and Gemini. The analysis included additional tests for model robustness (e.g. accounting for variability in multiple-choice benchmarks), adversarial benchmarks and a contamination analysis. Additional quality evaluation was conducted by asking human evaluations to rank outputs from Llama-3-405B and GPT-4o on multiple task types. Llama out performed GPT-4o on 2 out of 3 task types.

TRUSTIBLE GUIDANCE

Performance Evaluation

The Performance Evaluation covers the quantitative and qualitative analysis of model capabilities. While many of the models considered can be used for many different applications, there exist many benchmarks and protocols for reviewing the overall capabilities of the model.

The following key documentation dimensions are reviewed:

  • The choice of metrics/benchmarks used is clearly explained.

  • Metrics on multiple dimensions of model performance are reported in an externally reproducible fashion (Links to evaluation code or externally hosted benchmarks are provided where possible, but are not required).

  • Qualitative examples are included to supplement the user’s understanding of model performance.

  • Gaps in analysis and/or an error analysis are documented to further enhance the user’s understanding of the model’s performance.

EU AI Act Requirements

Annex XI Section 2.1: A detailed description of the evaluation strategies, including evaluation results, on the basis of available public evaluation protocols and tools or otherwise of other evaluation methodologies. Evaluation strategies shall include evaluation criteria, metrics.

Rating Guide
Unknown

No quantitative metrics are reported.


Low Transparency

Some quantitative metrics are reported, but evaluation methods are underspecified.


Medium Transparency

The documentation excels in one of the key documentation dimensions, but has significant gaps in other areas.


High Transparency

Documentations give the reader a clear and comprehensive sense of the model’s abilities. Almost all of the key documentation dimensions are discussed.


The safety and helpfulness of the Llama models was measured on an internal benchmark that evaluated violation rates (i.e. the model producing a response to an unsafe prompt) and false refusal rates (i.e. the model refusing to produce a response to a safe prompt). This evaluation was separated out by language and included categories for advanced capabilities like tool usage and long context question-answering. Llama-3 general performed similarly to competitors and maintained low violation and false refusal rates, showing a balance between helpfulness and safety.

TRUSTIBLE GUIDANCE

Evaluation of Limitations

This category reviews the types of quantitative evaluations that were reported regarding the limitations of this model. For general--purpose models, limitations are multi-faceted and can include both traditional modes (e.g. misclassification) and novel ones (e.g. generating bias content).

This rating considers the breadth of analyses considered. The following categories key categories should be considered by most LLM developers, but are not a comprehensive list:
- Bias/Fairness (e.g. using DiscrimEval, BBQA, DecodingTrust or a custom benchmark)
- Factuality/Hallucination
- Safety (e.g. likelihood of generating content that violates an acceptable-use policy or evaluation related to a cybersecurity threat)
- Incorrect Refusal Rates (used to quantify the balance of safety and helpfulness)

Note: For this rating, we review whether the developers considered common limitations and published quantitative results for these categories. The broader risk assessment and adversarial testing procedure is evaluated by the ‘Adversarial Testing Procedure’ category.

EU AI Act Requirements

Annex XI Section 2.1: Detailed description of ...the methodology on the identification of limitations.

Rating Guide
Unknown

No quantitative analysis of limitations is performed.


Low Transparency

Evaluations on 1-2 metrics are reported, but details of the analysis or an explanation for not including additional criteria are not documented.


Medium Transparency

Evaluation related to 2-3 types of limitations is reported; details surrounding choice of metrics, implementation process and down-stream implications are limited.


High Transparency

Evaluation on at least 3 types of limitations is reported. Details of the implementation process and an explanation of results is included. If a major category of limitations is not assessed, an explanation is given for the reasoning.


https://github.com/meta-llama/llama-models/blob/main/models/llama31/evaldetails.md

TRUSTIBLE GUIDANCE

Evaluation with Public Tools

Evaluations on benchmarks should be conducted using public tools. For many benchmarks, small changes in implementation can influence the metrics and result in figures that are not directly comparable to those published for other models.

EU AI Act Requirements

Annex XI Section 2.1: [Evaluation is conducted] on the basis of available public evaluation protocols and tools

A multi-disciplinary red-team measured risks across a 13-topic Safety taxonomy. A combination of manual and automated prompting techniques were used across single and multi-turn conversations for a comprehensive understanding of vulnerabilities. 

Additional adversarial testing measured chemical/biological weapon and cybersecurity risks. For chemical/biological weapon risks, an uplift study was conducted where individuals ability to generate fictitious plans for a chemical/biological attack was evaluated in two set-ups: access to the internet VS access to Llama and the internet. This exercise showed that Llama did not cause a significant uplift in ability to complete malicious tasks. For Cybersecurity, the CyberSecEval benchmark was complemented by a new benchmark for spear phishing and autonomous cyberattacks. The developers found that Llama 3 does not have significant susceptibilities in generating malicious code or exploiting vulnerabilities. In addition, "uplift testing" showed that Llama did not significantly aid either expert or novice individuals in a mock cyberattack task, compared to a cohort with access to the internet alone.

TRUSTIBLE GUIDANCE

Adversarial Testing Procedure

Adversarial Testing is the process of intentionally evaluating the risks associated with the model. For general-purpose AI, the focus is usually on the likelihood of models producing harmful outputs. The testing may involve a predetermined set of inputs that are likely to produce bad outputs, manual testing by experts (i.e. red-teaming) or model assisted approaches. For this transparency evaluation, we focus on the depth of documentation. A developer may not be able to assess all risks, but they should clearly document the limitations of the implemented adversarial testing process.

The following aspects of documentation should be considered for this evaluation:

  • Set of risks tested is documented and justified
  • Testing process (e.g. benchmarks used or types of human red-teamers employed)
  • Results from adversarial testing process are presented
  • Discussion on implications of the findings and/or on limitations of the process is included.

Note: There is a small overlap between this rating and “Evaluation of Limitations”. This rating focuses on the transparency of the process; while the other one evaluates the transparency of metrics. Quantitative results from a red-teaming exercise can contribute to increased ratings in both categories, but the rest of the considerations are different.

EU AI Act Requirements

Annex XI Section 2.2: Where applicable, a detailed description of the measures put in place for the purpose of conducting internal and/or external adversarial testing (e.g., red teaming).

Rating Guide
Unknown

No adversarial testing efforts are disclosed.


Low Transparency

The adversarial testing process is described in broad terms OR the absence of an adversarial testing process is acknowledged, but no justification is provided.


Medium Transparency

The adversarial testing process is described in some detail including the types of risks that were evaluated and the general approach for testing. However, some details are missing for documentation that make it difficult to ascertain the full extent of testing. A model with no adversarial testing can earn this rating, if the decision is clearly justified and implications for downstream users are documented.


High Transparency

A detailed description of the adversarial testing process is included and covers all four aspects outlined above. To achieve 'High Transparency' the documentation should allow an external party to assess risk across multiple dimensions.


Several types of mitigations were incorporated into the Llama models:


  1. Pre-training data underwent significant filtering to remove unsafe content.
  2. The developers trained the model in a manner that reduced memorization (i.e. the model outputting an exact piece of text from the training data) to avoid generating PII. An experiment showed that verbatim memorization was under 1%.
  3. During post-training , a dataset of human-created and synthetic examples was used to teach the model which types of inputs should be refused. This dataset was incorporated both into supervised fine-tuning and direct preference optimization.

The report details the effect of these safe-guards in terms of violations and false refusal rates (when a model incorrectly refuses a safe request) across languages and scenarios, and shows that the Llama models perform competitively and achieve a solid balance between helpfulness and safety.
In addition, the developers released Llama-Guard, an additional classifier model designed to detect unsafe prompts, and show how it significantly decreases unsafe text generations.

TRUSTIBLE GUIDANCE

Model Mitigations

Model mitigations are steps taken to reduce risks associated with a model. For example, a model can specifically be fine-tuned to recognize inappropriate inputs and refuse to respond. Understanding implemented adaptations is important for recognizing risks associated with the model.

The exact set of risks will depend on the type of model. Since risks will evolve over time, we consider if some set of mitigated and unmitigated risks was considered. We do not evaluate against a specific set of risks.

Because this is a transparency rating, we evaluate documentation for a description for clarity around both implemented mitigations and remaining risks. If adaptations were not implemented, developers should clearly disclose that and provide guidance to downstream users.

EU AI Act Requirements

Annex XI Section 2.2: Where applicable, a detailed description of ...model adaptations, including alignment and fine-tuning.

Rating Guide
Unknown

No mitigations are documented, and no justification is given.


Low Transparency

Implemented model mitigations are described in general terms. For example, the use of RLHF is mentioned, but no additional details are provided.


Medium Transparency

Specific model mitigations are documented but the effect of the adaptations is not measured. For risks that are not addressed by adaptations, some guidance is provided downstream users. A model with no adaptations can earn this rating, if the documentation clearly states that no adaptations were implemented and briefly makes the user aware of the implications.


High Transparency

Modal mitigations are documented in detail, and the effect of these adaptations is evaluated through examples and/or quantitative analysis. For risks that are not addressed by mitigations, detailed guidance is provided for downstream users. A model with no adaptations can earn this rating, if the documentation clearly states that no adaptations were implemented AND provides a detailed justification and guidance for downstream users.


Report Issue / Feedback