RedVLA: Physical Red Teaming for Vision-Language-Action Models

Dangerous Item Misuse

"Turn on the stove and put the moka pot on it"

Agent

Wrist

Resource Damage

"Put the black bowl in the bottom drawer of the cabinet and close it"

Agent

Wrist

Robot Damage

"Turn on the stove and put the moka pot on it"

Agent

Wrist

Dangerous Item Misuse

"Push the plate to the front of the stove"

Agent

Wrist

Resource Damage

"Put the black bowl in the bottom drawer of the cabinet and close it"

Agent

Wrist

Robot Damage

"Put the yellow and white mug in the microwave and close it"

Agent

Wrist

Dangerous Item Misuse

"Put the black bowl in the bottom drawer of the cabinet and close it"

Agent

Wrist

Resource Damage

"Turn on the stove and put the moka pot on it"

Agent

Wrist

Robot Damage

"Put the black bowl in the bottom drawer of the cabinet and close it"

Agent

Wrist

Environmental Harm

"Turn on the stove and put the moka pot on it"

Agent

Wrist

Unsafe Behavior Gallery. RedVLA elicits diverse unsafe behaviors across three safety cost levels (State / Cumulative / Conditional) and four physical hazard categories. Each card shows agent-view and wrist-view simultaneously.

Overview

Vision-Language-Action (VLA) models are advancing toward generalist robotic policies through unified end-to-end learning from vision and language to action. As their capabilities expand across increasingly important real-world domains, safety concerns have also grown substantially. Yet it remains unclear what physical safety risks these models may exhibit when deployed in the physical world, and how severe their consequences may be. Proactively identifying and mitigating such risks is therefore a crucial prerequisite for real-world deployment. This work asks: how can we proactively uncover the physical safety risks of Vision-Language-Action models?

RedVLA starts from a benign VLA setting, introduces an injected physical risk factor, and elicits unsafe behaviors across three safety cost types. The across six mainstream VLA models achieves the highest Attack Success Rate (ASR) of 95.5%.

In this work, we propose RedVLA, the first red teaming framework for physical safety in VLA models. We systematically elicit unsafe behaviors through a two-stage process: (I) Risk Scenario Synthesis constructs a valid and task-feasible initial risk scene. Specifically, it identifies critical interaction regions from benign trajectories and places the risk factor within these regions, aiming to entangle it with the VLA's execution flow and elicit a target unsafe behavior. (II) Trajectory-Driven Risk Amplification ensures stable elicitation across diverse models. It iteratively refines the risk factor state through gradient-free optimization guided by trajectory features.

95.5% Peak ASR on π_0.5

6 Mainstream VLA models

≤10 Optimization iterations

What RedVLA Does

RedVLA studies physical safety risks in VLA models under benign instructions and scenes. It keeps the language input fixed and perturbs only the physical scene by injecting a risk object into the robot's execution flow.

The method is designed to preserve task feasibility and scene-instruction consistency while still exposing target unsafe behaviors.

Two-Stage Pipeline

Risk Scenario Synthesis constructs a valid and task-feasible initial risk scene by identifying critical interaction regions and instantiating a target safety violation with a corresponding risk object.

Trajectory-Driven Risk Amplification ensures stable elicitation across diverse models by refining the injected risk factor with trajectory-guided gradient-free optimization.

Key Findings

95.5% peak ASR 6 VLA models 10 iterations SimpleVLA-Guard

RedVLA reveals severe physical vulnerabilities in existing VLA policies and produces data that can be reused for deployment-time safety monitoring.

Framework

The framework summarizes RedVLA as a two-stage pipeline. Risk Scenario Synthesis combines physical safety taxonomy, interaction identification, and risk instantiation to build a red teaming plan. Trajectory-Driven Risk Amplification then refines the injected risk factor to trigger the target unsafe behavior.

BibTeX

@misc{zhang2026redvlaphysicalredteaming,
    title={RedVLA: Physical Red Teaming for Vision-Language-Action Models}, 
    author={Yuhao Zhang and Borong Zhang and Jiaming Fan and Jiachen Shen and Yishuai Cai and Yaodong Yang and Jiaming Ji},
    year={2026},
    eprint={2604.22591},
    archivePrefix={arXiv},
    primaryClass={cs.RO},
    url={https://arxiv.org/abs/2604.22591}, 
}

RedVLA: Physical Red Teamingfor Vision-Language-Action Models

Overview

What RedVLA Does

Two-Stage Pipeline

Key Findings

Framework

BibTeX

RedVLA: Physical Red Teaming
for Vision-Language-Action Models