Misleading text in the physical world can hijack AI-enabled robots, cybersecurity study shows

As a self-driving car cruises down a street, it uses cameras and sensors to perceive its environment, taking in information on pedestrians, traffic lights, and street signs. Artificial intelligence (AI) then processes that visual information so the car can navigate safely.

But the same systems that allow a car to read and respond to the words on a street sign might expose that car to hijacking attacks from bad actors. Text placed on signs, posters, or other objects can be read by an AI's perception system and treated as instructions, potentially allowing attackers to influence an autonomous system's behavior through the real world.

New research led by UC Santa Cruz Professor of Computer Science and Engineering (CSE) Alvaro Cardenas and Assistant Professor of CSE Cihang Xie presents the first academic exploration of these threats, called environmental indirect prompt injection attacks, against embodied AI systems. The study shows that misleading text in an environment can hijack the decision-making of autonomous systems and outlines pathways for defending against these emerging threats.

The project will be presented at the 2026 IEEE Conference on Secure and Trustworthy Machine Learning (SaTML 2026), and is currently available on the arXiv preprint server.

"Every new technology brings new vulnerabilities," said Cardenas, a cybersecurity expert at the Baskin School of Engineering. "Our role as researchers is to anticipate how these systems can fail or be misused—and to design defenses before those weaknesses are exploited."

Secure LVLMs

"Embodied AI" refers to robots, cars, or other physical technology that is run by AI, and these systems are beginning to take up more space in our world as self-driving cars, robots that can deliver packages, and more. Embodied AI systems are increasingly being powered by large visual-language models (LVLMs), a type of AI algorithm that can process both visual input and text, and that help the robots deal with the unpredictable scenarios that pop up in the real world.

"I expect vision-language models to play a major role in future embodied AI systems," Cardenas said. "Robots designed to interact naturally with people will rely on them, and as these systems move into real-world deployment, security has to be a core consideration."

From an idea first proposed by graduate student Maciej Buszko in Cardenas' advanced security course (CSE 233), the team began to explore the threats to embodied AI from prompt injection attacks. These are well-known vulnerabilities in large language models, the kind of algorithms that run chatbots like ChatGPT.

By carefully crafting text inputs, attackers can override a model's intended instructions—causing chatbots and AI assistants to ignore safety rules, reveal sensitive information, or take unintended actions. Until now, such attacks have been understood as a purely digital phenomenon, limited to text entered directly into an AI system.

To explore how prompt injection attacks could threaten embodied AI, the researchers created a set of these attacks that could manipulate LVLMs for three applications: autonomous driving, drones performing an emergency landing, and drones carrying out a search mission.

They call their sets of attacks CHAI: command hijacking against embodied AI. CHAI was created by the two UC Santa Cruz professors, computer science and engineering Ph.D. students first-author Luis Burbano, Diego Ortiz, Siwei Yang, and Haoqin Tu, and Johns Hopkins professor Yinzhi Cao and his graduate student Qi Sun.

CHAI is built with two main steps for carrying out an attack. First, it uses generative AI to optimize the actual words that will be used in the attack, aiming to maximize the probability that the embodied AI robot will follow those instructions. Second, CHAI manipulates how the text appears, optimizing factors like its location in the environment and the color and size of the text.

The researchers trained CHAI with the ability to deliver attacks in English, Chinese, Spanish, and even Spanglish, a mix of English and Spanish words.

Their experiments showed that while the first stage of optimization is the most important, the second stage can also be the difference between a successful and unsuccessful attack—although it's not yet entirely clear why.

"A lot of things that happen in general with these large models in AI, and neural networks in particular, we don't understand," Cardenas said. "It's a black box that sometimes gives one answer, and sometimes it gives another answer, so we're trying to try to understand why this happens."

Deploying CHAI

The team deployed CHAI in their three application areas. They tested the drone scenarios with high-fidelity simulators, and the driving scenarios with real driving photos, as well as a small embodied AI robotic car driving autonomously in the halls of the Baskin Engineering 2 building. For each of these scenarios, they were able to successfully mislead the AI into making unsafe decisions, like landing in an inappropriate place or crashing into another vehicle.

They found that CHAI achieves up to 95.5% attack success rates for aerial object tracking, 81.8% success on driverless cars, and 68.1% success on drone landing. They tested their attacks against GPT4o, a recent public version of the ChatGPT-maker OpenAI's models, and Intern VL, an open source alternative to GPT that can be run on the device's built-in hardware rather than requiring cloud computing.

For the experiments on the AI robotic car, the researchers printed out images of the attacks created with CHAI, placing them into the environment with the car and successfully overriding its navigation. This proved that their attacks can work beyond simulators.

"We found that we can actually create an attack that works in the physical world, so it could be a real threat to embodied AI," Burbano said. "We need new defenses against these attacks."

These tests also confirmed that their attacks worked in varying lighting conditions, with further experiments planned to explore the success of the attacks under varying weather conditions, such as heavy rains.

The researchers also plan to run more experiments using video simulations and compare the impact of prompt-injection attacks with more traditional adversarial attacks, which use blurring or other visual noise to confuse the AI.

"We are trying to dig in a little deeper to see what are the pros and cons of these attacks, analyzing which ones are more effective in terms of taking control of the embodied AI, or in terms of being undetectable by humans," Cardenas said.

Additional future work from Cardenas' group will explore how to create defenses against these kinds of attacks, exploring opportunities for authentication of text-based instructions that embodied AI systems perceive, and ways to make sure instructions are aligned with the robot's mission and safety.







Comments