GPT-4 Autonomously Exploits Zero-Day Security Flaws with a 53% Success Rate

GPT-4 Autonomously Exploits Zero-Day Security Flaws with a 53% Success Rate

Researchers have achieved over a 50% success rate in hacking their test websites using autonomous teams of GPT-4 bots. These bots coordinate their actions and can spawn new ones as needed, all while exploiting previously undiscovered real-world 'zero-day' vulnerabilities.
Credit: Pixabay

Researchers have achieved over a 50% success rate in hacking their test websites using autonomous teams of GPT-4 bots. These bots coordinate their actions and can spawn new ones as needed, all while exploiting previously undiscovered real-world ‘zero-day’ vulnerabilities.

A few months ago, a research team published a paper detailing their use of GPT-4 to autonomously exploit one-day (or N-day) vulnerabilities – flaws that are known but lack a fix. When provided with the Common Vulnerabilities and Exposures (CVE) list, GPT-4 could independently exploit 87% of critical-severity CVEs.

Successful Hacking of Zero-Day Vulnerabilities by Autonomous LLM Agents

Fast forward to this week, and the same researchers have released a follow-up paper. They have successfully hacked zero-day vulnerabilities – flaws that are not yet known – using a team of autonomous, self-replicating Large Language Model (LLM) agents employing a Hierarchical Planning with Task-Specific Agents (HPTSA) approach.

Instead of assigning one LLM agent to tackle numerous intricate tasks, HPTSA employs a “planning agent” that oversees the entire process and deploys multiple task-specific “subagents.”

This structure resembles a hierarchy, with the planning agent coordinating efforts through the managing agent, which then assigns tasks to each “expert subagent.” This approach alleviates the burden on a single agent and ensures efficient task allocation.

This technique mirrors the methodology utilized by Cognition Labs with its Devin AI software development team. It involves planning out the project, identifying necessary skill sets, and overseeing the project’s execution while generating specialized “employees” as required to handle specific tasks.

When tested against 15 real-world web-focused vulnerabilities, HPTSA demonstrated a 550% increase in efficiency compared to a single LLM in exploiting vulnerabilities. It successfully exploited 8 out of 15 zero-day vulnerabilities, while the single LLM managed to exploit only 3 of the 15 vulnerabilities.

Ethical Concerns Surrounding the Potential Misuse of Advanced AI Models

However, concerns arise regarding the ethical implications of these models. There is legitimate worry that users may exploit these capabilities to launch malicious attacks on websites and networks.

Daniel Kang, one of the researchers and the author of the white paper, specifically highlighted that GPT-4, when operating in chatbot mode, lacks understanding of LLM capabilities and is incapable of independent hacking.

When questioned if it could exploit zero-day vulnerabilities, ChatGPT responded, “No, I am not capable of exploiting zero-day vulnerabilities. My purpose is to provide information and assistance within ethical and legal boundaries.” It advised consulting a cybersecurity professional for such matters.


Read the original article on: New Atlas

Read more: Caltech’s Leading Role in Ultrasound Brain–Machine Interface Advancement

Share this post

Leave a Reply