The answer is No.
The thing is–and speaking as the author of one of the Tweets linked above–that these examples are cherry-picked. Even OpenAI’s latest model, GPT-4, is unable to reliably detect surface-level critical bugs.
Let’s begin with a simple example contract that has a deadly vulnerability lurking within it.
Can GPT find a simple input validation bug?
Now, let’s try to use ChatGPT to audit this smart contract. It is very small, so it fits into GPT-4’s enormous context window.
We’ll use this prompt:
Helpful as always, ChatGPT responds with “Ready”. Now let’s just paste in the contract source code. (Note: when I did this experiment I had some extra comments in the code, in theory, the “audit” results ought to be the same.)
Here’s ChatGPT’s response:
Not only does ChatGPT misses the critical bug, it does so confidently. Ironically, it even points out missing input validation, though the bug it specifically claims to have found doesn’t really matter.
Okay. Maybe we just got unlucky. Let's try running it again.
Helpful as always. Now let's just paste in the code one function at a time.
This goes on for a while, asking for more code each time. On a different run, it analyzed the functions one at a time, basically summarizing the function and pointing out some facts about the function.
This was the final output:
So once again, it missed the critical, but relatively surface-level bug. This should be sufficient to show that ChatGPT is certainly not up to the task of auditing smart contracts, especially for mission-critical, financial code.
With the advent of GPT-4, ChatGPT is an excellent assistant for many tasks. It is proficient at writing code snippets, answering general knowledge questions, and providing helpful advice and commentary. That being said, ChatGPT is prone to generate responses that “feel” right (i.e., pass a vibe check), but aren’t really correct. Worryingly, despite OpenAI’s best efforts it is still often confidently wrong.
We conducted a short series experiments with a relatively surface-level, but critical, smart contract vulnerability in a small contract. ChatGPT failed to identify the vulnerability in all trials. This experiment was certainly limited by a low sample size (ChatGPT currently only allows 50 generations per 3 hours). However, for mission-critical code like smart contracts, a false negative rate of 10 in 10 tries is unacceptable.
ChatGPT and LLM-based AI technology in general will likely become a useful aid in a security researcher’s arsenal of tools. However, it is unlikely for LLMs, in their current state, to make the job redundant. It seems that highly-skilled, specialized jobs, like smart contract auditing, are likely to remain in demand for at least the forseeable near-term future.
Zellic is a smart contract auditing firm founded by hackers, for hackers. Our security researchers have uncovered vulnerabilities in the most valuable targets, from Fortune 500s to DeFi giants. Whether you’re developing or deploying smart contracts, Zellic’s experienced team can prevent you from being hacked.
Contact us for an audit that’s better than the rest. Real audits, not rubber stamps.
If you think you’d be a good fit to work at Zellic, we’re hiring!