Rainier Wu

May 15, 2025

Enumerating All 69,788,231 Ethereum Contracts

A look into how we were able to retrieve every single contract ever deployed on Ethereum

While building the EVM trackooor↗, we had the idea of scanning for uninitialized contracts on Ethereum to find potentially vulnerable contracts.

Our idea was to call variations of init() on all contracts, but we first had to tackle another challenge — how do we retrieve every single contract ever deployed on Ethereum?

It turns out this problem isn’t trivial at all. As far as we know, there isn’t an existing public resource or RPC endpoint that enumerates all deployed contracts, meaning we had to tackle this problem ourselves, taking multiple approaches to solve the problem.

Motivation — Why Do This?

Apart from being able to attempt initialization of all contracts, it is useful to have a dataset of contract addresses and bytecodes of every single deployed contract on Ethereum.

By having this dataset, we can

easily query for all contracts containing specific bytecodes,
perform static analysis on all deployed contracts to uncover vulnerabilities,
discover interesting trends and statistics through graphing contract data over time,

and more.

At Zellic, we believe in open-source software and data to benefit the Web3 community, which is why we are publicly releasing both the contracts dataset↗, and the source code↗ used to generate the dataset.

Approaching the Problem

How do we get all contracts deployed on Ethereum? And why is this a nontrivial problem?

Let’s go over three methods to approach the problem.

The Naive Method

First of all, let’s consider the most naive approach — go through every single transaction looking for deployment transactions. If a transaction is a deployment transaction, we note down the contract and bytecode. We can do this through eth_getBlockByNumber, starting from block number 1 and looping through transactions in each block. If a transaction doesn’t have a to address, it is a deployment transaction, and we can use another RPC call eth_getTransactionReceipt to get the deployed contract address.

Diagram of the naive approach to get all deployed contracts

However, there is a big problem — this wouldn’t work for contracts deployed as part of internal transactions↗.

Let’s look at an example. The UniswapV3Factory↗ is a contract designed to deploy Uniswap V3 pools.

A transaction that deploys a pool using the factory looks something like this:

Notice that this isn’t a deployment transaction, as the to address is not null; this is a normal contract call, and yet it still deploys a contract! In fact, many contracts can be deployed through internal transactions, using just a single transaction.

The naive approach doesn’t handle these scenarios, which are common across the blockchain. This is what makes getting all deployed contracts not as easy as it seems.

More technically speaking, we need to catch every instance of the CREATE or CREATE2 opcodes executing and deploying a contract.

Previous Method — Smart Contract Fiesta

Two years ago, we approached this problem by performing an Ethereum node full sync from the genesis block using a modified Geth↗ instance.

A tracer was added to catch every instance of the CREATE and CREATE2 opcodes executing, and contract addresses and block numbers of every contract deployed would be recorded as the full node synced.

This project was dubbed Smart Contract Fiesta, and you can read more about it here↗.

New Method — Deployment Scan

Now we’re back to the part where we were building the EVM trackooor↗ and attempting initialization of all Ethereum contracts after seeing the DeltaPrime exploit↗.

Of course, this meant we had to first get a list of all contracts, which prompted us to approach the problem again. We called this project Deployment Scan.

However, this time, we took a different approach. Instead of indexing contracts as a full node synced, we decided to try using RPC calls to an already synced full node. There were several APIs that seemed promising for our purpose, such as Debug API and Trace API.

We first considered debug API as it features debug_traceTransaction↗, which returns information on all internal calls for a given transaction.

The structure it returns looks something like this:

type TraceResult struct {
	From    common.Address `json:"from"`
	To      common.Address `json:"to"`
	Gas     uint64         `json:"gas"`
	GasUsed uint64         `json:"gasUsed"`
	Input   []byte         `json:"input"`
	Output  []byte         `json:"output"`
	Value   *big.Int       `json:"value"`
	Type    string         `json:"type"`
	Calls   []TraceResult  `json:"calls"`
}

Notice how Calls contains a list of the same type.

To get contracts deployed in a given transaction, we would call debug_traceTransaction on it and check the Type of the call. However, if the Type is a CALL or DELEGATECALL, Calls would have information on those calls, so we would have to iterate recursively through those calls as well.

Essentially, the recursive function to extract deployed contracts from a trace looked something like this:

func recursivelyGetDeployedContracts(txTraceResult shared.TraceResult) {
	switch txTraceResult.Type {
	case "CALL", "DELEGATECALL":
		// go through each internal call's trace result (if any) and recurse
		for _, tr := range txTraceResult.Calls {
			recursivelyGetDeployedContracts(tr)
		}
	case "CREATE", "CREATE2":
		contract := txTraceResult.To
		bytecode := txTraceResult.Output
		recordContractBytecode(contract, bytecode)
	default:
	}
}

However, in practice, this turned out to be considerably slow. Even with a local full node, it seemed like it would take several months to finish. It made sense — this method calls debug_traceTransaction on every single transaction, which is a lot of RPC calls considering there are billions of transactions.

Luckily, there are methods of getting traces for entire blocks, not just individual transactions.

Both Debug API and Trace API actually provide debug_traceBlockByNumber↗ and trace_block↗ respectively, which return traces for an entire block — perfect for our purpose. Instead of retrieving a block, looping through its transactions, and tracing each transaction, we could perform just one RPC call for the entire block.

We decided to go with Trace API’s trace_block as the data structure it returns also negates the need for recursive functions. However, we would fall back to Debug API’s debug_traceTransaction if Trace API fails after a certain number of attempts.

And now, with an Erigon↗ full node running on the same server as our code, and running trace_block for every block starting from genesis, we successfully retrieved all deployed contracts and their bytecodes in around five days.

It wasn’t all smooth sailing, though. It took countless attempts and several optimizations, and it still broke multiple times.

Another issue was how to store the several million contracts and bytecodes. We briefly tried Redis, but after it repeatedly crashed, we switched to PostgreSQL, which fortunately worked.

The data set of all contracts and bytecodes can be found here↗. All data is up to date as of block 21850000 (February 15th, 2025).

Statistics

Alongside the contract address and bytecode, we also recorded the block number and time stamp of deployment for each contract, allowing us to generate graphs that show the growth of Ethereum contracts over the last 10 years.

As of block 21850000 (February 15th, 2025), there have been 69,788,231 contracts deployed on Ethereum.

We also generated a graph on the number of contracts deployed daily, which allowed us to compare our data to Etherscan’s Ethereum Daily Deployed Contracts Chart↗, which matches up almost perfectly, supporting the validity of our data.

Another graph we generated was the number of unique bytecodes deployed over time. You can see that as of 2025, there are almost 70 million contracts deployed, but only around 2.5 million unique bytecodes, showing that many contracts deployed on Ethereum have identical bytecodes.

We also generated graphs of the following:

More Interesting Statistics

After graphing some of the contract deployments’ data, we noticed some interesting trends, such as a huge spike in unique contract deployments and a discrepancy between our previous data set, which prompted us to investigate.

Remnants of 2016 State-Bloat Attack

Interestingly, there was a huge spike in the amount of unique bytecodes deployed daily↗ at around 4th of October, 2016 — 16,000 unique contracts deployed in one day!

As you can see, this heavily skewed the scale of the graph.

For reference, this is almost double the next highest amount, which was just a few months ago in 2024.

We decided to investigate and found this address↗ deploying a huge amount of contracts.

The contracts it deployed would be very similar but with slightly differing bytecodes. Each contract would be sent 1 Wei upon deployment.

The contract would have no functions except a fallback, self-destructing upon receiving any call from this other address↗. When it self-destructed, the 1 Wei would be forcefully sent↗ to a contract address seemingly resembling a random string, such as rMdWeRyXiNkAaAaStLbU or 8t3g3t4q6u1a1a9a9a9a, which were presumably unowned addresses.

Some of these contracts have already been self-destructed. For example, check out this transaction↗, which self-destructed 500 of those contracts.

This was probably a state-bloat↗ attack, a type of denial-of-service attack that’s aimed to crash or harm performance of the Ethereum network by flooding the blockchain state with data.

In the third quarter of 2016, SELFDESTRUCT was abused to repeatedly send Ether to empty accounts, forcing tons of accounts to be included in the blockchain state data. This was possible at that time due to SELFDESTRUCT having a gas cost of 0 gas units, and the behavior of SELFDESTRUCT allowed a single contract to self-destruct multiple times in a single transaction, force-sending Ether to multiple empty addresses that now need to be included in the state.

The deployer was probably attempting something similar to this — the time frame matches up, as these contracts were deployed on October 16th, 2016, two days before EIP-150↗, a proposal that increased gas costs of many opcodes including SELFDESTRUCT, to prevent this type of attack in the future.

Graphing Self-Destructed Contracts

After successfully retrieving all contracts using our new method, we wanted to validate our data by comparing it to the previous Smart Contract Fiesta database↗.

A notable difference was the amount of contracts in Smart Contract Fiesta vs Deployment Scan. Smart Contract Fiesta recorded 30,586,657 contracts as of March 2023, but if you look at the Deployment Scan contract graphs above, you would see the cumulative contracts deployed was around 55 million.

So why weren’t 25 million of these contracts recorded in Smart Contract Fiesta?

We initially thought something might’ve went wrong, but then we realized a difference in the contracts recorded by Smart Contract Fiesta and Deployment Scan: Smart Contract Fiesta would remove contracts that self-destructed from its dataset, whereas Deployment Scan didn’t do anything about self-destructs.

At that time, it seemed unreasonable that 25 million contracts self-destructed, but we decided to modify Deployment Scan to record self-destructs to verify this anyways.

However, when we reran Deployment Scan, we kept getting out-of-memory (OOM) errors, at around block 2420000, October 2016 — the same time period as the state-bloat attack!

After some debugging, we realized and fixed the issue. Previously, for each contract deployment and self-destruct, we would run a function to asynchronously insert the data into the PSQL database. However, amidst the 2016 state-bloat attack, there were tens of thousands of self-destructs per block, essentially creating hundreds of thousands of threads on our server, which made it OOM.

It turns out that recording contracts to the PSQL database had been a significant bottleneck, even before we recorded self-destructs. A quick code change to batch-record data for each block fixed the issue and actually made Deployment Scan faster, taking only three days to finish.

And now, we can finally graph self-destructed contracts.

There is indeed a surprising amount of contracts that self-destructed, supporting that data for both Smart Contract Fiesta and Deployment Scan are valid.

You may be wondering, why does there seem to be an insignificant amount of self-destructs in 2016 when the state-bloat attack occured?

The reason is that the state-bloat attack self-destructed the same contract multiple times, whereas we only recorded the block number that each contract self-destructed in. So no matter how many times the same contract self-destructed, it would only be recorded as a singular self-destruct on the graph.

Who Deployed the Most Contracts?

Alongside self-destructs, we also recorded the EOA deployer and contract deployer (if applicable) for each contract, to see which addresses deployed the most contracts.

For normal deployment transactions where to is null, only the EOA deployer would be recorded, whereas for contracts deployed by other contracts via internal transactions, we would record both the EOA, which initiated the transaction, and the contract that deployed the contracts.

This way, we can filter between EOAs deploying contracts and contracts deploying other contracts.

Out of the EOAs, address 0xFfff46…↗ deployed the most contracts — almost 2.9 million!

However, this includes contracts deployed by other contracts through transactions initiated by the EOA.

What about only contracts deployed directly by EOAs through deployment transactions?

We found Poloniex: Deposit Funder↗ directly deploying 401,555 contracts. It has a total of 401,923 transactions, meaning almost all of its transactions were deployment transactions.

But EOAs in general deploy a small amount of contracts as opposed to contracts deployed internally.

So which contract deployed the most contracts?

It turns out to be the 1inch: CHI token↗, deploying a whopping 10 million contracts.

That’s a huge portion of contracts deployed internally!

The reason that the CHI token deploys this many contracts is to leverage the mechanics of gas refunds↗. Basically, if a transaction causes contracts to self-destruct or nullifies storage, up to 50% of the gas fees for that transaction can be refunded. The idea behind the CHI token is for a user to mint these tokens when gas is cheap, which deploys contracts. Then, when gas is expensive and the user wants to perform a transaction, they would burn the CHI tokens alongside their transaction, self-destructing the deployed contracts and reducing the transaction’s gas fees through the gas-refund mechanism.

We can also see that a lot of these contracts begin with leading zeros to save gas↗.

Initializing All Contracts

With an organized contracts’ database, we could finally iterate through our list of contracts and try to initialize them. We called this project Initscan↗.

Our process was as follows:

Iterate through all contracts by retrieving a certain amount of contracts from the PSQL database each time.
For each contract, use the RPC call trace_call↗ to call variations of initialization function signatures, such as init(), init(address), init(address,uint256), and so on.
Each trace_call would return whether the call succeeded, and if it did, state changes that the call would’ve caused. In the state changes, we would look for either our from address or an address we supplied in the calldata such as init(0xabc...). We did this as init functions typically set state variables such as admin or governor, so a call that both succeeds and sets a state variable to one of our addresses is a strong indicator that the initialization worked.
If a call succeeded AND there was a state change that contained one of our addresses, we would perform another trace_call with unrelated/random calldata to further confirm we initialized something. This was to make sure we didn’t just hit a fallback function that also caused a state change containing one of our addresses.
Finally, if all of those checks pass, we would log the contract if it contained any native Ether or ERC-20 funds, to filter for contracts with value.

Findings

Initscan takes even longer than Deployment Scan to finish, as we are doing trace_call multiple times for every single contract, pretty much a billion RPC calls.

So, what did we find?

There were a lot of contracts with USDT, which would’ve been exploitable if USDT adhered to the ERC-20 standards and returned a bool on transfer. It doesn’t, causing the USDT to be locked forever. The contract funds were relatively small though, having around $20 USD.

Other contracts with tens to hundreds of dollars had no method to withdraw the ERC-20 tokens, which were mostly USDT and USDC.

We did find some exploitable contracts with ~$10 in funds. As expected, they had initialization functions that set the owner address, and those functions were never called, allowing anyone to simply become an owner. Owners can then withdraw funds including native Ether and ERC-20 tokens.

We also found an uninitialized contract with ~$5,000 USD of ETH in funds, deployed by Bounce Finance↗. The contract is a proxy that delegates to an implementation contract with two initialization functions, one of which was never called.

Although currently the funds are not directly at risk, if this was found ~6 years ago when the contract was still in use, it could’ve resulted in denial of service and stealing of user funds.

We reported this to Bounce Finance, and this was their response:

This contract was deployed 4 years ago and has already been initialized.

During the simulation of the transaction initialization, it indeed shows as “success,” but the actual on-chain transaction fails.

Conclusion

Starting from the idea of wanting to scan Ethereum for uninitialized contracts, we tackled a separate challenge of enumerating every single contract on Ethereum and discovered some interesting statistics!

Although we didn’t have many critical- or high-value findings from initializing contracts, we generated a huge data set of all contracts deployed on Ethereum, which is extremely useful for other things such as scanning for bytecode patterns and static analysis.

We are curious as how the wider Web3 community will use this data as we open-source our database, code, and methodology.

Also, this project↗ and blog post was completed as part my internship at Zellic, where I was able to apply my CTF experience to real-world security challenges.

If you’re a student with CTF experience looking for an opportunity to work on cool blockchain security projects like this one, we encourage you to apply to join our team!

About Us

Zellic specializes in securing emerging technologies. Our security researchers have uncovered vulnerabilities in the most valuable targets, from Fortune 500s to DeFi giants.

Developers, founders, and investors trust our security assessments to ship quickly, confidently, and without critical vulnerabilities. With our background in real-world offensive security research, we find what others miss.

Enumerating All 69,788,231 Ethereum Contracts

Motivation — Why Do This?