One of the problems with Ethereum, or any blockchain, is that its size increases over time. This means an increase in its code complexity and storage requirements.
A blockchain must maintain all data throughout its history which must be stored by all clients and uploaded by new clients. This results in a constant increase in client load and synchronization time.
Additionally, code complexity increases over time because it is “easier to add a new feature than to remove an old one,” Vitalik Buterin wrote on his blog.
Therefore, Buterin believes that developers must actively work to stem these growing trends while preserving the permanence of Ethereum. So Buterin presented The Purge, a three-part plan to simplify blockchain and reduce its data load.
Part 1: History Expiration
A fully synchronized Ethereum node currently requires approximately 1.1 TB of storage space for the execution client. This requires a few hundred extra gigabytes for the consensus client. According to Buterin, most of this data is historical, such as historical block data, transactions and receipts, many of which are several years old. To store all this history, the disk space required continues to increase by hundreds of gigabytes each year.
Buterin thinks the problem can be fixed by something called history expiration.
Each block in a blockchain points to the previous one via a hash link. This means that consensus on the current block indicates consensus on history.
According to Buterin, as long as the network has consensus on the current block, all associated historical data can be provided by a single actor via a Merkle proof, which allows anyone to verify its integrity. This means that instead of each node storing all the data, each node could store a small percentage of the data, reducing storage requirements.
Buterin essentially suggests adopting the operational model of torrent networks, where each participant stores and distributes only a small portion of the data stored and distributed by the network.
Ethereum has already taken steps to reduce storage requirements: some information now has an expiration date. For example, consensus blocks are stored for six months and blobs for 18 days.
EIP-4444 is another step in this direction: it aims to limit the storage period of historical blocks and receipts to one year. The long-term goal, however, is to have a fixed period, say 18 days, during which each node must store everything, and then the old data is stored in a distributed manner across a peer-to-peer network.
Part 2: State Expiration
According to Buterin, removing the need for customers to store entire history does not completely solve the problem of excessive storage requirements. Indeed, a customer must increase its storage capacity by approximately 50 GB each year due to the “continuous growth of the State: account balances and nonces, contract code and contract storage”.
A new state object can be created in three ways: by creating a new account, by sending ETH to a new account, and by setting a previously inactive storage location. Once a state object is created, it remains in that state forever.
Buterin believes that the solution to automatically expire state objects over time must be efficient, user-friendly, and developer-friendly. This means that the solution should not require large amounts of calculations, users should not lose access to their tokens if they leave them untouched for years, and developers will not be too hindered in the process .
Buterin suggests two types of “least bad known solutions”:
- Partial State Expiration Solutions
- State expiration proposals based on address period.
Partial state expiration
Proposals for partial expiration of the state operate on the basis of the principle of dividing the state into “pieces”. This would require everyone to store the “higher level map” whose pieces are empty or non-empty forever. Data in songs is only stored if it has been recently accessed. The “resurrection” mechanism allows anyone to report data in bulk if it is not stored by providing proof of its nature.
State expiration based on address period
State expiration based on address period proposes having a growing list of state trees instead of just one storing the entire state. Any state read or written is updated in the most recent state tree. A new empty state tree is added once per period, which can last for a year.
In this scenario, the old state trees are frozen and full nodes must store only the last two trees. If a state object is part of an expired tree, it can be read or written, but the transaction would require a Merkle proof for this. After the transaction, it will be added to the last tree.
Feature cleanup
Over time, all protocols become complex, no matter how simple they start out.
Buterin wrote:
“If we don’t want Ethereum to enter a black hole of ever-increasing complexity, we need to do one of two things: (i) stop making changes and ossify the protocol(ii) be able to actually withdraw features and reduce complexity.”
According to Buterin, cleaning up the complexity of Ethereum requires several small fixes, like removing the SELFDESTRUCT opcode, removing old transaction types and beacon chain committees, reforming LOG, etc. Buterin also suggested simplifying gas mechanics, removing gas observability, and improving static analysis.