Snaping of state trees | Ethereum Foundation Blog

One of the important problems that has been raised during the Olympic version of the stress network is the large amount of data that customers must store; Over a little more than three months of operation, and in particular during the last month, the amount of data in the blockchain folder of each Ethereum client has increased to 10 to 40 gigabytes, depending on the customer you use and whether compression is activated or not. Although it is important to note that it is indeed a stress test scenario where users are encouraged to empty transactions on blockchain paying only the free test as a transaction costs, and the levels of transaction flow are therefore several times higher than bitcoin, it is nevertheless a legitimate concern for users, which, in many cases, do not have hundreds of gigabytes for hundreds of gigabytes Compection yourself on the other transaction stories of others.

First of all, let’s start by exploring why the current Ethereum client database is so large. Ethereum, unlike Bitcoin, has the property that each block contains something called the “state root”: the rooting of a Specialized Merkle tree type Who stores the entire system state: all account balances, contract storage, contract code and non-accounts are inside.

The purpose of this is simple: it allows a node since the last block, with a certain assurance that the last block is in fact the most recent block, to “synchronize” with the blockchain extremely quickly without treating any historical transaction, simply downloading the rest of the tree from the nodes of the network (the proposed Hashlookup Wire protocol message Will facilitate this), verifying that the tree is correct by checking that all the hashs correspond, then proceeding from there. In a fully decentralized context, this will probably be done through an advanced version of the Bitcoin check-on strategy, which will be more or less:

Download as many block headers as the customer can get their hands.
Determine the header at the end of the longest chain. From this header, come back 100 blocks for safety and call the block to this position P¹⁰⁰(H) (“The grandparent of the hundredth generation of the head”)
Download the status of the root of the state of p¹⁰⁰(H), using the Hashlookup Opcode (note that after the first laps, this can be parallelized between as many peers as desired). Check that all parts of the tree correspond.
Proceed normally from there.

For light customers, the state root is even more advantageous: they can immediately determine the balance and the exact condition of any account by simply asking the network A particular branch of the tree, without needing to follow the 1 of the bitcoin in several stages “” Ask all the transaction outputs, then ask all the transactions Expert these outings and take the model “

However, this state tree mechanism has an important drawback if it is implemented naively: the intermediate nodes of the tree considerably increase the quantity of disk space necessary to store all the data. To see why, consider this diagram here:

Change in the tree during each individual block is quite small, and the magic of the tree as a data structure is that most data can simply be referenced twice without being copied. However, even for each change in the state that is made, a large nodes (i.e. ~ 5 to 1000 knots, ~ 10 to 1000,000 knots, ~ 15 to 10,000,000 nodes) must be stored twice, a version for the old tree and a version for the new sorting. Finally, as a node deals with each block, we can therefore expect the total use of the disk space to be, in terms of computer science, roughly O (n * log (n))Or n is the transaction load. In practical terms, Ethereum blockchain is only 1.3 gigabytes, but the size of the database, including all these additional nodes, is 10 to 40 gigabytes.

So what can we do? A rear appearance correction is simply to go ahead and implement the synchronization of headers, essentially resetting the hard drive consumption of new users to zero, and allowing users to maintain their low hard disk consumption by renewing every one or two months, but it is a somewhat ugly solution. The alternative approach is to implement pruning of state trees: Essentially, use Reference count To follow the moment when the nodes of the tree structure (here by using “node” in the term of computer science meaning “data pieces which are somewhere in a graph or a tree structure”, and not “computer on the network”) deposit X Blocks (for example. X = 5000), after this number of blocks passes, the node must be permanently deleted from the database. Essentially, we store the tree nodes which are part of the current state, and we even store recent history, but we do not store the history of more than 5000 blocks.

X should be set as little as possible to keep the space, but the setting X Too low compromise robustness: once this technique is implemented, a node cannot come back more than X blocks without essentially completely restarting synchronization. Now, let’s see how this approach can be fully implemented, taking into account all angle cases:

When processing a block with number NKeep a trace of all nodes (in the state, trees and reception trees) whose number of references falls to zero. Place these nodes in a database “ROW of death” in a sort of data structure so that the list can be recalled later by block number (in particular, the block number N + X), and mark the input of the node database itself as being worthy of deletion in the block N + X.
If a node which is on the flow of death is reinstalled (a practical example of this is an account of the acquisition of a balance / nonce / code / special storage Fthen go to a different value gThen account b acquire the state F while the knot for F is in the death corridor), then increase your number of references to one. If this knot is again deleted to a house pâté M (with M> n), then put back on the corridor of the death of the future block to be deleted to the block M + X.
When you arrive at the treatment block N + XRemember the list of the hash you have saved during the block N. Check the node associated with each hash; If the knot is always marked for the deletion During this specific block (that is, not reintegrated, and above all not reinstated, then restarted for the deletion later), delete it. Also delete the list of hasrs in the death database.
Sometimes the new head of a chain will not be above the previous head and you will have to return a block. For these cases, you will have to keep in the database a journal of all the modifications of the reference counts (it is “newspaper” as in Journalization file systems; Essentially an ordered list of changes made); When you return a block, delete the list of death lines generated during the production of this block and cancel the changes made according to the newspaper (and delete the newspaper when you are finished).
When processing a block, remove the newspaper in the block N – X; You are not able to come back more than X The blocks anyway, so the newspaper is superfluous (and, if it was preserved, will in fact overcome all the interest of pruning).

Once this is done, the database should only store the state nodes associated with the last X Blocs, you will therefore always have all the information you need in these blocks but nothing more. In addition to that, there are other optimizations. In particular, after X The blocks, the transaction and reception trees must be completely deleted, and even the blocks can undoubtedly be deleted – although there is an important argument to keep an “archive knots” subset which absolutely store everything in order to help the rest of the network to acquire the data it needs.

Now, how many savings can it give us? It turns out that a lot! In particular, if we were to take the ultimate road to the daredevil and go X = 0 (ie absolutely losing any capacity to manage even the single block forks, not storing any history), then the database size would essentially be state size: a value which, even now (these data have been entered in block 670000) stand at around 40 megaocytes-the majority of which are composed of which is made up of around 40 mega-mega-mega-the majority is composed of which is made up of around 40 Megoocytes – The majority Accounts like this With storage locations filled to deliberately spam the network. HAS X = 100,000We would mainly obtain the current size of 10 to 40 gigabytes, because most of the growth occurred in the last one hundred thousand blocks, and the additional space required to store journals and lists of death layers would make up the rest of the difference. With each value between the two, we can expect the growth of the disk space to be linear (that is to say. X = 10000 would take us about eighty percent from the path to near zero).

Note that we may want to continue a hybrid strategy: keep each block But not all state -shaped knot; In this case, we would need to add approximately 1.4 gigabytes to store block data. It is important to note that the cause of the blockchain size is not quick block times; Currently, the block headers of the last three months represent approximately 300 mega-cells, and the rest is transactions of the last month, so at high levels of use, we can expect to continue to see the transactions dominate. That said, light customers will also have to cut block headers if they want to survive in low memory circumstances.

The strategy described above has been implemented in a very early alpha form in pyeth; It will be implemented properly in all customers in good time after the launch of Frontier, because these storage bloating are only a medium -term concern and not in the short term.

Source link

Categories

Hand picked

Ethereum breaks above the price areas made key – what it means for eTh

Ethereum seems ready to surpass bitcoin in the coming months – the incoming Altsaison?

Ethereum went to a crucial meeting at $ 4,000 – here is why

Super Vietnam 2025: Where Blockchain, AI, and Innovation Converge in Southeast Asia’s Rising Tech Powerhouse

Istanbul Blockchain Week 2025 Is Back: The Future of Web3 Unfolds in Turkey’s Innovation Hub

Wyoming strengthens the first stablecoin supported by the State with live surveillance of Inca Digital – but is it sufficient?

United Kingdom to apply compulsory crypto trade relations from January 2026

Hong Kong Police Bust 15 M Crypto Money Whitering Ring, Arrest 12

Categories

Hand picked

We are social

Snaping of state trees | Ethereum Foundation Blog

Related Posts