This is how Ethereum works

Part one: theory before practice

Fra Gadaleta
9 min readFeb 16, 2018

The Ethereum blockchain is probably the most evolved yet complex blockchain system ever created. If you are not familiar with blockchain technology, and even less with Ethereum, feel free to read Ethereum how the internet will be in which I explain the capabilities of Ethereum and the benefits in the decentralised Internet of the future.

Despite the complexity of the protocol and the security mechanisms that have been designed so far (for some of them formal proofs have been provided too), a full Ethereum node is composed of three essential parts

  • the blockchain component
  • a peer-to-peer network and
  • the virtual machine

The blockchain component

A blockchain is nothing more than a series of blocks that are chained to each other in specific order and such that breaking any one block b will also break all its successors b+1, b+2,… b+n
In each block, a number of transactions is stored together with the hash of the previous block, and the proof-of-work of the current block. The proof-of-work is the result of an intensive computation that finds the first number (called nonce) that, together with the content of the block returns a certain hash. Such a hash usually starts with a number of 0s that increases according to a parameter called difficulty. The higher the difficulty the longer it takes to find a hash starting with a higher number of 0s.
As each block is connected to the previous block (the hash of the previous block is part of the content of the current block), it is very hard to break the chain. Breaking a chain means forging a special block at any position in the chain, such that all the next blocks stay unchanged. If, by absurd, such a block existed, it would be extremely hard to convince the rest of the network that such a chain is a legitimate one. Which brings us to the second component: the peer-to-peer network

The peer-to-peer network component

Every Ethereum node is connected to other nodes to form a network.
A full node broadcasts transactions and blocks to the network and receives other transactions and blocks from it. The full node is also responsible of synchronising the current state of the chain with the rest of the network. Except for other experimental or private networks, the official Ethereum networks that people usually refer to are MainNet and TestNet. The latter is used to perform experiments of new protocols or features that usually get adopted by MainNet whenever ready.

The Ethereum blockchain is way more complex than the old cousin Bitcoin due to the fact that on Ethereum it is possible to execute arbitrary code in the form of smart contract.
A smart contract is a type of account that contains code, compiled from a high level language like Solidity or Serpent, that blockchain developers usually write. As for any other compiled language, the smart contract code is hosted in the form of bytecode (or object code) on the blockchain. Wallets, accounts, transactions and bytecode: these are the types of data that can be stored on the Ethereum blockchain. Hosting code would be useless if it is not executed. The most well known mechanism to execute bytecode compiled from a high level language is called VM or Virtual Machine.

The EVM component

As in Java, Javascript or Python, also Ethereum has its own virtual machine, a stack based machine to push and pop instructions as in a regular computer with Intel/ARM/AMD CPUs. The purpose of the virtual machine is to execute smart contract code. Such mechanism allows transitions from one state to another, just like a real machine: given a certain block (in which a number of transactions are stored), and given a state s, performing the computation will bring the machine into a new state s'
The state transition mechanism consists of accessing transaction-related accounts, computing operations, and updating/writing the state of the virtual machine. Whatever is executed on the virtual machine will alter its state. After executing all the transactions of a block, the current state will be stored into what will become the next block.
In the process of block verification an important component plays a fundamental role to increase performance and scalability: Merkle trees and Merkle proofs.

Merkle tree structures

To begin with, a Merkle tree is a data structure (specifically a binary tree) in which each node contains the hash of the concatenation of its direct children. With such a structure it is possible to hash very large files/data while still ensuring verification of a branch of the tree independently from the rest. So if a data chunk has changed, the path (all the hashes of the parent chunks) to the root will change, while a very large portion of the tree will stay unchanged. Since the root of the tree will be altered after modifying any leaf, checking the root is sufficient to detect such changes.
In Ethereum, Merkle trees are used to save space in each block and allow so called light clients to detect and verify changes very efficiently. The alternative to such an approach would be extremely inefficient, forcing nodes to store and verify large data. Indeed, storing only the root of the Merkle tree that represents the hash of all the transactions in that block is sufficient for verification. It is not necessary to store all the transactions in the block.
In each Ethereum’s block, three Merkle roots are actually stored, not just one as in a Bitcoin’s block.

Courtesy of https://blog.ethereum.org/2015/11/15/merkling-in-ethereum/

The three Merkle roots in each block header are

  • a Merkle root of the Merkle tree of all transactions
  • a Merkle root for receipts (pieces of data that show the effect of executing each transaction — a bit like method postconditions in the programmers jargon) and
  • a Merkle root for the state of the virtual machine (EVM state)

This strategy increases performance and scalability because transactions, receipts and states do not need to be stored in each block. Only full nodes would do that. A mobile phone or an IoT device can keep its blockchain in sync just by downloading the block headers.
In order to compute a proof, the node creates a fake block on the local filesystem, sets the state s from the current block, and applies a transaction. The same node also pretends to be a light client, using only the Merkle roots in the block. All queries to satisfy requirements from the transaction are sent to the server, which responds by sending the requested data as a proof. The client then executes the same and checks that the local result matches what the server provided as a proof, finally accepting it (or rejecting it).

Mining Ethereum blocks

Mining consists in processing new transactions and creating new blocks. Transactions are indeed always packed into blocks. Therefore a miner performs four steps

  • collects transactions from the transaction MemPool
  • executes code on the EVM
  • creates Merkle proofs
  • runs ETHash proof-of-work algorithm

This happens regardless of the rewards that each miner gets from mining a certain block. The MemPool collects all transactions that are pending and ready to be processed. The code appended to each transaction is executed on the local EVM (Ethereum Virtual Machine).
The state of the virtual machine, the transactions and the receipts of their execution are all stored in Merkle trees, the roots of which are store into the block header. Finally the proof of work calculates the nonce that, together with the block content, generates a hash with difficulty indicated by the difficulty parameter (also stored in the block).

Scalability in Ethereum

One major issue of a blockchain (not just Ethereum) definitely is scalability. Every full-node executes each transaction and stores the entire state for security reasons and to maintain high degree of decentralisation. Of course, this strategy cannot cope with an exponential increase in the number of transactions generated by the system. This would be the case as more and more DApps (distributed applications) are created and executed on the same blockchain.
Several forms of parallelisation are currently being investigated and quite an active community is contributing to solve the scalability issue in Ethereum. As a matter of fact, to be comparable to traditional centralised systems, a blockchain should be scalable, secure and decentralised. It turns out that at most two of such properties are possible.

In any blockchain at most two properties are possible.

There are four different attempts to solve the scalability issue in Ethereum. Such methods do not exclude each other. It is wiser to think that they actually complement each other.
The following strategies are being explored by the community

  • state channels
  • interactive verification for scalable computation
  • plasma chains
  • sharding

Plasma chains and sharding seem to be the most promising ones. Hence the ones I provide some more details.

How to solve the scalability issue

Before getting into the details of how sharding, plasma or any other future proposal designed to increase scalability work, let’s first understand how is scalability affected by current blockchain architectures.
In all blockchain protocols each node stores all states and all transactions. By state we mean account balances, contract code, storage, etc. In addition, every node processes all transactions, by executing them on their local virtual machine and saving receipts and next state back to the blockchain.
While this ensures that every node can check pretty much everything at any time, it clearly reduces scalability. In fact, such a node would require a larger and larger filesystem to store blocks and more and more computing resources to verify all transactions. Even if such computer were extremely powerful, that would still represent the bottleneck of the entire system. Bitcoin and Ethereum can process 10 and 20 transactions per second respectively, mainly because they are both limited by the computational power of one node. Such a system is definitely very secure but not scalable at all.
The key to break such a hard limit is to process many transactions in parallel.

Sharding

The idea of sharding is as simple as splitting the blockchain state into K shards or partitions. For the sake of explaining how would this work in practice, let’s think of a blockchain that can host Ether wallets, Reputation wallets and Model wallets. The wallets are just the types of assets that can be stored to this sample blockchain. Clearly the number of transactions in such a blockchain would be the sum of the transactions of every single asset. If each shard is dedicated to each asset, shard-specific nodes can process shard-specific transactions in parallel, still maintaining strong security guarantees.
In the sharding proposal, in order to perform the aforementioned capability, special nodes called collators accept transactions on shard k and create collations. A collation has a collation header, that will be stored in a block on the main chain.
With such information it is possible to verify that all the transactions of a collation are valid, after they have been processed by nodes other than those on the main chain. This, in turn, increases the throughput of the entire blockchain and the number of transactions per second.
One possible scenario that would arise after the sharding proposal is actually implemented would be given by the existence of four different types of nodes

  • Archival client — processing all transactions in all collations and keeping the full state for all shards
  • Stateless Regular Client — processing all top-level blocks (containing the collation headers for each shard) without processing transactions in each collation
  • Single-shard Client — maintaining the state for some specific shard
  • Stateless Light Client — verifying the block headers of the top-level blocks only and querying Merkle branches for shards and collations if requested

Plasma chains

Plasma is another very promising proposal that introduces the concept of sidechain or child blockchain. It uses a series of smart contracts to create hierarchical trees of sidechains. Plasma can be compared to having a blockchain into a blockchain relaying information back to the main chain whenever required.
Creating the sidechain is a task performed by the smart contract in the main chain. Therefore such a sidechain will be controlled by the rules and protocols enforced from the main chain. Independent nodes, however are responsible of maintaining and controlling the sidechain, making it easier for the nodes on the main chain to process other transactions. Such an approach seems to work very well for micropayments among parties that report their balances to the main chain only after the series of payments have been accepted and validated.
From a scalability perspective, Plasma does not impose any constraint about the number of sidechains that can be created, leading to theoretically infinite scaling. The major benefits of plasma are faster and way cheaper transactions off-chain.
Additional technical details are provided in the official proposal published on plasma.io

In the next post I will dive deep into the three components of an Ethereum full-node, namely the blockchain component, the peer-to-peer network component and the EVM. I will do so by referring to a real Python implementation of an Ethereum full-node. Stay tuned!

--

--

Fra Gadaleta

🏢 Founder of Amethix 🌟 Building software wizardry and 🦀 Rust-powered wonders 🎧 Host of the mind-bending podcast https://datascienceathome.com