Trade-offs in the distributed ledger technology

Blockchain

The future of any technology is commonly associated with the one that could alter life as we know it. Blockchain is one such case that is the center of attention these days and many corporations have come forward to embrace this “disruptive” technology. However, when it comes to design, it’s a totally new realm of challenges, many of these are not well understood.

This article attempts to provide insight on one such challenge 'distributed system network failures' and trade-offs in each protocol design.

The concept of distributed system network failures and design choices first appeared in the publication of the computer scientist Eric Brewer. Published by the name ‘CAP Principle’ in the year 1999, the CAP theory states that it is impossible for a distributed data store to simultaneously provide more than two out of the following three guarantees (meaning any two will succeed at the expense of the third):

  1. Consistency: Every read receives the most recent write or an error.
  2. Availability: Every request receives a (non-error) response, without the guarantee that it contains the most recent write.
  3. Partition tolerance: The system continues to operate despite an arbitrary number of messages being dropped (or delayed) by the network between nodes 

In simple terms, No distributed system is safe from network failures. Therefore, the design goals of each blockchain protocol differ in regard to how the network should behave during network failure or partition.

In the presence of a partition, one is then left with two options: consistency or availability.

  • When choosing consistency over availability, the system will return an error or a time out if particular information cannot be guaranteed to be up to date due to network partitioning.
  • When choosing availability over consistency, the system will always process the query and try to return the most recent available version of the information, even if it cannot guarantee it is up to date due to network partitioning.
Important note: Under normal conditions (in the absence of network failure or partition), no trade-off has to be made. Both the availability and consistency of the distributed system are satisfied.

Let’s compare some blockchain protocols and their design choice.

In a blockchain system that uses the heaviest (or longest) chain rule such as PoW favors availability over consistency. Since the network is always at the risk of multiple forks that can result in blocks getting dropped (or orphaned), the system cannot guarantee the finality of the latest version of the blocks produced. Thus, even after a block is committed, users have to wait for a specific number of blocks to make sure the transaction is “finalized” and included in the longest chain. Bitcoin users usually wait for six block confirmation to be sure, which takes 60 minutes (each block takes ~10 minutes to produce).

On the other hand, blockchain systems that use Byzantine Fault Tolerant (BFT) consensus are able to achieve block consensus without developing forks during the process. Thus once a transaction and its blocks are committed, they cannot be reverted and so does not require waiting for further confirmation. However, if more than 1/3 of the nodes drop (or go offline), the system will stall (time out) and stop producing new blocks. 

Another limitation of BFT protocol is the overhead cost (communication delays) as the number of nodes increases. Therefore, in blockchain protocols using BFT (e.g. PBFT consensus), participation for node runners is not necessarily permissionless and requires careful evaluation of who should be accepted to join as node runner. Facebook‘s Libra blockchain is one such case. 

Given the above constrain, the design goals vary for each blockchain protocol depending on the developer’s preference. For example, nowadays popular concept – ‘BFT algorithms with proof of stake’ used by Tendermint (COSMOS) favors Consistency while its modern inspirations like ETH Casper and NEAR protocol favors Availability.

Flaw in protocols favoring ‘Consistency’

Design protocols that choose consistency assume that the participants will act non-maliciously or be online most of the time. Therefore such protocols are more suited to scenarios where the participants have some reputation (or trust) and so believe chances of an attack are low. E.g. Facebook Libra association or companies building a private blockchain for their own consumption. Though there are punishment rules for dishonest nodes, the rules are less strict for inactive (or offline) nodes.

In ETH 2.0, little to no assumption is made on the truthfulness of validators and has strict punishment rules, including for inactive nodes. NEAR Protocol is another example that favors Availability over Consistency. 

What end-users experience in each of these protocols during network failure: 

Imagine a bank with 2 ATM machines in a city, allowing users to access their bank account, perform withdrawals and deposits on both ATMs. In general, when the user performs any task on one ATM, the other ATM machine also gets updated. It is consistent with the account balances accessed in both ATM machines. Also, the machines are available, since they are working correctly and the customer can use any of them anytime.

But what happens if we have a messed up connection between the 2 ATMs?. In such a scenario, you either choose between having the customer perform some transactions on one ATM and the updating of the other ATM is done when it’s available or you don’t perform anything at all until the network partition problem is resolved.

In the case of decentralized applications (dApps) that run on protocols favoring Availability (like Ethereum 2.0, NEAR), during the network partition, users are able to use the application or carry-out in-app transactions, however, for finality (or settlement), may need to wait a bit longer. While for protocols that favor consistency, the network would stall and users will not be able to use the dApps until the majority of the nodes are back online. 

  • September 21, 2020