Blockchain in 19 Lines of Python

I walk through a simple Python implementation of blockchain technology, commonly used for public ledgers in cryptocurrencies.

Published

19 September 2021

A blockchain is growing linked list with hash pointers (Narayanan et al., 2016). A hash pointer is a pointer with a cryptographic hash of the data to which it points. If you are unfamiliar with hash functions, just think of a hash of the data as a short code representing the data. We cannot back out the original data from this code, but we can verify if new data is equivalent to the original data by hashing the new data and comparing this new code to the stored code.

Each node in the linked list is called a block, which stores some data and points to the previous block in the blockchain. The first block in the blockchain is called the genesis block, since it does not point to another block. The most recently added block is called the head (Figure $1$ ).

Figure 1. Blockchain diagram. Each block contains data and a hash pointer, which is a pointer along with a cryptographic hash of the data to which it points. The genesis block is the first block and has a null pointer. The head is the most recently added block.

In Python, we can implement a simple blockchain for string data in just 19 lines of code, including whitespace:

from hashlib import sha256

hash_fn    = lambda x: sha256(x.encode('utf-8')).hexdigest()
hash_block = lambda block: hash_fn(block.data + block.hashed_prev_data)

class Block:

    def __init__(self, data, prev):
        self.data = data
        self.prev = prev
        self.hashed_prev_data = '' if prev is None else hash_block(prev)

class Blockchain:

    def __init__(self):
        self.head = Block('genesis', None)

    def add_block(self, data):
        self.head = Block(data, self.head)

Given the head, we can traverse the blockchain by following the hash pointers backward. We can verify that the data stored in each block is unchanged by comparing a hash of its data to the hash stored in the hash pointer that points to the block. Here is a function to walk the chain and print the stored string data, adding an asterisk if the string has been changed:

def verify(chain):
    curr = chain.head
    print(curr.data)
    while curr.prev is not None:
        if curr.hashed_prev_data != hash_node(curr.prev):
            print(f'*{curr.prev.data}')
        else:
            print(curr.prev.data)
        curr = curr.prev

We can use this simple blockchain implementation as follows:

>>> chain = Blockchain()
>>> for i in range(5):
>>>     chain.add_block(str(i))
>>> verify(chain)
4
3
2
1
0
genesis

Now let’s tamper with the data and try to verify the chain:

>>> chain.head.prev.prev.prev.prev.data = "a"
>>> verify(chain)
4
3
2
1
*a
genesis

We tampered with the data, changing “0” to “a”, but this was detected because the hash pointer on block “1” did not match the hash of the tampered block.

But what if we re-hash the block after tampering with the data?

>>> target_node = chain.head.prev.prev.prev.prev
>>> target_node.data = "a"
>>> chain.head.prev.prev.prev.hashed_prev_data = hash_block(target_node)

And this will actually work for the current block, but then the next block will not be verified! This is because block “2” has a hash pointer which will now no longer match the tampered node:

>>> verify(chain)
4
3
2
*1
a
genesis

Thus, we cannot tamper with a block in the blockchain without tampering with all the blocks that point to it. This is the essence of blockchain technology.

Now imagine Alice is adding blocks to her blockchain, while Bob is attempting to tamper with her blockchain by modifying the $i$ th block. Provided they work at the same rate, Bob will never be able to successfully tamper with her blockchain since he cannot work fast enough to tamper with each block between the $i$ th block and the head of the ever-growing chain. Any third-party auditor using Alice’s head block will always detect Bob’s tampering. This is why blockchain technology is often combined with a consensus algorithm for adding blocks. It forces Bob to have more compute than the network.

Blockchains are often used in cryptocurrencies because they can function as public ledgers. Each block stores a batch of financial transactions, and blocks are added to the blockchain through a consensus algorithm. This combination of a blockchain and a consensus algorithm for adding blocks to the chain allows for decentralized digital currencies. Cryptocurrency holders cannot double spend their digital money since transactions are recorded in a tamper-resistant public ledger.

Acknowledgements

Thanks to Emilio Cantu-Cervini for pointing out a bug in an earlier implementation.

Narayanan, A., Bonneau, J., Felten, E., Miller, A., & Goldfeder, S. (2016). Bitcoin and cryptocurrency technologies: a comprehensive introduction. Princeton University Press.