Blockchain in 19 Lines of Python

I walk through a simple Python implementation of blockchain technology, commonly used for public ledgers in cryptocurrencies.

A blockchain is growing linked list with hash pointers (Narayanan et al., 2016). A hash pointer is a pointer with a cryptographic hash of the data to which it points. If you are unfamiliar with hash functions, just think of a hash of the data as a short code representing the data. We cannot back out the original data from this code, but we can verify if new data is equivalent to the original data by hashing the new data and comparing this new code to the stored code.

Each node in the linked list is called a block, which stores some data and points to the previous block in the blockchain. The first block in the blockchain is called the genesis block, since it does not point to another block. The most recently added block is called the head (Figure 11).

Figure 1. Blockchain diagram. Each block contains data and a hash pointer, which is a pointer along with a cryptographic hash of the data to which it points. The genesis block is the first block and has a null pointer. The head is the most recently added block.

In Python, we can implement a simple blockchain for string data in just 19 lines of code:

from hashlib import sha256

hash_func = lambda x: sha256(x.encode('utf-8')).hexdigest()

class Block:

    def __init__(self, data, prev):
        self.data = data
        self.prev = prev
        if prev is not None:
            self.hashed_prev_data = hash_func(prev.data)

class Blockchain:

    def __init__(self):
        self.head = Block('genesis', None)

    def add_block(self, data):
        self.head = Block(data, self.head)

Given the head, we can traverse the blockchain by following the hash pointers backward. We can verify that the data stored in each block is unchanged by comparing a hash of its data to the hash stored in the hash pointer that points to the block. Here is a function to walk the chain and print the stored string data, adding an asterisk if the string has been changed:

def verify_chain(chain):
    curr = chain.head
    print(curr.data)
    while curr.prev is not None:
        if curr.hashed_prev_data != hash_func(curr.prev.data):
            print(f'*{curr.prev.data}')
        else:
            print(curr.prev.data)
        curr = curr.prev

We can use this simple blockchain implementation as follows:

>>> chain = Blockchain()
>>> chain.add_block('bar')
>>> chain.add_block('foo')
>>> chain.add_block('qux')
>>> verify_chain(chain)
qux
foo
bar
genesis

Now let’s tamper with the data and re-verify the chain:

>>> chain.head.prev.data = 'blah'
>>> verify_chain(chain)
qux
*blah
bar
genesis

We tampered with the data, changing “foo” to “blah”, and this was detected because of each block in the blockchain is linked through a cryptographic hash. This is the essence of blockchain technology.

Thus, we cannot tamper with a block in the blockchain without tampering with all the blocks that point to it. In our example, to successfully tamper with the blockchain, we would have to also modify the head block’s hash pointer, i.e.:

>>> chain.head.hashed_prev_data = hash_func('blah')
>>> verify_chain(chain)
qux
blah
bar
genesis

Now imagine Alice is adding blocks to her blockchain, while Bob is attempting to tamper with her blockchain by modifying the iith block. Provided they work at the same rate, Bob will never be able to successfully tamper with her blockchain since he cannot work fast enough to tamper with each block between the iith block and the head of the ever-growing chain. Any third-party auditor using Alice’s head block will always detect Bob’s tampering.

In some sense, the security of a blockchain is independent from cryptography. If space were no issue, then the hash pointer could just duplicate the data, i.e.:

class Block:

    def __init__(self, data, prev):
        self.data = data
        self.prev = prev
        if prev is not None:
            self.hashed_prev_data = prev.data

And we could verify the chain as above, only comparing raw data to raw data. So while a blockchain does provide a tamper-resistant collection of records, this security is from the structure of the linked list, not from cryptography per se. Instead, hash functions allow for efficient storage of fixed-size codes, which can be used later to verify that a block’s contents are unchanged without duplicating its data in the next block.

Blockchains are often used in cryptocurrencies because they can function as public ledgers. Each block stores a batch of financial transactions, and blocks are added to the blockchain through a consensus algorithm. This combination of a blockchain and a consensus algorithm for adding blocks to the chain allows for decentralized digital currencies. Cryptocurrency holders cannot double spend their digital money since transactions are recorded in a tamper-resistant public ledger.

  1. Narayanan, A., Bonneau, J., Felten, E., Miller, A., & Goldfeder, S. (2016). Bitcoin and cryptocurrency technologies: a comprehensive introduction. Princeton University Press.