Blockchain DIY (1/2)
This two-post blog series, also available on our Medium blog, will approach the notion of blockchain by first principles. By following along, you will end up with building your very own, simple blockchain from the ground up. The contents of this first post looks as follows:
Section 1) Motivation
Section 2) Blockchain-related terminology and core concepts
Section 3) Basic blockchain implementation and web app
The second and final post will be dealing with extending the basic blockchain implementation of this first post by some more advanced functionality such as mining awards based on our own native cryptocurrency, the “BeCoin”, and transaction processing across a multi-node setup.
To get the most out of these blog posts, you should have:
intermediate Python knowledge, including basic knowledge of object-oriented software engineering concepts
basic familarity with web application frameworks such as Flask
basic familarity with HTTP clients such as Postman
Readers, who are already familiar with core blockchain concepts and are primarily interested in their implementation, might want to skip the next two sections and jump right to the implementation section.
For all others, let’s start with some motivation and background.
Although the ideas which are at the core of blockchains had been around since the early 1990s , blockchain only really started to get the tech community’s attention with the publication of the Bitcoin White Paper  in response to the 2007–2008 financial crisis. At its heart, the paper targetted the centralized nature of the traditional financial system in the belief that a strictly decentralized, trustless value-transfer system available almost free-of-charge would not as easily fall prey to the shady dealings, which caused the 2007–2008 financial crisis.
The Ethereum White  and Yellow Paper , respectively, extended the single-application purpose expressed in the Bitcoin White Paper by adding general-purpose program execution capabilities and, with it, the idea of smart contracts to the blockchain.
Ever since these landmark publications and subsequent implementations of the Bitcoin and Ethereum blockchains, the blockchain notion has undergone its fair share of winter time: First time round around the 2018 mark, only to start moving up the hype cycle again as of 2020/21, hitting an all-time high with the late 2021/early 2022 NFT boom. At the time of writing in July 2022, it is firmly back in bear market territory as far as the valuations of (the blockchain-based) cryptocurrencies are concerned.
The worldwide developer community at the heart of blockchain development and progress, however, has remained strong and prolific throughout, not being overly concerned about any public misgivings and wildly fluctuating cryptocurrency valuations.
Talking of public perception, blockchain has experienced a good deal of controversy and criticism. There is the large energy consumption of proof-of-work blockchains. There are seemingly endless, often idealistic blockchain community debates on the best way forward, which are inscrutable to the general public and are slowing down badly needed technological progress. There is a steady stream of bad press due to crypto-related scams, Ponzi schemes, ransomware attacks and the like.
A lot of this criticism can be attributed to misconceptions and a simple lack of understanding of the blockchain fundamentals. Also, despite wildly swinging cryptocurrency valuations, hype cycles, bad press and the like, there seems to be this nagging feeling by a large number of observers that there is something truly innovative, and potentially disruptive to the underlying collection of blockchain-related technologies, which deserves to be understood more thoroughly and from the ground up.
And what better way is there than to start from first principles and to implement one’s very own blockchain? So, that’s exactly what we are going to do, i.e. blockchain DIY. Let’s go through some blockchain-related terminology and core concepts next.
Blockchain-related terminology and core concepts
A blockchain can be defined as a cryptographically connected chain of data blocks holding transaction data in a tamper-resident and tamper-evident, distributed, decentralized, trustless fashion. I.e., it is designed from the ground up to be a trust network replacing the need for “trusted” central authorities or intermediaries by technology with trust built-in by design. It is not designed, and thus not meant to be, a great database.
A smart contract is code (“functions”) and data (“state”) deployed using cryptographically signed transactions on a smart contract-supporting blockchain. It is executed by nodes within the blockchain network when certain conditions are met and the results are recorded on the blockchain provided all nodes agree on these results. Note that a blockchain does not necessarily support smart contract functionality. A case in point is the native Bitcoin blockchain protocol.
The concepts enabling the decentralized trustlessness at the heart of blockchains are, in particular, cryptographic hash functions (“hash cryptography”), distributed, peer-to-peer networking with (block) mining and consensus protocols.
A hash function maps an input of arbitrary size to an output of fixed length, the so-called hash value, or simply hash. A cryptographic hash function additionally has the following properties:
It is deterministic and non-reversible, i.e. the (fixed size) hash value follows deterministically from the (arbitrarily sized) input data, but not the other way round,
fast to compute,
reasonably collision-resistant, i.e. two different inputs to the algorithm do not easily map to the same hash value, and
features something called the Avalanche effect, i.e., a small change to the algorithm’s input results in substantial changes to the hash value thereby “diffusing” any statistically tractable relationship between input and hash value.
So, each block in a blockchain holds:
a hash value referring to the previous block in the chain, the so-called “previous hash” constituting the cryptographic connection,
a block creation, or “mining”, timestamp,
a cryptographic nonce (“number once”),
the block’s own hash value hashing the block’s transaction data and the
transaction data itself.
There is an exception to this: The so-called “genesis block” represents the first block in the chain, which, by its very nature, does not hold any hash value referring to a previous block. Instead, the genesis block’s previous hash value is typically set to “0”.
For any other block, the previous hash represents the value resulting from hashing all of the previous block’s header data, as illustrated above. By contrast, a block’s own hash value represents the value resulting from hashing the block’s transaction data, i.e. the “block data”, only. Please note that this block data hash value is not to be confused with, and is not equal to, the next block’s “previous hash” value.
So, what’s the point of all of this? Well, using hash values to connect the blocks in a chain makes the resulting blockchain effectively tamper-evident and thus tamper-resistant and immutable to a significant degree. This is because any modification of the block header data, i.e. any modification of the block’s previous hash value, its timestamp, its nonce and the block data hash value changes the next block’s previous hash value, which is computed on the basis of all of these block header values, thereby both making any tampering with the block header data evident and effectively breaking the chain.
Amongst the many cryptographic hash functions to choose from, we will be using the NIST-standardized SHA-256 algorithm . It maps input of any size to a 64 alphanumeric characters long, hexadecimal hash value. This can be seen in action using, for example, the open source blockchain demo by Anders Brownworth, also used to good effect here.
The SHA-256 algorithm has been used by Bitcoin since its inception, and, contrary to popular perception, can, in fact, be computed relatively easily using pen and paper.
⚡️ This is already all what is really needed for building a blockchain in the original sense of Haber and Stornetta . The second use of this hashing algorithm is beyond the cryptographic connectivity purpose described above. It is due to the need for the injection of trustlessness amongst parties that do not trust each other as popularized in the context of the crypto-related blockchain application by Satoshi Nakamoto  making so-called consensus algorithms necessary. ⚡️
In the case of Bitcoin, this comes in the form of a cryptographic “proof-of-work” puzzle requiring the generated hash value to meet additional conditions such as a certain minimum number of leading zeros, i.e., effectively a “trial and error” search for the “right” hash value. At the time of writing, the number of leading zeros, and thus the so-called “(target) difficulty level”, of the Bitcoin block mined last amounted to 19 thereby limiting the legitimate hash value range drastically and making this a truly challenging proof-of-work:
However, with a block’s timestamp, transaction data and previous hash value predetermined, this leaves the question as to what input to the SHA-256 algorithm to change to obtain different hash values, one of which might ultimately satisfy the proof-of-work difficulty level. That’s where the “nonce” value comes in, the only block value which isn’t fixed from the outset: It can be changed at will until a legitimate hash value output meeting the difficulty level such as 19 leading zeros has been generated.
⚡️ This is also how the blockchain network-wide brute-force, extremely energy-demanding compute competition for the generation of “right” nonces, or so-called “Golden Nonces”, and thus legitimate hash values, comes about - it is not due to the complexity of this second hash computation itself. Note, though, that a legitimate hash value might be hard to detect, but that it is easy to verify against the target difficulty level. ⚡
This proof-of-work is at the heart of the consensus algorithms used by blockchain protocols such as Bitcoin or Ethereum, with Ethereum, of course, being in the transition to another, less energy-consuming and more scalable “consensus layer”, proof-of-stake.
Ethereum roadmap by the Ethereum Foundation, “The great renaming: what happened to Eth2?”, Jan. 2022
But, why specifically do we need to bother with this second hash computation at all? Trustlessness as practiced by blockchains such as Bitcoin, Ethereum and the like requires consensus algorithms to address the Byzantine Fault Tolerance problem . More specifically, firstly, the question as to how to decide on the next block to be added to the blockchain featuring distributed miners competing for crypto rewards is to be answered. Secondly, the consensus protocol is also needed to answer the question as to how to deal with temporary times of inconsistency across the blockchain network simply due to, for example, network latency and different hash rates by the miners.
Consensus algorithms such as proof-of-work help address this problem by determining the fastest miner. That is, in answer to the first question the idea is to add the block generated first and rewarding its miner with native tokens and a transaction fee in the process. The second question is answered by the “rule of the longest chain”, i.e., the version of the blockchain effectively exhibiting the fastest hash rate, will always be the eventually agreed-upon blockchain state of things. For a brief summary of this and alternative consensus protocols such as proof-of-stake see, for example, this blog post by Amy Castor .
Since we are concerned with first principles, we will impose a low-level target difficulty for the purposes of our very own blockchain implementation and we will restrict ourselves to a straightforward SHA-256-based hash value computation. Similarly, the standard blockchain validity checks will be kept to a minimum and consensus will be fairly straightforward.
⚡️ The implementation is intended to be educational in nature and it is not suitable for production scenarios in any way whatsoever. Rather, its objective is to inspire you to dig deeper. For production-quality implementations, you may want to have a closer look at, for example, the open source Bitcoin Core project. ⚡️
A basic blockchain implementation and web app
Let’s start with some requirements. You may want to consider installing the Python libraries within a virtual environment dedicated to this blockchain pet project.
Postman HTTP client v9.24.0: Available for download here.
Flask 2.1.3 library: pip install Flask==2.1.3
requests library: pip install requests==2.28.1
The Flask library provides the necessary web server functionality. The requests library makes the handling of HTTP requests to interact with our blockchain relatively straightforward. The additionally required standard Python modules consist of:
hashlib - Includes the SHA-256 algorithm
datetime - To create a block’s mining timestamp
json - HTTP interaction with our blockchain will be in JSON format
We will implement a basic blockchain class first. The web application for HTTP interactions with an instance of this class will then be shown next. The implementations are inspired by the Blockchain material produced by the SuperDataScience team.
Apart from the standard constructor method, __init__, the class Blockchain contains four other methods. The “create_block” method does what its name suggests - it creates a new block in the form of a Python dictionary and appends this new block to the blockchain. The dictionary holds the new block’s (incrementally increasing) index in the chain, its creation, or mining, timestamp, its proof-of-work in the form of the “nonce” element and the hash of the previous block, the “previous_hash”, in the chain, if any.
⚡️ As you might have noticed, the block does not hold any transaction data. This doesn’t make the block overly meaningful, but it doesn’t diminish the educational value of this exercise, which is of our primary concern at this point. We will add transaction data in the second post. ⚡️
This method is also used by the constructor method to initiate the chain with the genesis block and its conventional values of 1 for its nonce value and “0” for its previous_hash value, respectively. The blockchain, “chain”, itself is represented as a list of blocks starting with the genesis block.
The “get_previous_block” method is equally straightforward and simply returns the preceding block in the list.
Things get marginally more complicated when it comes to the proof-of-work implementation. The “proof_of_work” method sets a low-level target difficulty of four leading zeros to keep things simple and efficient. The hash value candidates are computed by applying the SHA-256 implementation of the hashlib module to the difference between the square of the previous block’s proof-of-work, “previous_nonce”, and the square of the block’s candidate nonce value, “new_nonce”. This computation is a deliberately simple choice for educational purposes. The candidate nonce is incremented and this computation is repeated until a hash value meeting the target difficulty level is obtained. The candidate nonce associated with this hash value, and the nonce only, is then returned, since it is all that is needed to check the validity of the block.
⚡️ Please note that we do not carry around the legitimate hash value within a newly mined block, but rather the “previous_hash” value only, which, as described above, represents the hashed contents of the previous block (making our chain tamper-evident/-proof) rather than any hash value solving our cryptographic proof-of-work puzzle. This also implies, of course, that the previous_hash value is not required, and is indeed unlikely, to exhibit the target difficulty level property of, in this case, four leading zeros. Remember in this context that the fact that the blocks in our chain meet the required target difficulty is checked with the help of the is_chain_valid method, for which the block’s nonce represents the only proof-of-work element needed.⚡
This process is also the basis for our simple blockchain validity check in the form of the “is_chain_valid” method. Starting with the genesis block, the method first checks whether or not the block’s previous_hash value has the correct hash value. If so, the method proceeds with checking the block’s nonce for meeting the target difficulty level. These steps are repeated for the entire chain. If any of the two checks is failed by any particular block in the chain, the process stops and False is returned; otherwise, the blockchain is considered valid and the boolean value True is returned.
This completes the basic blockchain design, which by and itself is not overly exciting, since we can’t really interact with the blockchain just yet.
To be able to interact with an instance of our Blockchain class, we now move on to implementing a Flask web server responding to a number of HTTP GET requests.
⚡️ Please note that for Flask performance reasons, you might want to set the configuration parameter JSONIFY_PRETTYPRINT_REGULAR to False. ⚡️
The actual web application part starts with instantiating our blockchain class. This is followed by the definition of the “mine_block” endpoint, which makes use of the object’s “proof_of_work” and “create_block” method, respectively, to solve our cryptographic puzzle, to obtain the corresponding nonce and to then create a new block with the relevant parameters. The successful completion of this process is communicated to the HTTP client via a corresponding JSON payload.
The “get_chain” endpoint simply jsonifies the blockchain list and its length and returns this payload to the HTTP client.
The “is_valid” endpoint calls the “is_chain_valid” method, jsonifies its return value and returns this payload to the HTTP client.
To finish things off, the Flask web server is launched on port 5000, so that things become way more interesting now:
For example, using the Postman HTTP client, calling the “get_chain” endpoint should produce something similar to the output shown below, i.e., your very first genesis block.
With the help of a few calls of the “mine_block” endpoint, your chain should grow as illustrated below.
You may also want to check whether or not your chain of blocks can still be considered valid, i.e. meets both the target difficulty level and holds the correct previous hash values.
And there you are! Given the various, somewhat theoretical blockchain-related concepts, which we had to work our way through at the start of this blog post, the actual, albeit rather basic, implementation should feel both surprisingly straightforward and rewarding.
However, this is only the beginning. In the next and final blog post, we will extend this single-node to a multi-node blockchain setup holding transaction data and featuring its very own crypto coin, the “BeCoin”.
 Stuart Haber and W. Scott Stornetta, “How to Time-Stamp a Digital Document”, in: A. J. Menezes and S. A. Vanstone (eds.), “Advances in Cryptology — CRYPTO ‘90”, LNCS 537, pp. 437–455, Springer-Verlag, Berlin, 1991
 Satoshi Nakamoto, “Bitcoin: A Peer-to-Peer Electronic Cash System”, Oct. 2008
 Vitalik Buterin, “Ethereum: A Next-Generation Smart Contract and Decentralized Application Platform”, 2014
 Gavin Wood, “Ethereum: A Secure Decentralised Generalised Transaction Ledger”, Berlin version, 2022
 Wouter Penard and Tim van Werkhoven, “On the Secure Hash Algorithm family”, in: Gerard Tel, “Cryptography in Context”, pp. 1–18, Feb. 2008
 Leslie Lamport, Robert Shostak, and Marshal Pease, “The Byzantine Generals Problem”, ACM Trans. on Programming Languages and Systems, Vol. 4, No. 3, pp. 382–401, July 1982
 Amy Castor, “A (Short) Guide to Blockchain Consensus Protocols”, March 2017, updated Sept. 2021
Newest job offers
Solution Design Project Manager
Senior Python Developer
(Senior) Data Engineer (m/f/d)
Quality Assurance Engineer
Enterprise Data Architect (m/f/d)
(Senior) International Program Manager (f/m/d)
IT Security Manager (m/f/d)
Create Your Own Career
On our career website "Create Your Own Career" you can discover the wide range of entry and career opportunities at Bertelsmann and be inspired by our employee stories!Find more interesting jobs