Bitcoin From Scratch

Trevor McGuire
34 min readFeb 8, 2022

--

Satoshi’s message in Bitcoin’s Genesis Block: “The Times 03/Jan/2009 Chancellor on brink of second bailout for banks”

A while ago I decided I really wanted to understand how Bitcoin works. So, being a firm believer in the notion that the best way to understand something is by doing it, I decided to code it from scratch.

The goal of this post is to provide a solid understanding of the core functionality of Bitcoin. Using Python, we’ll take a bottom-up approach — starting with some basic math and working our way up to creating (and broadcasting) a Bitcoin transaction. After that, we’ll take a look at blocks and the blockchain.

Here is the link to the Jupyter Notebook version of this post: https://github.com/trevormcguire/blogposts/blob/main/BitcoinFromScratch.ipynb

Also, if you’re interested, here is the github library I wrote associated with all of this: https://github.com/trevormcguire/pybitcoin

Hope you enjoy!

First Things First

Before we get into Bitcoin, it is important to understand a few things:

Finite Fields

A finite field is a set that contains a finite amount of numbers. We can denote this with Fp where “p” (called prime) is the size of the set. For our purposes, what this means is that in a finite field, any number greater than “p” doesn’t exist.

Fp = {0, 1, 2, … p-1}

For Reference: https://en.wikipedia.org/wiki/Finite_field

Modular Arithmetic

In finite field math, any operation must return a number that is also in the set (this property is called closed). If the result of a math operation is larger than the prime (p), the result needs to “wrap around” back to the first element within the set. A good way of visualizing this is to picture a clock. Once the minute-hand passes 11:59, we start back at the beginning (12:00).

As a result, we’ll need to redefine math a bit. To do this we’ll use the modulo (mod) operator, which returns the remainder after division. In python, the mod operator is %.

For the most part, modular arithmetic is the same as normal arithmetic, with an extra mod operator step. For example, take a look at finite field addition:

Let p = 4, x = 3, y = 2(x + y) % p
(3 + 2) % 4
5 % 4 = 1

Similarly, take a look at subtraction:

(3–2) % 4 = 1

Modular Division

As you can see, modular arithmetic is pretty straightforward. However, there is one exception: division.

To handle division, we will need to utilize something called Fermat’s Little Theorem.

In short, this theorem states that for any prime (p), a number raised to the power of p-1 is 1.

Practically, what this states is that in a finite field with p = 10, any number raised to 9 is equal to 1.

Okay, so why do we care about all this? Well, since division is the inverse of multiplication a/b = a*(1/b) = a*(b^-1), we can turn this into a multiplication problem and utilize Fermat’s Little Theorem. By doing this, we get the below:

The beautiful part of Fermat’s Little Theorem (at least for our purposes) is the right side of the equation (the “1”).

Since we know b^-1 = b^-1 * 1, we can replace the 1 with the left side of the theorem. By doing this, we ultimately get a/b = a * b^(p-2)

Let’s write function to handle modular division for us:

Elliptic Curves

Elliptic Curves have the form y² = x³ + ax + b

Because is on the left side of the equation, elliptic curves are symmetric over the x axis. An example can be seen below:

Elliptic Curve

In Bitcoin, we are interested in specific points along an elliptic curve.

The parameters of the elliptic curve used in Bitcoin are specified in secp256k1 (https://en.bitcoin.it/wiki/Secp256k1). These parameters are written in the code below.

By the way, if you aren’t familiar with the hexadecimal (base-16) numbering system, I got you — read this

Because a is 0 and b is 7, we can shorten Bitcoin’s elliptic curve equation to the following: y² = x³ + 7.

Elliptic Curve Cryptography (ECC)

So here’s why all this matters: Bitcoin uses an elliptic curve that exists within a finite field. More specifically, Bitcoin exists in a finite field with prime P, and on a curve y² = x³ + ax + b, where a = 0 and b = 7. Note how P, a, and b are all constants in secp256k1 (that we just defined).

In Bitcoin, elliptic curves are used for key generation and digital signatures. For signatures, something called the ECDSA (Elliptic Curve Digital Signature Algorithm) is used, which we will dig into shortly.

Elliptic Curve Basics

Since Bitcoin utilizes an elliptic curve, we will want to create an object representing this. Moreover, since we are interested in specific points along this curve, we should also create a class to represent the (x, y) coordinates of a point.

In Bitcoin, we’ll need to add and multiply points together, so we will also define those operators as well.

Point addition can be thought of as drawing a line between two points, and then extending that line until it intersects the curve once again. The result is finally flipped over the x-axis to obtain the sum.

The process for Point Addition is the below:

        Case1 -> self.x == p.x
slope = (3*x1^2 + a) / 2y1
x3 = s^2 - 2x1
y3 = s(x1 - x3) - y1
Case2 -> self.x != p.x
slope = (y2 - y1) / (x2 - x1)
x3 = s^2 - x1 - x2
y3 = s(x1 - x3) - y1
Case3 -> self == INF #identity fn
return p
Case4 -> p == INF #identity fn
return self
#additive inverse -a + a = 0
Case5 -> self.x == p.x and self.y != p.y
return INF
Case6 -> self.x == p.x and self.y == 0 #line is tangent
return INF

Point Multiplication is just multiple iterations of Point Addition (3G = G + G + G). We implement the Double and Add algorithm below for efficiency (since we will be working with huge integers).

Lets further dissect the above.

Using the parameters in secp256k1, we define E, which is the elliptic curve on which Bitcoin operates.

We then define something called the Generator Point (G). This is the first point after infinity (INF) on Bitcoin’s curve. Infinity can be thought of as the point in which the finite field wraps around back to the beginning. It basically acts like zero.

If we begin with INF and add G, we get G. If we add G to G, we get 2G. Add another, and we get 3G, and so on.

If you repeat this process N times (N being the order of G, and a also parameter specified in secp256k1), we again reach INF. So N represents how many points there are along Bitcoin’s elliptic curve. As you will see, this concept is important for public key generation (note that in Bitcoin, public keys are just points along its curve).

Hash Functions

Before we get into coding up the Bitcoin protocol, we need to briefly touch on hash functions. If you have heard of bitcoin, I’m sure you’ve also heard of the word “hash” — and rightly so, as hash functions are used everywhere in Bitcoin.

There are many hash functions out there, but in Bitcoin the ones you need to know are hash160 and hash256. These are both “double hash functions”, because they both perform two rounds of hashing.

Both of these functions utilize the sha256 hashing algorithm. In fact, hash256 is just two rounds of sha256. Hash160, however, is sha256 followed by another called ripemd160.

What these functions are actually doing is mixing up bits in a way such that it is next to impossible to unscramble them. This is possible because sha256 outputs values in a uniform distribution — meaning, the values are essentially random. This is an important feature of sha256. In fact, the probability of sha256 producing the same hash twice is 10^-60, according to this source. Add in the fact that Bitcoin uses two rounds of it (again, called hash256), and you can see why Bitcoin is so secure.

On top of this, hash functions output fixed-size bytes, which is advantageous from a standardization standpoint.

Don’t worry — to keep things as straightforward as possible, we won’t implement these from scratch. The Python Standard Library comes with an awesome library that we will utilize called hashlib. However, if you wanted to dig into the details, Andrej Karpathy implemented these hashing algorithms in this blog post.

Private and Public Keys

The first thing we are going to do is create a private key (also called a “secret key”). This is what gives you access to your funds on the blockchain and is synonymous to a “password”.

In Bitcoin, the private key can be a random integer between 1 and N. We can also create one from a message, like below (which obviously isn’t very secure).

Side note — the second argument in int.from_bytes() is the byte order. The byte order can be either big-endian or little-endian. In big-endian, the “big end”, or the most significant bytes, are at the beginning. In little-endian, the opposite is true. So to switch from big to little endian (or vice versa), all we need to do is reverse the order of the bytes. Because little-endian is sometimes favorable for memory purposes, Bitcoin utilizes both.

Anyway, now that we have a private key, we can create our public key. Your public key is created by multiplying your private key by G (and if you’re wondering, this is why G is called the Generator Point!).

Notice how the public key is also just a point along the curve. This is because we are multiplying an integer by a point, G, which we defined using __rmul__ in our Point class.

Now, let’s ensure this point is, in fact, on Bitcoin’s elliptic curve. Remember the equation defined in secp256k1?

Well, here it is again: y² = x³ + 7 (mod P).

Awesome. So we now have our private/public key pair. On the blockchain, this key pair identifies who you are. The public key is also important because it’s used to generate wallet addresses.

Bitcoin Wallets

Okay — so for me, this is where things start getting interesting.

The wallet address is essentially a hashed version of the public key. The hashing algorithm used here is hash160, which again, is the sha256 hashing algorithm, followed by the ripemd160 algorithm.

But before we can get the wallet address, we’ll need to do one more thing. After the public key is hashed, it is encoded using a Bitcoin-specific base-58 encoding. Base-58 is meant to be very human-readable as it uses all symbols in the alphabet, except for 0/O and l/I (because these pairs are easy to confuse). Lets create a few functions to encode and decode in and out of base-58.

How the encoding works is straightforward. First, the bytes are converted to an integer. Next, any lead zeros are removed (these will be added back later). Lastly, while our integer is larger than 0, we tack on the letter in the base-58 alphabet which has an index corresponding to the remainder of our division operation.

The checksum is also important (as we will see in a second). It is just a hash256 of the first 4 bytes of our input. If you don’t know what a checksum is, it’s essentially just a tiny algorithm that checks for errors.

So now that we have our hash functions and base-58 methods defined, we are finally ready to generate a wallet address. To do this, we will define a class representing our Public Key.

As a reminder, the public key is generated from the secret key by multiplying it by the Generator Point (G). And because G is a point (x, y) on Bitcoin’s elliptic curve, we use point multiplication to arrive at the public key — meaning the public key is also just a point on Bitcoin’s elliptic curve.

One thing to note here is that the address needs to specify if we’re using Bitcoin’s testnet or mainnet (we will only be using testnet).

Let’s give this a whirl:

Cool. Now that we have our first Bitcoin “identity”, we can begin constructing transactions.

Creating and Broadcasting Bitcoin Transactions

We can now use Bitcoin testnet to send some funds to our newly created wallet.

To do this we are going to utilize something called a faucet, which is just a service that hooks into Bitcoin’s testnet. There are many of them out there. All you have to do is google “bitcoin testnet faucet” and hit one of the top couple links and enter the required info (amount and address).

We can see the results after doing that:

Free Money!?”, you ask?

Not quite. Unfortunately, the bitcoins on testnet aren’t worth anything.

Now, let’s view our wallet address on testnet to ensure we did, in fact, receive the bitcoin. To do that I looked up “bitcoin testnet explorer”, found blockstream.info/testnet, and entered our wallet address.

Okay there is a lot to unpack here.

So the first thing we notice is that our wallet does, in fact, exist. That’s a great start!

Next, we can see details about our wallet, including the total number of transactions, how much we’ve received, and how much remains unspent. On the bottom you can see our transaction id 353f52fb1332a3c313b175f79a0cf022d44cce85d4ea433a552b510f7bab2c8b, which should match up to one returned to us from the faucet.

As we will soon see, the transaction id is simply a hash256 of the encoded transaction, interpreted as little-endian.

If we click “Details” next to our Transaction ID, we get shown the below — which shows the inputs (left) and outputs (right) that our made up our transaction.

You may be wondering why there is 1 input and 2 outputs. This is because a Bitcoin transaction works differently than how we conventionally think of transactions.

In Bitcoin, each input actually points to the output of a previous transaction. This ensures you can’t spend any bitcoin without receiving some first.

This previous output needs to be unspent, and thus is called an Unspent Transaction Output — commonly referred to as a UTXO. The entire collective set of UTXO’s at any given time is called the UTXO set and is important because it represents all available bitcoins currently in circulation.

What’s key to understand here is that before the value can be split up, a UTXO must be fully spent. This is why in our screenshot above, the input has a much higher amount (0.10557793 BTC) than what was actually sent to us (0.001 BTC).

This is also why there are two outputs. The second output (with our wallet address) shows the amount that was sent (0.001 BTC). The first output (with the address of the sender) shows 0.10457649 BTC. The sender here, with an address of tb1qwngfv3fe8lmxwfyhu2h9us4lkft8fj763lnsz2, is from the testnet faucet we used to send ourselves bitcoin.

So, to reiterate, the input is the UTXO equaling 0.10557793 BTC. Because a UTXO must be fully spent, the entire amount is used to produce two outputs. In this case, the second output is the amount being sent, or 0.001 BTC. The first output is the “change” being sent back to the sender, or 0.10457649 BTC. Together the outputs total 0.10557649 BTC.

What might not be obvious here is that the inputs don’t equal the outputs. This difference is called the “fee” and eventually goes to the Bitcoin miner who processes the block containing our transaction. Note here that a “block” is just a group/batch of transactions.

This whole process is like paying $20 for a $5 beer and receiving $12 back as change because we tipped the bartender $3.

Continuing with this analogy, let’s say you want a second round. The input to this new transaction will be the $12 you received back last time (this is the unspent output of your previous transaction — UTXO). The output will be another $5 to the bartender, and $4 back to you (because you again tipped $3).

Before we code up the transaction, there are two more things we need to touch upon.

Script

Bitcoin has an entire programming language for transactions, called Script. It is stack-based and consists of elements and operations. Script doesn’t have most of the functionalities of modern-day programming languages (no loops, for example); however, the limited nature of it is what inherently makes it more secure.

Every transaction creates a script that tells the next person wanting to spend the bitcoins how exactly to gain access to them. This script is referred to as the ScriptPubKey, or Public Key Script. ScriptPubKey essentially “locks” the bitcoins, while providing instructions of how to unlock them. To unlock them, there is another script called the Signature Script, or ScriptSig. We will be dealing with both of these very soon.

There are several types of transactions in Bitcoin, but the one we’ll be focusing on is called pay-to-pubkey-hash, or P2PKH — which is commonly used when sending bitcoins from one wallet to another.

P2PKH uses the hash of the public key, rather than the public key itself. Aside from additional security, the advantage of hashing the public key is that it reduces it from 65 bytes (or 33 bytes compressed) to 20 bytes.

Anyway, the P2PKH ScriptPubKey (“locking” script) looks like this:

OP_DUP OP_HASH160 <pkHash> OP_EQUALVERIFY OP_CHECKSIG

Each one of these commands beginning with “OP” is a specific operation. In plain english, this script is duplicating the public key, performing a hash160 on it, verifying the result is equal to the public key hash, and then finally checking the signature.

Once again, this is called the “locking” script, and provides instructions of how to unlock the bitcoins. To unlock the bitcoins (meaning to solve the P2PKH script), the owner of <pkhash> needs to provide the unhashed public key and a valid signature for it. These two values are what makes up the ScriptSig.

scriptPubKey: OP_DUP OP_HASH160 <pkHash> OP_EQUALVERIFY OP_CHECKSIGscriptSig: <signature> <pubKey>

As you will see, the ScriptPubKey is part of the transaction output, while the ScriptSig is part of the input.

So, to recap, every input points to a previous output (UTXO). And because every UTXO contains a “locking” script (ScriptPubkey), every input will need to provide an “unlocking” script (ScriptSig) in order to access the bitcoins from the UTXO. If the bitcoins are successfully unlocked, they can then be spent. This is what people mean when they refer to “digital signatures” in Bitcoin.

Let’s come back to our example. The below was the ScriptPubKey from the transaction sent to our Bitcoin wallet:

OP_DUP OP_HASH160 32cf8a0b81f63473be3f88efe281f4b1f13c857e OP_EQUALVERIFY OP_CHECKSIG

All this says is the following: “In order to unlock the bitcoins, ScriptSig needs a public key that hashes to 32cf8a0b81f63473be3f88efe281f4b1f13c857e and a signature that was created by the secret key that generated this public key”(recall how we created our public key earlier).

This brings us to digital signatures.

Bitcoin Digital Signatures

The algorithm used for signatures in Bitcoin is called the Elliptic Curve Digital Signature Algorithm — or ECDSA. The signature is generated from the private (secret) key. The signature itself is really a tuple (r, s). The math behind all this can be seen below.

-----------------
Let e = private key
Let P = public key
Let k = 256-bit random target
Let R = (x, y) coords where r is x
Let r = x coord of R
Let z = signature hash
-----------------
u = z/s
v = r/s
eG = PkG = R = uG + vP
vP = G(k - u)
P = G((k-u)/v) = eG
e = (k-u)/v
k = u + ve = z/s + re/s = (z + re)/s
s = (z + re)/k

Here is all that in english:

We are choosing a random 256-bit integer k which is between 1 and N (recall that N is a constant defined in secp256k1). To solve “r” we need to solve R = kG. Since G is known, and “k” is a random integer, this is cake. Here, R is a point (x, y) along Bitcoin’s elliptic curve, and “r” is the x-coordinate of this point.

Once we have “r”, we can solve for “s” pretty easily to get our tuple (r, s) by using the following formula:

s = (z + re)/k

Note that “z” is the hash256 of an encoded transaction (the “message”), and “e” is our private/secret key, which is also known to us at this point.

So to solve “r” we pick a random integer and multiply it by Point G, and then take the x-coordinate of the result. Then, to solve “s”, we multiply “r” by the secret key, “e”. We then take our encoded transaction, “z”, add it, and finally divide the whole thing by our random “k”.

The randomness of k here is important as this is what protects your identity. This is why k is never revealed. In actuality, there is a deterministic process to generate k, detailed in RFC-6979, which can be read about here: https://datatracker.ietf.org/doc/html/rfc6979.

This specification was designed to prevent any duplications of “k” across the Bitcoin network. There was a famous Playstion 3 hack that was caused by this exact risk. But I digress.

In Bitcoin, the signature will be used to digitally sign a transaction. A signature proves that you know a private/secret key without actually revealing it. It allows you to access the bitcoins in the UTXO.

As you will see, after signing a transaction, we are going to need to encode it using something called Distinguished Encoding Rules (DER) format. Note that s cannot be derived from r, so we cannot compress the signature.

Here is the code.

As you can see, DER format (used in Signature.encode) isn’t anything crazy. Here are the steps:

1. Start with 0x30 and the encode length of entire signature
2. Encode both r and s as 32 bytes big endian integers
3. For both r and s, if the first byte is >= 0x80, prepend with 0x00
4. Add [0x02, len(r), r, 0x02 len(s), s]

This might all be a bit confusing right now, so let’s just throw it all together and see what happens.

Bitcoin Transactions

Transaction inputs will contain the hashed contents of the previous transaction output, the index of the output we’re trying to spend, and the ScriptSig (“unlocking” script). The output will contain the amount and the ScriptPubKey (“locking” script). It will look something like this:

Input:
Previous tx: f5d8ee39a430901c91a5917b9f2dc19d6d1a0e9cea205b009ca73dd04470b9a6
Index: 0
scriptSig: 304502206e21798a42fae0e854281abd38bacd1aeed3ee3738d9e1446618c4571d10
90db022100e2ac980643b0b82c0e88ffdfec6b64e3e6ba35e7ba5fdd7d5d6cc8d25c6b241501
Output:
Value: 5000000000
scriptPubKey: OP_DUP OP_HASH160 404371705fa9bd789a2fcd52d2c580b65d35549d OP_EQUALVERIFY OP_CHECKSIG

To start coding all this up, we will need a few helper functions to help us encode and decode integer values (to and from bytes).

Variable Integers, or varints, are able to handle extremely large integers efficiently by compressing them with a simple algorithm. These are used in Bitcoin all over the place.

Because we’ll be dealing with Script, we’ll also need an object to help us decode Script components.

There a few subtleties here, such as the one-off handling for OP_PUSHDATA1 and OP_PUSHDATA2. As you can see, the first byte is a varint describing the length of the script. We then loop until this length is reached, decoding bytes as we go along. At the start of each iteration, we are grabbing how long the field is (each field is preceded with its length as an integer), and then taking that many bytes.

In Script, Op Codes are mapped to integer values. For example, OP_DUP has a value of 118, while OP_HASH160 has a value of 169.

Op Codes 1–75 are for pushing elements into the stack and correspond to how many bytes need to be pushed. Codes 76 and 77 (OP_PUSHDATA1 and OP_PUSHDATA2) handle special cases where the size of the element is larger than 75 bytes. A full list of Op Codes can be found here: https://wiki.bitcoinsv.io/index.php/Opcodes_used_in_Bitcoin_Script

The “encode” method for Script takes our commands list and serializes it.

As you can see above, we also defined a standard P2PKH ScriptPubkey, as we will be utilizing this soon.

Lets test all this out. We will first generate a random secret key between 1 and N, multiply it by G to calculate a public key. Then, we will hash our public key, and finally create a ScriptPubKey (again, the “locking” script).

Now, with all that out of the way, we are finally ready to build our transaction.

A standard Bitcoin transaction will have 4 components: Version, Inputs, Outputs, and Locktime.

We know what inputs and outputs are, but let’s briefly touch upon version and locktime. Version is exactly what is sounds like — it describes what version a transaction is. For our purposes we will be using a version of 1, as that is the most general case.

Locktime was designed — along with “sequence” in our inputs — by Satoshi to enable high frequency trades. Both locktime and sequence were created to essentially compress a large batch of transactions between two parties into one single transaction. The locktime acts like a “time limit” and correlates to a block number. The sequence is how many transactions have occured prior to this time limit expiring (locktime) and increases incrementally with each successive transaction. However, there is a rule where if sequence is 0xffffffff, locktime is ignored.

The encode method of the transaction class creates the unhashed version of the “message” that we need to create our signature. This hashed version of this message is what we referred to as “z” in our calculations to obtain the (r, s) tuple that a signature consists of.

As stated before, a transaction output consists of the amount and the ScriptPubKey. Both the encode and decode methods here are pretty straightforward. During encode, the amount is encoded in 8 bytes little-endian and then — since the script_pubkey object is of our Script class we just built — we call the equivalent method of Script.

One thing to note here is that the amount is denoted in satoshis (SAT), which is 1/100,000,000th (1e-8) of a bitcoin (BTC).

Transaction inputs, on the other hand, are encoded as follows: First, the bytes of the previous transaction are reversed to make it little-endian. Next, the previous index is encoded as 4 bytes little-endian. The script_sig will be of the Script class we built earlier, so we simply call its encode method. Finally, the sequence is encoded as a 4 byte integer.

The decode method works in reverse. The first 32 bytes are the contents of the previous transaction, little-endian. The next 4 bytes are the index of the output, and is also little-endian. The script_sig is decoded and returned as a Script object. Lastly, the sequence is decoded back into an integer.

You’ll notice that our decode method can also handle Segwit, which stands for Segregated Witness. On this, I’ll only being giving you the TLDR version:

This was a soft-fork upgrade in August 2017 that enabled larger block sizes and “transaction malleability”, which is the ability to alter Transaction IDs without changing the meaning of the transaction itself. I chose to include it in Tx.decode because our testnet faucet transaction (which we’ll want to retrieve) used segwit to send us the bitcoins.

Anyway, a transaction can have many inputs and many outputs. That is why (as you can see in the encode method) the number of inputs and outputs have to be encoded via a varint before the actual inputs and outputs are subsequently encoded.

Lastly, in the encode method, you will see something called a SIGHASH. This is just a byte denoting what part of the transaction we signed. For our purposes we will use SIGHASH_ALL, which is represented by “1”.

Okay — that was a lot of information, I know. Luckily for us it’s finally time to use all of this to actually create a Bitcoin transaction.

Now that we’ve built our transaction object (Tx), we have the ability to create transactions. To do this, we’re going to want to grab some information about our previous faucet transaction. Of course, we could do this manually by typing our Transaction ID into the site we used before…but nah.

To help automate things a bit, we are going to quickly build a wrapper around Blockstream’s testnet API to both extract and post transaction data.

I’m not going to spend time explaining the code here, as it’s a bit of an aside. To avoid excessively spamming the API (because we’re not terrible people), we only call it when no cached data is availble.

Creating and Signing a Bitcoin Transaction

The entire goal of this section is to create and sign a Bitcoin transaction. To do that, we are going to spend our 0.001 BTC we received from the testnet faucet. For this, we are going to create a second address. We will be sending this address 1/4 of our bitcoin, or 0.00025 BTC (25,000 SAT).

Our task here is to to populate the ScriptSig of the input we want to spend. In order to do that, we have to create the transaction, sign it, and then overwrite the input with this signature.

Let’s first cover the creation of a transaction.

As stated already, each input in a transaction points back to the output of a previous transaction (UTXO). Because of this, we need to use the UTXO’s ScriptPubKey for the input’s initial ScriptSig.

And if you think about it, this makes intuitive sense. Recall that a UTXO’s ScriptPubkey is the “locking script”. The entire point of the signing process is to generate a ScriptSig that satisfies the conditions of this ScriptPubkey so that we can consequently “unlock” the bitcoins.

This whole process is basically like declaring “I am the owner of the unspent bitcoins in this previous transaction, and I can prove that with this Signature generated from my secret key” (without giving away your secret key, of course).

Anyway, to do this, we need data about the previous transaction. To do this, we will use our API to get the most recent transaction from the sender’s wallet. We will then grab the transaction id, and do a second API call to get the encoded contents of that transaction. We use our newly-built Tx class to decode this.

Awesome, so far everything looks good.

You can see our 0.001 BTC (100,000 SAT) in the outputs. This is the bitcoin we want to claim with our digital signature.

You can also see our ScriptPubKey, which contains the hash160 of our public key in the middle of it.

Script([118, 169, '32cf8a0b81f63473be3f88efe281f4b1f13c857e', 136, 172])

Now that we have our two wallets and our sender’s previous transaction, we can construct our transaction.

To do this we create two lists: inputs and outputs. Each element in both of these lists are dictionaries with all the data required to properly create a Bitcoin transaction. The code is below.

Notice how we specify our “change” amount to be less than the total UTXO amount. This is becuase we need to leave a “tip” for the miner (via the fee). So here, we are specifying a 5000 SAT fee.

Moreover, take note on how our output’s ScriptPubKey is generated. If you recall, we create a wallet address by taking the hash160 of our public key, and encoding it using Base-58. Because of this, we only need to Base-58 decode our wallet address to retrieve the Public Key hash.

And lastly, remember that our input’s initial ScriptSig (“initial” because we will soon be replacing this with our signature) is the ScriptPubkey of our UTXO that it is pointing at.

Once we define our transaction, we then want to sign it to prove to the entire network that our sender is indeed the owner of the UTXO we are trying to spend. To do this, we want to create a “message”. The message is just the encoded version of the transaction, pointing to the specific input UTXO we want to spend (in our case there is only 1 input, so we just use index 0).

Once our message is created, we want to hash it and create our (r, s) signature tuple.

We then encode the signature using DER Format, and tack on a SIGHASH_ALL marker (denoted by the number 1). This signature is then combined with the encoded (not hashed!) version of the public key to create our ScriptSig.

Finally, once we have our ScriptSig, we want to overwrite the input with it.

And that’s it!

We have just created our very first Bitcoin transaction. Pretty neat, huh?

To broadcast a transaction, we encode the entire thing (which will be 225 bytes in our case), and take the hexadecimal value. This hex will contain the contents of our entire signed transaction. The transaction ID is calculated by taking the little-endian interpretation of the hash256 of the encoded contents.

Using our API wrapper, let’s now broadcast this transaction out to the blockchain. Note that the API will return the transaction ID if all goes smoothly.

Transaction Verification

Now that our transaction has been sucessfully broadcasted, it needs to be validated.

Miners validate transactions in order to obtain bitcoins via fees and block rewards (to be discussed next).

The process of validating a transaction is somewhat simple. There are really only a few things that need to be checked:

1. The inputs are unspent - meaning they're in the UTXO set. This prevents double-spending.
2. The sum of the inputs is ≥ the sum of the outputs. This ensures no new bitcoins are created.
3. The ScriptSig unlocks the ScriptPubKey

To code this up, we will need a couple things. First, we will need to decode our signature (DER Format) and public key (SEC Format). Remember, these two values are what make up the ScriptSig. To do that, we’ll need a helper function to perform a modular square root, as when the public key is compressed, we will need to solve for y in Bitcoin’s elliptic curve equation y² = x³ + 7.

Now that we have the ability to create, broadcast, and verify Bitcoin transactions, we are ready for what I call the “buzzword portion of Bitcoin”. This includes all the things you’ve heard about — blocks, blockchain, mining, nodes, and of course, Proof-of-Work.

Blocks and the Blockchain

As stated before, blocks are just batches of transactions. These transactions settle roughly every 10 minutes, based on Block difficulty (which we will touch on shortly).

Coinbase Transactions

The first transaction in a block is called the Coinbase Transaction (which has absolutely nothing to do with the company of the same name). What’s special about a coinbase transaction is that it’s the only transaction capable of producing new bitcoins. These bitcoins are called the “block reward”, and go to whoever mines the block.

The block reward is halved every 210,000 blocks (about every 4 years) in an event called the “halving”. When Bitcoin was first implemented, the block reward was 50 BTC. As of writing, the block reward is 6.25 bitcoins, which is currently worth over $250,000!

“The halving” was designed by Satoshi to limit inflation. In fact, there is even a cap on the number of bitcoins that can ever be created — which is 21,000,000 BTC. This is what gives Bitcoin its scarce property.

At its current pace, the block reward is scheduled to hit zero around the year 2140. However, over 90% of bitcoins have already been mined. This means that over time, miners will have to rely more and more on fees for profitability, rather than the block reward.

Anyway, back to Coinbase Transactions. These are pretty much the same as normal transactions, with a few key exceptions:

- They must have only one input.
- The input must have a previous transaction of 32 bytes of 00.
- The input must have a previous index of ffffffff.
- The ScriptSig is set by whoever mined the transaction (must be between 2 and 100 bytes)

Because the ScriptSig is set by the miner, there were initially some issues that were fixed in a soft-fork called BIP0034, which implemented “height” to the block. Height is essentially the block number. More specifically, it is a little-endian integer and represents the number of blocks since the Genesis Block.

Genesis Block

The Genesis Block is the very first block to exist in Bitcoin. It was mined by Satoshi himself and even contained a readable message in the ScriptSig, which can be seen in the code below:

Kind of ominous, I know. What I’ve realized — which I’ll get into later — is that it’s the little things like this that make Bitcoin as interesting as it is.

One thing this message proves is that the Genesis Block was created on or after January 3rd, 2009. Futhermore, although no one actually knows who Satoshi Nakamoto is (or was), many point to this message as proof he/she was from the UK.

What’s interesting about the Genesis Block is that it was hard-coded into bitcoin’s source code. This is because there is no previous block that the Genesis Block points to. The block reward was sent to wallet address 1A1zP1eP5QGefi2DMPTfTL5SLmv7DivfNa — but actually cannot be spent due to a quirk in the source code whereby the transaction isn’t in the global transactional database. No one knows if this “quirk” was intentional or simply a bug in the code. Furthermore, many believe this address to be owned by Satoshi, however like many things in Bitcoin, this is also unknown.

You can read more about the Genesis Block here:

Blocks

Anyway, let’s get back to the code. As you’ll see, a block acts pretty similar code-wise to a transaction.

There are 6 main things we need to construct a block.

1. Version:
- The version is simply which version of Bitcoin was used
2. Previous Block ID
- This is the block id. Like a transaction the id is the hash256 of the encoded block contents, little-endian.
- Every block points to a previous block, hence why they call it a "blockchain".
3. Merkle Root:
- This is a single hash of all the block's transactions hashes (so a hash of hashes).
4. Timestamp:
- Unix timestamp
- Used for calculating new bits/target/difficulty every 2,016 (technically 2015, as we'll see) blocks
5. bits
- Encodes the proof of work necessary in this block
6. nonce
- stands for "number used only once"
- Miners use this (and change it) to solve proof-of-work

Like transactions, every block needs to point to the previous block. As you may have guessed, this is why people call it a “blockchain”. For this reason, we need to specify the previous Block ID as a parameter. One should note here that all blocks ultimately point back to the Genesis Block.

Encoding a block is very similar to encoding a transaction. First the version is encoded as a 4 byte intetger. Then both the previous block and the merkle root are encoded as 32 bytes little endian. Both bits and nonce are 4 bytes total. Adding all these up gives us 80 bytes.

The encoded block is what is referred to as a Block Header, which is extremely important for mining.

For demonstration purposes, let’s test this on Bitcoin’s Genesis Block.

The output can be seen below:

And this brings us to maybe the biggest buzzword in all of crypto: Proof-of-Work.

Proof-of-Work, Mining, and Block Difficulty

At face value, Proof-of-Work (PoW) is actually a simple concept: produce a number below a target number.

To produce this number, miners use the block header (which, again, is just the encoded block). Specifically, they use the bits field — which is actually two numbers (a coefficient and an exponent). To produce the target number, a simple calculation is done: coef * 256**(exp-3). As a result, the target is a 256-bit number.

A valid Proof-of-Work is the hash of the block header that produces a little-endian integer that is below the target number. And producing this valid Proof-of-Work is what people mean when they say “mining”.

This target number is extremely hard to produce. In fact, to do so the entire bitcoin network must calculate 3.8 × 10²¹ hashes! Apparently, the best GPU card in the world would take around 50,000 years to perform that many calculations.

The “hardness” sorrounding solving PoW is referred to as Block Difficulty. It is a standard calculation used to help quantify how hard it is to find that target number.

- (0xffff * 256**(0x1d - 3)) / target

For context, the difficulty of the Genesis Block was 1.

Every 2,016 blocks is what’s referred to as a Difficulty Adjustment Period. At the end of this period, blocks either become harder or easier to mine. However, because of how the targets are calculated, the time to create a block will always average out to be around 10 minutes. This means that if half the computing power suddenly left the Bitcoin network, the block creation time will still regress towards 10 minutes.

time_differential = (2016th block timestamp) – (1st block timestamp)
new_target = previous_target * time_differential / (2 weeks)

What’s pretty funny about this whole “every 2016 blocks” thing is that Satoshi actually had an error in his code, whereby 2015 blocks are used instead. Fixing this bug would require a hard-fork — and we all know how much die-hard bitcoiner’s love the idea of hard-forks. Alas, the bug still remains.

So let’s sum all of this up. Proof-of-Work is a calculated number that is very hard to find. Miners attempt to hit this target number by using the block header. Because the sha256 hashing algorithm produces a uniform distribution, the block header hash is a randomized number. And because the target is a relatively small number, the odds of finding one of these hashes in a random process is miniscule. In fact, on average PoW requires about 10²² bits to be processed.

To reach this target number, miners alter the nonce field in the block header — which when hashed via sha256, produces a new number to compare to the target. To reset the nonce field, miners can change the coinbase transaction, which ultimately changes the Merkle Root (which we’ll discuss in a second) and clears nonce.

In fact, this nonce field has to be reset often. Take, for example, the Bitcoin AntMiner S9 — which can calculate 12 terahashes per second! That computing power would take 0.0003 seconds to clear the nonce field.

So Proof-of-Work is just a gigantic, iterative, brute-force guessing process.

1. Use bits of the block header to calculate our number, and say a prayer it's below the target
2. It's definitely not
3. Change the nonce field to produce a new block header hash, and go back to step 1

If you go back to blockstream.info/testnet and type in our most recent transaction, you will see the ID of the block it was included in. We can grab this ID, use the API to return the block header, and decode it using our newly created Block object. Let’s do this now:

Next, we can utilize the API to retrieve the hashes of all transactions in this block. Note that these are just the Transaction IDs — which, if you recall, are constructed using hash256.

Proof-of-Inclusion and Merkle Roots

So what is this merkle root parameter anyway?

Well, for starters, a Merkle Tree is a way to structure data. The Merkle Root is the result of constructing such a structure, and for our purposes is just a 32 byte hash of transaction hashes (the TX IDs). The hashes are referred to as leaves, and the entire structure is the Tree. Given a list of hashes [A B C D], it is constructed like this:

[A B C D] -> [hash(A,B), hash(C, D)] -> if more than 1 hash in our list, repeat process

So given a list of hashes, a merkle tree is constructed using the following iterative process:

1. Take all transactions, and hash each one
2. If there is an odd number of hashes, we duplicate the last one
3. Pair up the hashes into groups of 2
4. Hash these pairs
5. Loop from step 3 until there is only a single hash left (the merkle root)

The merkle root is the single hash that results from this process. And because the merkle root is a hash of all combined transactions within a block, it is said to provide “Proof-of-Inclusion”. What this means is that it’s proving a specific transaction is in a certain block (~efficiently ~).

With that, let’s briefly touch on Bitcoin nodes. A node is a computer that runs Bitcoin software and stores the entire blockchain. Nodes relay block information to other computers on the Bitcoin network in order to keep the blockchain up-to-date. One way they do this is by using a command called “merkleblock”, which transmits transaction data via a Merkle Tree.

The contents of the merkleblock command is exactly the same as the block header, with an additional 4 fields appended to it. These 4 fields make up “Proof-of-Inclusion”:

1. Number of Transactions -- how many leaves the merkle tree has (4 bytes)
2. Number of Hashes -- (varint)
3. Hashes -- All the hashes (32 bytes * num hashes)
4. Flag bits -- Gives info about where the hashes go within the merkle tree

Reference: https://en.wikipedia.org/wiki/Merkle_tree

Below, we will use our block’s transaction hashes to naively construct the Merkle Tree. To reiterate, the Merkle Root is the final product of this construction process.

Awesome. With that, we have everything we need to define an object to represent a Merkle Tree. For brevity, we won’t be getting into how to traverse this tree; however, the below should at least show you the general concept of how the Merkle Root is found.

Summary

So to recap, Bitcoin operates within a finite field with prime P, and exists along an elliptic curve y² = x³ + 7. The parameters of Bitcoin’s elliptic curve are defined in secp256k1. One of these parameters, G, is the Generator and is the point from which public keys (and ultimately wallets) are created.

To construct a Bitcoin transaction, we need a previous unspent output (UTXO) of our sender’s wallet. The ScriptPubKey of this UTXO locks the bitcoin and provides instructions of how to unlock it. Once we have this, we encode the transaction to create a “message”, which we then sign with our ECDSA signature. We encode this in DER format, and combine it with our SEC encoded public key to create a ScriptSig (our “unlocking” script). After we use this ScriptSig to overwrite the input, we encode the entire transaction, convert it to a hexadecimal, and broadcast it.

These transactions are then included in a block, which is just an ordered batch of transactions. All blocks point back to a previous block, which is why it’s called a blockchain. The hashed transaction IDs are also included in the block, and are used to construct the merkle root (which is needed for Proof-of-Inclusion).

Miners then want to verify these transactions and process the block to obtain the block reward via the coinbase transaction. To do so, they need to provide sufficient Proof-of-Work, which means using the block header (aka the encoded block) to produce a number below a target. And because of the nature of the sha256 hashing algorithm, this is extremely hard to do.

As a result, Bitcoin automatically adjusts the mining difficulty every 2,016 (but actually 2,015 due to Satoshi’s bug) blocks in a way such that the time to settle transactions will always average out to around 10 minutes.

Some Concluding Thoughts

Perhaps the most interesting thing about Bitcoin isn’t the code itself, but rather the “lore” around it. And the fact that Satoshi Nakamoto has managed to remain faceless since Bitcoin’s inception has no doubt fueled this lore.

I believe this to be both good and bad for Bitcoin — the reasons for which stem from the same fact. Bitcoin is obviously the original cryptocurrency, and has developed a massive following. This is good in the sense that it gives Bitcoin a higher chance of widespread adoption, as it is a “fan-favorite” so to speak. However, it might also play out to be detrimental, as strong feelings in the Bitcoin community have caused a lot of strife over the years.

It isn’t a stretch to say that some parts of the Bitcoin community have become fanatical. And while this passion has definitely helped Bitcoin remain the largest cryptocurrency, it has also impeded progress. Take, for example, the SegWit upgrade that happened in 2017. When this occurred, many in the Bitcoin community were outraged and responded by hard-forking Bitcoin to create another crypto called Bitcoin Cash — taking a significant amount of computing power out of Bitcoin.

While there are two sides to every coin, the fact is many Bitcoin proponents have historically fought against key protocol upgrades. Perhaps this is why it is still one the slowest, and in turn, one of the most energy inefficient cryptos out there.

All that being said, I find Bitcoin — and the crypto space in general — to be fascinating. There is no doubt that the software behind Bitcoin (some of which we coded today) will revolutionize many different industries in the coming years.

Just take a look at NFTs, for example. The rapid rise of these “non-fungible tokens” is a testament to the power of the decentralized technology that is core to what Bitcoin is.

And to think that this world-changing technology was created by someone/something that has managed to remain anonymous this whole time…

Anyway, if you’d like to learn more about all of this, I recommend any of the below resources.

1. https://www.amazon.com/Programming-Bitcoin-Learn-Program-Scratch/dp/1492031496
2. http://karpathy.github.io/2021/06/21/blockchain/
3. https://en.bitcoin.it/wiki/Main_Page

I sincerely hope you got something out of this post. Thanks for reading!

--

--

Trevor McGuire
Trevor McGuire

Written by Trevor McGuire

Machine Learning Engineer. linkedin — https://www.linkedin.com/in/trevorwmcguire/ || Twitter/Instagram— @trevorwmcguire

No responses yet