Blockchain is all leeks regardless of technology-the composition and architecture of blockchain technology

Blockchain is all leeks regardless of technology-the composition and architecture of blockchain technology

  • The three lowest-level technologies
    • Encryption of data relationships
    • Data cannot be tampered with
    • Peer-to-peer network makes data never offline
  • Core technology concept
    • Block
    • Mining and consensus mechanism
    • Merkle Tree
  • What is suitable for blockchain?
  • Blockchain application
    • Transaction model
    • Identity authentication system
    • Smart contract
  • Conclusion

Recently, I have been studying the blockchain. I have heard of it before, and I probably know a little bit now. Let me talk about my feelings in the process of learning, that is, "Go! Can you explain it more clearly." Most learning materials are chaotically structured, or they are too thin to explain in simple terms. As for those who are still holding open-air microphones and shouting what a blockchain is, and those who have missed it and regretted it for a lifetime, they are all liars.

I believe that many investment institutions are now looking at the blockchain. This year, even AI is not as popular as the blockchain. So for a blockchain project, whether it is reliable or not, is it drawing a cake, or is it really possible, everyone should keep their eyes open.

The three lowest-level technologies

It s best to put aside the price of various coins first when talking about blockchain. However, to do projects based on blockchain technology is indeed sustainable. Therefore, understanding the blockchain technology is more real than speculating coins.

The three words "blockchain" cannot explain the whole of this technology. If you have to use a name that can be fully expressed, I think it should be called "Peer-to-Peer Encrypted Non-Tampered Database", that is, "point-to-point encryption." The database cannot be tampered with".

It is not a database (such as MySQL, MongoDB), nor is it a type of database (such as SQL, NoSQL). It is a database architecture. It has a higher level of database technology. Considering the reliability of the data Guarantee, and how the database service is not offline. Therefore, you cannot compare it to a common named database. You can even use other databases to help store and retrieve data when a specific blockchain is implemented.

Encryption of data relationships

In our ordinary database, whether it is relational or non-relational, there may or may not be a relationship between our different records, but in the blockchain, one piece of data must have a connection with another piece of data, even if There is no connection in the actual business logic, but it always exists on the chain and cannot exist without the chain. There is always a path from one data to another. If you don't believe it, read it down.

"Block" expresses the final presentation form of the data relationship in the blockchain. A record, no matter what information it is, will eventually be placed in a block (or its retrieval information). And between blocks, there is a "linked list" data relationship. Anyone who can program knows what a linked list is, that is, there is an index key to the previous data in the latter data. Therefore, any two data on the blockchain can always be finally connected together through these index keys, and the data cannot escape this logic.

But the three words "blockchain" cannot explain the difference between such a data structure and the structure of an ordinary database, because the linked list data structure described above can also be constructed with an ordinary database, as long as you want. .

The real value is that the blockchain uses the principles of cryptography and the existing encryption technology to encrypt these index relationships layer by layer, so that in the stored data, these index keys are not so obvious, but need to pass Various calculations can be obtained. For example, when a block saves a bunch of transaction information, it uses a merkle tree to save it. The parent node is the result of the double hash of the two child nodes, and the merkle algorithm ensures that the transaction information cannot be tampered with.

We don't need to know what specific encryption is here yet. What we need to understand is that encryption is everywhere in the blockchain, which is a significant feature.

Data cannot be tampered with

The data on the blockchain cannot be tampered with, everyone says so. But in fact, the data can be changed, it just means that you can recognize it after you change it, and all the blocks after the block where the modified data is located will be invalid. The blockchain network has a synchronization logic. The entire blockchain network always keeps all nodes using the longest chain. After you modify it, once the network is synchronized, the modified things will be overwritten. This is an aspect that cannot be tampered with.

What's more interesting is that the blockchain through encryption verification ensures that data access needs to be strictly verified, and these verifications are almost unforgeable, so it is difficult to tamper with. Encryption does not mean that it cannot be tampered with, but it is achieved through a combination of encryption and economic principles. This is also a bit of a metaphysical flavor, a purely technical realization of things, but also depends on theory to maintain. But this is the fact. This is the legendary mining.

The mining process is actually a process in which a miner strives to create a block. Once a mine is mined, it means that the miner is eligible to create a new block. How to count as digging to mine? Through a series of complex encryption algorithms, starting from 0 to , find a hash value that satisfies the difficulty, and get this value, which is to mine. This algorithm process is called the "consensus mechanism", which is to determine who has the right to bookkeeping by what form. There are many consensus mechanisms. Which consensus mechanism is the best for the blockchain is entirely determined by the actual purpose of the blockchain. Combine economic principles to choose.

Mining is not over. Take Bitcoin as an example. Next, miners need to package the transaction that is broadcast to the network into this block. Is a transaction legal? Did the person who initiated the transaction forged a transaction? To ensure the legitimacy of a transaction, the authenticity of the source of the transaction must be found from the previous block that already exists, and how to verify the authenticity of the transaction? In the previous block, the merkle root hash of the source of the transaction is stored. As long as you find the block where the exchange is located, and do a merkle check again, you can determine whether the transaction is legal. Obtaining the merkle root hash is obtained by continuously encrypting all transactions in the block. Therefore, as long as the transaction is fake, the merkle root hash cannot be obtained. Encryption here again helps to achieve data reliability.

In addition to these, encryption in the blockchain is everywhere. These encryption rules and algorithms make the entire blockchain follow a rule, making the cost of tampering with data extremely high, so that the participants are not interested in tampering with the data, or even fear. This is the place of metaphysics.

Peer-to-peer network makes data never offline

If the blockchain does not have a p2p network, it just has an encryption system and a chain feature according to the previous description, and then runs on a certain (group) server. It runs according to our current centralized mode, which looks very fun. of. But the inventor wanted to play bigger. The encryption system made the data untamperable, but I just threw an atomic bomb to blow up your computer room. It was not tamper-proof, it was over.

In order to prevent the computer room from being bombed by the atomic bomb, the inventor designed a peer-to-peer network (the client and the client communicate directly without passing through a specific server) to the blockchain. Simply put, in this peer-to-peer network, everyone s computer keeps exactly the same data structure (in fact, a complete "block" and "chain"). They are connected to each other through the network for synchronization. When a miner creates a new Other people will synchronize this block to the data structure in their own custody. Therefore, no matter which node on this network is bombed, the other nodes are still alive, and the newly joined friends can synchronize data from these nodes to their computers. If you want to make the blockchain data disappear, blow up the earth.

And this design of joining a peer-to-peer network is called "decentralization". As long as there is one node alive on the network, the blockchain data will not disappear.

What makes politicians even more afraid is that these saved data can be viewed by users on the node at will, it doesn't matter, it is completely public. Now that the node user has synchronized the data, you can use it whatever you want. It is your data, and you can use it whatever you want. Imagine that one day Taobao said that I want to block chain my own data...I can't bear to look at it...

Core technology concept

What I just explained above is why the blockchain is the foundation of the blockchain. What I want to talk about in this section is that now that there is a blockchain in front of you, we need to analyze the specific technical points or architectures used in the blockchain.


Blocks have already been mentioned. So what exactly is a block? A block is the main data storage structure of the block chain. A block contains two parts: a block header and a block body. The block header is the highlight of the block.

Schematic diagram of a block structure in the blockchain (Note: version information is omitted)

For a block, it is a special data structure. Its block header contains some fixed information: version (client version, this information will be different every time the client software is upgraded), block height (in fact, it means that this is the first block in the chain), block Hash (the hash value of this block is obtained by mining), the block hash of the previous block (this field is the key point and the key to the structure of the linked list), timestamp (the block creation time), Difficulty is related to Nonce (these two fields are related to mining and will be discussed in detail later when I talk about mining), merkle root (the hash value of the merkle root of the block body, the merkle tree will be briefly introduced later, and the details will be discussed in other articles. ). In addition to these fields, if you make your own blockchain, you can also add some other information to the block header.

The block body is the location where the specific content is saved. In the Bitcoin blockchain, the block body saves the transaction information for a period of time. In other blockchains, the transaction information may not necessarily be stored here, but may be other information. In short, the block body is to store the specific business information of the business that the blockchain is used for.

In some blockchain implementations, a block can also have a block tail, which is used to save some information after the block is created. This information may be added after the block header and block body have been created, such as Information such as the length and capacity of the block.

This is a block. The previousHash field in a block header stores the hash value of the previous block. Therefore, through this block, you can know which block is the previous block, and the previous block can know the previous block until It can be traced back to the first block of the entire chain. This is the blockchain.

Schematic diagram of how blocks constitute a chain

Just like the picture above, the next block always points to the previous block. Once a block is generated and there is a block pointing to it, it cannot be modified, because once modified, all hashes need to be recalculated. But we know that the characteristic of the hash algorithm is that if you want to get this hash, you must perform the hash algorithm with the original content. Therefore, if the given content is different from the original content, the hash will not be obtained. Therefore, a certain block in the middle The hash obtained by modifying the chain cannot be pointed to by the following block, and the block chain will be broken. If the broken blockchain is added to the network, it will either not be recognized, other nodes will not treat you as a legitimate node, or you have to synchronize again, and re-copy the longest chain from the network to your local coverage The original chain.

But you may have two questions: 1. This blockHash is not a hash of the content. How to ensure that the information in the block is not modified? If I don t change the blockHash, but only the content, can t I hide it? 2. If two blocks point to one block at the same time, and the block bodies of the two blocks are not the same, what should I do?

The first question, we need to know this principle through the combination of mining and merkle tree. The second question, in fact, this situation is very common. The probability of mining success is actually 100%. The key is which miner digs first. Generally, when the miner digs the mine, it will broadcast to the whole network. Others Miners who have not digged will stop. However, due to network delays and other circumstances, multiple miners may mine together in a short period of time, and they all create new blocks and broadcast them to the network. This situation is called "forking."

When a fork occurs, there are two ways. But it's all going with the flow, without manual intervention. Miners who later dig new blocks will decide for themselves which branch the last block will be their previous block. If one chain is obviously longer than the other in a relatively short number of blocks, then the long chain will be retained, and the short one will be discarded. The miners who dig the short chain are considered to have done nothing. When all network nodes synchronize, they will choose the longest chain for synchronization. Miners who later mine will also choose the longest chain when they create new blocks.

But there is another situation, that is, the miners of the short chain are reluctant, or the two chains cannot tell the winner in a short time, and even in the end, each chain has a lot of blocks to follow. There can be no two bills for a group of people's transfers. If the two bills are found to be different in the end, it will be troublesome. So in this situation, the miners will decide to separate their families. That is, one blockchain becomes two chains. One chain copies all the previous chains and becomes an independent chain. Since then, the two chains have nothing to do with each other, although the previous blocks are exactly the same. But the well water in the back does not violate the river water. This situation is called a "hard fork," and this is how BitCoin Cash was born. The newly generated chain inherits the previous block, but the latter block is entirely determined by the miner who digs this chain. The advantage of a hard fork is that for the original user, suddenly, one of his own assets has become two

As for the block body, it is really designed according to the business needs of the blockchain application. For example, Bitcoin is designed as a transaction model. All the transaction records in the block body are put together, which is a long bill. The ins and outs of every penny are clearly written. If the Red Cross used the block at the beginning Donate from the chain, then there won't be such a big disturbance. But other blockchain applications are not necessarily transaction models, such as the blockchain used to record medical information, the blockchain used to record the location of users...So, for the blockchain technology itself, stop here , And further down is the unique technology of Bitcoin.

Mining and consensus mechanism

I have talked about mining many times before. Simply put, the mining process is a process in which a bunch of miners are grabbing the right to create a new block. In the world of cryptocurrency, if you grab this right, you will add a transaction to yourself at the top of the block, and the money for this transaction comes out of thin air, so it is also called "mining reward" , And the amount is still quite large, so the miners squeezed their heads to grab this accounting right. But in other non-currency blockchain applications, if there is no such reward, how can we incentivize miners to mine? This is also a mysterious and mysterious topic in the blockchain field, and there is no answer yet.

So what is the technical algorithmic process of mining?

In the mining machine program, it is stipulated how to get a hash, and this stipulation is called a consensus mechanism. All miners perform an algorithm according to this consensus mechanism to see who gets a qualified result first, and this result is again It can be easily verified that it meets the conditions (verify whether it meets the consensus mechanism). Different blockchains have different consensus mechanisms. The more well-known ones are PoW and PoS, as well as other consensus mechanisms derived from the two.

In order to be able to explain the mining process clearly, we only demonstrate the PoW that Bitcoin follows.

SHA256(SHA256(version + prevHash + merkleRoot + time + currentDifficulty + nonce )) <TARGET

The miner executes the above formula, as long as the above formula is satisfied (the execution result is true), it will be mined. Now explain this formula.

  1. The mining opportunity performs a double sha256 calculation. The calculation parameters are actually all the information in the block header, but because the block has not been generated at this time, the information is temporarily stored. If the accounting right is grabbed, the information will be recorded. Go in
  2. The version is the client software version of the current mining machine. Each version upgrade may affect some parameters. For example, the block size is expanded from 1M to 2M, but it is unchanged for the mining algorithm.
  3. prevHash is the hash value of the previous block
  4. merkleRoot is the root hash obtained by the merkle algorithm of the transaction temporarily stored in the memory of the current miner. Merkle will talk about it below
  5. time is the current timestamp
  6. currentDifficulty current is difficult, the difficulty is considered by the current out of a formula, the formula is currentDifficulty = diff_1_target/TARGETthe formula diff_1_target inside a constant may be considered, in which the bit credits the client is constant, the value 0x1d00ffff. Of course, in fact, it may also change, but no matter how it changes, it is almost the same value. And TARGET we will talk about below.
  7. nonce nonce double sha256 nonce 0 nonce 1 1 nonce nonce nonce
  8. TARGET 10 TARGET currentDifficulty 10 10 10 2016 2 TARGET 2016 2 TARGET currentDifficulty 2016 TARGET 2016 10


nonce merkle root merkle root hash


Merkle Tree

Merkle Merkle Tree merkle tree Merkle Tree

Merkle Root Merkle n n/2 hash n hash hash double hash

parentHash = sha256(sha256( hash1 + hash2 ))

hash hash hash hash merkle root

merkle merkle hash

merkle root merkle root

There are two main characteristics of blockchain data: 1. Open and transparent, any node has complete rights to view the data; 2. It is difficult to forge or tamper with. Therefore, the blockchain is very suitable for two types of scenarios: 1. Evidence; 2. Supervision. If the information on the blockchain is recognized by the law, the infringing party will be unable to dispute any evidence on the blockchain. And imagine that if taxes are completely migrated to the blockchain, every tax of each citizen will be used in the end, and it will be clear. This may be a point that scares some people.

But the blockchain has two major disadvantages: 1. Mining is required, and there is a risk of forks, which means that a piece of data on the blockchain will have to wait a long time before it can become credible data that cannot be tampered with. 2. The data is divided into blocks and stored separately, which brings huge trouble to the query and greatly affects the efficiency. Therefore, the blockchain is not suitable for scenarios that require high immediacy, whether it is the immediacy of information exchange (such as chat) or the immediacy of query (such as search engines).

Blockchain is not omnipotent. Certain services are clearly more efficient and cost-effective. However, they want to build blockchain for the sake of the wind. It only depends on whether the leeks grow together. There is another worrying thing about whether the information on the blockchain is open and transparent, and cannot be deleted, whether it will cause great damage to personal privacy. Just imagine that the little brother who repaired the computer to Guanxi passed the block. Chain Network flaunts the photos it has discovered...The harm to the parties...will not dissipate even if the person dies...

Blockchain application

With the advent of the tuyere, blockchain applications have come and gone. But at present, the more mature ones are nothing more than three models: 1. Bitcoin; 2. Ethereum smart contract; 3. Bitshares. Others, not to mention, the readers figure it out.

What is the architecture of a blockchain application? On top of the blockchain itself, what other technologies are needed to support it?

Blockchain application system architecture diagram (Shao Qifeng et al. "Blockchain Technology: Architecture and Progress")

Blockchain applications, such as Bitcoin and Ethereum, have a relatively large degree of coupling with the blockchain itself, which means that the blockchain as a database cannot be a relatively independent module in the application, which is similar to our current popular B/The S architecture is slightly different. In the blockchain application, after the blockchain is disassembled, it will be integrated with other layers of the application to finally realize the overall function of the application.

Let's take Bitcoin as an example to talk about a blockchain application, which parts are more than a simple blockchain itself. Bitcoin is a blockchain-based billing system. In addition to the blockchain, it also includes: 1. Transaction model; 2. Identity authentication system (similar to PKI); 3. Smart contract.

Transaction model

The transaction model is the transaction record stored in the block. The reason why the ins and outs of every money in Bitcoin is clear is that it relies on the transaction model. In our reality, an account in a bank will only tell you how much money an account currently has, how much money has been spent in the past, how much money has been earned, and how much money is still owed. But it won't tell you "a certain amount of money you spend comes from a certain amount of income." But Bitcoin must tell you this logic. A transaction contains two parts: "input" and "output". For example, if you want to transfer 10 BTC, then your account must have one or more "inputs" that add up to or exceed 10BTC, and the output refers to whom you want to transfer the 10BTC to. But there is a situation, what to do when all the "inputs" add up to 10.5 BTC, just like you have 100 yuan, grandpa Mao, to buy something for 70 yuan, you need to "change". Therefore, sometimes one of the "outputs" is transferred to oneself, that is, "change".

Schematic diagram of Bitcoin transaction input and output

In fact, the output in another transaction is the input of this new transaction.

In the block body, these transaction records and their input and output are truthfully recorded. In addition, merkle calculations must be performed to store the merkle root in the block header.

Identity authentication system

Since it is a transaction, the identity of both parties to the transaction must be involved. There are two accounts at both ends of a Bitcoin transaction. It doesn't matter who it is, but it uses an encryption algorithm to ensure that the transaction is initiated by which account, and the person who initiates the transaction must sign the transaction information.

You may have heard of asymmetric encryption, public and private keys. The most important thing about a Bitcoin account is this private key. Once the private key is lost, you cannot prove that you are the owner of the account, and you cannot transfer the signature from the account, and you can no longer spend the money in the account. Coins, and lost coins.

So what is the process of signing? How can I prove that this transaction was made by me? How to prove that the money was transferred to me?

Key, address and wallet

The key usually refers to the private key corresponding to the owner of the bitcoin asset that protects the bitcoin asset. In some cases, the private key and the public key are collectively referred to as the key. Here we take the narrow interpretation of the private key as the standard.

Address The payment address of Bitcoin. In most cases, it refers to the encapsulation of the public key (in some cases, there are scripts in addition to the public key).

Wallet A type of Bitcoin client software that is a container of private keys, usually implemented through ordered files or simple databases. Bitcoin wallets contain private key and public key data, although public key data theory does not require storage.

Generally speaking, the user's public key and Bitcoin address can be equated, but in fact they are not. The Bitcoin address is a string much shorter than the public key, mainly for the convenience of input. The public key is used for various asymmetric encryption.

Bitcoin public key to Bitcoin address

The public key and address are public in the Bitcoin network. Only the private key is kept by the user and cannot be given to anyone. When a transaction is initiated, the transaction initiator signs the transaction with "own private key" and "recipient public key", then other people in the network can use the initiator's public key to verify whether the transaction was initiated by him Yes, for the receiver, it is necessary to provide its own private key for decryption operation to prove that the transaction is indeed sent to itself. The Bitcoin client (wallet) does this encryption, decryption and signature.

Smart contract

Bitcoin itself already has the embryonic form of a smart contract, but the programming language it uses is relatively weak in scripting capabilities, and the contract logic that can be implemented is not complicated. Ethereum expands the smart contract part of the chain on this basis, greatly enhancing the programming capabilities of smart contracts.

In the input and output mentioned above, the output is actually used as the input of another new transaction. The output of Bitcoin is not only to tell the system which address and how much money to transfer to, the output is actually a Bitcoin script. This script has also undergone complex asymmetric encryption. To run this paragraph, if you want to get the money in an output and use this money as the input of your own transaction, you must first decrypt the script with your own private key, and then run the script, the script After running, the money can be used as input for your own transaction. Combining the previous knowledge, only the corresponding private key can be decrypted. Therefore, only the user corresponding to the bitcoin address corresponding to the output record can decrypt the script and get the money.

In this process, "script" is a key. In addition to the simplest transfer logic above, a slightly more complicated programming can be achieved through some conditional judgments. For example, only when certain conditions are met, the decrypted script To run. Therefore, based on this design, Bitcoin's script system can be used to implement functions such as multi-signatures and guarantee contracts, which is the prototype of smart contracts.


As for the research on blockchain, I have just started, and there must be many places that I don t understand thoroughly, and there are also misunderstandings in some places. But for those who want to understand this field, I hope you first understand the technical principles behind the blockchain (you don t need to have a thorough understanding of the technical details), and read some more mature and credible materials (I accidentally discovered that the predecessors have already been in " A similar review article has been published in the Chinese Journal of Computers. Readers can read " Blockchain Technology: Architecture and Progress " instead of just believing it is true. Once you have mastered these technical principles, you will find that in fact it has a lot of restrictions, there are beautiful places, and there are unnecessary places, those inexplicable projects will not pit you.

The original text is posted on my blog: If some errors are found later, they will be updated on my blog, so please pay attention to my blog.