Starting from practice: the microservice evangelist tells you how to choose between Spring Cloud and Boot

Starting from practice: the microservice evangelist tells you how to choose between Spring Cloud and Boot


With the rapid development of the company's business volume, the challenges faced by the platform are far greater than the business, the demand continues to increase, the number of technical personnel increases, and the complexity faced by it has also increased greatly. In this context, the technical architecture of the platform has also completed the evolution from traditional monolithic applications to microservices.

Evolution of the system architecture

Single application architecture (first generation architecture)

This is the initial situation of the platform. At that time, the traffic was small. In order to save costs, all applications were packaged into one application. The architecture adopted was .net+sqlserver:

Presentation layer
the outermost layer (uppermost layer), closest to the user. It is used to display data and receive data input by users, and provide users with an interactive operation interface. The platform uses a web form based on .net.

Business logic layer
the business logic (Business Logic Layer) is undoubtedly part of the system architecture reflects the core values. Its focus is mainly on the formulation of business rules, the realization of business processes and other system design related to business requirements, that is to say, it is related to the domain logic that the system responds to. In many cases, business logic is also used. The layer is called the domain layer.

The position of the business logic layer in the system architecture is very important. It is located between the data access layer and the presentation layer, and plays a role of connecting the previous and the next in the data exchange. Since the layer is a weakly coupled structure, the dependence between layers is downward, and the bottom layer is "ignorant" to the upper layer, and changing the design of the upper layer has no effect on the bottom layer that it calls.

If you follow the idea of interface-oriented design in the layered design, then this downward dependency should also be a weak dependency. For the data access layer, it is the caller; for the presentation layer, it is the callee.

Data Layer
Data Access Layer: Sometimes also known as persistence layer, its main function is responsible for database access, you can access the database system, binary files, text documents or XML documents, the platform used in this stage is + sqlserver.

The first-generation architecture seems very simple, but it supports the early business development of the platform and meets the processing needs of tens of thousands of website user visits. But when user visits show a large-scale increase, the problem is exposed:

  • Maintenance costs continue to increase: When a failure occurs, there will be more combinations of causes that may cause the failure, which will also lead to a corresponding increase in the cost of analyzing the failure, locating the failure, and repairing the failure. The average repair cycle of the failure will take a lot of time, and The failure of any module will affect other application modules; in the case that the developer lacks a deep understanding of the global function, repair a fault, and often introduce other faults, causing the process to fall into the viciousness of "the more repairs, the more faults" cycle.

  • Poor scalability: All functional codes of the application are running on the same server, which will cause the horizontal expansion of the application to be very difficult, and only vertical expansion can be used.

  • The delivery cycle becomes longer: any minor modification of the application and code submission will trigger the code compilation of the entire application, run unit tests, code inspections, build and generate deployment packages, verify functions, etc. This is also the version's feedback cycle As it becomes longer, the construction efficiency per unit time becomes very low.

  • The training cycle for newcomers becomes longer: as the functions of the application become more and more, the code becomes more and more complex, and for the new members of the team, they should understand the business background, be familiar with the application, and configure the local development environment. A seemingly simple task, but it will take longer.

Vertical application architecture (second generation architecture)

In order to solve the problems faced by the first-generation architecture, the team formulated the following strategies and formed the second-generation application architecture (vertical application architecture)

  • The application is disassembled into independent application modules.

  • Each application module is deployed independently, and the horizontal expansion problem of the application module is solved through the session maintenance in load balancing.

    Sticky is a load balancing solution based on cookies. The session between the client and the back-end server is maintained through cookies. Under certain conditions, it can be ensured that the same client accesses the same back-end server. When the request came, the server sent a cookie and said: Bring it next time, come to me directly!. In the project, we used the session_sticky module in Taobao's open source tengine.

  • The database is split into different databases and accessed by corresponding applications.

  • Domain splitting.

  • Separation of dynamics and statics.

It can be seen that the second-generation architecture solves the application-level horizontal expansion and expansion. After optimization, the architecture supports the access needs of hundreds of thousands of users. At this stage, some applications have completed the rewrite of the mvc architecture using java. Of course there are some problems.

  • The degree of coupling between applications is high, and the interdependence is severe.

  • The interaction between application modules is complex, sometimes directly accessing each other's module database.

  • The database involves too many related queries and slow queries, and it is difficult to optimize the database.

  • The single-point access to the database is serious and cannot be recovered if a failure occurs.

  • The data replication problem is serious, causing a large amount of data inconsistency.

We have tried to use sql server AlwaysOn to solve the expansion problem, but the experiment found that there was a delay of at least 10s during the replication process, so this solution was abandoned.

  • System expansion is difficult.

  • Each development team fights on their own, and the development efficiency is low.

  • The test workload is huge and the release is difficult.

Microservice architecture (platform status: third-generation architecture)

In order to solve the problems of the first and second generation architectures, we have sorted out the platform and optimized it. According to the platform business needs and the summary of the first and second generation architectures, we have determined the core requirements of the third generation architecture:

  • The core business is extracted as an independent service for external services.

  • The service modules are continuously and independently deployed, reducing the version delivery cycle.

  • The database is divided into tables by service.

  • Use caching extensively to improve access.

  • The interaction between the systems uses the lightweight rest protocol, instead of the rpc protocol.

  • Going to .net, the development language is implemented using java.

And based on this, the third-generation architecture of the platform was reconstructed.

Looking at the composition of the third-generation architecture, it is mainly divided into eight parts:

  • CDN: The CDN system is responsible for redirecting the user's request to the service node closest to the user based on comprehensive information such as network traffic and the connection of each node, load status, distance to the user and response time in real time. Its purpose is to enable users to obtain the required content nearby, solve the congestion of the Internet network, and improve the response speed of users visiting the website.

    When the platform chooses a CDN vendor, it needs to consider the length of its operation, whether it has expandable bandwidth resources, flexible traffic and bandwidth selection, stable nodes, and cost-effective; the platform adopts Qiniu's CDN service based on the previous factors.

  • LB layer: The platform includes many business domains, and different business domains have different clusters. The LB layer (Load Balancer) is a load balancing service that distributes traffic to multiple business servers, and expands the external service capabilities of the application system through traffic distribution. And eliminate the single point of failure to improve the availability of the application system.

    Which load to choose requires a comprehensive consideration of various factors (whether it meets high concurrency and high performance, how to solve the session maintenance, what is the load balancing algorithm, support compression, and the memory consumption of cache), which are mainly divided into the following two:

    LVS: Working at layer 4, Linux implements a high-performance, high-concurrency, scalable, and reliable load balancer that supports multiple forwarding methods (NAT, DR, IP Tunneling), and the DR mode supports load balancing through the WAN. Support dual machine hot backup (Keepalived or Heartbeat). The dependence on the network environment is relatively high.

    Nginx: Working in the 7th layer, event-driven, asynchronous non-blocking architecture, high-concurrency load balancer/reverse proxy software that supports multiple processes. You can do some diversions against http based on domain name, directory structure, and regular rules.

    The internal fault of the server is detected through the port, such as the status code returned by the server processing the webpage, timeout, etc., and the request that returns the error will be resubmitted to another node, but the disadvantage is that it does not support url to detect.

    For session sticky, we implement it through the cookie-based extension nginx-sticky-module. This is also the scheme currently adopted by the platform.

  • Business layer: represents the services provided by the business in a certain field of the platform. For the platform, there are systems such as commodities, members, live broadcast, orders, finance, and forums. Different systems provide services in different fields.

  • Gateway and Registry: Provides a unified low-level microservice api entry and registration management. The internal system architecture is encapsulated and the Rest API is provided to each client. At the same time, responsibilities such as monitoring, load balancing, caching, service degradation, and current limiting are realized. The current platform uses nginx+consul to achieve.

  • Service layer: This layer is a number of small and autonomous services that work together. The platform determines the boundary of the service according to the boundary of the business, and each service only focuses on its own boundary. This layer is built based on spring cloud.

  • Infrastructure layer: This layer provides infrastructure services for upper-layer services, mainly in the following categories:

    Redis cluster: Provide cache service for the upper layer with high response speed and memory operation.

    Mongodb cluster: As mongodb has features such as a flexible document model, highly available replication set, and scalable sharded cluster, the platform provides storage services such as articles, posts, and link logs for the upper layer based on this. The mongodb cluster uses a replication + sharding architecture to solve availability and scalability issues.

    MySQL cluster: Stores members, products, orders, and other transactional data.

    Kafka: Supports all messaging services of the platform.

    ES (elasticsearch): Provides platform search services for commodities, members, orders, logs, etc.

  • Integration layer: This feature is the biggest highlight of the entire platform, including the practice of continuous integration CI, continuous delivery CD, DevOps culture, let everyone participate in the delivery, and complete the automatic deployment and release of a service under the standardized process and standard delivery , Thereby improving the overall efficiency of the version delivery link.

  • Monitoring layer: Splitting the system into smaller, fine-grained microservices brings many benefits to the platform, but it also increases the complexity of the operation and maintenance of the platform system.

    The task services provided to end users are completed by a large number of microservices. An initial call will eventually trigger multiple downstream service calls. How can the request flow be reconstructed to reproduce and solve this problem?

    To this end, the open-source open-falcon platform is deployed to provide monitoring at the application level and above, ELK is used to provide analysis of application logs, self-built services are used to provide link log tracking, and the spring config server practice unified configuration services.

How the microservice team works

Conway's Law: When any organization designs a system, the delivered design plan is structurally consistent with the organization's communication structure.

Way of working

When practicing the third-generation architecture, we made several adjustments to the team organization:

  • Divided according to the business boundary, in a full stack within a team, let the team be autonomous, organized in this way, maintaining the cost of communication within the system, each subsystem will be more cohesive, and the mutual dependence and coupling can be weakened. The cost of cross-system communication can also be reduced.

  • A dedicated architect department was established to be responsible for the implementation of the third-generation architecture. It is usually a reasonable structure for a team of architects to consist of five roles: system architecture, application architecture, operation and maintenance, DBA, and agile experts. So how do you control the output of the architecture team and ensure the smooth implementation of the architecture work?

    • First of all: creating a self-organizing culture of continuous improvement is the key cornerstone of the implementation of microservices. Only by continuous improvement, continuous learning and feedback, and continuous creation of such a cultural atmosphere and team, can the microservice architecture continue to develop, maintain fresh vitality, and realize our original intention.

    • Secondly: the products of the architecture team must go through a strict process, because the architecture team promotes a general solution. In order to ensure the quality of the solution, we have a strict closed loop from program research to review to implementation.

Let's talk about the delivery process and development model of the entire team. If it is not pre-defined, it is difficult to make the microservice architecture play a real value. Let's first look at the delivery process of the microservice architecture.

Using the microservice architecture to develop applications, we are actually designing, developing, testing, and deploying microservices one by one, because each service is not dependent on each other, and the approximate delivery process is as shown in the figure above.

design phase:

The architecture group divides product functions into several microservices, and designs API interfaces (such as REST API) for each microservice. API documents need to be given, including the name, version, request parameters, response results, error codes and other information of the API.

In the development phase, the development engineer implements the API interface and also completes the unit testing of the API. During this period, the front-end engineer will develop the Web UI part in parallel, and can create some fake data based on the API document (we call it "mock data" ), in this way, front-end engineers do not have to wait for all the back-end APIs to be developed before they can start their own work, and realize the parallel development of front-end and back-end.

Testing phase:

In this stage, the process is fully automated. Developers submit code to the code server. The code server triggers continuous integration construction and testing. If the test passes, it will be automatically pushed to the simulation environment through the Ansible script;

In practice, for the online environment, the review process must be followed before it can be pushed to the production environment. Improve work efficiency, and control part of the online instability that may be caused by insufficient testing.

Development model

In the above delivery process, the three stages of development, testing, and deployment may all involve the control of code behavior. We also need to formulate relevant development models to ensure that multiple people can collaborate well.

  • Practice the "strangler mode":

Because the third-generation architecture spans a large span and faces the .net legacy system that cannot be modified, we adopt the strangling mode, adding a new proxy proxy microservice outside the legacy system, and controlling the upstream method in LB instead of directly modifying it. The original system will gradually replace the old system.

  • Development specification

Experience has shown that we need to make good use of the code version control system. I once encountered a development team. Because the branch was not standardized, the last minor version was launched and the code was turned into a few hours. In the end, the developers themselves did not know which branch to merge. .

Take Gitlab as an example. It supports the multi-branch code version very well. We need to use this feature to improve development efficiency. The picture above is our current branch management specification.

The most stable code is placed on the master branch. We do not submit code directly on the master branch. We can only perform code merging operations on this branch, such as merging the code of other branches to the master branch.

The code in our daily development needs to pull a develop branch from the master branch. This branch can be accessed by everyone, but under normal circumstances, we will not submit code directly on this branch. The code is also merged from other branches to develop. Branch up.

When we need to develop a feature, we need to pull a feature branch from the develop branch, such as feature-1 and feature-2, and develop specific features in parallel on these branches.

When the feature is developed, we decide that we need to release a certain version. At this time, we need to pull a release branch from the develop branch, such as release-1.0.0, and merge the features that need to be released from the related feature branch to the release branch. Then, the release branch will be pushed to the test environment, the test engineer will do functional testing on this branch, and the development engineer will modify the bug on this branch.

When the test engineer cannot find any bugs, we can deploy the release branch to the pre-release environment. After verifying again, there are no bugs. At this time, the release branch can be deployed to the production environment.

After the launch is complete, merge the code on the release branch into the develop branch and the master branch at the same time, and tag the master branch, such as v1.0.0.

When a bug is found in the production environment, we need to pull a hotfix branch (such as hotfix-1.0.1) from the corresponding tag (such as v1.0.0), and fix the bug on this branch. After the bug is completely fixed, the code on the hotfix branch needs to be merged into the develop branch and the master branch at the same time.

We also have requirements for the version number, the format is: xyz, where x is used to upgrade only when there is a major refactoring, y is used to upgrade when a new feature is released, and z is used to modify a bug only upgrade. For each microservice, we need to strictly follow the above development model to implement.

Microservice development system

We have described the structure, delivery process, and development model of the microservice team. Let's talk about the microservice development system below.

What is microservice architecture

Definition of Martin Flower:

In short, the microservice architectural style [1] is an approach to developing a single application as a suite of small services, each running in its own process and communicating with lightweight mechanisms, often an HTTP resource API.

These services are built around business capabilities and independently deployable by fully automated deployment machinery.

There is a bare minimum of centralized management of these services, which may be written in different programming languages and use different data storage technologies

Simply put, microservices are a design style in the software system architecture. It advocates dividing an originally independent system into multiple small services. These small services run in separate processes. The services are RESTful based on HTTP. Lightweight API for communication and collaboration.

Each microservice that is split is built around a certain or some highly coupled business in the system, and each service maintains its own data storage, business development, automated test cases, and independent deployment mechanism. Thanks to the lightweight communication mechanism, these microservices can be written in different languages.

Split granularity of microservices

How granular the microservices are divided into is more often a balance between granularity and the team needs to be found. The smaller the microservices, the more benefits the microservices independence will bring. But managing a large number of microservices will also be more complicated. Basically, the split needs to follow the following principles:

  • Single responsibility principle: that is, "gather things that change due to the same reason, and separate things that change due to different reasons". Determine the boundaries of microservices through this principle.

  • The principle of team autonomy: The larger the team, the higher the cost of communication and assistance. In practice, a team will not exceed 8 people. The full stack of the team is a full-function team.

  • First divide the database, then divide the service: Whether the data model can be completely separated determines whether the boundary function of the microservice is completely demarcated. In practice, we first discuss the data model boundary. The data model boundary maps the business boundary, and then from bottom to top Complete service split.

How to build a microservice architecture

In order to build a good microservice architecture, technology selection is a very important stage. Only by choosing the right "actors" can this drama be performed well.

We use Spring Cloud as the microservice development framework. Spring Boot has embedded Tomcat, which can directly run a jar package to publish microservices. In addition, it also provides a series of "out of the box" plug-ins.

For example: configuration center, service registration and discovery, fuses, routing, proxy, control bus, one-time token, global lock, leader election, distributed session, cluster status, etc., which can greatly improve our development efficiency.

Engineering structure specification

The above picture is the project composition structure that each service should have in our practice.

among them:

  1. Microservice name + service:

    Provide service calls for other internal microservices. The service name + api module is an interface specification defined between services, using swagger+rest interface definition. The service name + server module contains the application and configuration that can directly start the service.

  2. Microservice name + web:

    The entry for the upper-level web application request. The service generally calls the underlying microservice to complete the request.

API gateway practice

The API gateway serves as the access entry for all microservices and APIs on the backend, and performs auditing, flow control, monitoring, and billing on microservices and APIs. Commonly used API gateway solutions are:

  • Application layer scheme

    The most famous of course is Netflix's zuul, but this does not mean that this solution is the most suitable for you. For example, Netfilx uses AWS and has limited control over the infrastructure, so it has to do zuul in the application layer. If Considering the overall situation, this kind of plan is not the most suitable or the most effective plan.

    But if your team has limited control over the overall technical facilities, and the team structure is not perfect, the application layer solution may also be the best solution for you.

  • nginx + lua solution

    It is also the most suitable solution that we have adopted and considered. OpenResty and Kong are relatively mature options. However, Kong uses Postgres or Cassandra. Domestic companies estimate that there are not many choices of these two products, but Kong's HTTP API design is still very good.

  • Our solution

    Using the nginx+lua+consul combination scheme, although our team is mostly java, choosing zookeeper would be a more natural choice, but as a newcomer, after analyzing the stress test results, we finally chose to use consul.

    Good HTTP API support can dynamically manage upstreams, which also means that we can seamlessly implement service registration and discovery through the publishing platform or glue system, and be transparent to the service visitor.

In the above scheme:

Consul acts as a state storage or configuration center (mainly uses the KV storage function of consul); nginx acts as an API gateway, and dynamically distributes traffic to the configured upstreams node according to the relevant configuration of upstreams in consul;

nginx connects to the consul cluster according to the configuration items;

The started API or microservice instance, through manual/command line/release deployment platform, the instance information is registered/written into consul;

When nginx obtains the corresponding upstreams information update, it dynamically changes the upstreams distribution configuration inside nginx, thereby routing and distributing traffic to the corresponding API and microservice instance nodes;

After the above registration and discovery logic is solidified through scripts or a unified release and deployment platform, transparent service access and expansion can be realized.

Link monitoring practice

We found that log monitoring under a single application was very simple in the past, but it has become a big problem under the microservice architecture. If the business flow cannot be tracked and the problem cannot be located, we will spend a lot of time to find and locate the problem. In the interaction of microservices, we will be very passive. At this time, distributed link monitoring comes into being, and its core is the call chain.

Through a global ID, the same request distributed on each service node is connected in series to restore the original call relationship, track system problems, analyze call data, and count system indicators.

Distributed link tracking was first seen in a paper "dapper" published by Google in 2010.

So let's first take a look at what a call chain is. The call chain actually restores a distributed request to a call link. Explicitly check the invocation status of a distributed request on the backend, such as the time-consuming on each node, which machine the request is sent to, the request status of each service node, and so on.

It can reflect how many services and service levels have been experienced in a request (for example, your system A calls B, and B calls C, then the request level is 3). If you find that some requests have a level greater than 10, then this service is likely to need to optimize the
common solution are:

  • Pinpoint

    github address: GitHub-naver/pinpoint: Pinpoint is an open source APM (Application Performance Management) tool for large-scale distributed systems written in Java.

    Friends who are interested in APM should take a look at this open source project. This is an open source project from a Korean team. It uses the mechanism of JavaAgent to do bytecode code implantation (probe) to achieve the purpose of adding traceid and capturing performance data. .
    The performance analysis of tools such as NewRelic and Oneapm on the java platform is also a similar mechanism.

  • Zipkin

     OpenZipkin   A distributed tracing system github GitHub - openzipkin/zipkin: Zipkin is a distributed tracing system 

    This is open sourced by twitter, and it is also done with reference to Dapper's system.

    Zipkin's java application uses a component called Brave to collect data for internal performance analysis of the application.

    Brave's github address:

This component implements a series of java interceptors to track the invocation process of http/servlet requests and database access. Then, by adding these interceptors to configuration files such as spring, the performance data collection of java applications is completed.

  • CAT

    github address: GitHub-dianping/cat: Central Application Tracking

    This is open sourced by Dianping Dianping, and the implemented functions are still quite rich, and some companies in China are using it. However, the way CAT implements tracking is to hard-code some "buried points" in the code, which is intrusive.

    This has advantages and disadvantages. The advantage is that you can add points where you need it, which is more targeted; the disadvantage is that you must modify the existing system, and many development teams are unwilling.

    Among the first three tools, if you don't want to repeat the wheel, the order I recommend is Pinpoint >Zipkin >CAT. The reason is simple, that is, the intrusiveness of these three tools to the program source code and configuration files increases in sequence.

  • Our solution

    For microservices, we have extended the microservice architecture based on spring cloud. Based on the concept of Google Dapper, we designed a set of distributed tracking system (WeAPM) based on the microservice architecture.

As shown in the above figure, we can query the response log by parameters such as service name, time, log type, method name, exception level, and interface time consumption. In the obtained TrackID, the entire link log of the request can be queried, which provides great convenience for reproducing the problem and analyzing the log.

Circuit breaker practice

In the micro-service architecture, we split the system into individual micro-services, so that there may be call failures or delays due to network reasons or depending on the service itself, and these problems will directly lead to the caller s external services. There is a delay.

If the caller s requests continue to increase at this time, there will eventually be a backlog of tasks due to waiting for the failed relying party to respond, which will eventually lead to the paralysis of its own services. In order to solve this problem, the circuit breaker mode was created

We use Hystrix in practice to realize the function of the circuit breaker. Hystrix is one of Netflix's open source microservice framework suites. The framework aims to provide greater fault tolerance to delays and failures by controlling those nodes that access remote systems, services, and third-party libraries.

Hystrix has functions such as thread and signal isolation with fallback mechanism and circuit breaker function, request caching and request packaging, as well as monitoring and configuration.

The use process of the circuit breaker is as follows:

Enable circuit breaker

@SpringBootApplication @EnableCircuitBreaker public class Application { public static void main(String[] args) {, args); } } 

Alternative use

@Component public class StoreIntegration { @HystrixCommand(fallbackMethod = "defaultStores") public Object getStores(Map<String, Object> parameters) {//do stuff that might fail } public Object defaultStores(Map<String, Object> parameters) { return/* something useful */; } } 

Configuration file

Resource control practices

Talking about resource control, it is estimated that many small partners will contact docker. docker is indeed a very good solution for resource control. We also reviewed whether to use docker during our preliminary research, but we finally chose to give up and use linux. libcgroup script control, the reasons are as follows:

  • Docker is more suitable for large memory for resource control and containerization, but our online servers are generally about 32G, and using docker will waste resources.

  • Using docker will make the operation and maintenance complicated, and the pressure from the business will be great.

Why is there a cgroup?

There is often a requirement in the Linux system to limit the allocation of resources for a certain or certain processes. That is to say, the concept of a set of containers can be completed. In this container, there are allocated specific proportions of cpu time, IO time, and available memory size.

So the concept of cgroup appeared, cgroup is controller group, originally proposed by Google engineers, and later integrated into the Linux kernel, docker is also based on this.

libcgroup usage process:


yum install libcgroup 

Start service

service cgconfig start 

Configuration file template (take memory as an example):


Seeing that the memory subsystem is mounted under the directory/sys/fs/cgroup/memory, enter this directory to create a folder, and a control group is created.

mkdir test echo " ">> tasks(tasks test ) 

In this way, the current terminal process is added to the memory-limited cgroup.

Microservice architecture is a point that programmers must come into contact with during the advanced process. By the way, I would like to recommend an architecture exchange learning group: 650385180, which will share some video recordings recorded by senior architects: Spring, MyBatis, Netty source code analysis, the principles of high concurrency, high performance, distributed, microservice architecture, JVM performance optimization, and concurrent programming have become an essential knowledge system for architects. You can also receive free learning resources. I believe that for code friends who have already worked and encountered technical bottlenecks, there will be content you need in this group.


To sum up, this article starts from the background of our microservice practice, and introduces the working method of the microservice practice, technology selection, and some related microservice technologies.

Including: API gateway, registry, circuit breaker, etc. I believe these technologies will bring you some new ideas in practice.

Of course, the entire microservice practice road contains a lot of content, and it is impossible to include all of it in an article. If you are interested, you can put it forward in the message.