Cross-city automatic migration of a large-scale cluster of tens of thousands of nodes in the goose factory (below)

Cross-city automatic migration of a large-scale cluster of tens of thousands of nodes in the goose factory (below)

Continued from the previous article: Cross-city automatic migration of a large-scale cluster of tens of thousands of nodes in the goose factory (part 1)

Migration strategy of cross-city migration platform

Just now we were talking about models. There is a set of models to guide us in the migration. After we have a model, we need a platform to support this model.

Relations start from nothing and aggregate the most basic relations to form a chain of relations. Find the core node of the relationship chain, split the relationship chain, and merge the small relationship chain into a larger relationship chain. The migration platform has a module specifically responsible for this type of operation of the relationship chain.

The other module is the migration module of the relationship chain, which is how to move the relationship chain that has been divided from one city to another. It involves data migration, task switching, ordinary table upgrades, double writing tables, dependent tasks and synchronization tasks. Treatment.

In addition, there is a module for platform guarantee, data verification, task verification and cross-city traffic monitoring.

Relationship chain migration module

The migration model solves one thing, which is to build our relationship chain from scratch, and then split the relationship chain from large to small, to a moderately scaled relationship chain that is suitable for migration.

Relationship chain migration is to solve another problem. A relationship chain contains tasks and data, and their status will change during the migration process. For example, data is still being written, and TDW's data is constantly changing every day. The task may also be running and not over.

In other words, the relationship chain is not a static state, it is dynamically changing. The relationship chain migration is to ensure that the migration of data and tasks is accurate in a dynamically changing environment.

(The state transition of the relational chain migration module)

When the relationship chain is migrated, it will first look at what double-write tables are in the relationship chain. The double-write tables will be processed first, and then the migration of other data will be processed. When data is migrated, first expand by table partition and migrate by partition, which can speed up the migration.

The user will be notified before the start of the migration. The user can leave it alone, but he needs to be notified. If any abnormality is encountered, the user can analyze whether it is caused by this change.

After notifying the user, the partitions of all tables will be expanded, and the expanded partition will be migrated, that is, distcp. When distcp reaches a certain progress, it will freeze the task. The tasks in the relationship chain may have state changes, and freezing operations can transform them into an immutable state.

Freeze the writing task of the data. The writing task can be found through the relational chain. If there is no relational chain, this freezing operation cannot be implemented. The migration process does not allow tasks to perform write operations, because write operations will make it difficult to ensure data consistency.

After freezing the task, it enters a state of waiting for the data to be consistent. In this state, the data differences between the two cities will be continuously compared.

For example, when adding new data, if the data is different, data synchronization will be done again until the data is completely consistent and the data migration is completed when it enters a consistent state. After the task is migrated, the task can be unfrozen after the task is migrated, and the migration of the entire relationship chain is completed.

The most important thing here is that there is a logic for freezing tasks to ensure that when we migrate data, there is a period of time that no tasks are modifying data. The shorter the time for freezing the task, the better, which requires data migration as fast as possible. Tables in the data warehouse occupy large or small space. For large tables, when directly performing table-level distcp, the number of maps is usually very small. At this time, concurrency and speed cannot be increased. Therefore, for large tables, partition-level distcp is required. The opposite is true for small tables, partition-level distcp is often a waste.

In addition, the migration of the relationship chain needs to support concurrency, and the resources consumed in the migration process of the relationship chain are different. Sometimes the network traffic is large and sometimes the network traffic is small. Concurrent migration of relational chains can circumvent this situation and achieve a migration close to full load traffic.

Platform Assurance Module

The platform guarantee module includes two major guarantees: one is basic guarantee, and the other is monitoring guarantee.

1. The basic guarantee has done two things, one is data verification, the data on both sides of the migration needs to be verified, and the other is the sample rerun of the task. We introduce an idea, draw a vertical path in the relationship chain, and redo the task all the way from the root node.

A relationship chain may have dozens of nodes, but only four or five nodes after sampling. Re-run the sampled nodes and compare the data after re-run to see if they are consistent. This ensures that the data is accurate and the task is accurate.

2. Monitoring guarantee

Several things have been done. One is the monitoring of the data volume. After the migration is completed, the fluctuation of the data volume is monitored to see if the data volume fluctuates significantly from the previous one. In addition, all migrated tasks will be monitored to see if they are running properly in the new city.

Finally, there is abnormal monitoring of traffic. The data and tasks are verified successfully, and the migration is also successful. Data has been migrated from one city as a whole to another, and tasks have also been switched over. The last thing to consider is whether there is an abnormal situation that will cause our cross-city traffic to increase abnormally.

We have a flow monitoring mechanism to solve the abnormal flow caused by some exceptions. By strengthening the abnormal monitoring of the flow, and realizing the automatic switching of tasks. Collect all the tasks that are running and the data they access every five minutes.

If a task is found to be running in the computing cluster of city A, but it accesses the data of city B, this abnormal situation will be monitored. When the data traversal traffic is too high, the task will be automatically killed and the task switch will be automatically performed at the same time.

Migration strategy of cross-city migration platform

We have introduced the migration model and migration platform. We have a set of models to solve the difficulties that our operations face in cross-city migration, and then we also have a platform to support the logic of this relationship chain migration. Next, I want to talk about our migration strategy.

Migrate cluster independent deployment

Migrating clusters need to be deployed independently. The biggest workload for migration is data migration. There is a lot of data that needs to be synchronized from one place to another. The way of data migration is to do distcp, which is an MR task, which consumes computing resources. The migration cluster is best to be independent, so that it will not affect the normal scheduling tasks.

The biggest feature of the migration cluster is that the network traffic will run very high, because it only does one thing. It pulls data from the source cluster and writes it to the target cluster. When you observe the network traffic of the migration cluster, you will find that it runs. At that time, the incoming and outgoing traffic is the same, and it is a cluster with high network traffic consumption.

It has a feature, it is a cluster with low CPU consumption. Using it as a migration cluster alone can achieve two characteristic configurations, one is a model with a high network configuration, such as a model with a 10 Gigabit network card. The other is to use low-memory task configuration and high-concurrency configuration for the computing nodes of the migrated cluster.

This can greatly increase the migration speed while minimizing the amount of equipment required to migrate the cluster. We use a migration cluster of 40 machines to support 1P migration traffic.

Migration flow control

Migration traffic control. The interesting part of this picture is that when we do data migration, the source cluster (city A HDFS cluster), the migration computing cluster, the target cluster (city B HDFS cluster), the traffic of these three clusters Have a very interesting relationship.

The source cluster is a large cluster, usually with thousands of machines; the migration computing cluster is much less, perhaps only a few dozen. When the migration computing cluster is full, the network consumption of the source cluster can be calculated, and the network consumption of the target cluster can also be calculated.

For example, if the migration computing cluster has a scale of 40 devices, when the network of these 40 devices is full, if the source cluster has only 400 devices, the network will be full. When the network is full, there will be tasks influences.

Another interesting thing is that the traffic of the target cluster is larger than the traffic of the migration cluster. The reason is that there are multiple copies of Hadoop when writing data, which will double the traffic of the target cluster. When we use the migration cluster, we can manage the resource pool of the migration cluster and limit the size of its resource pool, that is, limit the maximum number of concurrent migrations, so as to control the migration traffic.

Cluster synchronization tasks

Let's talk about synchronization tasks. The effect of synchronization tasks on traffic will be relatively small. Because the direction of synchronization tasks and the direction of migration are opposite, the migration direction is from city A to city B, and the synchronization tasks are reversed, so the traffic is very small.

We try to minimize the impact of synchronization tasks on the business. It is recommended to use an independent synchronization task resource pool. This resource pool can be larger to allow synchronization tasks to be completed quickly without affecting other tasks.

HDFS cluster's shrinking and expanding strategy

Finally, in the shrinking and expansion strategy of HDFS clusters, when the cluster is shrinking, priority should be given to the overall offline of the cluster, and data cleaning and small file merging must be performed before shrinking.

In addition, the equipment is moved in batches during the relocation, for example, 200 machines are relocated in each round. When these 200 units are expanded to the target cluster, the newly expanded nodes will not participate in the calculation for a period of time. Because Hadoop's balance mechanism will cause the network traffic of newly expanded machines to be full, which directly affects the speed of computing tasks.