A small pit that is enough to make the private cloud service crumble – talk about CMDB’s asset audit

1. Introduction
An introduction to the author
The author of this article is Wu Xiumin’s contact mode: autohomeops@autohome.com.cn, which is mainly responsible for the development of the asset management system and the configuration management system of the car home. Personal Blog http://pylixm.cc/
Team Introduction
We are the car home operation and maintenance team. It is the most core team in the automotive home technology department. It is composed of OP and dev. Our goal is to build a high performance, high scalability, low cost, stable and reliable website infrastructure platform for the auto home group. Team technology blog address is http://autohomeops.corpautohome.com.
Contact information
You can communicate with us via email or message from official technology blog.
Two. Preface
As the private cloud and the company’s automation systems deepen the CMDB data dependency, CMDB has become the basic data source for the maintenance of the company’s server. Once the CMDB data has a problem, it may result in an unexpected consequence.
The serious may cause the paralysis of the online business. At the beginning of our CMDB development, we have made the direction of “high accuracy, high availability, high automation”, and the core of this direction is “process control”.
And “process control” is a long-term construction process, at some time it may not be able to catch up with the development of the business process. This time, in order not to affect the smooth progress of the daily work, it is necessary to handle the manual process according to the prescribed standard process. If the time is long, it may cause the inaccurate data of the machine.
The accuracy of data has always been a major problem in the construction of CMDB. Next, let’s talk about our exploration of ensuring accuracy of data other than “process control”. Welcome to communicate.
Three. The problems we have encountered
With the construction of private cloud and automation systems, all kinds of data maintenance of CMDB have been processed automatically. But as mentioned above, “process control” and “business process development” are a game of interaction, and there will be opportunities for manual intervention. Among them, we encountered many problems. Several common problems were as follows:
Problem 1. the data of each asset field are inaccurate.
When a company’s private cloud platform grabs IP from CMDB, it is based on the computer room and business line to grab the pre allocated IP segment. When there is a problem in the business line and computer room, it will catch the error.
Previously, a colleague created a machine room in CMDB to divide the IP. When the machine room was created, there was no specified format, and no verification was added from the background.
The private cloud can not get the available IP when installed, and the process cannot go down. Such data is inaccurate, which is very dangerous. This is just a computer room naming. Once the private cloud is allocated the wrong IP, it is possible.
Covering the online server business is a serious problem.
Problem 2.. In an asset state, the empty field also has value, resulting in the data uniqueness error when other private cloud process flows into the library.
A colleague who has a business line applies for a cloud host. After passing the leaders’ instructions at all levels, he has not received the result email of the cloud host application. Then contact operation and maintenance view, operation and maintenance, and contact cloud host manager.
The administrator contacted the cloud host developer again, and the developer found that the machine went wrong when it entered the CMDB asset library automatically and found us. After checking through logs, we found that a server was offline.
IP did not empty, resulting in the data storage times only error. After multiple system developers’ joint investigation, the data problem was finally discovered. This problem is time consuming and unforeseeable.
Problem 3., because there is no private cloud online job list and manually modify the state, causing the time is wrong, statistics on-line assets when the data is not accurate.
When analyzing CMDB data, it depends on the time of various events. However, when people modify data, they may not modify the change time. So when we categorize the asset data, and
Private cloud job data is not correct. Further verification is needed. At this point, it’s crumble.
Four. Our audit scheme
4.1 overview
Based on the above questions, we have read a lot of data, and there is very little information about the self audit of assets.
In accordance with our own problems, we have developed a self audit plan with the core of “disk”, “trial” and “punishment” to ensure the accuracy of CMDB data.
4.2 disk – log back, external check
4.2.1 record data source
Our CMDB is built on Django. We rewrite the model of Django, and record the change log when the data changes.
Save the data before and after the change, as the data source for future calculation.
4.2.2 back calculation
With the most detailed data change logs of the asset, so long as we traverse the change log of the data, we can know the status information of the assets at any time of history.
For example, we have to calculate the number of machines applied to a business line last month. As long as we traverse the data change log of last month’s CMDB, we will accumulate the business line and the state changed records simultaneously.
It’s the data we need.
4.2.3 external disk state
According to the regression processing, some aggregated data can be obtained. We can also get some collate data from the external system such as the work order system. According to these 2 data we can judge
Is there any mistake in CMDB’s data record? Is it multiple assets or less assets?
The flow of the external disk state is as follows:
4.3 trial – periodic review, self revision
In addition to inventory, we also customize the self censorship background tasks. Every day, we will check the accuracy of CMDB assets and whether they are empty or not.
The assets are blacklisted and sent to the relevant operation and maintenance personnel in the form of mail, which reminds him that there is something wrong with this asset and needs to be corrected. First, we should discover the problem and gain the initiative by calling the system externally.
Mail style:
In addition to mail, we also developed a blacklist verification function – “blacklist” to urge operators to do data correction. The operation and maintenance of the modified assets in the “list of blacklists” to do the confirmation operation, and other second mail in the afternoon will publish the process of revision, so that everyone supervision and supervision of the role of each other.
4.4 penalty – division of responsibility and implementation to people
Execution is also a great guarantee for data accuracy. Above, we find the problem assets through automated inventory, and timely and effective correction of data is also a big problem.
Rules and regulations of 4.4.1 visualization
In order to solve the “executive force” problem, we have set up regulations and regulations, and visualize it in the form of assets in the form of CMDB, for each operation and maintenance to view learning.
The page prototype is as follows:
4.4.2 combines blacklists and regulations
After the blacklist and regulations are visualized, there are provisions to follow. When there is a problem in the asset data, the operation and maintenance time is given to self – correction, such as the data error still exists, and can carry on the small punishment in varying degrees.
The page prototype is as follows:
Flow chart of the award and Punishment Ordinance event:
Five. Summary of experience
In the whole construction process of CMDB, the accuracy of data has always been a big problem. Some of our experiences in this direction are summarized as follows:
Divide the role of responsibility into people and reduce the trouble of wrangling. What is the problem and the person who looks for the role.
In the process of private cloud job flow control, data recording time of each asset must be recorded in detail, so as to meet various statistical requirements.
Separate the super administrator from the developer, liberate the developers, avoid excessive checking of the wrong data and delay normal development.
Six. Future Roadmap
Asset locking processing based on blacklist
Self correction of partial problem assets
Seven. Reference materials
CMDB understanding
Blue whale
Excellent cloud software

Leave a Reply

Your email address will not be published. Required fields are marked *