We, at Capitoline, have developed our own data centre management and operations methods largely based on our work with existing related standards, the Amsterdam Internet Exchange and other data centre audit customers.
We thought a good place to start establishing good practices would be to try and analyse why data centres go wrong and put in place practices which prevent these failures. Information on this has been published before but usually by manufacturers who have a specific interest in justifying a demand for their own products or sometimes by users such as Google who are not in a hurry to give much away about their own shortcomings. As a result information tends to be varied and with no common reporting terminology.
Over a sixty month period, 219 major failures were identified. This paper explains what caused them and how you can avoid the same happening to your facility.