Having gone through all the pain of polishing an OpenStack installation, I can see why it might seem like a better idea to grab a couple of beers in a local bar than spend additional hours fighting Nagios or Cacti. In the world of bare metal these tools seem to do the job—they warn about abnormal activities and provide insight into trend graphs.

With the advent of the cloud, however, proper monitoring has started getting out of hand for many IT teams. On top of the relatively predictable hardware, a complicated pile of software called “cloud middleware” has been dropped in our laps. Huge numbers of virtual resources (VMs, networks, storage) are coming and going at a rapid pace, causing unpredictable workload peaks.

Proper methods of dealing with this mess are in their infancy – and could even be worth launching a startup company around. This post is an attempt to shed some light on the various aspects of cloud monitoring, from the hardware to the user’s cloud ecosystem.