http://ifttt.com/images/no_image_card.png
I started a new job a few weeks ago, and I'm now at a point where I'm investigating monitoring options. At past jobs I used Nagios, which I know will work, but I would like to look into other more modern tools. I am aware that #monitoringsucks, and I am pretty sure people have hashed these topics before, but here are some of the things I want from a modern monitoring tool:
I also asked a question on Twitter about what monitoring tool people would recommend, and here are some of the tools mentioned in the replies:
Several people told me to look into Sensu, and a quick browsing of the home page tells me it would be worth giving it a whirl. So I think I'll do that next. Stay tuned, and also please leave a comment if you know of other tools that might fit the profile I am looking for.
Update: more tools mentioned in comments or on Twitter after I posted a link to this blog post:
via http://agiletesting.blogspot.com/2012/09/what-i-want-in-monitoring-tool.html
I started a new job a few weeks ago, and I'm now at a point where I'm investigating monitoring options. At past jobs I used Nagios, which I know will work, but I would like to look into other more modern tools. I am aware that #monitoringsucks, and I am pretty sure people have hashed these topics before, but here are some of the things I want from a modern monitoring tool:
- Ideally open source, of if not affordable per host per month pricing (we already signed up as a paying customer of Boundary for example)
- Installation and configuration should be easily scriptable
- server installation, as well as addition/modification of clients should be easily automated so it can be done with Puppet/Chef
- API would be ideal
- Robust notifications/alerting rules
- escalations
- service dependencies
- event handler scripts
- alerts based on subsets of hosts/services
- for example alert me only when 2+ servers of the same type are down
- Out-of-the-box plugins
- database-specific checks for example
- Scalability
- the monitoring server shouldn't become a bottleneck as more clients are added
- nagios is OK with 100-200 clients (with passive checks)
- hierarchy of servers should be supported
- agent-based clients
- Reporting/dashboards
- Hosts/services status dashboards
- Downtime/outages dashboards
- Latency (for HTTP checks)
- Resource graphing would be great
- but in my experience very few tools do both alerting and resource/metrics graphing well
- in the past I used Nagios for alerting and Ganglia/Graphite for graphing
- Integration with other tools
- Send events to graphing tools (Graphite), alerting tools (PagerDuty), notification mechanisms (irc, Campfire), logging tools (Graylog2)
I also asked a question on Twitter about what monitoring tool people would recommend, and here are some of the tools mentioned in the replies:
- Sensu
- OpenNMS
- Icinga
- Zenoss
- Riemann
- Ganglia
- Datadog
Several people told me to look into Sensu, and a quick browsing of the home page tells me it would be worth giving it a whirl. So I think I'll do that next. Stay tuned, and also please leave a comment if you know of other tools that might fit the profile I am looking for.
Update: more tools mentioned in comments or on Twitter after I posted a link to this blog post:
- New Relic (which I am actually in the process of evaluating, having paid for 1 host)
- Circonus
- Zabbix
- Server Density
- Librato
- Comostas
- OpsView
- Shinken
- PRTG
- NetXMS
- Tracelytics
via http://agiletesting.blogspot.com/2012/09/what-i-want-in-monitoring-tool.html
0 comments:
Post a Comment