Dear lazyweb: Hyperic, Zenoss?


Sysadmins on the lazyweb: I’ve used Nagios for years, accompanied by either a homebrew trending/graphing package or Munin. Recently I’ve had a few people draw my attention to Hyperic, and from there I’ve been looking at Zenoss Core as well.

If any of you have experience with Hyperic or Zenoss, and especially if you’ve left Nagios for either, I’d love to hear what you think, whether it be a sales pitch or a warning.


27 responses to “Dear lazyweb: Hyperic, Zenoss?”

  1. Sales pitch: http://www.opennms.org
    That’s all the sales you get because that’s all the sales we have (grin). We have integration with Hyperic and we have users moving from Nagios to OpenNMS almost everyday in the community. And, believe it or not, because of our support for Nagios, we have OpenNMS users that check out Nagios (check scripts, NSClient, and NRPE) for the first time, too. BTW: I don’t think you can go wrong with either Hyperic or Zenoss. Just thought I’d send you the sales pitch for OpenNMS since you asked (grin). Follow the quick start guides for easy installation of 1.5.90 using apt-get or yum.

  2. Sales pitch as well: http://zenoss.com
    I’m the Community Manager for Zenoss, so take my comments with the appropriate grain of salt. Depending on your needs, the major differences between Zenoss and Hyperic are remote monitoring vs. agent-based monitoring. Hyperic requires a central server with agents installed on the machines to monitor. Zenoss uses protocols such as SNMP, SSH, Syslog and WMI to remotely monitor machines without a footprint on the machine. Hyperic is Java-based, Zenoss is Python. Both have facilities for moving some of your Nagios investment to the new platforms. Both are GPLv2 and have sizable communities and install bases. All that said, I’m going to push Zenoss for our flexibility and extensibility, we have a lot of users doing some really interesting extensions and customizations. We also have a 2.2 beta that is firming up with improved installers, documentation and bug fixes. Give it a try (we have VMWare images for testing) and let me know if you have any questions.

    Thanks,
    Matt Ray
    Zenoss Community Manager
    mray@zenoss.com

  3. I am not a huge fan of Nagios. It is a good tool sure but the configuration is rough, it lacks some functionality I want, and overall I think there are better ways of doing things. I’ve not yet found a suitable replacement, however. Keep us posted. ;-)

  4. Argh. Well… I guess I’ll join the party. I usually like community to answer community questions – somehow, I think that is just karmically right. But since everyone else is giving the sales pitch, we might as well too!

    Sales pitch, part 3: http://www.hyperic.com
    Basically most of the points that Matt had are true – especially the difference with agent based vs remote based. Basically I would describe agentless monitoring (like Nagios) as great for service checking (think red light-green light, its up or its down). Hyperic can do that too (you have to for network devices!), but really we shine at the systems and applications management layer. So, we have an agent which means natively we are discovering all the processes running on the machine, importing them into inventory, can collect config changes, log data and all kinds of system events and show them in context with the litany of performance metrics. We can also run control actions (stop, restart, vacuum, garbage collection, etc) and run remote diagnostics (top, ifconfig, etc). This is where monitoring really turns to management. We also provide an easy way to organize your clusters, groups and applications the way you see fit. This management piece and organization piece are all reasons for agent-based, and are the reason why we usually are used for systems and applications management. For agentless network management, Dave is right to send their sales pitch – we see a lot of Hyperic-OpenNMS pairs out there in the community. I’d imagine there are Hyperic-Zenoss pairs too, but I haven’t seen any paid-customer versions of this yet and the community hasn’t fessed up yet ;-)
    Like all the products we’re really extensible too- from a managed product perspective we support over 70 of the industry’s favorites out of the box, and our community has built somewhere near 2000 more (unfortunately… not nearly that number have been shared – lots of them are mywebapp style plugins… but there are a lot on our HyperFORGE). And also, we have a plugin-based UI framework now with HQU, which uses Groovy to abstract the complex Java code, and make it easy to create and add new screens to a live deployment (hint: check out the HyperCAST webinar on this where we add a JIRA integration from start to finish in 9 minutes) or drive Hyperic through scripts or web services. This is how we did the OpenNMS integration in fact.

    Good luck and I hope to see you out on our forums!

  5. Beyond the sales pitch. I have tested several Open Source Monitoring platforms in order to select one for our company. The ones I played with: Nagios (Oreon/Centreon and Groundwork), OpenNMS, Zabbix, Zenoss, Hyperic, Pandora FMS and several others.

    Although I like Hyperic a lot, their Open Source offering is not adequate for a multi-role environment due to limitations and so is GroundWork Open Source. Many consider these products as castrated and only a tease. We are using several community editions of apps like Nessus and Vyatta routers and after usuing then for a while we decided to pay for Nessus fee and Vyatta support (we are not free loaders but think if you have a community version please make it real and usefull). There are several full feature top of the line monitoring solutions were you can get the support if you want to and not obligated: Centreon (Nagios), OpsView (Nagios), Pandora FMS, Zabbix, OpenNMS, etc. Trust me, if you deploy any of them in a real production environment you will happily pay the subscription for support to have help right away and peace of mind. Once you start using any of them, they become central to your operations and management and above will expect them to be up as any of your crucial apps.

    Thankfully Zenoss in their new version 2.2 released Active Directory/LDAP and role functionality so their Core product is usable in a production environment with several admins. Zenoss is suppossed to be agentless but unless you get the Enterprise edition, you can’t get enough information from SNMP from Windows devices. With the Core version you will need a third party agent (free) so it really is like installing an agent at least for Windows clients. It is also a resources hog, it will consume your memory and paging like it is St. Patrick’s Day. It needs a lot of definition but once it is configured it is very nice.

    Pandora FMS is very similar to Hyperic in their vision for monitoring. They describe themselves as Application and Network monitoring. They use an agent and like Hyperic, they can go beyond what Nagios or Zenoss can monitor. It is very easy to manage a divice, stocks, environment, you name it. The community edition is full featured and support is optional. They also make Babel Enterprise, an app Security Information Management. Babel along with OSSIM and Prelude LML are one of the most prolific Open Source Security Management projects today.

    Zabbix 1.6 is in Beta and it looks very promising. I like the way you can configure almost anything using triggers and their graphics rocks. It uses an agent in order to mine the info. The new version really have a lot of improvements over the 1.4 stable release. ALL features of Zabbix are available in the community version and support is optional.

    At the moment of this post the new version of (Nagios) Centreon 2 is in beta but if you look at the demo in their site you will see what Nagios “been all that it can be”. From functionality to a well though out interface, this should be the standard for what other Nagios based solutions your be measured. Completely Open and Free and support is optional. The bad, almost all documentation and users who post for problems and resolutions are in French. This fact have been the only obstacle for us not to consider it for our network.

    All this full featured solutions are great and selecting one will depend on your monitoring goals and network situation. You can try all of them as Virtual Machines to get an idea of their strengths and weaknesses.

  6. Hi,

    Did you evaluate Zabbix/Zenoss/OpenNMS/… ? Or started to use one of them?
    If so, I’m curious to know that is your favorite NMS now.

    thanks
    Vitaly

  7. Vitaly: Not yet, no. Still nagios and munin. They’re working, so looking into more NMS-type alternatives is pretty low priority (especially since we’re doing PCI compliance right now, which makes everything else pretty low priority!)

  8. I have been struggling with the various tools for years. Nagios still seems to be the most useful for a small enterprise, even with its horrible configuration. Coupled with Munin (well, “coupled” may be a bit optimistic) it is usable, if not wonderful.

    I have attempted several times to look at the videos on the Hyperic website and they lock up IE, Firefox and any other browsers I have used. Now their website may not reflect the software, but a failure there sure makes one think.

    Zabbix seems to be coming along — I have evaluated it several times since inception. I just wish it would have embraced RRD.

    I was really excited about Open-NMS until I tried to get Java to behave. I really hope Sun’s exposure of Java to the OS community helps turn the “Design once, run nowhere” can of worms into something useful. (No flames please — I really want Java to succeed, but the pain is intense.)

  9. John,

    Feedback on your Java issues are important to see that progression. We’re working with the Fedora community to help resolve some of these issues with OpenJDK. I will say, that ever since we reworked the packaging for RPM (yum) and DEB (apt), the install has seemed quite painless on mainstream distros.

    David
    The OpenNMS Group

  10. I’ll pipe in too.

    I’m in the midst of consolidating the event monitoring from Nagios and Big Brother. The Big Brother and Nagios were inherited from past admins and while it can work in my environment (600+ servers) they are a serious pain to administer if you also have other roles to fill. Also the somewhat cryptic nature of their knowledge bases are problematic for quick turn-around times.

    I have been a big Nagios fan from the past but as I move into larger and more complex implementations, critical elements like object and event management really start to drag down the return on effort from that particular system.

    So far our Zenoss migration is going fairly well and I am phasing hosts out of the two other systems. We will likely go to the enterprise model just so I can get some quicker resolution to ‘odd’ behavior items (like right now, when certain events occur, a select group of zenoss users are paged with no clear rhyme or reason why…).

    The ability to manage the objects and their related properties and methods that zenoss offers is a great model. OpenNMS does this as well but it more java based and I am loathe to go in that direction as debugging java behavior is not one of my favorite activities.

    I’ll post more about this on my own blog as I move forward.

  11. I’m also looking for a new NMS systen to replace a mixture of cacti and What’s up Gold. I tried zabbix in the past and will be evaluating again soon, Zenoss I didn’t like much. I tried the VM of OpenNMS and liked what I saw so I tried installing it on our Debian etch server (followed all the docs) but it fails with a dependency error (version to be installed is too new) and so far I’ve not had a response to my forum post about it which is at least 3-4 weeks old :-( hence the reason for trying Zabbix again.

    Good Luck, hopefully you will get less adverts and more real usage reports from users.

    Wayne

  12. Wayne,

    We have all been pretty busy at the developer’s conference the past week and the week prior… preparing. Also, I see that you posted to the install list… which is correct, but I typically hang out on the discussion list even though since I’ve been out the past 3 weeks I’m about 500 messages behind.

    I’m glad that the VM worked out and looked interesting to you. My reply basically states to configure your apt sources.list to use the “unstable” release (soon to be 1.6-stable).

    deb http://debian.opennms.org unstable main
    deb-src http://debian.opennms.org unstable main

    After that, thanks to the magic of packaging, you should be able to just apt-get install and be on your way to enjoying OpenNMS.

  13. Update to my post on Jul 11 2008.

    I had been testing Zenoss v2.2 (Installer) a lot on our network and have had a bunch of crazy behavior like the other users and even with authentication of LDAP on a CentOS 5.2 64x. The guy is slow and hangs sometimes when trying to List for Devices. I’m running a Dell 2950 4 Xeon Procesors and 4Gb. Still a lot of paging with only 30 devices. I even got Michael Badger’s book “Zenoss Core Network and System Monitoring” but I think I’m finally done with Zenoss.

    Tested OpsView v2.12 (http://www.opsview.org) again and like it more than before. Once you have played with Nagios in its basic form and then find an implementation with easy configuration, distributed capabilities and great pre-made reports, it is hard not to like it. Only caveat is that it stills runs on Nagios 2. Hopefully their v3 will be as good as this one. You can get a couple of movies of the product in action at their site.

    More info on Centreon2 (incredible Nagios 3 implementation), it has hit RC1 status. Also Zabbix announced their 1.6 release for October.

  14. I’ve spent time over the last couple of months building a comparison document between Nagios, OpenNMS and Zenoss. It is available in draft form at http://www.skills-1st.co.uk/papers/jane/open_source_mgmt_options.pdf .

    It is designed for 2 audiences – those who want to compare product bullets based on requirements criteria, and those who want a fairly in-depth look at each of the 3 main products.

    As I said, this is DRAFT – comments would be welcomed.

    Cheers,
    Jane

  15. Hi there!
    I found the comparison document really interesting, but haven’t you heard about Osmius?

    Osmius is a Open Source Monitoring Tool prepared to monitor almost anything connected to a network. They have an Agent Development Framework, and the engine is build in C++ over ACE (so is real multiplatform and very, vaery fast).
    The business view is already integrated using SLA and services in an easy way to understand.

    There’s a lot of work to do and it would be nice to receive your impressions and improvements.

  16. I have been using OpenNMS to monitor my clients’ routers and public services – SMTP, FTP, POP, HTTP, HTTPS etc. as well as our internal systems for about 8 months.

    It works great for this up/down type of scenario – simple to set up and get the services monitored using discovery.

    The alerting and escalation works for us as well.

    The downside is that OpenNMS is not multi-tenant (yet), so I can’t use it to monitor their internal networks and have all the data available on one web console.

    I tried Zabbix last year and dropped it. It was difficult to set up and I prefer the agent-less model.

    That was Zabbix 1.2 I think; it looks like they are at 1.6 now. They say they improved installation and their distributed monitoring model so it may time for another look.

  17. Jane Cury, good paper and podcast. I agree with your conclusion: Zenoss is a great tool only if they were more carefull with QA and testing for releases. I been testing v2.2.4 and it is working fine specially for Windows devices.

    Still problems with Google Maps, Locations, etc., due to coding of relations from Zope “non-relational” object DB. Hands down easiest setup of all the tools with the installer and modeling of Windows serves compared to any other OS solution.

  18. Updates 1st:

    — Zabbix 1.6 is out, had some problems with the install on CentOS 5.3 and will try again this next weak. So a demo and it looks great. Will let you know after my own test.

    — OpsView announced in its list it is releasing an update this coming Monday and announcing their V3 Road Map (based on Nagios 3). They also anounced they were acquired by Opsera.com.

    Joselus, thanks for the info on Osmius. Wow an offering based on logical thinking and integration of business needs as well as technical needs, Refreshing!!! Installed and tested Osmius, very impressed by it. I will had kill for Zenoss to have at least a “basic” online capability of generating “printable” and HTML reports for Mangement and Decision Makers for their Zenoss Core like Osmius.

    Osmius is awesome and the install is a breeze with the universal installer. The only thing I do not like is the fact that you need to do to much work to compile and setup the agents for the devices you will monitor. Peopleware should create an agent with setup for Windows and another for Linux like others has done for easy deployment.

    If you are looking to monitor a lot of Windows servers then I will use Zenoss for now instead of Osmius due to their CMDB and easy of deployment of SNMP Informant.

    Oh yes, besides what you read on Jane’s comparison paper or any other place about monitoring Agent-Less with Zenoss, it is not so for Windows. Zenoss recommends the free basic edition of SNMP Informant, since Zenoss can gather info natively from it like others do with their Agents. And it works great at getting all kind of WMI data from Windows!!!

  19. Jane Cury, thanks for the sharing your research – I appreciate how you provided it in the Open Source spirit. I learned quite a bit from it.

    I am curious if anyone else sees multi-tenancy as a big plus or minus for NMS systems.

    I’d like to hear more experiences with how these products compare with regards to distributed network monitoring and their usefulness in an MSP scenario.

    I didn’t consider Zenoss because of their business model of offering a stripped down product and then paid versions. I’d rather get the complete product and pay for expert support. I’d need to purchase the Zennoss Enterprise edition to get the Distributed Collector Package. I may as well buy a fully commercial product.

    Couple of questions: Is there a reason (other than its impossible to review all products) that Zabbix didn’t make the list – or get mentioned?
    In your experience Is multi-tenancy not generally considered an important feature?

    Thank you.

  20. David Hustace,
    From reading the OpenNMS forums I know multi-tenancy is asked about for OpenNMS and that it is a major change to implement it. I thought I read it was on the list to be released this year.

    Is there an update on that schedule?

    I really like the OpenNMS product – thanks for all the hard work!

  21. I’d like to throw one more system into the mix – http://www.logicmonitor.com/services/logicmonitor-hosted-monitoring-service/

    Again, I work there, so
    How are we different?
    – LogicMonitor is a hosted software-as-a-service, so startup time is trivial, always on latest features, no hardware cost, no install time, etc. (There is an agent that runs on a linux or windows host in your datacenter, but it basically acts as a clever proxy, and only makes outgoing SSL connections, so trivial to set up.)
    – we focus on datacenter monitoring, more than internal IT. So we have fabulous monitoring for things like Netscalers, NetApps, sql server, windows, linux, routers, etc. We do not (at the moment at least) cover printers, etc.
    – monitoring for all datacenter devices is ‘out of the box’ (or off the web). Just enter the hostname, and all attributes will be found (all VIPs on a load balancer; all volumes on a NetApp, etc.) And kept up to date with no configuration.
    – Eliminating configuration is a big focus. Any configuration that does need doing (for credentials, escalation policies, threshold, etc) is able to be done globally, on a group, host or instance (e.g. individual drive) level.
    – there is a default monitor assumption, appropriate for datacenters. e.g. on a server running a database, typically about 100 datapoints will be plotted and/or alerted on, with no configuration. Intelligent thresholds that combine multiple datapoints mean there is not alert overload. Also intelligent filtering on discovery saves lots of time. (e.g. discover all volumes with ‘QA’ in their name into a different collection, with different thresholds and escalations, automatically, now and in future.)
    – it’s designed to report and monitor data most meangingful in datacenter environments. e.g. per Volume read and write latency and IO operations for NetApps; per Virtual IP requests and traffic on load balancers, etc.

    So it’s not free, but LogicMonitor will save you a lot of time over setting up open source systems, especially if you have a lot of changes in your systems and would have to keep modifying the monitoring.
    And one nice thing about being hosted is that there is no expensive committment- you can elect to just monitor load balancers and storage arrays, say.