3 posts categorized "jgroups"

08/30/2010

Troubleshooting JGroups & Multicast IP

Although the jgroups manual implies that you should start looking for a new job if you can not figure out what is wrong with jgroups and multicast ip, I find that in the real world, we systems administrators support many different types of "clusters".  We've got Oracle RAC clusters, GFS Clusters, VCS Clusters, Microsoft clusters, hardware clusters, load-balancing clusters with mod_jk or mod_proxy, etc.  A JGroups cluster is simply one of the many different types of clusters we support and its default configuration uses multicast IP, which is not something we run into every day.

I've written a couple of short posts already on jgroups problems (see JBoss: Clustered Node Startup Failures & JBoss:  Overlooked Solution for JGroups-Related Startup Errors) but I thought a short troubleshooting guide might be helpful.  This post will serve to consolidate the two previously mentioned posts plus add some additional recently discovered information.  The howto leans more towards running jgroups services on a unix-like platform running jboss but, for the most part, the information in it applies to Windows as well.  I've not had much of an opportunity with jgroups over a WAN or across multiple VLANs so if you are doing that and having problems, this post may not be all that helpful but if you are having problems with a jgroups cluster over a Cisco LAN and all your servers are on the same network segment (a fairly common deployment configuration), this howto is for you!

Let me know if you've come across other causes of multicast IP failures with jboss/jgroups and I'll add them to the HOWTO.

Here is the link to the new doc =>  HOWTO: Troubleshoot JGroups and Multicast IP Issues

09/04/2009

JBoss: Overlooked Solution for JGroups-Related Startup Errors

I came across this a little while ago and the fix for it became so obvious, I feel really bad for not noticing it in the first place.  In JBoss, if your server will not start up and you are participating in a cluster and you see the following message in your jboss server.log:

UDP.createSockets(): cannot list on any port in range 0-1

Check to see if the IP address bound to your network interface card matches the IP address recorded for your hostname in /etc/hosts.  It probably does not.  Once you fix them, JBoss should start up fine.

06/18/2009

JBoss: Clustered Node Startup Failures

As far as I can tell, JBoss clustering is based on functionality provided by another JBoss project called JGroups. We recently ran into an issue where half of our six identically configured application servers would simply not start. As the servers were all generated from the same base image, server configuration was not thought to be the culprit. All nodes were on the same subnet so we were a bit puzzled. In the logs on the servers, we same messages that looked like the following:

ERROR [org.jgroups.protocols.pbcast.GMS] [some_host:some_port] received view <= current view; discarding it (current vid: [some_host:some_port|4], new vid: [some_host:some_port|4])

and/or

WARN [org.jgroups.protocols.pbcast.NAKACK] [some_host:some_port (additional data: 19 bytes)] discarded message from non-member some_host:some_port (additional data: 19 bytes)

When setting up a cluster of jboss servers, even though the docs don't really require it, your network administrators will appreciate it when you place all these servers on the same subnet. This is because JGroups uses IP Multicast pings to maintain membership in the cluster and network administrators HATE IT when you multicast across subnets. When you have dual NICs in your servers set to fail on fault, it's really nice when the primary NICs on each box is plugged into the primary switch but it's even nicer when half of your boxes are plugged into switch A as their primary and the other half are plugged into Switch B as their primary. However, when half are plugged into one switch and the other half are plugged into another switch, you need to be able to pass multicast ip traffic between these two switches, which was the problem in our case. So, if you should happen to come across this condition, you might want to check to see if multicast IP is enabled on your switches and if your primary and secondary switches are passing multicast traffic between them.