« The Great SSL Extended Validation Certificate Mystery | Main | Drupal: One Fix for 403-Forbidden Errors from RSS Aggregators »

06/18/2009

JBoss: Clustered Node Startup Failures

As far as I can tell, JBoss clustering is based on functionality provided by another JBoss project called JGroups. We recently ran into an issue where half of our six identically configured application servers would simply not start. As the servers were all generated from the same base image, server configuration was not thought to be the culprit. All nodes were on the same subnet so we were a bit puzzled. In the logs on the servers, we same messages that looked like the following:

ERROR [org.jgroups.protocols.pbcast.GMS] [some_host:some_port] received view <= current view; discarding it (current vid: [some_host:some_port|4], new vid: [some_host:some_port|4])

and/or

WARN [org.jgroups.protocols.pbcast.NAKACK] [some_host:some_port (additional data: 19 bytes)] discarded message from non-member some_host:some_port (additional data: 19 bytes)

When setting up a cluster of jboss servers, even though the docs don't really require it, your network administrators will appreciate it when you place all these servers on the same subnet. This is because JGroups uses IP Multicast pings to maintain membership in the cluster and network administrators HATE IT when you multicast across subnets. When you have dual NICs in your servers set to fail on fault, it's really nice when the primary NICs on each box is plugged into the primary switch but it's even nicer when half of your boxes are plugged into switch A as their primary and the other half are plugged into Switch B as their primary. However, when half are plugged into one switch and the other half are plugged into another switch, you need to be able to pass multicast ip traffic between these two switches, which was the problem in our case. So, if you should happen to come across this condition, you might want to check to see if multicast IP is enabled on your switches and if your primary and secondary switches are passing multicast traffic between them.

TrackBack

TrackBack URL for this entry:
http://www.typepad.com/services/trackback/6a01156fbc6fe6970c011572288445970b

Listed below are links to weblogs that reference JBoss: Clustered Node Startup Failures:

Comments