Help Wanted: Generating Remote Thread Dumps for Large Numbers of Servers
A recent commenter on this site posted a request for assistance on generating thread dumps remotely, quickly, across a large number (30+) of servers, whenever an incident occurs. I thought it would be a good idea to solicit ideas from anyone reading this site for ideas on how you might be doing it.
When a failure occurs within their web application, he must remotely generate thread dumps across 34 load-balanced jboss systems and restore service as quickly as possible. Since I've been experimenting a lot with Jmx4Perl and we are talking specifically about JBoss servers, scripting something in Jmx4Perl that would iterate through a list of servers would certainly be worth a try since Jmx4Perl has built-in methods that would help. Is anyone out there doing something similar today? JMX-Console is too slow for his purposes and depending upon the failure, both jmx-console and jmx4perl might not be all that usable, (if for example, the http connector is dead).
I would love to hear what other people are doing managing large clusters of JBoss systems, especially in the area of remotely generating thread dumps. Please brag about it in the comments! :) Let's assume, for the solution's sake, that we do not know 'why' he has to remotely generate thread dumps across this many systems.