34 posts categorized "bigip"

10/10/2012

BigIP: Fixing MQ Read and Write 104 X('68') Errors

F5 published a Deployment Guide "Deploying the BigIP LTM with IBM WebSphere MQ", which provides a very good HowTo guide for load-balancing MQ with a BigIP and provides all the steps necessary to get up and running quickly.

I followed all these steps myself and my developers were off and running. The problems started a little later, however, when the application was unable to read messages off of the queue. Errors like the following were showing up in the MQ "client" logs:

AMQ9208: Error on receive from host 10.X.Y.Z (10.X.Y.Z).
 
EXPLANATION:
An error occurred receiving data from 10.X.Y.Z (10.X.Y.Z) over
TCP/IP. This may be due to a communications failure.
ACTION:
The return code from the TCP/IP read() call was 104 (X'68'). Record these
values and tell the systems administrator.

In addition, we would see the following errors on the MQ "server" logs:

AMQ9206: Error sending data to host 10.a.b.c
(10.a.b.c)(port#).
 
EXPLANATION:
An error occurred sending data over TCP/IP to 10.a.b.c
(10.a.b.c)(port#). This may be due to a communications failure.
ACTION:
The return code from the TCP/IP(write) call was 104 X('68'). Record these
values and tell your systems administrator.

If the BigIP VIP was bypassed, the errors on both ends of the connection would go away. No firewalls separated one mq node from another. Other than citing a "network failure", IBM support wasn't much assistance here. Initial network traces indicated that I had not followed the deployment guide recommendations verbatim, as the traces were showing keep-alive requests every 300 seconds from the MQ nodes, which also matched the 300 second tcp idle timeout on the BigIP. It was somewhat easy to assume then that the bigip was issuing a connection reset on the idle tcp connection at around the same time the mq heartbeat was getting issued but reducing the MQ Heartbeat interval to below the BigIP's TCP Idle Timeout value did not resolve the issue.

The resolution was found with the help from this F5 Solution Article: "SOL8049: Implementing TCP Keep-Alives for server-client communication using TCP profiles". Add a Keep Alive Interval to the TCP profile (or create a new tcp profile) and assign the value to half of what your tcp timeout is set to (or an even smaller value), restart the MQ channels, and these errors should go away.

5K4HH6YPHF77

09/26/2012

BigIP V11 Upgrade Reminder: "$::" in iRules Will Not Work

A reminder to myself and others who are scheduling upgrades to version 11 of the BigIP software: Although the matchclass (DevCentral login required) command is deprecated (but will still work as of v11.2.0), data groups referenced in your irules using the $::data_group_name syntax will result in TCL errors written to your ltm logs and Connection Resets to your end-users.

If you come across some error messages in the ltm logs like this:

MMM DD HH:MM:SS tmm err tmm[PID]: 01220001:3: TCL error: /Partition/the_offending_irule - can't read "::your_data_group": no such variable while executing "matchclass [IP::client_addr] equals $::your_data_group"

The quickest way to fix and restore service is to remove the '$::' from in front of your Data Group in your irule and then wrap it in double quotes. (For example, $::your_data_group becomes "your_data_group"

To avoid future matchclass incompatibilities, it might be wise to modify those rules using matchclass to the new class match functions. Migrating matchclass functions to class match syntactically looks to be just as easy—replace matchclass with class match (although I personally haven't done this yet in my rules using matchclass).

08/25/2012

HTTP Monitors Behave Differently in BigIP LTM v10/v11

I needed to create a custom http monitor for an Adobe CQ implementation. Having upgraded our environment from v9 to v10 (and then v11) only fairly recently, this was my first time having to set up a custom http monitor on our BigIPs because none of my existing monitors was suitable.

The first thing I noticed post v10 upgrade was that the Send String in the default monitors had changed to include a carriage return/line feed (CRLF) after the resource, (i.e. from GET / to GET /\r\n). A typical CQ implementation may include Apache web servers as your customer- or internet-facing systems while your internal publishing servers may continue running the java container identified as Day-Servlet-Engine/4.1.24.

If everything is working as expected, a request to resource /sample/resource/page.html against your customer-facing web servers should also work against your publishing servers. So, I set up a monitor that performed periodic GET requests for /sample/resource/page.html. The send string in the http monitor looked similar to this: GET /sample/resource/page.html\r\n

When applied to the apache pool, the monitor worked as expected but the same monitor applied to the publishing tier resulted in disabled servers. cURL requests for the exact same resource worked as expected against servers in both pools. The only thing that wouldn't work was the http monitor against the publishing servers.

As it turned out, two CRLF's needed to be sent to the cq publishing server http listener, but, luckily, changing the send string in the http monitor from GET /sample/resource/page.html\r\n to GET /sample/resource/page.html\r\n\r\n worked fine against both Apache and the Day servlet engine.

Note to Self: Some app servers need two CRLFs in the http monitor.