Tuesday, March 13, 2012

ClusterXL flapping troubleshooting - short HOWTO

ClusterXL is one of the most interesting and yet not easy to handle parts of the Check Point products. This post is to summarize some basic troubleshooting steps when dealing with cluster instability.

Symptops: FW SPLAT based cluster is "flapping". Members periodically change status from Active/Standby to Down. In SmartView Tracker logs you can see entries about "member 1 is down" members 2 changes state to active" as well as messages of connectivity problems with cluster interfaces:


cluster_info: (ClusterXL) member 2 (192.168.0.6) is down (Interface Active Check on member 2 (192.168.0.6) detected a problem (14 interfaces required, only 13 up).)
cluster_info: (ClusterXL) interface Mgmt of member 2 (192.168.0.6) is down (receive up, transmit down) 


------------------------------------------------------------


There are some basic steps for fixing it quickly.

1. Check if there is any other Check Point cluster connected to the same IP network. If there is, change so-called "magic MAC" numbers, as described in SK25977. Aplly the solution, reboot the cluster. Check if the issue is now fixed. If not, go to the step 2.


2. Check your cluster is runnign in multicast mode. To do that, run

# cphaprob -a if
You should have something to the following output:


------------------------
High Availability interfaces (cphaprob -a if)
------------------------
Required interfaces: 4
Required secured interfaces: 1


eth0       UP                    non sync(non secured), multicast
eth1       UP                    sync(secured), multicast
eth2       UP                    non sync(non secured), multicast
eth3       UP                    non sync(non secured), multicast


If you have broadcast for interfaces instead of multicast, there is something wrong with physical interfaces, cabling and switching. Otherwise to to the step 3.

3. Be sure IGMP snooping is disabled on the adjacent switch. ClusterXL uses CCP in multicast mode by default, so IGMP registration won't work on the switch side. You have to have IGMP snooping disabled globally on the switch or at least for the specific NIC ports connected to the cluster.

Once IGMP snooping is disabled, this should stop flapping. Reasons are mentioned in ClusterXL R75.20 Administration Guide on the page 31.

In case you cannot disable IGMP snooping on the switch, the last option is to switch CCP from multicast to broadcast.

To do that, run

cphaconf set_ccp broadcast  

You will have to reboot the cluster again.  As mentioned in the comments, reboot is not required to activate the feature.


All, if you have something to add to this, please be my guest and comment at will.

7 comments:

  1. Interesting. Also, historically to set the CCP to broadcast we've also had to add fwha_sync_broadcast_ack=1 to fwkern.conf...

    Are you saying this should no longer be necessary?

    ReplyDelete
  2. it is not necessary from NGX version 6.0

    ReplyDelete
  3. We've just changed multicast to broadcast, but without rebooting the cluster. No flapping more :)

    ReplyDelete
  4. Dear Mr. Valeri,

    my client using a 12500 Checkpoint appliance with Gaia OS and R77.30 version. they have a cluster XL deployment.

    the problem is CCP was not heard from member eth. on the switch core they have already disable the igmp snooping but still show a CCP message from firewall. then we ask to support and they suggest to change from multicast to broadcast.

    my question is: what issue would arise if we do a change from multicast to broadcast? and what the effect after we change it?

    thank you

    ReplyDelete
    Replies
    1. Hi Denis, broadcast will be heard in the whole network segment after this change.

      My question is, don't you see any CCP at all between the cluster members? This seems to be impossible if configured properly

      Delete
  5. HI Guys.. I Have Checkpoint 23500 Firewall installed with VSX. we are using R80.10 and facing the same interface flapping problem. Firewall are actually did not failover. i check and can confirm it is configured with broadcast not multicast. verified by cphaprob -a if command. from my switch IGMP snooping is enabled . NOt sure how to resolve this flapping problem.

    ReplyDelete