Thursday, May 15, 2014

Notes about sync redundancy

During the last Advanced Check Point Troubleshooting course I have been asked about best practices to build sync redundancy with Gaia.

The question is not a simple as it sounds. The classic textbooks for ClusterXl recommend using two or more independent synchronisation interfaces marked as First and Second Sync. Although it was true for older versions, R7x changed the play.

Sk92804 "Sync Redundancy in ClusterXL" clearly states using multiple sync interfaces is obsolete. The new best practice is to build a bond interface defined as sync.

Now simple, you say? Not really. Using bond interfaces with Check Point is tricky. There are at least three SecureKnowledge articles that you should keep in mind, mostly for CP appliances:

  • State of Sync interface configured on Bond interface is 'DOWN' for each Virtual System
    Solution ID: sk100450
  • SecurePlatform / Gaia OS crashes on 12000 / 21000 appliance during configuration of Bond interface
    Solution ID: sk69442
  • Incorrect count of Bond slaves in use after physical link down
    Solution ID: sk98160

Each one of them requires a fix. Only after three support fixes your sync should be fine.


  1. Jesus..
    Valeri, I`d really like to know your opinion on the quality of check point code nowadays.
    Last 3 or 4 projects involving CP I`ve done were (and some of them still are) a nightmare. Bug after bug, support cases opened one after another. OSPF peers dropped, CPU skyrocketing, crashes, voip drops, dozens of problems solved (or not) by hotfixes.
    "Using bond interfaces with Check Point is tricky."
    Bonding interfaces is a mature technology, been here for ages. Why would we need 3 (!) hotfixes to make it work?
    It seems that software testing is performed after the release, using customers as beta-testers.
    Do you agree? Or is there any magic way to make it work from the beginning?

    1. Do not overreact, Pavel. Yes, there is a certain fall in quality of Check Point products lately, at least it seems so. I can do nothing about it.

      If you would look onto the quoted cases though, you would see, there is nothing critical about them. Bonds are working fine, there are just some minor LACP cosmetics and "might be" misbehaviour in a limited number of scenarios.

      Yes, it is annoying. But at least it is fixable and known, before you configure.

    2. My dear colleague Pavel, I cannot be more agreed with you! I´ve been working with CP technologies around 6 years. Some CISCO'S coworkers, told me that I must leave from the dark side. and return to the light, cause I´m Cisco certificated too. But, some people just need constantly challenges. And thats is CP a world of unlimited challenges !!. I have respect for Valeri, but, I believe that you are not overreacting, because is the truth, CP use their customers like beta testers. Have no idea how many clients told me that in my entire carrer.

      Best regards,

      Mario Jr.

  2. Thanks for note. Why? I'm not sure of the cost/benefit of making the switch-what does it buy you? On 1 cluster its easy, try doing it on 600 clusters in change control environment.

    The switch to GAIA was bad bad bad. I think things will get better in the future now that we are over the hump and doing QA releases.