Thursday, February 18, 2016

VSX deployment on High-End chassis. Part 3. Changing chassis ID

This is the third and last part of the topic started in the couple of previous posts.

When deploying 61/41K chassis as a cluster, you get a pair of devices marked as Chassis 1 and Chassis 2. The numbers are not just voluntary marks. They are used to generate and sustain quite complex system of internal addressing and communications: provisioning, sync, clustering, Security Groups management, and so on.

There are multiple SKs describing different architectural aspects of the system and its internal addressing.

By default, chassis 1 is active, and chassis 2 is standby. Although you can change that, it is custom to keep chassis #1 in the main Data Center, while putting #2 to your secondary one. But what if the logistics people made a mistake, and the chassis were swapped, leaving you with chassis 1 in the secondary Data Center?

The solution is standard, you just need to change the chassis ID numbering. The appropriate procedure is described in the Admin Guide (page 181 for R76.30SP version of the document). If you catch the error soon enough and if there is no security setup on the chassis yet, the procedure works like a charm.

However if you already have VSX deployed and provisioned, the story is not so simple anymore. The mentioned Admin Guide omits something quite important: unique role of SMO in a Security Group.

I have described this role in my first post for the matter. In combination with the internal addressing as function of the chassis IDs, the process becomes a bit peculiar.

Chassis ID change requires "hacking" into CMM config files and changing ID parameter on both ends. Chassis ID is used to form internal addressing. Each element of the chassis has a unique IP address based on it logical position and chassis ID. So you have to disconnect the chassis to cease intercommunication for a while, change ID numbers, rebooting each and every blade in the system and reconnecting the environment.

The catch is about Security Group addressing. As mentioned, the very first SGM added to the group becomes SMO. Other SGMs use SMO internal IP address to pull the config from it.

The main issue with changing chassis ID while having Security Group with non-default settings is about SMO swapping the place. Indeed, when addressed by IP, SMO is changing logical place from one physical chassis to another one, after ID change. In a particular scenario when you start the new chassis 1 first, its first SMO after the first boot tries to pull SMO config and fails. The reason of the failure is about that SGM trying to access another one by an IP address that belongs to itself after the change.

As the result, all other SGMs are also failing to pull config os Security Group, leading to total collapse of the chassis clustering. If you already have some traffic running through your system, that is inacceptable.

To avoid such situation, there are some actions to be done:


  1. Mind chassis ID from the beginning of deployment. It is quite easy to change IDs if nothing is configured yet.
  2. If you have Security Group and/or VSX setup on the system, plan for some downtime. The solution is to dissolve a Security Group and reconfigure it after the ID change. 

Good luck with your 61/41K system. The solution is actually quite nice, although complex.


-------
To support this blog and Check Point Video Nuggets project send your donations to https://www.paypal.me/cpvideonuggets

4 comments:

  1. Do you still need checkpoint r&d to upgrade one? Given where you'd use these devices I'd be dubious about using them as they still don't feel like a finished product to me

    ReplyDelete
    Replies
    1. It depends what you call upgrade. There is just one major release of VSX for this chassis. Hence no upgrade in principle.

      However there is no problem to deploy HFAs and fixes. You do not need RND for that.

      Delete
    2. Hi,

      No need for R&D for upgrades. I've seen old R76SP (vanilla) having some issues, but upgrades from R76SP.10 and .20 to R76SP.30 works well. One tip i can give you is to make sure you get latest version of the upgrade TCL script (ask, because you will not always get the latest one by default)

      Delete
    3. Managed to click on publish before I was done :)
      Anyway, at least in my experience, R&D has been helpful and they WANT to be engaged in any upgrades or new installations.

      Delete