Tuesday, December 16, 2014

Hardware diagnostics ordeal with 21400 appliances

Through a long and painful support process with one of my customers we have had to run hardware diagnostic tool on 21400 appliances with optical 10GB NICs.

It is supposed to be an easy tool to use. There is SK97251 describing the usage, and even the Administration guide for the tool.

But guess what?

First, the built-in R77.10 diagnostics did not work. When loading, it was crashing with kernel panic message. So we have resorted to the USB-based tool, as described in the SK article above.

The main point of the exercise was to check if any of the optical interfaces has a problem. We have spent 2 hours trying to make those tests.

The main problem is that the tool does not seem to know anything about these 10GB ports, and not so much about regular Ethernet ones either.

One is supposed to plug a loopback adapter to the tested port. As the tool is using non-standard interface names, it is blinking the port to test before proceeding. It seems the tool is calling interfaces almost randomly, not in accord with standard Gaia interface names.

Interface eth1-01 became eth0, eth1-02 - eth2. Port eth1 was not ever blinking, same for eth4 and many other interfaces on the way. Finally, eth1-08 became eth20. The funniest part is, if you are unable to identify an interface and were not fast enough to skip its verification, the whole test fails.

Obviously, no optical ports were ever blinking during the procedure.

I suspect our problems are actually described in the quoted article as
  • On 21000 Appliances with SAM card, testing of the SAM card is not supported.
It is not really clear to me if these 10GB cards are in fact SAM or not. In the catalog they are marked as "acceleration ready", whatever it means.

I am pretty sure that Check Point support engineers were never running HW diag tool on 21400 with optical cards. I suspect QA never did either. I really wish they did. That would save me yet another Saturday spent in a datacenter. 

Tuesday, December 9, 2014

10GB bonded interfaces with jumbo frames, some notes.

Today, with new Check Point appliances, one can design a decent DC firewall. With multiple 10GB interfaces and bonds, one can fit Nexus network core throughput easily.

There is one caveat though. There are multiple known issues with 10GB interfaces on Check Point appliances, especially concerning bonding.

Firstly, there is an issue with LACP if jumbo frames are in use, described in sk86980. I have to say the solution is incomplete. For example, i mentions only three particular software versions, and mine was not listed there. 

Secondly, there is a stability issue with jumbo frames, mentioned in sk99113

Luckily, driver update from the second SK fixes LACP with jumbo frames too.

Amount of different versions that need patching (from R75.40 till R77.10, including quite recent R75.47) for the matter clearly shows that Check Point developers were not too concerned about jumbo frames for a long time. It looks like high end DC testing scenarios were not part of the regular QA test cycles. 

I can only hope this will change in the future.