Tuesday, December 4, 2012

No memory policy installation failure - resolved

I have faced a nasty issue lately with one of my VSX customers. After a certain IPS update the customer has lost ability to push policy to one of Virtual Systems. There was an error: "Load on Module failed - no memory". Strangely, it was just a single VS amont tens of others managed by the same CMA.

They have rebooted the VSX cluster hoping to fix the situation, but it only made it much worse. On the standby physical member the problematic VS was not even loaded, as pushed policy could not be anymore fetched. The cluster was broken, and the member went to "down" state.

Surprisingly, the first case one can find in SecureKnowledge, sk40768, has saved the day. There is a parameter related to showing rule's UUID in the logs, one has to switch it off as the solution case describes.

Once we have applied the solution, policy could be pushed without a problem. The second cluster member was still down, with weird interface probing errors and VS failed to run. We have had to reboot it, and after that everything came back to normal.

Lessongs learned:

1. Do no believe policy installation errors, they can be extremely misleading.
2. Do not rush into rebooting VSX cluster members, that could back-fire.
3. Do DB Revision Control before updating IPS, that would allow you to roll back quickly, if any issue with policy installation.

12 comments:

  1. Valeri, you can't do DB Revisions with VSX...it corrupts the objects and DB...
    sk65420

    ReplyDelete
    Replies
    1. Hi Craig, I am replying to you on LinkedIn.

      Although the SK looks scary as hell, my customers are actually doing revision control with VSX, and so far it has caused no issue at all.

      In fact, it is CP who was wondering if it is possible to roll back to a previous DB point during the support processes for this particular case.

      Anyhow, thanks for the heads up.

      Delete
    2. Doing the revisions themselves don't break anything as far as I know, it's the act of "rolling back" that causes mayhem. I've seen it at least twice...it's never pretty :)

      Delete
    3. VSX RnD have told me exactly this - restore opration will "break" the VSX configuration on MGMT.

      Delete
  2. Not really a great solution - disabling logging of rule UUID's breaks any sort of long term reporting/analysis performed on the log files as you must use the UUID as the rule umber is likely to change.

    There is a grub.conf hack that has resolved this for me in the past, setting vmalloc=512M

    /etc/grub.conf:

    title Start in normal mode
    root (hd0,0)
    kernel /vmlinuz ro root=LABEL=/ vmalloc=512M panic=15 console=SERIAL 3 quiet
    initrd /initrd

    ReplyDelete
    Replies
    1. Thanks, Matt. It is a different issue. We have tried your solution first, and it did not work.

      Delete
  3. You're right, we encountered the same issue yesterday and the previous fix I quoted isn't working this time.

    Confirming that disabling of UUID logging as a temporary fix.

    Followed sk40768, thought we were close:

    ;[cpu_2];fw_rules_uid_handle_uid: couldn't allocate dictionary string id for rule no. 730

    Then I got this:
    [cpu_3];fw_rules_uid_handle_uid: couldn't allocate dictionary string id for rule no. 43

    move some rules around, then this:
    [cpu_3];fw_rules_uid_handle_uid: couldn't allocate dictionary string id for rule no. 42

    moved some more rules around:
    [cpu_3];fw_rules_uid_handle_uid: couldn't allocate dictionary string id for rule no. 41

    Eventually got to this point:

    ;[cpu_1];fw_rules_uid_handle_uid: couldn't allocate dictionary string id for rule no. 0

    Escalating with Checkpoint - will keep you posted.



    ReplyDelete
  4. Varera, are you SURE that your problem was this? Did you indeed have non-ascii chars in one of the rules?
    It doesn't sound related to your IPS update event...

    I'm asking because there's another issue with some VSX versions, which has a rather low limit on the string dictionary table, which is used to save (among many other things) the rule uuid.
    If you got to the limit, then disabling uuid logging will work, but only temporarily, until you'll fill it up again...
    I'm sure there's an SK about this. If I could only remember my SecureKnowledge password.... :-)

    Tends to happen in VSX more often than in non-VSX, because in versions before R75.40VS, the string dic table was shared for all VSs, while the limit was the same as for non-VSX versions...

    ReplyDelete
    Replies
    1. Hi DBar! You are right, the described is not a solution, it is a workaround. We are still in the process of getting to the root cause with support.

      As for SecureKnowledge, there are plenty of cases, but nothing that would imminently help, so far.

      Delete
    2. Did you see sk66342?
      What's your current limit and current number of entries in the table?

      Delete
    3. Yes, I have seen that. The table is not full in our case. Still digging into the issue.

      Delete