Monday, November 6, 2017

Kernel debug Best Practices or "Why "fw ctl zdebug..." should not be used"

Over last several days I have seen rapidly growing amount of posts at CPUG and CP Community where "fw ctl zdebug..." command was mentioned, used and advised.

Although some of you already know my position for the matter, I have decided to write a post about the growing custom to use zdebug instead of employing full fw ctl debug mechanism.

Kernel debug in general


Check Point FW is essentially a Linux-based system with a kernel module inserted between drivers and OS IP stack. If you do not know what I am talking about, you may want to look into this post with an explanatory video for the matter.

Extracting information about kernel based security decisions is rather tricky, so Check Point developed an elaborate tool to read some info about various FW kernel modules actions.

In a nutshell, each kernel module has multiple debug flags that force code to start printing out some information. I have numerous posts in this blog explaining different flags, tips and tricks with kernel debug and also providing links to CP kernel debug documents.

Debug buffer


It is important to understand FW kernel is always printing out some debug messages. For most of the kernel modules, error and warning flags are active, and the output goes to /var/log/messages by default. This is not practical for debug, so before starting kernel debug, an engineer needs to set a buffer which would receive debug output instead of /var/log/messages file.

To do so, the following command is used: fw ctl debug -buf XXXXX, where XXXXX is the buffer size in KB. The maximum possible buffer today is 32 MB, but I advise my students to use 99999 to make sure they get maximum buffer possible anyway.

Kernel can be very chatty, so having a bigger buffer would ensure less kernel messages being lost.

Debug modules and flags


FW kernel is a complex structure. It is built with multiple modules. Each of the modules has its own flags. One can run a single debug session with multiple flags raised for several modules. To raise debug flags, one use one or several commands of this type:

fw ctl debug -m (module name) (+|-) (list of flags)

It is essential that + and - options allow you to raise and remove flags on the fly, even during an already running debug session. List of modules and flags can be found by the first link in this post.

Printing info out of buffer


Raising flags is not enough, as to get information, you need to start reading buffer out with this command:

fw ctl kdebug -f (with some options)

There will be A LOT of information, so never do this on the console. Use SSH session or redirect to a file.

Stopping debug


Once you collected the relevant info, you need to reset kernel debug to the default settings, otherwise you FW will continue printing out tons of unnecessary info. To do so, run

fw ctl debug 0

What is fw ctl zdebug then?

fw ctl zdebug is an internal R&D macros to cut corners when developing and testing new features in the sterile environment. It is equivalent to the following sequence of commands:

fw ctl debug -buf 1024
fw ctl debug (your options)
fw ctl kdebug -f
-------(waiting for Ctrl-C)
fw ctl debug 0

Why is this a problem?


If you are still reading this post and get to this line, you probably think zdebug is a god sent miracle. It simplifies so many things, it is the only way to run debug in production environment! Right? 

Wrong. To make it plain, here is the list of problematic point with this way of doing things:

1. The buffer is way too small. Lots and lots of messages might be just lost because buffers does not have enough room to hold them before read.
2. It is not flexible enough. Running debug in production requires lots of consideration and certain amount of caution. After all, you are asking FW kernel to do extra things, lots of them. The best practice is to start with a single flag or two and expand area of research in the fly trying to catch an issue. This is impossible to do with fw ctl zdebug macros.
3. It is too simple to use. You could say, what a funny argument. Yet, let's think about it. To master kernel debug as described above, one has to understand kernel structure, dependencies, flags and modules. You don't have to do any of that to run fw ctl zdebug drop, and many people do jsut that. 

And guess what, this is also the simplest way to bring your busy production FW cluster down. So no, do not try this at home or at your place of work, if job security is important for you. 


-----------
Support CPET project and this blog with your donations to https://www.paypal.me/cpvideonuggets 


4 comments:

  1. I've run 'fw ctl zdebug drop' on countless number of Check Point gateways and on some of the largest installs in the world. I did this with confidence and without issue. Did I run this on a gateway running at 99% CPU and memory utilization, no. But I ran it on any gateway that I would run any other debug on. My unscientific approach certainly doesn't mean future use will not cause issues but I don't plan to stop using it or mentioning it to others unless Check Point R&D officially agrees with you .

    The z in zdebug is named after the creator Tamir Zegman. I encourage you to seek his personal opinion on the matter before you continue.

    ReplyDelete
    Replies
    1. Thanks, Dan. I do know who's behind the tool, and why he did it in the first place. Hence "internal R&D tool" note in the blog.

      Delete
  2. I only use this with a | grep "something" to look for specific things so I avoid buffer overflows. I wonder if that still fills up buffers?

    ReplyDelete
  3. For the record, there is no buffer overflow. If the buffer is full, the messages are lost. However, grep does not help you with th buffet size, in only filters the output

    ReplyDelete