Monday, November 6, 2017

Kernel debug Best Practices or "Why "fw ctl zdebug..." should not be used"

Over last several days I have seen rapidly growing amount of posts at CPUG and CP Community where "fw ctl zdebug..." command was mentioned, used and advised.

Although some of you already know my position for the matter, I have decided to write a post about the growing custom to use zdebug instead of employing full fw ctl debug mechanism.

Kernel debug in general

Check Point FW is essentially a Linux-based system with a kernel module inserted between drivers and OS IP stack. If you do not know what I am talking about, you may want to look into this post with an explanatory video for the matter.

Extracting information about kernel based security decisions is rather tricky, so Check Point developed an elaborate tool to read some info about various FW kernel modules actions.

In a nutshell, each kernel module has multiple debug flags that force code to start printing out some information. I have numerous posts in this blog explaining different flags, tips and tricks with kernel debug and also providing links to CP kernel debug documents.

Debug buffer

It is important to understand FW kernel is always printing out some debug messages. For most of the kernel modules, error and warning flags are active, and the output goes to /var/log/messages by default. This is not practical for debug, so before starting kernel debug, an engineer needs to set a buffer which would receive debug output instead of /var/log/messages file.

To do so, the following command is used: fw ctl debug -buf XXXXX, where XXXXX is the buffer size in KB. The maximum possible buffer today is 32 MB, but I advise my students to use 99999 to make sure they get maximum buffer possible anyway.

Kernel can be very chatty, so having a bigger buffer would ensure less kernel messages being lost.

Debug modules and flags

FW kernel is a complex structure. It is built with multiple modules. Each of the modules has its own flags. One can run a single debug session with multiple flags raised for several modules. To raise debug flags, one use one or several commands of this type:

fw ctl debug -m (module name) (+|-) (list of flags)

It is essential that + and - options allow you to raise and remove flags on the fly, even during an already running debug session. List of modules and flags can be found by the first link in this post.

Printing info out of buffer

Raising flags is not enough, as to get information, you need to start reading buffer out with this command:

fw ctl kdebug -f (with some options)

There will be A LOT of information, so never do this on the console. Use SSH session or redirect to a file.

Stopping debug

Once you collected the relevant info, you need to reset kernel debug to the default settings, otherwise you FW will continue printing out tons of unnecessary info. To do so, run

fw ctl debug 0

What is fw ctl zdebug then?

fw ctl zdebug is an internal R&D macros to cut corners when developing and testing new features in the sterile environment. It is equivalent to the following sequence of commands:

fw ctl debug -buf 1024
fw ctl debug (your options)
fw ctl kdebug -f
-------(waiting for Ctrl-C)
fw ctl debug 0

Why is this a problem?

If you are still reading this post and get to this line, you probably think zdebug is a god sent miracle. It simplifies so many things, it is the only way to run debug in production environment! Right? 

Wrong. To make it plain, here is the list of problematic point with this way of doing things:

1. The buffer is way too small. Lots and lots of messages might be just lost because buffers does not have enough room to hold them before read.
2. It is not flexible enough. Running debug in production requires lots of consideration and certain amount of caution. After all, you are asking FW kernel to do extra things, lots of them. The best practice is to start with a single flag or two and expand area of research in the fly trying to catch an issue. This is impossible to do with fw ctl zdebug macros.
3. It is too simple to use. You could say, what a funny argument. Yet, let's think about it. To master kernel debug as described above, one has to understand kernel structure, dependencies, flags and modules. You don't have to do any of that to run fw ctl zdebug drop, and many people do just that. 

My personal position on this is that kernel debug is a sensitive and risky operation. It requires understanding of the technology and the tool itself beforehand. Without such understanding one could miss messages, complicate things and in some very limited cases, crash the GW under debug. The latter I have not seen for quite some time, though.

Support CPET project and this blog with your donations to