The Linux kernel black box

bash --version: GNU bash, version 4.4.20(1)-release (x86_64-pc-linux-gnu)
INT (Arcana): DC 10
QOTD:

“Mayday, mayday”

In the logs of a booting machine I was trying to debug, these warnings caught my attention. They where showing up persistently in every boot, with multiple instances of following lines:

[    3.851205] pstore: decompression failed: -22
[    3.851273] pstore: crypto_comp_decompress failed, ret = -22!

I was looking at a faulty machine but these were only warnings; maybe not related to the problem at hand. So I checked other machines, and some were also showing them messages early on boot time.

My curiosity was piqued.

The last breath

The Linux kernel implemented a while ago (since version 2.6.39, according to lwn.net) a feature that works pretty much like an airplane’s flight recorder. When the system crashes, the last lines of the kernel log are attempted to be saved in this black box.

“Black box” is actually a misnomer for flight recorders, they are actually reflective bright orange to make them easier to spot during searches.

They are not stored in disk though, because that could also be compromised with the crash. So the idea of a “Persistent Storage” interface was introduced in the kernel: they named it the pstore filesytem.

The pstore filesystem provides some kind of abstraction in the kernel to whatever storage option the platform provides. It supports several backends depending on what is available in the machine.

In modern servers, the backend for pstore ends up being in a dedicated memory that is hardware vendor dependent; and that is exposed through the ERST (Error Record Serialization Table), which is in turn defined in the ACPI specification (section 18.5).

This basically means that the hardware controller exposes a common protocol, and the pstore filesystem implementation knows how to talks to it, if present, as part of one of it’s backend options.

Another alternative which sounds really nice, is to configure pstore to use part of the RAM as the backend. But for this to work it has to use a range of RAM that for sure is not going to be erased during boot. This backend is called ramoops.

So, this pstore filesystem is a Linux kernel feature that helps us recover the last logs of our panicking kernel by storing them in a special hardware-defined persistent storage. Cool.

But, how does it works from the kernel debugger point of view? How do we access those crash logs? And… why am I getting those warnings?

Recreating the crash

The kernel gives a very special treatment to the pstore filesystem. Looking at its documentation, it is not a regular storage, and it has to be accessed in a similar fashion that proc or tempfs: we have to ask the kernel to open us a window into the pstore contents.

We do that with a mount syscall, specifically telling the kernel a mount of type pstore.

mount -t pstore - /sys/fs/pstore

It is likely that pstore is already mounted in your system, int /sys/fs/pstore. And it is also likely that the /sys/fs/pstore directory is empty, even if you have pstore support.

But, if you have suffered from kernel panics in the past, and you never heard of this directory chances are you also have some files lying around.

$ ls -lA /sys/fs/pstore
-r--r--r--  1 root root  5203 Dec  4  2017 dmesg-erst-6495821019368914945.enc.z
-r--r--r--  1 root root  5074 Dec  4  2017 dmesg-erst-6495821019368914946.enc.z
-r--r--r--  1 root root  5473 Dec  4  2017 dmesg-erst-6495821019368914947.enc.z
-r--r--r--  1 root root  5686 Dec  4  2017 dmesg-erst-6495821019368914948.enc.z
-r--r--r--  1 root root 17731 Jan 29  2018 dmesg-erst-6516477063283605505
-r--r--r--  1 root root 17694 Jan 29  2018 dmesg-erst-6516477063283605506
-r--r--r--  1 root root 17729 Jan 29  2018 dmesg-erst-6516477063283605507
-r--r--r--  1 root root 17741 Apr 30  2018 dmesg-erst-6550257500757688321
-r--r--r--  1 root root 17710 Apr 30  2018 dmesg-erst-6550257500757688322
-r--r--r--  1 root root 17716 Apr 30  2018 dmesg-erst-6550257500757688323
-r--r--r--  1 root root 17710 Apr 30  2018 dmesg-erst-6550257500757688324
-r--r--r--  1 root root 17726 Jul  3  2018 dmesg-erst-6574057198740045825
-r--r--r--  1 root root 17717 Jul  3  2018 dmesg-erst-6574057198740045826
-r--r--r--  1 root root 17736 Jul  3  2018 dmesg-erst-6574057198740045827
-r--r--r--  1 root root 17752 Jul  3  2018 dmesg-erst-6574057198740045828

Some of them are compressed, some are not, but all of them hold a couple of KB worth of kernel log lines around the point of a panic or oops. The real purpose of this persistent storage is to save the crash logs for us, and then we are supposed to rescue them and put them in a safe place, cleaning up the directory.

These files are not meant to be left there for too long. Capacity in the persistent storage is limited and chances are that it will refuse new entries if it is full. Old entries are compressed to make room for new ones, but storage is still not sizable.

The recommended action is to move the logs once read to another place, or erase them entirely. But typically this is not done automatically, and in long lived systems that have seen many crashes, they tend to pile up. Which leads us to…

Back to the errors

After that research, we can come back to the errors from the beginning.

[    3.851205] pstore: decompression failed: -22
[    3.851273] pstore: crypto_comp_decompress failed, ret = -22!

They clearly refer to pstore filesystem, and seem to point specifically to compressed crash logs. The solution suggested everywhere is to clean up the /sys/fs/pstore directory, removing any compressed entries. And that for sure works and gets rid of the annoying warnings.

The reason behind the errors seem to be related to a change that was made in the compression algorithm used, and the fact that it could no longer read entries encrypted with old methods.

This combined with the fact that the machines I was dealing with have really old entries could be the explanation for the messages. I could not find clear evidence of this, only similar issue in SLE14.

However, the warnings were mitigated, and the knowledge acquired with the research proved to be useful later on when debugging a remote system that would not boot up.