The Linux kernel black box
bash --version
: GNU bash, version 4.4.20(1)-release (x86_64-pc-linux-gnu)
INT (Arcana)
: DC 10
QOTD
:
“Mayday, mayday”
In the logs of a booting machine I was trying to debug, these warnings caught my attention. They where showing up persistently in every boot, with multiple instances of following lines:
[ 3.851205] pstore: decompression failed: -22
[ 3.851273] pstore: crypto_comp_decompress failed, ret = -22!
I was looking at a faulty machine but these were only warnings; maybe not related to the problem at hand. So I checked other machines, and some were also showing them messages early on boot time.
My curiosity was piqued.
The last breath
The Linux kernel implemented a while ago (since version 2.6.39, according to lwn.net) a feature that works pretty much like an airplane’s flight recorder. When the system crashes, the last lines of the kernel log are attempted to be saved in this black box.
“Black box” is actually a misnomer for flight recorders, they are actually reflective bright orange to make them easier to spot during searches.
They are not stored in disk though, because that could also be compromised with
the crash. So the idea of a “Persistent Storage” interface was introduced in
the kernel: they named it the pstore filesytem
.
The pstore filesystem
provides some kind of abstraction in the kernel to whatever
storage option the platform provides. It supports several backends depending
on what is available in the machine.
In modern servers, the backend for pstore
ends up being in a dedicated
memory that is hardware vendor dependent; and that is exposed through the ERST
(Error Record Serialization Table), which is in turn defined in the
ACPI specification
(section 18.5).
This basically means that the hardware controller exposes a common protocol,
and the pstore filesystem
implementation knows how to talks to it, if
present, as part of one of it’s backend options.
Another alternative which sounds really nice, is to configure pstore
to use
part of the RAM as the backend. But for this to work it has to use a range of
RAM that for sure is not going to be erased during boot. This backend is
called
ramoops.
So, this pstore filesystem
is a Linux kernel feature that helps us recover
the last logs of our panicking kernel by storing them in a special
hardware-defined persistent storage. Cool.
But, how does it works from the kernel debugger point of view? How do we access those crash logs? And… why am I getting those warnings?
Recreating the crash
The kernel gives a very special treatment to the pstore filesystem
. Looking
at its documentation,
it is not a regular storage, and it has to be accessed in a similar fashion
that proc
or tempfs
: we have to ask the kernel to open us a window into
the pstore
contents.
We do that with a mount
syscall, specifically telling the kernel a mount of type pstore
.
mount -t pstore - /sys/fs/pstore
It is likely that pstore
is already mounted in your system, int
/sys/fs/pstore
. And it is also likely that the /sys/fs/pstore
directory
is empty, even if you have pstore
support.
But, if you have suffered from kernel panics in the past, and you never heard of this directory chances are you also have some files lying around.
$ ls -lA /sys/fs/pstore
-r--r--r-- 1 root root 5203 Dec 4 2017 dmesg-erst-6495821019368914945.enc.z
-r--r--r-- 1 root root 5074 Dec 4 2017 dmesg-erst-6495821019368914946.enc.z
-r--r--r-- 1 root root 5473 Dec 4 2017 dmesg-erst-6495821019368914947.enc.z
-r--r--r-- 1 root root 5686 Dec 4 2017 dmesg-erst-6495821019368914948.enc.z
-r--r--r-- 1 root root 17731 Jan 29 2018 dmesg-erst-6516477063283605505
-r--r--r-- 1 root root 17694 Jan 29 2018 dmesg-erst-6516477063283605506
-r--r--r-- 1 root root 17729 Jan 29 2018 dmesg-erst-6516477063283605507
-r--r--r-- 1 root root 17741 Apr 30 2018 dmesg-erst-6550257500757688321
-r--r--r-- 1 root root 17710 Apr 30 2018 dmesg-erst-6550257500757688322
-r--r--r-- 1 root root 17716 Apr 30 2018 dmesg-erst-6550257500757688323
-r--r--r-- 1 root root 17710 Apr 30 2018 dmesg-erst-6550257500757688324
-r--r--r-- 1 root root 17726 Jul 3 2018 dmesg-erst-6574057198740045825
-r--r--r-- 1 root root 17717 Jul 3 2018 dmesg-erst-6574057198740045826
-r--r--r-- 1 root root 17736 Jul 3 2018 dmesg-erst-6574057198740045827
-r--r--r-- 1 root root 17752 Jul 3 2018 dmesg-erst-6574057198740045828
Some of them are compressed, some are not, but all of them hold a couple of KB worth of kernel log lines around the point of a panic or oops. The real purpose of this persistent storage is to save the crash logs for us, and then we are supposed to rescue them and put them in a safe place, cleaning up the directory.
These files are not meant to be left there for too long. Capacity in the persistent storage is limited and chances are that it will refuse new entries if it is full. Old entries are compressed to make room for new ones, but storage is still not sizable.
The recommended action is to move the logs once read to another place, or erase them entirely. But typically this is not done automatically, and in long lived systems that have seen many crashes, they tend to pile up. Which leads us to…
Back to the errors
After that research, we can come back to the errors from the beginning.
[ 3.851205] pstore: decompression failed: -22
[ 3.851273] pstore: crypto_comp_decompress failed, ret = -22!
They clearly refer to pstore filesystem
, and seem to point specifically to
compressed crash logs. The solution suggested everywhere is to clean up the
/sys/fs/pstore
directory, removing any compressed entries. And that for
sure works and gets rid of the annoying warnings.
The reason behind the errors seem to be related to a change that was made in the compression algorithm used, and the fact that it could no longer read entries encrypted with old methods.
This combined with the fact that the machines I was dealing with have really old entries could be the explanation for the messages. I could not find clear evidence of this, only similar issue in SLE14.
However, the warnings were mitigated, and the knowledge acquired with the research proved to be useful later on when debugging a remote system that would not boot up.