Kernel panic as an unneeded event.

Mon Sep 10 20:47:05 CDT 2007

On 9/10/07, Oren Beck <orenbeck at gmail.com> wrote:
>
> Ok folks, here's a mindset wrench.
>
> CAN we make a set of fallbacks to allow a certain minimal function
> allowing a potential "panic" to seek external help?

Not much.

The defining characteristic of a kernel panic is an error that the kernel
has detected within its own data structures and procedures, from which
recovery is not possible. An application shouldn't be able to cause a panic;
only kernel code itself (including drivers) or a HW problem like bad RAM can
do so.  AIX and Macs have some NVRAM set aside into which the kernel is able
to write some information about the error before shutting down or
rebooting.  At boot time, these OSes copy the error information to a file on
the hard drive from which further analysis is possible.  That part could
easily be automated.

But the only reason that IBM and Apple are able to do this is that they
control the hardware.  They are able to spec the exact way in which this
NVRAM is to be written.  It isn't safe for a kernel that has panicked to
write to any hard drive, because the kernel data structures that keep track
of what files belong where on the drive are suspect.  Writing any data to
the drive risks corrupting the entire filesystem.

So we either have to get this special reserved NVRAM, ideally supported by a
BIOS ROM routine that can't possibly have been corrupted, or a network
interface operating under the same constraints that can send a kernel panic
report somewhere that it can be safely saved....

Or we virtualize.

We write a VM system that sets up one or more virtual machines, do our real
computing within the VM(s), which would have panic() configured to put the
panic report into a specific location in (high?) memory  before calling for
a warm reset, then let the host system trap the reset instruction.  It
should be able to detect the signature of a panic, and write the memory
image to a file on the host while rebooting the VM.  Being able to put the
whole mess into a debugger would be so valuable.

As multi-core CPUs and hardware assistance to virtualization become more
common, this should be easier.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://kclug.org/pipermail/kclug/attachments/20070910/119f723d/attachment.htm