Ok folks, here's a mindset wrench.
CAN we make a set of fallbacks to allow a certain minimal function allowing a potential "panic" to seek external help? This help could come from either defaults designed to establish a session with human agents or some AI mimic. Starting from the point I have left us at where it is taken as not arguable that a Knoppix concept routine set CAN establish a remote session and functional remote console interface, then the external intelligence hopefully could complete the Linux install.
Constructive comments anyone?
Oren Beck
"So, you mean there was a time when an installer did NOT include remote support?"
The idea is to get whatever is causing the kernel panic worked out before you deploy the box outside a supervised lab. If you're still getting kernel panics, the code's not ready.
On 9/10/07, Oren Beck orenbeck@gmail.com wrote:
Ok folks, here's a mindset wrench.
CAN we make a set of fallbacks to allow a certain minimal function allowing a potential "panic" to seek external help?
Not much.
The defining characteristic of a kernel panic is an error that the kernel has detected within its own data structures and procedures, from which recovery is not possible. An application shouldn't be able to cause a panic; only kernel code itself (including drivers) or a HW problem like bad RAM can do so. AIX and Macs have some NVRAM set aside into which the kernel is able to write some information about the error before shutting down or rebooting. At boot time, these OSes copy the error information to a file on the hard drive from which further analysis is possible. That part could easily be automated.
But the only reason that IBM and Apple are able to do this is that they control the hardware. They are able to spec the exact way in which this NVRAM is to be written. It isn't safe for a kernel that has panicked to write to any hard drive, because the kernel data structures that keep track of what files belong where on the drive are suspect. Writing any data to the drive risks corrupting the entire filesystem.
So we either have to get this special reserved NVRAM, ideally supported by a BIOS ROM routine that can't possibly have been corrupted, or a network interface operating under the same constraints that can send a kernel panic report somewhere that it can be safely saved....
Or we virtualize.
We write a VM system that sets up one or more virtual machines, do our real computing within the VM(s), which would have panic() configured to put the panic report into a specific location in (high?) memory before calling for a warm reset, then let the host system trap the reset instruction. It should be able to detect the signature of a panic, and write the memory image to a file on the host while rebooting the VM. Being able to put the whole mess into a debugger would be so valuable.
As multi-core CPUs and hardware assistance to virtualization become more common, this should be easier.
So what happens when there is a problem with the fallback-to-crawl system?
I believe there appears in recent kernels a system for loading a different kernel, so you might be able to rig something to switch to your known-good instead of requiring you to reboot again, if you are in a panic/recompile/repeat cycle, but setting that up would certainly harder be than the problem at hand fixing.
On 9/10/07, Oren Beck orenbeck@gmail.com wrote:
Ok folks, here's a mindset wrench.
CAN we make a set of fallbacks to allow a certain minimal function allowing a potential "panic" to seek external help? This help could come from either defaults designed to establish a session with human agents or some AI mimic. Starting from the point I have left us at where it is taken as not arguable that a Knoppix concept routine set CAN establish a remote session and functional remote console interface, then the external intelligence hopefully could complete the Linux install.
Constructive comments anyone?
Oren Beck
"So, you mean there was a time when an installer did NOT include remote support?"
Kclug mailing list Kclug@kclug.org http://kclug.org/mailman/listinfo/kclug
On 9/10/07, David Nicol davidnicol@gmail.com wrote:
So what happens when there is a problem with the fallback-to-crawl system?
I believe there appears in recent kernels a system for loading a different kernel, so you might be able to rig something to switch to your known-good instead of requiring you to reboot again, if you are in a panic/recompile/repeat cycle, but setting that up would certainly harder be than the problem at hand fixing.
So far- some good starting points and thanks!
The direction I hope we could use as a "target" is one I can best describe as - "Load a stable minimal live assist with communication-THEN begin the install or whatever-save the Oh, let's call it ID for InitialDegubber to a USB device bootable if possible- then any later event causing a Something Bad has a way to cry for more skilled help" Shorter statement is - something akin to a netboot yet only invoked upon a drastic failure. The teaser is a searchable "BreakFix" repository that such an ID could query as "self remedy" with the dividend of FEEDBACK to learn hardware compatibilities etc.
Oh, think of it this way as a closer- " do you always feed back false driver detects"?
Oren Beck
and when you are done, do you donate the work to Knoppix for the glory, or what?
On 9/11/07, Oren Beck orenbeck@gmail.com wrote:
Oh, think of it this way as a closer- " do you always feed back false driver detects"?
Oren Beck
On 9/11/07, David Nicol davidnicol@gmail.com wrote:
and when you are done, do you donate the work to Knoppix for the glory, or what?
On 9/11/07, Oren Beck orenbeck@gmail.com wrote:
Oh, think of it this way as a closer- " do you always feed back false
driver
detects"?
Oren Beck
-- Louie, Louie, we've got to go.
Actually, it's not so much glory or donation or anything more than my hoping we'd be able to improve these systems even if only by making what would be a hard crash into a managed request for external assist. With that request only after exhausting every viable retry~change variable~ within reason before conceding. Which to become a learning from experiences process needs feedback.
"Think of it as Kaizen in action"
Oren Beck