Monday, 22 July 2013

Kernel Crash Report/Crash Dump Analysis

In my previous post, we have configured how to capture kernel dump for reference click on the link kernel crash dump

Here in this article,we master the basic usage of crash utility to open the dumped memory core and process the information contained therein and to intercept the output.


find out the dumped kernel location and start analyzing the code.
#crash /usr/lib/debug/lib/modules/2.6.32-279.el6.i686/vmlinux /var/crash/\:40\:28/vmcore

KERNEL: /usr/lib/debug/lib/modules/2.6.32-279.el6.i686/vmlinux
DATE: Thu Jul 18 09:40:21 2013
UPTIME: 00:37:21
LOAD AVERAGE: 934.79, 206.96, 67.74
TASKS: 5494
NODENAME: <hostname>
RELEASE: 2.6.32-279.el6.i686
VERSION: #1 SMP Fri Jun 22 10:59:55 UTC 2012
MACHINE: i686  (2933 Mhz)
MEMORY: 895.6 MB
PANIC: "Oops: 0002 [#1] SMP " (check log for details)
PID: 6847
COMMAND: "bash"
TASK: ea142aa0  [THREAD_INFO: db8ae000]
CPU: 0

Explanation of code is as below :-

KERNEL: specifies the kernel running at the time of the crash.
DUMPFILE: is the name of the dumped memory core.
CPUS: is the number of CPUs on your machine.
DATE: specifies the time of the crash.
TASKS: indicates the number of tasks in the memory at the time of the crash. Task is a set of program instructions loaded into memory.
NODENAME: is the name of the crashed host.
RELEASE: and VERSION: specify the kernel release and version.
MACHINE: specifies the architecture of the CPU.
MEMORY: is the size of the physical memory on the crashed machine.
PANIC: specifies what kind of crash occurred on the machine.

Panic refers to the use of magic keys(SysRq), which we deliberately trigger for a crash.

SysRq (System Request) refers to Magic Keys, which allow you to send instructions directly to the kernel. They can be invoked using a keyboard sequence or by echoing letter commands to /proc/sysrq-trigger, provided the functionality is enabled. We have discussed this in the Kdump part.
I attacked system by Denial of service(DoS) using the system to consume all its resources by forking. forkbomb

if you have looked load average of the crashed kernel its too high(934.79, 206.96, 67.74) and the process responsible was PID:6847

PANIC: "Oops: 0002 [#1] SMP " has the value below.

Bit 0 1
0 No page found Invalid access
1 Read or Execute Write
2 Kernel mode User mode
3 Not instruction fetch Instruction fetch

from our PANIC analysis it is clear that "we have a page not found during a write operation in Kernel mode; the fault was not an Instruction Fetch."

we have used "/proc/sysrq-trigger" from the command line to dump our kernel in previous post, but if your system is unresponsive then you would be unable to trigger. In such cases we enable SysRq feature so that we could use magic keys to collect the dump of the crashed kernel.

#echo "1" > /proc/sys/kernel/sysrq  
or add an entry to /etc/sysctl.conf  

#vim /etc/sysctl.conf  
kernel.sysrq = 1

Once configured, you will be able to use magic keys [ alt + PrintScreenSysRq + <options> ] 

Options as are below.

'b'  - Will immediately reboot the system without syncing or unmounting your disks.
'c'  - Will perform a system crash by a NULL pointer deference  A crash dump will be taken if configured.
'd'  - Shows all locks that are held. 
'e'  - Send a SIGTERM to all processes, except for init.
'f'  - Will call oom_kill to kill a memory hog process.
'g'  - Used by kgdb (kernel debugger)
'h'  - Will display help 
'i'  - Send a SIGKILL to all processes, except for init.
'j'  - Forcibly "Just thaw it" - filesystems frozen by the FIFREEZE ioctl.
'k'  - Secure Access Key (SAK) Kills all programs on the current virtual console. 
'l'   - Shows a stack backtrace for all active CPUs.
'm' - Will dump current memory info to your console.
'n'  - Used to make RT tasks nice-able
'o'  - Will shut your system off (if configured and supported).
'p'  - Will dump the current registers and flags to your console.
'q'  - Will dump per CPU lists of all armed hrtimers (but NOT regular timer_list timers) and detailed information about all clockevent devices.
'r'  - Turns off keyboard raw mode and sets it to XLATE.
's'  - Will attempt to sync all mounted filesystems.
't'  - Will dump a list of current tasks and their information to your console.
'u' - Will attempt to remount all mounted filesystems read-only.
'v' - Forcefully restores framebuffer console
'v' - Causes ETM buffer dump [ARM-specific]
'w' - Dumps tasks that are in uninterruptable (blocked) state.
'x' - Used by xmon interface on ppc/powerpc platforms. Show global PMU Registers on sparc64.
'y' - Show global CPU Registers [SPARC-64 specific]
'z' - Dump the ftrace buffer

[ Alt + SysRq + c ] - Crash collected by rebooting the system.

We have almost analyzed upto certain extent, however there are few of the commands which can help us in understanding more. 
Let us look few more basic commands which can be helpful.

crash> help

*              files          mach       repeat      timer
alias         foreach     mod        runq          tree
ascii         fuser         mount     search       union
bt             gdb           net         set             vm
btop          help          p            sig             vtop
dev           ipcs          ps           struct         waitq
dis            irq            pte         swap           whatis
eval           kmem      ptob        sym            wr
exit           list           ptov        sys              q
extend       log           rd           task 

bt - backtrace - Display a kernel stack backtrace :=
The sequence of numbered lines, starting with the hash sign (#) is the call trace. It's a list of kernel functions executed just prior to the crash. This gives us a good indication 
of what happened before the system went down.

crash> bt

PID: 6847   TASK: ea142aa0  CPU: 0   COMMAND: "bash"
 #0 [db8af618] crash_kexec at c049b75c
 #1 [db8af66c] oops_end at c083fe92
(output omitted ...)

foreach - display command data for multiple tasks in the system :=
This command allows for a an examination of various kernel data associated with any, or all, tasks in the system, without having to set the context 
 to each targeted task.

crash>foreach bt
(output omitted..)

log - dump system message buffer :=

The log command dumps  the kernel log buffer contents  inchronological order . This  is  similar to  what  you would see when you type dmesg on a running machine. This is useful when you want to look at the panic or oops message. An oops is triggered by some exception. It is a dump of the CPU register's state and kernel stack at that instant . From the panic message, we can find hints as to how the panic was triggered (e. g. the function or process  or pid or command or address that triggered the panic), the register's information, kernel module list, whether the kernel is 
tainted with proprietary kernel modules loaded, and so  on..

(output omitted..)
SysRq : Trigger a crash
BUG: unable to handle kernel NULL pointer dereference at (null)
IP: [<c06a0d8f>] sysrq_handle_crash+0xf/0x20
*pdpt = 00000000116c3001 *pde = 0000000000000000
Oops: 0002 [#1] SMP
Pid: 6847, comm: bash Not tainted 2.6.32-279.el6.i686 #1 innotek GmbH 
EIP: 0060:[<c06a0d8f>] EFLAGS: 00010096 CPU: 0
EIP is at sysrq_handle_crash+0xf/0x20
(output omitted..)

we don't observe any tainted flags on the kernel, each flag has its own meaning.

P — Proprietary module has been loaded.
F  — Module has been forcibly loaded.
S — SMP with a CPU not  designed for SMP .
R — User forced a module unload.
M — System experienced a machine check exception.
B — System has  hit  bad_page.
U — Users pace- defined naughtiness .
A —  ACPI  table over ridden.
W — Taint on warning.

ps - Display process status :=
Display process status information This command displays process status for selected, or all, processes in the system. If no arguments are entered, the process data is displayed for all processes. The active task is marked with ">"


   6846   5711   0  d169d550  UN   0.1    5992   1112  bash
>  6847      1   0  ea142aa0  RU   0.1    5992   1320  bash
   6848      1   0  d169d000  UN   0.1    5992   1368  bash

when you observe from the log, there are number of bash instances. Hence total number of bash instances are as below

crash> ps | fgrep bash | wc -l

 vm - virtual memory :=
This command displays basic virtual memory information of a context, consisting of a pointer to its mm_struct and page dirctory, its RSS and total virtual memory size; and a list of pointers to each vm_area_struct, its starting and ending address, vm_flags value, and file path name.

crash> vm
PID: 6847   TASK: ea142aa0  CPU: 0   COMMAND: "bash"
   MM       PGD      RSS    TOTAL_VM
e247f740  d16c2000  1320k    5992k
  VMA       START      END    FLAGS  FILE
d16b66f8    70d000    70e000 4040075
d16b6694    760000    77e000 8000875  /lib/
(output omitted..)

files - open files :=
This command displays information about open files of a context. It prints the context's current root directory and current working directory, and then for each open file descriptor it prints a pointer to its file struct, a pointer to its dentry struct, a pointer to the inode, the file type, and the pathname. 

crash> files 6427
PID: 6427   TASK: d4150aa0  CPU: 0   COMMAND: "bash"
ROOT: /    CWD: /var/crash
  0  f6698240  f3609df8  f3610e48  FIFO
  1  d40791c0  f377c8d0  f377d1a8  FIFO
(output omitted..)

runq - run queue :=
This command displays the tasks on the run queues
of each cpu.

crash> runq
CPU 0 RUNQUEUE: c6408680
  CURRENT: PID: 6847   TASK: ea142aa0  COMMAND: "bash"
  RT PRIO_ARRAY: c6408778
     [no tasks queued]
  CFS RB_ROOT: c64086dc
     [120] PID: 7990   TASK: c9bb1000  COMMAND: "bash"
     [120] PID: 8616   TASK: c13f8aa0  COMMAND: "bash"
     [120] PID: 7714   TASK: f2485550  COMMAND: "bash"
(output omitted..)

timer - timer queue data :=
Displays the timer queue entries in chronological order, listing the target function names, the current value of jiffies, and the expiration time of each entry.

TVEC_BASES[0]: c0be66a0
 1941920   d4aa1b74   c0466210  <process_timeout>
 1941920   d847fb74   c0466210  <process_timeout>
(output omitted..)

net - network command :=
Display various network related data

crash> net
 f70e6820   lo
 f4f32020   eth0   <ipaddress-2>
 c171e020   eth1   <ipaddress-1>

we have found some of the basics of kernel dump analysis which might be helpful in knowing what went behind the kernel to crash the system. As a best practice we need to analyze the dump and take necessary actions to avoid the re-occurrences. 

"there are lot of administrators who don't care rebooting server, but need server online".