Debugging early kernel crash

I was trying to get kexec running on our new platform, but it was failing. The kexec executable reads new uimage, dtb etc correctly and also jumps to new kernel, but then nothing – system just hangs. There is already some workaround related to system reset in the new platform that we have, so I was suspecting if the “cpu soft reset” is working correctly or not. This suspicion came to my mind when going through the kexec syscall implementation, which is nothing but a special “reboot” case. However, that was not the case as confirmed by our BSP provider. Since, there is nothing on the debug UART console, I was frustrated on how to proceed further. Luckily our contact person for the platform vendor suggested to enable “earlyprintk”. I knew about that kernel feature, but never thought I would ever gonna need that. But (fortunately) I have to work on so many new things here of which I had no prior experience. So, yea, why not, let us enable earlyprintk. And wow, I could see the kernel was booting, but stalling somewhere after initializing RAM.

Hmm, something is better than nothing. Next thing I did was to take a look at the dmesg log of a regular booting process, and tried to identify what comes next after RAM initialization. I just blindly grepp’ed the console message in kernel source tree, and I was landed in some __initcall – wow, another thing in kernel which I have been ignoring since quite a while.  Now it was a time to understand what it does, and how can I debug it. So naturally it was a time to google and I found an extremely insightful article on github books here. The entire book is awesome to understand linux internals btw. So, then I know how the calls are being made, and next idea was to see which call was causing the trouble.

So I added a few prints in init/main.c from where the __initcall loop is iterating (do_initcall_level). But this function just calls the function by pointers, so how do I know the name of that function. And luckily I am not the only one who wants to know the function name from the function pointer: linux kernel already has a way to do that. I can’t recall it right now, and googled it while writing this down, and found this. The printks can handle more than just function name extraction out of pointer, it can print IP addresses and UUIDs and what not. So, yea, with that I actually printed current function, and next function in queue, and I got to know the blocking function.

diff --git a/init/main.c b/init/main.c
index 2a89545..ee81844 100644
--- a/init/main.c
+++ b/init/main.c
@@ -841,7 +841,7 @@ static char *initcall_level_names[] __initdata = {
 
 static void __init do_initcall_level(int level)
 {
-	initcall_t *fn;
+	initcall_t *fn, *fn_next;
 
 	strcpy(initcall_command_line, saved_command_line);
 	parse_args(initcall_level_names[level],
@@ -850,8 +850,14 @@ static void __init do_initcall_level(int level)
 		   level, level,
 		   &repair_env_string);
 
-	for (fn = initcall_levels[level]; fn < initcall_levels[level+1]; fn++)
+	for (fn = initcall_levels[level]; fn < initcall_levels[level+1]; fn++) {
+		printk(" Calling: %ps\t", fn);
+		fn_next = fn + 1;
+		if (fn_next)
+			printk(" Next: %ps", fn_next);
+		printk("\n");
 		do_one_initcall(*fn);
+	}
 } 
 static void __init do_initcalls(void)

 

This debugging session is more than a month old, but today I had to debug a similar problem, the early boot up crash, where kernel just refused to boot, and I had to enable the earlyprintk. I am just writing this down, because I don’t want to forget how did I debug such kind of issues.

Happy debugging!

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s