Debugging again…

I got one more kernel related problem which made me happier, that finally I was debugging something which I love. But the fading of love is directly proportional to the time it takes to solve the problem. And then, a sudden surge of satisfaction, a kind of kick you get, once you figured out the root cause.

This time the problem was showing the symptom at user-space. We recently upgraded the kernel version, and already working code at userspace broke after the upgrade. The basic logic would say that something is broken at kernel space. Since the problem was appearing when running our complex audio application, I wanted to get rid of all the complex stuff and wanted to reproduce it with basic ALSA utilities (aplay, arecord). And as usual, the problem did not appear when trying to reproduce with aplay and arecord. This was enough to force us to think that something is wrong with audio application. The exception reported by the application was about failing to set the ALSA hardware params. I had faces similar problem few years, where hardware param was failing due to passing invalid arguments. But how to identify what parameter was wrong? On the other hand, there was no change in codec driver and application between kernel upgrade. Nevertheless I brushed off the dust from memory and found the actual place in kernel (sound/core/pcm_native.c – snd_pcm_hw_refine()) where the hardware params are refined and could potentially fail if passed an invalid value. The entire calculation and the parameter refinement code is still a puzzle for me. Anyway, I could see that the channel count passed by the audio application was 4 for playback and it turned out be the culprit. More obvious and logical next step was to check whether codec supports 4 channels or not, and found codec actually supports 6 channels for playback. Hmm, strange. Then I tried to run aplay with 4 channels and that worked, without any problem. Looking at the application logs and comments from other engineer who worked on the same problem before me (but could not continue further due to other tasks), I figured the audio application makes use of an ALSA plugin. The ALSA plugin takes 2 channels as input and converts into 4 and passes to actual sound card. I thought something may be broken in the plugin code, but was not fully convinced, since nothing changed at user-space, and the code works just fine with old kernel.

And I entered into the usual printk-debugging method and spent rest of the day without any luck. Spent some time next day to understand the refinement code in pcm_native.c but little or no documentation makes it difficult. Then suddenly thought of taking a look at McASP platform driver (sound/soc/davinci/davinci-mcasp.c) and did a quick diff between two kernel versions. There were significant changes. I glanced through them and found many changes related to imposing channel related constraints. That gave me a direction for debugging further, and found the piece of code which actually overwriting the channel count, and setting the new channel constraint based on overwritten channel count, which I thought was the source of problem. I added some code to bypass that logic and whoa, the user-space application worked like a charm. I also figured why the standalone ALSA apps (aplay and arecord) failed to reproduce the problem. The problem was showing up with the audio application because it was opening playback and capture device at the same time. The order also mattered, because the application was opening capture device prior to playback device. Had I done the same, I could have encountered the problem. Now it is on my task list for Tuesday. Anyway I thought the channel constraint code can be fixed by having per stream channel count and prepared the patch for the same, in anticipation of upstreaming one more kernel patch. While writing the commit message, I thought of taking a look at the code once more to better formulate the message. That’s when I stumbled upon the code commentary which actually describes why the logic was done in that way; and I saved myself from embarrassment. Well, but I learnt something and at the end of the day it was greatly satisfying.

Happy debugging!


Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s