Debugging
suspend-resume in Linux kernel is challenging. Suspending a core
involves turning off or putting peripherals in a low power mode. This
involves UART as well, which means adding printks and console for
debugging are ruled out after a certain point in the suspend path. This is akin to early bootloader or kernel debugging
Linux kernel tools for debugging suspend
Linux kernel provides certain utilities for debugging PM (power management) modes including suspend.
- Pass commandline option no_console_suspend to kernel
With
this, console won't be suspended (when the tty layer suspends), so we
can continue to see printks and debug logs from other driver suspends
till very late in the suspend code path.
-
Test modes with kernel sys entries /sys/power/state
Various stages of suspend (like
process freezing, device suspend, platform suspend and core) can be
tested by passing respective power management parameters to
/sys/power/state and suspend path only till that stage will be executed
before 'resume' happens.
More details
can be found in www.kernel.org/doc/Documentation/power/basic-pm-debugging.txt
Debugging using Hardware mechanisms
Most
of the interesting suspend issues I faced was either in the late
suspend or early resume code path, mostly because of hardware, processor
related issues than pure software bugs. These typically are the most
challenging because the errors are sporadic, unpredictable and hard to
reproduce.
I have found GPIOs and Hardware Debuggers like JTAG to be very useful for late stage suspend, especially when all the drivers and platform have been suspended, and we enter into code that runs outside of DDR. (static RAM based IRAM in the iMX architecture). The IRAM code implements late suspend and early resume and essentially does following
- Put DDR in self refresh so that DDR controller can be put in low power mode/turned off.
- Switch ARM core PLL to run from a base Oscillator clock that stays on during suspend.
- Turn off/switch some SoC PLLs and put their regulators into bypass mode.
- invalidate, flush and disable caches (from closer to further i.e., L1 first then L2)
- Finally, put ARM core to sleep by executing ARM WFI (wait for interrupt) instruction
This is all assembly code (since the code is hand-written, and mapped into small IRAM address space) with lot of tight loops (for e.g. wait for PLL changes take effect by looping till a bit is set). A hang here would cause the device not to suspend fully or worse, not wake-up.
Good 'old GPIOs can be used to naildown to the perticular instruction, or use a JTAG to single-step.
Debugging
early resumes with JTAG is tricky because, JTAG communication with
processor is lost when suspended. When a button press or some other
mechanism is used to trigger device wake-up/resume, and by the time JTAG is
connected, we would have already executed the problematic code and would be hung by then!
My favorite trick is to add a
simple tight loop just after ARM WFI instruction, so the
code resumes and tight loops there, allowing one to attach JTAG and
break out of the loop by modifying the condition register, and then
single-step. This will also help root-cause genuine hardware problems disallowing wake interrupts in the first place hence not able to come
out of WFI
Got the idea? Same can be applied for early uboot or kernel debugging when console is not yet setup.
No comments:
Post a Comment