New issue
Advanced search Search tips
Note: Color blocks (like or ) mean that a user may not be available. Tooltip shows the reason.

Issue 772203 link

Starred by 9 users

Issue metadata

Status: Available
Owner: ----
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: Chrome
Pri: 2
Type: Bug



Sign in to add a comment

Touchpad firmware update causes reboot loop

Issue description

UserAgent: Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/60.0.3112.78 Safari/537.36
Platform: 10004.0.0 (Official Build) dev-channel eve test

Steps to reproduce the problem:
1. Apply diff changes from attached touch_fw_update.diff to /opt/google/touch/scripts/chromeos-google-touch-firmware-update.sh
2. Reboot device
3. Observe reboot loop

What is the expected behavior?
Boots to login screen

What went wrong?
The device enters a reboot loop whenever /sys/class/chromeos/cros_tp/version is not populated - for instance, when using an upstream kernel, which doesn't support the cros_tp firmware expected from the script.

Did this work before? N/A 

Chrome version: 63.0.3230.0  Channel: dev
OS Version: 63.0.3230.0
Flash Version: 27.0.0.159

This issue only affects Eve boards, as they are the only system that is targeted by the /opt/google/touch/scripts/chromeos-google-touch-firmware-update.sh script
 
touch_fw_update.diff
808 bytes Download
Reposting touch_fw_update.diff text here, since it's a small change:


--- chromeos-google-touch-firmware-update.sh	2017-09-27 17:54:02.405365017 -0700
+++ chromeos-google-touch-firmware-update.sh.new	2017-10-05 15:52:31.847109722 -0700
@@ -144,7 +144,8 @@
 force_flash() {
   display_splash
 
-  st_flash --board=eve "${FW_LINK}"
+#  Disabled only to speed up reproduction, makes no difference to the bug
+#  st_flash --board=eve "${FW_LINK}"
 
   if [ "$?" -ne "0" ]; then
     clear_splash
@@ -184,7 +185,7 @@
   local updater_fw_ver="$(get_fw_version_from_disk "${FW_LINK}")"
   log_msg "Current active fw version is: '${active_fw_ver}'"
   log_msg "Current updater fw version is: '${updater_fw_ver}'"
-  if [ -z "${active_fw_ver}" ]; then
+  if [ -n "${active_fw_ver}" ]; then
     log_msg "Unable to detect the active FW version, forcing a reflash."
     force_flash
   fi

Comment 2 by tfiga@chromium.org, Oct 6 2017

Cc: sheckylin@chromium.org wnhuang@chromium.org rongchang@chromium.org adlr@chromium.org
Components: Internals>Input>Touch>Pad
Owner: wnhuang@chromium.org
+Some related people.
Note we found another *unrelated* "reboot hang" issue with mainline kernels where upstart is somehow stuck waiting for display_boot_message() to complete as seen with "initctl log-priority debug". However this reboot hang has a super simple workaround: Ctrl-C! For some reason the Ctrl-C is not needed with chromeos-4.4.

Comment 4 by wnhuang@google.com, Oct 6 2017

The fact that  /sys/class/chromeos/cros_tp/version is not popoulate probably means that you are using an EVT machine with old firmware that does not have the correct device tree binding.

Can you update your firmware and try again?

Comment 5 by wnhuang@google.com, Oct 6 2017

The firmware I refer in #4 is the AP firmware, sorry for the confusion.
#define CROS_EC_DEV_TP_NAME "cros_tp"  is missing from:
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/log/include/linux/mfd/cros_ec.h

Is it possible to have /sys/class/chromeos/cros_tp/version without it?

It seems to be "CHROMIUM" only:
https://chromium-review.googlesource.com/c/chromiumos/third_party/kernel/+/485740


What is AP firmware? I haven't heard of that before. Where do we get it and how do we update it?

Comment 8 by wnhuang@google.com, Oct 6 2017

re #7, AP firmware means BIOS. Just run

chromeos-firmwareupdate -m recovery
Running chromeos-firmwareupdate -m recovery bumped my firmware from:

FW: Google_Eve.9584.76.0
EC: eve_v1.1.6566-81013dc32

to

FW: Google_Eve.9584.86.0
EC: eve_v1.1.6572-0e3f4120f

After the update, I attempted to update to an upstream kernel and the device still enters a reboot loop, as the cros_tp firmware version is not populated through the cros_ec driver in upstream kernels, as Marc pointed out in #6.
I guess AP means Application Processor = mobile phone CPU

> chromeos-firmwareupdate -m recovery

This does more than the "AP" (not a bad thing)
if [ -z "${active_fw_ver}" ]; then
     log_msg "Unable to detect the active FW version, forcing a reflash."
     force_flash # which ends with a reboot

I believe there's a relatively simple design issue here: this code creates a reboot loop with a termination condition too strict. So any upstream kernel causes a reboot loop but not just: any other reason or bug that causes active_fw_ver to be empty will cause the same reboot loop.
Wei-Ning could we store something in VPD to track attempts and just have it fail after X attempts.

The for those wishing to bypass they could just set the value above 'X'.
Cc: vpalatin@chromium.org
Rereading the bug and I realize you are using an upstream kernel, so I think the correct fix is to upstream the cros_ec change:

https://chromium-review.googlesource.com/c/chromiumos/third_party/kernel/+/485740

Since the touchpad RO section should always work, user shouldn't encounter reboot loop.

re #12, yeah that is a possible solution, but it's kind of weird to touch RW_VPD so frequently after the devices is shipped. I think upstreaming the CLs is more reasonable.

Vincent, are there any issues blocking us to upstream this CL?
https://chromium-review.googlesource.com/453780
> Vincent, are there any issues blocking us to upstream this CL?
> https://chromium-review.googlesource.com/453780

Nothing, 
as the FP MCU development it was related to was cancelled, this was low priority, that's it.
Thanks. I'll upstream the cros_tp change then.
Thanks, upstreaming cros_tp will definitely solve the long term. However it won't help:
- people testing/bisecting the kernel back previous to the upstreaming point.
- people not .configuring their kernel "correctly"


> Since the touchpad RO section should always work, user shouldn't encounter reboot loop.

"Should"?

Don't get us wrong: we are used to "shoulds" and even "mays" work when trying random versions from the kernel mainline. However the price/time to pay is rarely as high as a reboot loop (combined with a hang, but I digress). This took us an awful lot of time to debug and it will cost all others the same; a bit less for the ones lucky enough to find this bug.


> it's kind of weird to touch RW_VPD so frequently after the devices is shipped

Isn't there some other place on the filesystem where a number of tries could be persisted across reboots?

> the FP MCU development it was related to was cancelled,

Does this mean chromeos-google-touch-firmware-update.sh doesn't make any visible difference right now? Sorry if I'm getting confused across firmwares.
> (combined with a hang, but I digress)

The other, boot hang issue was just filed by Casey as issue 777678. This reboot loop and that boot hang don't seem related besides being apparently located in the same (and definitely not "upstream first") script.

> Wei-Ning could we store something in VPD to track attempts and just have it fail after X attempts.

To prevent another (and unrelated) reboot loop, this depthcharge code nvram_read()/nvram_write() a "magic byte" https://chromium-review.googlesource.com/c/chromiumos/platform/depthcharge/+/754914/2/src/board/eve/board.c#126

Comment 19 by dtor@google.com, Nov 17 2017

Cc: groeck@chromium.org
Generally speaking ChromeOS expects certain functionality form the kernel, we do not support any random kernel version or random configs.

If we really wanted to handle the missing device case (which for ChromeOS proper should never happen) it would be to check if /sys/class/cros_tp exists at all and abort even before trying to parse the version.


How can you except developers to follow this policy when hit by a reboot loop or hang?

 https://www.chromium.org/chromium-os/chromiumos-design-docs/upstream-first

Is this page obsolete?

Many older Chromebooks don't work well with upstream kernels for various reasons. Cyan style Chromebooks don't work due to a combination of BIOS issues and recent kernel interrupt handling changes. Chromebooks running chromeos-3.18 don't work well with newer kernels because the ASoC firmware file format was changed between 3.18 and 4.4, and the upstream kernel does not support the old format (even though it claims that it does - try booting 4.14 on Caroline). Many such issues are not under our control.

Effectively this means that it is not (or no longer) possible to run an upstream kernel with a non-ChromeOS distribution on affected systems. This also means that is would be challenging to uprev those systems to a more recent ChromeOS kernel, since the problems are not ChromeOS specific. In my opinion, such issues are much more severe than the problem discussed here, since the problem discussed here can easily be solved by booting a ChromeOS kernel.

Maybe ChromeOS should refuse to boot if it detects a non-ChromeOS kernel ?

> Cyan style Chromebooks don't work due to a combination of BIOS issues and recent kernel interrupt handling changes.

Can you provide a reference?

> the upstream kernel does not support the old [audio] format (even though it claims that it does)

This only affect audio so - unlike a reboot loop - it's contained and doesn't block other components from providing a better upstream / non-regression experience.

An audio expert who needs to work with a newer kernel can upgrade config files to the newer upstream format when (very unfortunately) needed.


> This also means that is would be challenging to uprev those systems to a more recent ChromeOS kernel,

There's a huge difference between:
1) developers booting upstream kernels in their lab to develop or debug "upstream first" their own, specific component while ignoring other, unrelated issues;
2) qualifying a different kernel version for production.

This bug is about 1) and not 2)

> Effectively this means that it is not (or no longer) possible to run an upstream kernel with a non-ChromeOS distribution on affected systems.

We still boot ChromeOS with upstream kernels on a very regular basis hence this bug. Sorry I don't understand how "non-ChromeOS distributions" relate to this.


> https://www.chromium.org/chromium-os/chromiumos-design-docs/upstream-first
> Is this page obsolete?

Is comment #21 a long "yes"?

Search for bugs tagged "Kernel-4.14". And, no, #21 is not a long "yes". We know that we are struggling in some areas to upstream our code in a timely fashion. cros_ec is not a glaring example of how things should be done. That doesn't justify dropping the guidelines for upstreaming.


> Maybe ChromeOS should refuse to boot if it detects a non-ChromeOS kernel ?
> [...]
> no, #21 is not a long "yes"

I can't reconcile these two lines with each other.
refusing to boot non-chromeos kernels fundamentally breaks some of the first steps in the debugging process(can we reproduce this on Linux master).
re #25, we have changed the firmware updater and remove the force reflash when we can not get the correct firmware version:

https://chromium-review.googlesource.com/c/chromiumos/platform/touch_updater/+/773999

This should fix the issue you are seeing.
Owner: ----
Status: Available (was: Unconfirmed)

Sign in to add a comment