New issue
Advanced search Search tips

Issue 907594 link

Starred by 4 users

Issue metadata

Status: Assigned
Owner:
Components:
EstimatedDays: ----
NextAction: ----
OS: Chrome
Pri: 2
Type: Bug

Blocked on:
issue 912801



Sign in to add a comment

Chromeboxes going into an infinite "Chrome OS is repairing" loop

Issue description

UserAgent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_13_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/70.0.3538.102 Safari/537.36
Platform: 10895.78.0 (Official Build) stable-channel ninja

Steps to reproduce the problem:
1. Run a kiosk application normally for anywhere from a few days to a few weeks.

i.e. this is extremely hard to reproduce

What is the expected behavior?

What went wrong?
The Chromeboxes that we have deployed at a customer location will randomly get to this screen (attached image - MVIMG_20181112_120618 (1).jpg) and will not successfully repair themselves no matter how long we let them sit. It has only happened at one customer location. Nothing is different with the application that is running at this location, compared to our other locations. It has happened to over 10 chromeboxes at their location thus far.

There was one specific instance recently where we were able to get a bit more information. A few days ago one of our customer's IT staff noticed that one of the units was missing the icons on the homescreen (which are hardcoded in our application, not remote). He instructed his help desk to reboot the unit remotely. After a restart, the chromebox was stuck in the “repair” mode but not going anywhere.

As mentioned above, the icons are hardcoded in our application leading us to believe that potentially the filesystem is getting corrupted somehow possibly? I have attached a picture of the application corrupted images as well (T1SME.jpg).

Please see what we have tried ourselves to rectify this issue below:
- We have instructed our customer to completely open up outbound traffic for these units. They say they have completed this step, but this is very hard to test. Our thought behind this is some call to Google's backend services could have been being blocked, which could in turn then cause some sort of corruption within the device's filesystem or something similar.
- We have updated to Chrome OS 69 and even 70 on some units, with the problem still happening.
- We have shipped 2 units that were experiencing the issue back to the chromebox mfgr for analysis. Have not heard anything back yet from them.

Did this work before? N/A 

Does this work in other browsers? Yes

Chrome version: 70.0.3538.102  Channel: stable
OS Version: 69.0.3497.120
Flash Version:
 
MVIMG_20181112_120618 (1).jpg
2.8 MB View Download
T1SME.jpg
59.4 KB View Download
Components: Internals>Installer
The manufacturer has stated that the EMMC boot disk is somehow being erased while running an application in single-app kiosk mode - does this help at all?
Blockedon: 912801
Components: -Internals>Installer OS>Hardware
Owner: gwendal@chromium.org
Status: Assigned (was: Unconfirmed)
Looking at the log enclosed, the eMMC device [SanDisk iNAND Ultra e.MMC 5.0 32GB/ aka DS3031] is worn out:

looking at /var/log/storage_info.txt:
eMMC Life Time Estimation A [EXT_CSD_DEVICE_LIFE_TIME_EST_TYP_A]: 0x0b
eMMC Life Time Estimation B [EXT_CSD_DEVICE_LIFE_TIME_EST_TYP_B]: 0x0b
>>> 100% of the lifetime device has been used
eMMC Pre EOL information [EXT_CSD_PRE_EOL_INFO]: 0x01
>>> Device is End Of Life

At that state, the eMMC device behavior is not well defined, but write operations will fail. We are lucky if we can read from the eMMC device at all.

Life usage is monitored with UMA stats. And there is few hundreds Ninja devices 
affected:
https://uma.googleplex.com/p/chrome/timeline_v2/embedded?sid=627064bdc660dd7ec655e7a7812a28a1



[On a side note, looking at bios_info.txt, the device coreboot was never updated, it stays at version Google_Ninja.5216.383.7. It seems no new version has been qualified yet.]

eMMC being worn out should not have happen under normal condition during the warranty period. We need to identify where the excessive writes are coming from.

Looking at a similar issue at crbug.com/912801, there was some excessive writes in logs, but not enough to overwhelm the device.
debug-logs_20181130-100740.tgz
9.2 MB Download
Screenshot 2019-01-06 at 6.18.33 PM.png
369 KB View Download
Labels: -Hotlist-Interop
Looking in more details, it looks like there is a subset for device that are againg very quickly from 70% to EOL in a matter of weeks:
https://uma.googleplex.com/p/chrome/timeline_v2/embedded?sid=382cb08e814852c5a7de22d1c80fc620


Screenshot 2019-01-09 at 10.38.54 AM.png
349 KB View Download

Sign in to add a comment