New issue
Advanced search Search tips
Note: Color blocks (like or ) mean that a user may not be available. Tooltip shows the reason.

Issue 757359 link

Starred by 2 users

Issue metadata

Status: Duplicate
Owner:
User never visited
Closed: Sep 2017
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: Chrome
Pri: 1
Type: Bug



Sign in to add a comment

ChromeOS issue: The digital signage web page is stuck on final image of a video file.

Project Member Reported by ryutas@chromium.org, Aug 21 2017

Issue description

ChromeOS version: 59.0.3071.91  (9460.60.0 (Official Build) stable-channel ninja)
ChromeOS device model: AOpen Chromebox Commercial. Ninja.5216.383.7
Case#: 13431459

Description:The digital signage web page is stuck on final image of a video file.

Steps to reproduce: 
1. Enroll the device.
2. Configure KIOSK and load video file.

(Please refer the details of vide file, link below in a share file)
https://docs.google.com/a/google.com/document/d/11eHBiYe4aUmZmlgbaJPBzcHs5U8ss-DDD6u-z2tO5HQ/edit?usp=sharing
3. The digital signage web page is stuck on final image of a video file 

Current Behavior / Reproduction: 
The digital signage web page is stuck on final image of a video file.

Expected Behavior: 
The digital signage web page should not stuck/freeze.

Drive link to logs: 
debug-logs_20170818-142123: https://drive.google.com/drive/folders/0B01YYztUbOCudE1HSzFsYlVTTU0?usp=sharing
Suspecting time of the device stuck. 20:46 2017-08-18.

Extra info:
- Sometime the devices show as online in CDM and can be rebooted remotely, after which they play fine. But, This morning was showing as offline in CDM and manually rebooted was required.
- The logs are top to bottom full of error lines, but after rebooting and checking back again they’re showing the same thing still (albeit a different Chrome error code, but same format and consistency etc) and the screens are cycling through content.
-There was not a valid crash ID in the provided Logfiles . 


Found similar log messages in below crbug cases.

https://bugs.chromium.org/p/chromium/issues/detail?id=756191
ERR imageloader[456]: Failed to read latest-version file.
ERR imageloader[456]: Failed to verify and mount component.

https://bugs.chromium.org/p/chromium/issues/detail?id=719040
Attribute compr_data_size (and others) will be removed. See zram documentation.


 

Comment 1 by roy...@google.com, Aug 22 2017

Labels: -Pri-2 Pri-1

Comment 2 by roy...@google.com, Aug 23 2017

Adding more context from the customer. 
Logs are here: https://drive.google.com/drive/folders/0Bx12NVyIouI0QWdkbi0yWDgyOVk?usp=sharing
===========================================

The architecture of the application used is that it runs a chrome app in kiosk mode containing a webview.  We hit a backend service which controls what urls are displayed in the webview.  The app calls "chrome.power.requestKeepAwake('display')" to stop the device going to sleep.
Symptoms: these are the scenarios discovered so far:
  1. Chrome stumbles with a “trap invalid opcode” error which crashes the browser and the media player eventually puts itself to sleep (flashing power light on media player)
    - feedback is this indicates that the Chrome process has tried to run an assembler level command that doesn’t exist.  This can happen when there is an error in the process being run (Chrome in this case) where it starts trying to run parts of the code that aren’t actually the program, e.g. it’s trying to execute bits of data.
  2. The digital signage web page shows a white screen (the chrome kiosk app appears to be ‘live’)
    - Chrome logs show consistency in error codes pointing to “cras_client: socketpair: Too many open files”
  3. The digital signage web page is stuck on final image of a video file “Keno Let’s Play” (the chrome kiosk app appears to be ‘live’)
    - The devices show as online in CDM and can be rebooted remotely, after which they play fine
    - The logs are top to bottom full of error lines, but after rebooting and checking back again they’re showing the same thing still (albeit a different Chrome error code, but same format and consistency etc) and the screens are cycling through content
  4. Sudden unexplained disconnection from Control Centre and Chrome Admin (I believe the media player turns off completely and the screen goes black)
    - There are devices where the Chrome logs appear to be running completely normally at the point the device has disconnected from Control Centre. These eventually suddenly shutdown and nothing is captured in the logs about any shutdown processes. I need these devices to be rebooted to pull down the updated log post-crash to see if there’s any further information in them
 
Logs are attached for a sample from this week, and a spreadsheet highlighting where I see the issues occurring. Below is a summary of findings so far:
  - Zagame’s Boronia M2 (10/08) – AOPEN device gone to sleep. Log shows “trap invalid opcode”
  - Zagame’s Boronia Blade – AOPEN device gone to sleep. Log shows “trap invalid opcode”
  - Capalaba Bistro Blade – AOPEN device gone to sleep. Log shows “trap invalid opcode”
  - Zagame’s Boronia M2 (15/08) – AOPEN device appears online, but TV is showing a white screen (CDM screen grab) – log shows “too many open files”?
  - Cabramatta Inn M2 – Went offline in Control Centre 8:15am 16/08, shows as online in CDM. Screen grabs (attached) show screen frozen on “Keno Let’s Play” image. Rebooted via CDM and all fine. Can’t make out anything from log
  - Caboolture M2 – offline in Control Centre 10:39am 16/08, showing Offline/unavailable in CDM so can’t take screen grab, but rebooted remotely and all ok. Can’t make out anything from log
Chrome OS version (found by navigating to "chrome://version"): OS is 59.0.3071.91 across the board. The issues have been evident for a couple months now (but weren’t being reported properly until more recently when we realised it was a much bigger issue), so whilst we were on OSv57 and now on OSv59.
Cc: krishna...@chromium.org mlight@chromium.org

Comment 4 by roy...@google.com, Aug 23 2017

Labels: ReleaseBlock-Stable
[RBS for visibility. Feel free to remove after this is triaged]

Comment 5 by mlight@chromium.org, Aug 23 2017

This looks like  Issue 696734 , where playing high-def video was causing Ninja and Sumo devices to freeze.    A fix was made and verified around July 7, 2017.  The latest published stable image (build 9460.73.0, chrome 59.0.3071.134) should include the fix.

Another example: replicated the customer config with their web content running in the browser. Log analyzer reports multiple errors: out of memory, crash, multiple instances of NOTREACHED()

File available from
https://drive.google.com/a/google.com/file/d/0B_5yutJvxlGoaHFVNXlMWDNxY0k/view?usp=sharing

Comment 7 by roy...@google.com, Aug 23 2017

Regarding OOM: Can you get us more info on how much RAM this device has ?
Crash id: a6b47792d499eec6 looks like a kernel warning which I've been told we can ignore.
NOTREACHED: may not be relavent here.

I think the most important finding in this log is the OOM. That can happen for two reasons (i) Because we have a memory leak in Chrome (ii) Because the app itself has a memory leak.

To rule out (ii) usually people try to run the app in a regular chrome browser and try to profile it. Is this something you can do for the customer ? I haven't heard any other customer reporting a memory leak recently so I doubt its (i), but there can always be edge cases we don't know about. 
4GB memory in the test device - Ninja. 
Running same content in Chrome Browser (Windows, build 60.0.3112.101 official release) to check.
Cc: sduraisamy@chromium.org
mlight@, are you able to repro the issue by running the URL (http://d3huy4irkvet32.cloudfront.net/live/screen/SIGNAGE-L5WBG/index.html) in a chrome browser on a Ninja device using a build with the fix for  Issue 696734 ?
Update on #8 - Have been profiling content in Windows Chrome browser for 20 hours now, no memory leak.

Updated same Ninja device to 60.0.3112.101 - content has also been running 20 hours, no errors seen so far.

Comment 11 by roy...@google.com, Aug 24 2017

Owner: sduraisamy@chromium.org
[Raj: Please help us find an explicit owner for this and assign it.]
Labels: -M-59 M-61
Owner: krishna...@chromium.org
Status: Assigned (was: Untriaged)
Krishna, please try to repro this issue on a Ninja and add the relevant logs. Thank you.
First I tried to reproduce the problem using the customer's setup (M-59 build 9460.60.0, chrome 59.0.3071.91).  I created a Chrome Sign Builder schedule for the URL http://d3huy4irkvet32.cloudfront.net/live/screen/SIGNAGE-L5WBG/index.html so it could be run in kiosk mode.

CSB began running at approximately 4:40 pm on Wed. Aug. 23, 2017.

It ran overnight fine, and most of this morning, until exactly 10:57 am Pacific time when this was reported in /var/log/powerd.20170823-163809 (summarizing):

"Power button was pressed when no display was attached.  Shutting down due to user request."

(Full logs attached).

Couple of things:  I didn't touch the power button this morning; and a display was attached, albeit turned off.


Full system logs attached (/media/mlight/OEM/debug-logs_20170824-155111.tgz)

powerd.20170823-163809
1.3 MB Download
debug-logs_20170824-155111.tgz
551 KB Download

Comment 15 by roy...@google.com, Aug 24 2017

About 7 minutes before this event, someone attached a keyboard to it. Would you know why the keyboard was touched, and if there is anything one can do from the keyboard which may look like a shutdown command ?

2017-08-24T10:47:58.779891-07:00 ERR chrome[2549]: [2549:2549:0824/104758.779692:INFO:status_uploader.cc(226)] Status upload successful
2017-08-24T10:50:45.272630-07:00 INFO kernel: [65663.452337] usb 1-3: new full-speed USB device number 6 using xhci_hcd
2017-08-24T10:50:45.439021-07:00 INFO kernel: [65663.619370] usb 1-3: New USB device found, idVendor=046d, idProduct=c52b
2017-08-24T10:50:45.439091-07:00 INFO kernel: [65663.619407] usb 1-3: New USB device strings: Mfr=1, Product=2, SerialNumber=0
2017-08-24T10:50:45.439110-07:00 INFO kernel: [65663.619430] usb 1-3: Product: USB Receiver
2017-08-24T10:50:45.439120-07:00 INFO kernel: [65663.619449] usb 1-3: Manufacturer: Logitech
2017-08-24T10:50:45.451596-07:00 INFO kernel: [65663.631862] logitech-djreceiver 0003:046D:C52B.0007: hiddev0,hidraw0: USB HID v1.11 Device [Logitech USB Receiver] on usb-0000:00:14.0-3/input2
2017-08-24T10:50:45.562270-07:00 INFO kernel: [65663.742261] input: Logitech TK820 as /devices/pci0000:00/0000:00:14.0/usb1/1-3/1-3:1.2/0003:046D:C52B.0007/0003:046D:4102.0008/input/input8
2017-08-24T10:50:45.563601-07:00 INFO kernel: [65663.744221] logitech-hidpp-device 0003:046D:4102.0008: input,hidraw1: USB HID v1.11 Keyboard [Logitech TK820] on usb-0000:00:14.0-3:1
2017-08-24T10:50:45.613030-07:00 NOTICE root[540]: /opt/google/input/device_added 
2017-08-24T10:50:45.656238-07:00 NOTICE root[548]: /opt/google/input/device_added /dev/input/event7
2017-08-24T10:58:06.635570-07:00 WARNING kernel: [66105.522747] init: debugd main process (2570) killed by TERM signal
2017-08-24T10:58:06.648123-07:00 ERR shill[882]: [ERROR:object_proxy.cc(582)] Failed to call method: org.chromium.dhcpcd.Release: object_path= /org/chromium/dhcpcd: org.freedesktop.DBus.Error.ServiceUnknown: The name :1.13 was not provided by any .service files
2017-08-24T10:58:06.648157-07:00 ERR shill[882]: [ERROR:dbus_method_invoker.h(111)] CallMethodAndBlockWithTimeout(...): Domain=dbus, Code

For a quick experiment I started CSB again, disconnected the monitor and pressed the power button.  Sure enough, the ninja did shut down, completely.

However, for last night's run the device's power button was flashing blue ("sleep mode").  A brief press of the power button should have woken it up, I believe.

I plugged in a usb dongle for a keyboard/mouse combo to do a "vmstat" command.

I'll do another run overnight, and leave the monitor on the whole time.


For comparison, the Ninja I have running 60.0.3112.101 is still running without error.
The customer provided other log file and I save it to the share drive Link below.
File Name:0003573sat26AUG51451debug-logs_20170824-224333
Link:https://drive.google.com/open?id=0B01YYztUbOCudE1HSzFsYlVTTU0

From comment 2, > symptoms 1. “1. Chrome stumbles with a “trap invalid opcode” error which crashes the browser and the media player eventually puts itself to sleep (flashing power light on media player)”

I noticed that “trap invalid opcode” error message in the provided message file.
2017-08-16T20:58:17.378712+10:00 INFO periodic_scheduler[27272]: cros-machine-id-regen: job completed
2017-08-16T21:14:43.493827+10:00 INFO kernel: [998701.876421] traps: Chrome_ProcessL[30198] trap invalid opcode ip:5d4b8cbbcf2b sp:7d6d0c9f0ca0 error:0 in chrome[5d4b8b153000+796c000]
2017-08-16T21:14:43.518477+10:00 WARNING crash_reporter[29562]: [user] Received crash notification for chrome[30016] sig 4, user 1000 (ignoring call by kernel - chrome crash; waiting for chrome to call us directly)
2017-08-16T21:14:43.625166+10:00 INFO session_manager[29987]: [INFO:child_exit_handler.cc(77)] Handling 30016 exit.
2017-08-16T21:14:43.625412+10:00 ERR session_manager[29987]: [ERROR:child_exit_handler.cc(85)]   Exited with signal 4

krishnargv@ who should be the owner for this stable blocker?
Cc: afakhry@chromium.org
Owner: xiy...@chromium.org
@xiyuan, Would you be able to assist us with this. 

The “trap invalid opcode” (aka SIGILL) in #20 happens in ontent::ZygoteCommunication::ForkRequest based on ip:5d4b8cbbcf2b and chrome[5d4b8b153000+796c000] in the log line.

Around that timestamp in ui log:

[30016:20586:0816/211427.227184:ERROR:cpu_data_collector.cc(280)] CPU freq stats not available in sysfs.
ExceptionHandler::GenerateDump sys_pipe failed:Too many open files
ExceptionHandler::SendContinueSignalToChild sys_write failed:Bad file descriptor
ExceptionHandler::WaitForContinueSignal sys_read failed:Bad file descriptor
Failed to generate minidump.[0816/211443.827476:WARNING:diagnostics_writer.cc(209)] [FAIL] 016 JSONLocalState (File too big)

My guess is the problem looks related to "Too many open files". Chrome is running out of fd somehow.
Just noticed "[FAIL] 016 JSONLocalState (File too big)" in the log I pasted. This is usually an indication of FS corruption. It shows up when chrome fails to load "Local State" file.
A number of Ninja hangs when the monitor is turned off for a lengthy period is also being tracked in Issue 719040.  I did an overnight run last night with the casino ads video loop, which lasted for 11 hours with the monitor off, and then hung badly.  Even "alt + VolUp + x x x" wouldn't wake up the kernel.
 
This issue was reported on 9460.60.0.

mlight this also looks like your comment #73 on https://bugs.chromium.org/p/chromium/issues/detail?id=734708, which is 9592.82.0
and https://bugs.chromium.org/p/chromium/issues/detail?id=719040 is testing on the same version. 
Cc: xiy...@chromium.org
Owner: marc...@chromium.org
The “trap invalid opcode” caused by running out of fd problem is issue 733718 and we raised soft fd limit to 8k for chromeos (https://codereview.chromium.org/2969453002). The fix is merged to M59 (59.0.3071.134) and M60 (60.0.3112.52).

However, the freeze problem is most likely issue 719040. It is a combination of hardware and kernel/graphics. For this issue, it is tracking problem happening on AOpen ninja with version 59.0.3071.91 (9460.60.0).

marcheu@ can you help to triage?
Cc: snanda@chromium.org
Owner: rajatja@chromium.org
This is a result of the switch to kernel 4.4, so assigning to kernel folks
Cc: avkodipelli@chromium.org
rajatja@ This is currently a stable blocker. Can you please take a look and share your input here?

Comment 35 by rajatja@google.com, Sep 11 2017

Mergedinto: 719040
Status: Duplicate (was: Assigned)
The Ninja hang issue while playing videos (where the system hangs and does not wake up even using Alt + VolUp + x + x + x) is being tracked in crbug.com/719040. It seems that the "trap invalid opcode" / out of fd(s) issue has been triaged and fixed (per #31).

Comment 36 by rajatja@google.com, Sep 11 2017

Cc: benzh@chromium.org

Sign in to add a comment