New issue
Advanced search Search tips
Note: Color blocks (like or ) mean that a user may not be available. Tooltip shows the reason.
Starred by 5 users
Status: Started
Owner:
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: Chrome
Pri: 0
Type: Bug



Sign in to add a comment
cheets_StartAndroid.stress timeout
Project Member Reported by ddavenp...@chromium.org, Nov 10 Back to list
This failed on kevin-paladin:
http://cautotest-prod/tko/retrieve_logs.cgi?job=/results/155404104-chromeos-test/
https://uberchromegw.corp.google.com/i/chromeos/builders/kevin-paladin/builds/2998

It appears to have timed out when starting the browser backend, and may not necessarily be an issue with arc:

11/10 06:03:00.959 ERROR|           browser:0068| Failed with TimeoutException while starting the browser backend.
 
Specifically this times out waiting for oobe_exists in cros_browser_backend.py:172

py_utils.WaitFor(lambda: self.oobe_exists, 30)
Saw this again on cheets_StartAndroid.stress on auron_yuna-paladin:

https://luci-milo.appspot.com/buildbot/chromeos/auron_yuna-paladin/1653
https://pantheon.corp.google.com/storage/browser/chromeos-autotest-results/162250346-chromeos-test/chromeos4-row10-rack8-host9/

12/13 04:37:26.154 ERROR|           browser:0050| Failed with TimeoutException while starting the browser backend.
TimeoutException: Timed out while waiting 30s for py_utils.WaitFor(lambda: self.oobe_exists, 30).

Cc: deanliao@chromium.org dtor@chromium.org ecgh@chromium.org michae...@chromium.org risan@chromium.org
Owner: levarum@chromium.org
Status: Assigned
At first blush, his looks like a product bug, presumably in
ARC++, not caused by infrastructure.

levarum@ - could you do some initial triage?  The top-level
question is "is this problem caused by a CL in the CQ run,
or is it a bug in the tree?"

Cc: lhchavez@chromium.org yusukes@chromium.org
Owner: ----
Status: Available
In

cheets_StartAndroid.stress/sysinfo/var/log/chrome/chrome_20171213-043621

this looks suspicious:

[28319:28319:1213/043626.291292:VERBOSE1:arc_session_manager.cc(563)] ARC opt-in. Starting ARC session.
[28319:28319:1213/043626.291363:VERBOSE1:arc_session_runner.cc(185)] Session start requested: FULL_INSTANCE
[28319:28319:1213/043626.291377:VERBOSE1:arc_session_runner.cc(258)] Starting ARC instance
[28319:28319:1213/043626.730362:ERROR:customization_document.cc(623)] Customization manifest is missing on server: https://ssl.gstatic.com/chrome/chromeos-customization/yuna.json
[28319:28319:1213/043626.759986:WARNING:memory_kills_monitor.cc(175)] Monitoring has been started
[28319:28319:1213/043627.303373:INFO:arc_session_impl.cc(459)] ARC ready.
[28319:28319:1213/043627.372469:ERROR:server.cc(1915)] Not implemented reached in void exo::wayland::(anonymous namespace)::remote_surface_set_rectangular_shadow_background_opacity_DEPRECATED(wl_client *, wl_resource *, wl_fixed_t)
[28319:28319:1213/043630.288315:VERBOSE1:arc_net_host_impl.cc(721)] GetDefaultNetwork: default network is /service/2
[28319:28319:1213/043630.299925:ERROR:object_proxy.cc(626)] Failed to call method: org.chromium.Midis.BootstrapMojoConnection: object_path= /org/chromium/Midis: org.freedesktop.DBus.Error.ServiceUnknown: The name org.chromium.Midis was not provided by any .service files
[28319:28319:1213/043630.299986:ERROR:arc_midis_bridge.cc(65)] ArcMidisBridge had a failure in D-Bus with the daemon.
[28319:28319:1213/043631.861828:VERBOSE1:connection_holder.h(384)] Instance arc::mojom::AppInstance not available.
[28319:28319:1213/043634.981454:VERBOSE1:connection_holder.h(384)] Instance arc::mojom::IntentHelperInstance not available.
[28319:28319:1213/043634.981983:VERBOSE1:arc_settings_service.cc(241)] OnPrefChanged: arc.backup_restore.enabled
[28319:28319:1213/043634.982061:VERBOSE1:arc_settings_service.cc(241)] OnPrefChanged: arc.location_service.enabled
[28319:28319:1213/043648.043337:VERBOSE1:arc_net_host_impl.cc(758)] New default network: /service/2 (ethernet)
[28319:28319:1213/043648.043774:WARNING:statistics_provider.cc(370)] Requested statistic not found: Product_S/N
[28319:28319:1213/043648.044319:VERBOSE1:arc_net_host_impl.cc(758)] New default network: /service/2 (ethernet)
[28319:28319:1213/043648.198988:VERBOSE1:arc_session_runner.cc(236)] OnShutdown
[28319:28319:1213/043648.199078:VERBOSE1:arc_bridge_host_impl.cc(281)] Mojo connection lost
[28319:28319:1213/043648.200962:VERBOSE1:arc_net_host_impl.cc(912)] AndroidVpnStateChanged: state=ConnectionStateType::NOT_CONNECTED service=
[28319:28319:1213/043648.201321:INFO:arc_session_runner.cc(287)] ARC stopped: SHUTDOWN
[28319:28319:1213/043648.256631:VERBOSE1:arc_session_runner.cc(236)] OnShutdown
[28319:28319:1213/043648.306582:WARNING:pref_notifier_impl.cc(28)] Init observer found at shutdown.

@yusukes, @lhchavez are any of these messages fatal?
arc_session_runner / arc_bridge_host_impl are consequences (not causes) of another fatal issue in the Android side.
 Issue 794719  has been merged into this issue.
Still happening:

https://uberchromegw.corp.google.com/i/chromeos/builders/kevin-release/builds/1808

2/27 05:23:52.579 DEBUG|          autotest:1310| AUTOTEST_STATUS::		FAIL	cheets_StartAndroid.stress	cheets_StartAndroid.stress	timestamp=1514381032	localtime=Dec 27 05:23:52	Unhandled TimeoutException: Timed out while waiting 30s for py_utils.WaitFor(lambda: self.oobe_exists, 30).
12/27 05:23:52.579 INFO |        server_job:0218| 		FAIL	cheets_StartAndroid.stress	cheets_StartAndroid.stress	timestamp=1514381032	localtime=Dec 27 05:23:52	Unhandled TimeoutException: Timed out while waiting 30s for py_utils.WaitFor(lambda: self.oobe_exists, 30).

Any ARC++ engineer can take this?

If it's helpful, here's the relevant part of cheets_StartAndroid.stress.DEBUG from https://pantheon.corp.google.com/storage/browser/chromeos-autotest-results/165673689-chromeos-test/chromeos6-row2-rack24-host18/cheets_StartAndroid.stress/debug/:

12/27 05:23:11.914 INFO |cheets_StartAndroi:0048| cheets_StartAndroid iteration 8
12/27 05:23:11.915 DEBUG|          arc_util:0040| ARC is enabled in mode enabled
12/27 05:23:11.916 DEBUG|          arc_util:0040| ARC is enabled in mode enabled
12/27 05:23:11.928 INFO |    browser_finder:0123| Chose browser: PossibleCrOSBrowser(browser_type=system)
12/27 05:23:11.929 DEBUG|    cros_interface:0058| sh -c mktemp -d /tmp/extension_XXXXX 
12/27 05:23:11.930 DEBUG|      global_hooks:0056| ['sh', '-c', 'mktemp -d /tmp/extension_XXXXX']
12/27 05:23:11.946 DEBUG|    cros_interface:0067|  > stdout=[/tmp/extension_gCtUt
], stderr=[]
12/27 05:23:11.947 DEBUG|      global_hooks:0056| ['cp', '-r', '/usr/local/autotest/common_lib/cros/autotest_private_ext', '/tmp/extension_gCtUt']
12/27 05:23:11.965 DEBUG|    cros_interface:0058| sh -c chown -R chronos:chronos /tmp/extension_gCtUt 
12/27 05:23:11.965 DEBUG|      global_hooks:0056| ['sh', '-c', 'chown -R chronos:chronos /tmp/extension_gCtUt']
12/27 05:23:11.978 DEBUG|    cros_interface:0067|  > stdout=[], stderr=[]
12/27 05:23:11.978 INFO |    cros_interface:0551| (Re)starting the ui (logs the user out)
12/27 05:23:11.979 DEBUG|      global_hooks:0056| ['sh', '-c', 'systemctl']
12/27 05:23:11.988 DEBUG|    cros_interface:0058| sh -c stop ui 
12/27 05:23:11.988 DEBUG|      global_hooks:0056| ['sh', '-c', 'stop ui']
12/27 05:23:12.473 DEBUG|    cros_interface:0067|  > stdout=[ui stop/waiting
], stderr=[]
12/27 05:23:12.475 DEBUG|    cros_interface:0414| rm -rf /var/lib/whitelist/*
12/27 05:23:12.477 DEBUG|      global_hooks:0056| ['sh', '-c', 'rm -rf /var/lib/whitelist/*']
12/27 05:23:12.501 DEBUG|    cros_interface:0414| rm -rf /home/chronos/Local\ State
12/27 05:23:12.503 DEBUG|      global_hooks:0056| ['sh', '-c', 'rm -rf /home/chronos/Local\\ State']
12/27 05:23:12.525 DEBUG|      global_hooks:0056| ['sh', '-c', 'systemctl']
12/27 05:23:12.548 DEBUG|      global_hooks:0056| ['sh', '-c', 'status ui']
12/27 05:23:12.567 DEBUG|    cros_interface:0443| IsServiceRunning(ui)->False
12/27 05:23:12.569 DEBUG|    cros_interface:0058| sh -c start ui 
12/27 05:23:12.570 DEBUG|      global_hooks:0056| ['sh', '-c', 'start ui']
12/27 05:23:12.826 DEBUG|    cros_interface:0067|  > stdout=[ui start/running, process 32413
], stderr=[]
12/27 05:23:12.829 DEBUG|      global_hooks:0056| ['sh', '-c', '/bin/ps --no-headers -A -o pid,ppid,args:4096,state']
12/27 05:23:12.876 DEBUG|    cros_interface:0367| ListProcesses(<predicate>)->[292 processes]
12/27 05:23:12.878 DEBUG|    cros_interface:0058| sh -c cryptohome --action=remove --force --user=test@test.test 
12/27 05:23:12.879 DEBUG|      global_hooks:0056| ['sh', '-c', 'cryptohome --action=remove --force --user=test@test.test']
12/27 05:23:13.131 DEBUG|    cros_interface:0067|  > stdout=[Remove succeeded.
], stderr=[]
12/27 05:23:13.132 DEBUG|    cros_interface:0414| rm -rf /home/chronos/DevToolsActivePort
12/27 05:23:13.132 DEBUG|      global_hooks:0056| ['sh', '-c', 'rm -rf /home/chronos/DevToolsActivePort']
12/27 05:23:13.149 DEBUG|      global_hooks:0056| ['sh', '-c', '/bin/ps --no-headers -A -o pid,ppid,args:4096,state']
12/27 05:23:13.202 DEBUG|    cros_interface:0367| ListProcesses(<predicate>)->[296 processes]
12/27 05:23:13.204 INFO |cros_browser_backe:0159| Restarting Chrome (pid=32428) with remote port
12/27 05:23:13.204 INFO |cros_browser_backe:0166| dbus-send --system --type=method_call --dest=org.chromium.SessionManager /org/chromium/SessionManager org.chromium.SessionManagerInterface.EnableChromeTesting boolean:true array:string:"--disable-arc-opt-in-verification,--enable-net-benchmarking,--metrics-recording-only,--no-default-browser-check,--no-first-run,--enable-gpu-benchmarking,--autoplay-policy=no-user-gesture-required,--load-extension=/tmp/extension_gCtUt/autotest_private_ext,--disable-component-extensions-with-background-pages,--disable-default-apps,--disable-search-geolocation-disclosure,--enable-smooth-scrolling,--enable-threaded-compositing,--remote-debugging-port=0,--start-maximized,--ash-disable-system-sounds,--allow-failed-policy-fetch-for-test,--oobe-skip-postlogin,--disable-logging-redirect,--vmodule=*/chromeos/net/*=2\,*/chromeos/login/*=2\,chrome_browser_main_posix=2,--disable-gaia-services"
12/27 05:23:13.205 DEBUG|    cros_interface:0058| sh -c dbus-send --system --type=method_call --dest=org.chromium.SessionManager /org/chromium/SessionManager org.chromium.SessionManagerInterface.EnableChromeTesting boolean:true array:string:"--disable-arc-opt-in-verification,--enable-net-benchmarking,--metrics-recording-only,--no-default-browser-check,--no-first-run,--enable-gpu-benchmarking,--autoplay-policy=no-user-gesture-required,--load-extension=/tmp/extension_gCtUt/autotest_private_ext,--disable-component-extensions-with-background-pages,--disable-default-apps,--disable-search-geolocation-disclosure,--enable-smooth-scrolling,--enable-threaded-compositing,--remote-debugging-port=0,--start-maximized,--ash-disable-system-sounds,--allow-failed-policy-fetch-for-test,--oobe-skip-postlogin,--disable-logging-redirect,--vmodule=*/chromeos/net/*=2\,*/chromeos/login/*=2\,chrome_browser_main_posix=2,--disable-gaia-services" 
12/27 05:23:13.205 DEBUG|      global_hooks:0056| ['sh', '-c', 'dbus-send --system --type=method_call --dest=org.chromium.SessionManager /org/chromium/SessionManager org.chromium.SessionManagerInterface.EnableChromeTesting boolean:true array:string:"--disable-arc-opt-in-verification,--enable-net-benchmarking,--metrics-recording-only,--no-default-browser-check,--no-first-run,--enable-gpu-benchmarking,--autoplay-policy=no-user-gesture-required,--load-extension=/tmp/extension_gCtUt/autotest_private_ext,--disable-component-extensions-with-background-pages,--disable-default-apps,--disable-search-geolocation-disclosure,--enable-smooth-scrolling,--enable-threaded-compositing,--remote-debugging-port=0,--start-maximized,--ash-disable-system-sounds,--allow-failed-policy-fetch-for-test,--oobe-skip-postlogin,--disable-logging-redirect,--vmodule=*/chromeos/net/*=2\\,*/chromeos/login/*=2\\,chrome_browser_main_posix=2,--disable-gaia-services"']
12/27 05:23:13.218 DEBUG|    cros_interface:0067|  > stdout=[], stderr=[]
12/27 05:23:13.219 DEBUG|      global_hooks:0056| ['sh', '-c', '/bin/ps --no-headers -A -o pid,ppid,args:4096,state']
12/27 05:23:13.253 DEBUG|    cros_interface:0367| ListProcesses(<predicate>)->[291 processes]
12/27 05:23:13.255 DEBUG|    cros_interface:0309| GetFile(/home/chronos/DevToolsActivePort, /tmp/tmpk10gwj)
12/27 05:23:13.256 DEBUG|    cros_interface:0309| GetFile(/home/chronos/DevToolsActivePort, /tmp/tmpWAVKAr)
12/27 05:23:13.357 DEBUG|    cros_interface:0309| GetFile(/home/chronos/DevToolsActivePort, /tmp/tmpbKWMzp)
12/27 05:23:13.458 DEBUG|    cros_interface:0309| GetFile(/home/chronos/DevToolsActivePort, /tmp/tmpZK5VNp)
12/27 05:23:13.560 DEBUG|    cros_interface:0309| GetFile(/home/chronos/DevToolsActivePort, /tmp/tmpMm6EZ_)
12/27 05:23:13.663 DEBUG|    cros_interface:0309| GetFile(/home/chronos/DevToolsActivePort, /tmp/tmpBEV5py)
12/27 05:23:13.765 DEBUG|    cros_interface:0309| GetFile(/home/chronos/DevToolsActivePort, /tmp/tmpmApfZY)
12/27 05:23:13.866 DEBUG|    cros_interface:0309| GetFile(/home/chronos/DevToolsActivePort, /tmp/tmpolw4IL)
12/27 05:23:13.967 DEBUG|    cros_interface:0309| GetFile(/home/chronos/DevToolsActivePort, /tmp/tmpe370dk)
12/27 05:23:13.968 DEBUG|    cros_interface:0339| GetFileContents(/home/chronos/DevToolsActivePort)->36495
/devtools/browser/d9a6aefb-c96b-4463-b4af-ea6ec643e6df
12/27 05:23:13.968 INFO |cros_browser_backe:0076| Discovered ephemeral port 36495
12/27 05:23:13.969 INFO |cros_browser_backe:0077| Browser target: /devtools/browser/d9a6aefb-c96b-4463-b4af-ea6ec643e6df
12/27 05:23:48.902 ERROR|           browser:0050| Failed with TimeoutException while starting the browser backend.

Cc: cmtm@chromium.org
Adding this week's constable.
Cc: cmtm@google.com
I'll start looking into this.
Labels: -Pri-3 Pri-1
Cheets_StartAndroid.stress also just killed a CQ run.  I am not sure if it's the same problem, but maybe you can look into this too?  Thanks.

https://uberchromegw.corp.google.com/i/chromeos/builders/veyron_minnie-paladin/builds/4886


Owner: cmtm@chromium.org
Status: Started
Regarding the minnie failure from #13:

there seems to be a disconnect between 14:48 and 14:51 as shown in autoserv.DEBUG:


12/27 14:48:51.551 DEBUG|          autotest:1310| AUTOTEST_STATUS::START	----	----	timestamp=1514414930	localtime=Dec 27 14:48:50	
12/27 14:48:51.552 INFO |        server_job:0218| START	----	----	timestamp=1514414930	localtime=Dec 27 14:48:50	
12/27 14:48:51.552 INFO |        server_job:1401| Parsing lines in fast mode
12/27 14:48:51.553 ERROR|   logging_manager:0626| tko parser: parsing partial test None CLIENT_JOB.0
12/27 14:48:51.554 ERROR|   logging_manager:0626| tko parser: RUNNING: RUNNING
12/27 14:48:51.554 ERROR|   logging_manager:0626| CLIENT_JOB.0
12/27 14:48:51.554 ERROR|   logging_manager:0626| 
12/27 14:48:51.719 DEBUG|          autotest:1310| AUTOTEST_STATUS::	START	cheets_StartAndroid.stress	cheets_StartAndroid.stress	timestamp=1514414930	localtime=Dec 27 14:48:50	
12/27 14:48:51.720 INFO |        server_job:0218| 	START	cheets_StartAndroid.stress	cheets_StartAndroid.stress	timestamp=1514414930	localtime=Dec 27 14:48:50	
12/27 14:48:51.720 INFO |        server_job:1401| Parsing lines in fast mode
12/27 14:48:51.720 ERROR|   logging_manager:0626| tko parser: parsing partial test cheets_StartAndroid.stress cheets_StartAndroid.stress
12/27 14:48:51.721 ERROR|   logging_manager:0626| tko parser: RUNNING: RUNNING
12/27 14:48:51.721 ERROR|   logging_manager:0626| Subdir: cheets_StartAndroid.stress
12/27 14:48:51.721 ERROR|   logging_manager:0626| Testname: cheets_StartAndroid.stress
12/27 14:48:51.722 ERROR|   logging_manager:0626| 
12/27 14:51:02.146 DEBUG|          autotest:0985| Result exit status is 255.
12/27 14:51:02.147 DEBUG|             utils:0212| Running 'ping chromeos4-row9-rack9-host6 -w1 -c1'
12/27 14:51:02.364 DEBUG|             utils:0280| [stdout] PING chromeos4-row9-rack9-host6.cros.corp.google.com (100.115.200.102) 56(84) bytes of data.
12/27 14:51:02.364 DEBUG|             utils:0280| [stdout] 64 bytes from 100.115.200.102: icmp_seq=1 ttl=59 time=0.721 ms
12/27 14:51:02.365 DEBUG|             utils:0280| [stdout] 
12/27 14:51:02.365 DEBUG|             utils:0280| [stdout] --- chromeos4-row9-rack9-host6.cros.corp.google.com ping statistics ---
12/27 14:51:02.365 DEBUG|             utils:0280| [stdout] 1 packets transmitted, 1 received, 0% packet loss, time 0ms
12/27 14:51:02.365 DEBUG|             utils:0280| [stdout] rtt min/avg/max/mdev = 0.721/0.721/0.721/0.000 ms
12/27 14:51:02.385 DEBUG|          ssh_host:0301| Running (ssh) 'if [ -f '/proc/sys/kernel/random/boot_id' ]; then cat '/proc/sys/kernel/random/boot_id'; else echo 'no boot_id available'; fi' from '_do_run|execute_control|_diagnose_dut|get_boot_id|run|run_very_slowly'
12/27 14:51:02.396 INFO |     ssh_multiplex:0076| Master ssh connection to chromeos4-row9-rack9-host6 is down.
12/27 14:51:02.397 DEBUG|     ssh_multiplex:0118| Nuking ssh master_job
12/27 14:51:02.397 DEBUG|     ssh_multiplex:0123| Cleaning ssh master_tempdir
12/27 14:51:02.398 INFO |     ssh_multiplex:0092| Starting master ssh connection '/usr/bin/ssh -a -x -N -o ControlMaster=yes -o ControlPath=/tmp/_autotmp_BOwe3Kssh-master/socket -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null -o BatchMode=yes -o ConnectTimeout=30 -o ServerAliveInterval=900 -o ServerAliveCountMax=3 -o ConnectionAttempts=4 -o Protocol=2 -l root -p 22 chromeos4-row9-rack9-host6'
12/27 14:51:02.398 DEBUG|             utils:0212| Running '/usr/bin/ssh -a -x -N -o ControlMaster=yes -o ControlPath=/tmp/_autotmp_BOwe3Kssh-master/socket -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null -o BatchMode=yes -o ConnectTimeout=30 -o ServerAliveInterval=900 -o ServerAliveCountMax=3 -o ConnectionAttempts=4 -o Protocol=2 -l root -p 22 chromeos4-row9-rack9-host6'
12/27 14:51:06.211 DEBUG|             utils:0280| [stdout] 4db3912d-e8da-4d7a-b032-4530c6740ef7
12/27 14:51:06.253 INFO |        server_job:0218| 		ABORT	----	----	timestamp=1514415066	localtime=Dec 27 14:51:06	Autotest client terminated unexpectedly: DUT is pingable, SSHable and did NOT restart un-expectedly. We probably lost connectivity during the test.
12/27 14:51:06.253 INFO |        server_job:1401| Parsing lines in fast mode


at that time the syslog shows this (NOTE: time is GMT)


2017-12-27T22:48:50.131903+00:00 NOTICE ag[5015]: autotest server[stack::execute_section|_execute_daemon|run] -> ssh_run(/usr/local\
/autotest/bin/autotestd_monitor /tmp/autoserv-SgAQev 0 0)
2017-12-27T22:48:51.005036+00:00 NOTICE root[5049]: autotest runtest cheets_StartAndroid
2017-12-27T22:48:52.833212+00:00 NOTICE autotest[5070]: 14:48:52.826 WARNI|  external_modules:0057| Unable to import cv2 due to: In\
correct cv2 version found, expected 2.4.8 <= version < 3.0.0, found version $Rev: 4557 $
2017-12-27T22:48:54.480356+00:00 INFO session_manager[1793]: [INFO:browser_job.cc(164)] Terminating process: exiting cleanly
2017-12-27T22:48:54.480735+00:00 INFO session_manager[1793]: [INFO:system_utils_impl.cc(93)] Sending 15 to 1845 as 1000
2017-12-27T22:48:54.481083+00:00 INFO session_manager[1793]: [INFO:container_manager_impl.cc(193)] Killing off container android
2017-12-27T22:48:54.481335+00:00 INFO session_manager[1793]: [INFO:system_utils_impl.cc(93)] Sending 9 to 2268 as 0
2017-12-27T22:48:55.222584+00:00 INFO session_manager[1793]: [INFO:browser_job.cc(180)] Cleaned up child 1845
2017-12-27T22:48:55.222680+00:00 INFO session_manager[1793]: [INFO:session_manager_impl.cc(419)] emitting D-Bus signal SessionState\
Changed:stopped

I have no idea what cv2 is, but could this be the problem?

Mmm I don't think cv2 is the problem because nothing happens for almost two seconds.  Then the session manager shuts something down---maybe android---but it appears to do that cleanly.  So, no obvious failure from my limited viewpoint.
Labels: -Pri-1 Pri-0
Happened on auron_yuma paladin.  This is breaking the CQ.

On auron_yuma (https://pantheon.corp.google.com/storage/browser/chromeos-autotest-results/165946603-chromeos-test/chromeos4-row10-rack8-host3/cheets_StartAndroid.stress/sysinfo/var/log/chrome/), there is a timeout waiting for the browser backend. The last thing it does is starting the Android container (seemingly successfully), then a bunch of SELinux errors, unclear if they are critical (probably not).

2017-12-28T15:51:25.889844+00:00 INFO session_manager[2028]: [INFO:session_manager_impl.cc(1831)] Started Android container with PID 2373
2017-12-28T15:51:25.896632+00:00 WARNING kernel: [ 1241.572229] audit_printk_skb: 9 callbacks suppressed
2017-12-28T15:51:25.896662+00:00 NOTICE kernel: [ 1241.572253] audit: type=1400 audit(1514476285.894:966): avc:  denied  { module_request } for  pid=2373 comm="init" kmod="personality-8" scontext=u:r:init:s0 tcontext=u:r:kernel:s0 tclass=system permissive=0
2017-12-28T15:51:25.896666+00:00 NOTICE kernel: [ 1241.572417] audit: type=1400 audit(1514476285.894:967): avc:  denied  { use } for  pid=2373 comm="init" path="/init" dev="loop1" ino=15 scontext=u:r:init:s0 tcontext=u:r:chromeos:s0 tclass=fd permissive=0
2017-12-28T15:51:25.917951+00:00 DEBUG kernel: [ 1241.593335] SELinux: initialized (dev tmpfs, type tmpfs), uses transition SIDs
2017-12-28T15:51:25.917967+00:00 NOTICE kernel: [ 1241.593518] audit: type=1400 audit(1514476285.915:968): avc:  denied  { module_request } for  pid=2373 comm="init" kmod="fs-configfs" scontext=u:r:init:s0 tcontext=u:r:kernel:s0 tclass=system permissive=0
2017-12-28T15:51:25.917601+00:00 INFO session_manager[2028]: [INFO:server_backed_state_key_generator.cc(126)] Stable device secret missing!
2017-12-28T15:51:25.922608+00:00 DEBUG kernel: [ 1241.598295] SELinux: initialized (dev tmpfs, type tmpfs), uses transition SIDs
2017-12-28T15:51:25.923592+00:00 NOTICE kernel: [ 1241.599557] audit: type=1400 audit(1514476285.921:969): avc:  denied  { mounton } for  pid=2373 comm="init" path="/sys/kernel/debug" dev="tmpfs" ino=12721 scontext=u:r:init:s0 tcontext=u:object_r:debugfs:s0 tclass=dir permissive=0
2017-12-28T15:51:26.268617+00:00 DEBUG kernel: [ 1241.943677] SELinux: initialized (dev tmpfs, type tmpfs), uses transition SIDs
2017-12-28T15:51:26.272600+00:00 INFO kernel: [ 1241.947785] binder: 2408:2408 transaction failed 29189, size 0-0
2017-12-28T15:51:26.273600+00:00 NOTICE kernel: [ 1241.949111] audit: type=1400 audit(1514476286.271:970): avc:  denied  { module_request } for  pid=2373 comm="init" kmod="fs-configfs" scontext=u:r:init:s0 tcontext=u:r:kernel:s0 tclass=system permissive=0
2017-12-28T15:51:26.853598+00:00 NOTICE kernel: [ 1242.528481] audit: type=1400 audit(1514476286.851:971): avc:  denied  { dac_read_search } for  pid=2412 comm="main" capability=2  scontext=u:r:zygote:s0 tcontext=u:r:zygote:s0 tclass=capability permissive=0
2017-12-28T15:51:26.853612+00:00 NOTICE kernel: [ 1242.528586] audit: type=1400 audit(1514476286.851:972): avc:  denied  { dac_read_search } for  pid=2412 comm="main" capability=2  scontext=u:r:zygote:s0 tcontext=u:r:zygote:s0 tclass=capability permissive=0
2017-12-28T15:51:26.853614+00:00 NOTICE kernel: [ 1242.528670] audit: type=1400 audit(1514476286.851:973): avc:  denied  { dac_read_search } for  pid=2412 comm="main" capability=2  scontext=u:r:zygote:s0 tcontext=u:r:zygote:s0 tclass=capability permissive=0
2017-12-28T15:51:26.853615+00:00 NOTICE kernel: [ 1242.529300] audit: type=1400 audit(1514476286.851:974): avc:  denied  { dac_read_search } for  pid=2412 comm="main" capability=2  scontext=u:r:zygote:s0 tcontext=u:r:zygote:s0 tclass=capability permissive=0
2017-12-28T15:51:26.856586+00:00 NOTICE kernel: [ 1242.531469] audit: type=1400 audit(1514476286.854:975): avc:  denied  { dac_read_search } for  pid=2412 comm="main" capability=2  scontext=u:r:zygote:s0 tcontext=u:r:zygote:s0 tclass=capability permissive=0
2017-12-28T15:51:56.908797+00:00 NOTICE autotest[2474]: 07:51:56.902 ERROR|           browser:0050| Failed with TimeoutException while starting the browser backend.


In the chrome logs, there is a "Not implemented reached" error sandwiched between ARC instance start and shutting down after the autotest timeouts waiting for the browser:

[2207:2207:1228/075124.828986:VERBOSE1:arc_session_runner.cc(258)] Starting ARC instance
[2207:2207:1228/075126.327301:ERROR:server.cc(1915)] Not implemented reached in void exo::wayland::(anonymous namespace)::remote_surface_set_rectangular_shadow_background_opacity_DEPRECATED(wl_client *, wl_resource *, wl_fixed_t)
[2207:2207:1228/075157.097507:VERBOSE1:arc_session_runner.cc(236)] OnShutdown

Now that we have this bug tracking the current failure, let's use it. But I'll note that cheets_StartAndroid.stress fails anytime we get a bad android, so we should not conflate these issues.

Let's mark this fixed once we're over the current problem.

cmtm: Have you tried reproing this problem with the current arc++?
I _think_ issue 798000 is another symptom of this bad android container image. Keeping it separate until proven the same.
It seems that the timeout occurs here:
https://github.com/catapult-project/catapult/blob/master/telemetry/telemetry/internal/backends/chrome/cros_browser_backend.py#L177

The timeout is 30s. It seems to be that this operation usually takes less than 8s and the time taken is very stable, so it's unlikely to be an issue where it sometimes "just takes longer".

Also of note is that this occurs before the call to enable android here:
https://cs.corp.google.com/chromeos_internal/src/third_party/autotest/files/client/common_lib/cros/chrome.py?l=175

Chrome is still started with ARC enabled before this. I'll have to dig into what this actually means.
Owner: cmtm@google.com
Re #20: if the timeout indeed occurs around BindDevToolsClient, could this be the same/similar issue as  issue 797620 ?
Owner: linben@chromium.org
Cc: linben@chromium.org ahass...@chromium.org diand...@chromium.org uekawa@chromium.org hiroh@chromium.org
 Issue 799948  has been merged into this issue.
Another failure.

https://luci-milo.appspot.com/buildbot/chromeos/auron_yuna-paladin/1917

cheets_StartAndroid.stress                  FAIL: Unhandled TimeoutException: Timed out while waiting 30s for py_utils.WaitFor(lambda: self.oobe_exists, 30).

Project Member Comment 28 by bugdroid1@chromium.org, Jan 13
The following revision refers to this bug:
  https://chromium.googlesource.com/chromiumos/third_party/autotest/+/9a225c02d1f78511f44bc9abdeea07fb9bfe011b

commit 9a225c02d1f78511f44bc9abdeea07fb9bfe011b
Author: Ben Lin <linben@google.com>
Date: Sat Jan 13 06:37:50 2018

Add check for Chrome PID if exception occurs for startAndroid.stress.

startAndroid.stress test has been hitting exceptions regarding timeout
when trying to enable play store. We don't know whether Chrome is still
alive or not at this point, so logging this.

BUG=chromium:783832
BUG=chromium:799958
TEST=ran cheets_StartAndroid test locally.

Change-Id: Ifbe1bc73fa75a29df01410a9ef4cd77d5b4c7f80
Reviewed-on: https://chromium-review.googlesource.com/865100
Commit-Ready: Ben Lin <linben@chromium.org>
Tested-by: Ben Lin <linben@chromium.org>
Reviewed-by: Yusuke Sato <yusukes@chromium.org>

[modify] https://crrev.com/9a225c02d1f78511f44bc9abdeea07fb9bfe011b/client/common_lib/cros/chrome.py
[modify] https://crrev.com/9a225c02d1f78511f44bc9abdeea07fb9bfe011b/client/site_tests/cheets_StartAndroid/cheets_StartAndroid.py

Comment 29 by lannm@google.com, Jan 16 (6 days ago)
Any update on this? Still seeing many failures.
Comment 30 by linben@chromium.org, Jan 16 (6 days ago)
Cc: phshah@chromium.org
With the CL landed from Jan 13 we should be seeing more debug information now.

Can you link the failures?

Also adding this week's constable.
Comment 31 by jrbarnette@chromium.org, Jan 16 (6 days ago)
Components: -Infra>Client>ChromeOS Platform>Apps>ARC
This isn't a CrOS Infra bug.  Recategorizing as ARC++.

Probably, this also shouldn't be P0, but the ARC++ constable
should make that call.

Comment 32 by kinaba@chromium.org, Yesterday (24 hours ago)
As written in #20,

py_utils.WaitFor(lambda: self.oobe_exists, 30)

is a pretty generic failure that happens even before trying to start the main ARC instance
(((even though the failure comes after starting a mini ARC instance for login-screen)))
which is happening across all the kinds of Chrome OS tests including boards that have never enable ARC:

https://stainless.corp.google.com/search?view=list&first_date=2017-12-24&last_date=2018-01-22&status=GOOD&status=WARN&status=FAIL&status=ERROR&status=ABORT&reason=oobe_exists&exclude_cts=false&exclude_not_run=false&exclude_non_release=true&exclude_au=true&exclude_acts=true&exclude_retried=true&exclude_non_production=true

I think cheets_StartAndroid.stress is more visible just because, as being the "stress" test, it does 100x more login compared to other tests and thus has 100x more chance to hit the bug.

I'm not sure if categorizing it as ARC++ bug is appropriate. Rather I think we want help for autotest/devtool-connection experts (though I don't know who they are...)
Comment 33 by kinaba@chromium.org, Yesterday (24 hours ago)
Cc: kinaba@chromium.org
Comment 34 by kinaba@chromium.org, Today (19 hours ago)
Looked at 4 or 5 logs of this failure.

As far as I could see, in all of them, the Chrome process was restarting during the login wizard.
It seems that autotest is waiting for the connection between the first (dead) chrome and is timing out.
(I guess this is what linben@ observed as well and that's why the debug code of #28 is added.)


All (and only) for the failing instance of Chrome, the corresponding /var/log/ui/ log recorded the following line

Inconsistency detected by ld.so: dl-tls.c: 493: _dl_allocate_tls_init: Assertion `listp->slotinfo[cnt].gen <= GL(dl_tls_generation)' failed!

immediately before the logged PID switches.




https://pantheon.corp.google.com/storage/browser/chromeos-autotest-results/169160870-chromeos-test/chromeos6-row4-rack5-host1/

/var/log/messages
2018-01-10T21:03:07.477986+00:00 INFO session_manager[30716]: [INFO:browser_job.cc(147)] Running child /opt/google/chrome/chrome ...
2018-01-10T21:03:08.387393+00:00 INFO session_manager[30716]: [INFO:child_exit_handler.cc(77)] Handling 30827 exit.
2018-01-10T21:03:08.387513+00:00 ERR session_manager[30716]: [ERROR:child_exit_handler.cc(79)]   Exited with exit code 127
2018-01-10T21:03:08.387828+00:00 INFO session_manager[30716]: [INFO:session_manager_service.cc(307)] Exiting process is chrome.
2018-01-10T21:03:08.387894+00:00 INFO session_manager[30716]: [INFO:browser_job.cc(156)] Terminating process group: Ensuring browser processes are gone.
2018-01-10T21:03:08.387941+00:00 INFO session_manager[30716]: [INFO:system_utils_impl.cc(93)] Sending 9 to -30827 as 1000
2018-01-10T21:03:08.423756+00:00 INFO session_manager[30716]: [INFO:browser_job.cc(147)] Running child /opt/google/chrome/chrome ...

autotest DEBUG log
01/10 13:03:07.450 INFO |cros_browser_backe:0141| Restarting Chrome (pid=30735) with remote port
...
01/10 13:03:07.468 DEBUG|      global_hooks:0056| ['sh', '-c', '/bin/ps --no-headers -A -o pid,ppid,args:4096,state']
01/10 13:03:07.519 DEBUG|    cros_interface:0367| ListProcesses(<predicate>)->[302 processes]
01/10 13:03:07.521 DEBUG|    cros_interface:0309| GetFile(/home/chronos/DevToolsActivePort, /tmp/tmpbILu43)
01/10 13:03:07.522 DEBUG|    cros_interface:0309| GetFile(/home/chronos/DevToolsActivePort, /tmp/tmpS4IMRr)
01/10 13:03:07.623 DEBUG|    cros_interface:0309| GetFile(/home/chronos/DevToolsActivePort, /tmp/tmpWWODD4)
01/10 13:03:07.724 DEBUG|    cros_interface:0309| GetFile(/home/chronos/DevToolsActivePort, /tmp/tmptlRQfF)
01/10 13:03:07.826 DEBUG|    cros_interface:0309| GetFile(/home/chronos/DevToolsActivePort, /tmp/tmpqkKm_u)
01/10 13:03:07.927 DEBUG|    cros_interface:0309| GetFile(/home/chronos/DevToolsActivePort, /tmp/tmpg_CsWu)
01/10 13:03:08.029 DEBUG|    cros_interface:0309| GetFile(/home/chronos/DevToolsActivePort, /tmp/tmpNspBk8)
01/10 13:03:08.130 DEBUG|    cros_interface:0309| GetFile(/home/chronos/DevToolsActivePort, /tmp/tmpkLb2Rf)
01/10 13:03:08.231 DEBUG|    cros_interface:0309| GetFile(/home/chronos/DevToolsActivePort, /tmp/tmpuhVMKG)
01/10 13:03:08.231 DEBUG|    cros_interface:0339| GetFileContents(/home/chronos/DevToolsActivePort)->39626
/devtools/browser/f79536d4-319d-44e8-b69c-694ea7659f83
01/10 13:03:08.232 INFO |          __init__:0047| DoNothingForwarder started between 127.0.0.1:39626 and 39626
01/10 13:03:08.233 DEBUG|do_nothing_forward:0063| Connection test succeeded for 127.0.0.1:39626
01/10 13:03:08.233 INFO |chrome_browser_bac:0231| Got devtools config: ws://127.0.0.1:39626/devtools/browser/f79536d4-319d-44e8-b69c-694ea7659f83
01/10 13:03:44.569 ERROR|           browser:0052| Failed with TimeoutException while starting the browser backend.

/var/log/ui.*

[30827:30827:0110/130308.087582:ERROR:input_method_manager_impl.cc(1032)] IMEEngine for "jkghodnilhceideoidjikpgommlajknk" is not registered

DevTools listening on ws://127.0.0.1:39626/devtools/browser/f79536d4-319d-44e8-b69c-694ea7659f83
device-enumerator: scan all dirs
  device-enumerator: scanning /sys/bus
  device-enumerator: scanning /sys/class
device-enumerator: scan all dirs
  device-enumerator: scanning /sys/bus
  device-enumerator: scanning /sys/class
Inconsistency detected by ld.so: dl-tls.c: 493: _dl_allocate_tls_init: Assertion `listp->slotinfo[cnt].gen <= GL(dl_tls_generation)' failed!
[30949:30949:0110/130309.190555:ERROR:input_method_manager_impl.cc(1032)] IMEEngine for "jkghodnilhceideoidjikpgommlajknk" is not registered

DevTools listening on ws://127.0.0.1:34615/devtools/browser/ccb23e8f-bef1-442a-9fa7-ac7e33fcd8c0
device-enumerator: scan all dirs
  device-enumerator: scanning /sys/bus
  device-enumerator: scanning /sys/class
device-enumerator: scan all dirs
  device-enumerator: scanning /sys/bus
  device-enumerator: scanning /sys/class
device-enumerator: scan all dirs
  device-enumerator: scanning /sys/bus
  device-enumerator: scanning /sys/class
device-enumerator: scan all dirs
  device-enumerator: scanning /sys/bus
  device-enumerator: scanning /sys/class
[30949:30949:0110/130314.845695:ERROR:server.cc(1915)] Not implemented reached in void exo::wayland::(anonymous namespace)::remote_surface_set_rectangular_shadow_background_opacity_DEPRECATED(wl_client *, wl_resource *, wl_fixed_t)
Comment 35 by kinaba@chromium.org, Today (19 hours ago)
An instance on peach_pi, policy_ProxySettings.directproxy_usenoproxy
https://pantheon.corp.google.com/storage/browser/chromeos-autotest-results/170911181-chromeos-test/chromeos6-row2-rack9-host4/policy_ProxySettings/
(ui.20180119-054605 is the restarted chrome run with "Inconsistency detected by ld.so".)
Comment 36 by kinaba@chromium.org, Today (19 hours ago)
According to my Google search, the error message can be due to a race between dlopen and pthread_create of glibc
https://sourceware.org/ml/libc-alpha/2016-01/msg00089.html
but so far I have zero familiarity with the field to access it...
Comment 37 by kinaba@chromium.org, Today (7 hours ago)
Upstream (glibc) bug https://sourceware.org/bugzilla/show_bug.cgi?id=19329 (still open)
Comment 38 by kinaba@chromium.org, Today (3 hours ago)
Cc: hidehiko@chromium.org
Comment 39 by kinaba@chromium.org, Today (65 minutes ago)
Cc: vapier@chromium.org yunlian@chromium.org tor@chromium.org g...@chromium.org
CC:ing people who seem to be familiar with glibc (chosen from sys-libs/glibc git history).

+vapier +gbiv +yunlian, do you folks have any suggestions? (or do you know anyone who's familiar in this area?)
The bug https://sourceware.org/bugzilla/show_bug.cgi?id=19329 looks biting Chrome OS: 0.01 ~ 0.1 % of time Chrome restarts in OOBE screen due to this assertion in libc, and makes autotests failing.
Comment 40 by kinaba@chromium.org, Today (59 minutes ago)
(1) Ideally we want to fix the root cause (race in glibc) and I believe it's worth trying, but even the upstream people looks struggling 

(2) Add a workaround to Chrome browser so that the critical situation can be avoided. (Not sure if it is practically feasible, and it doesn't look great anyway.)

(3) Add a workaround to either of Telemetry/Autotest so that they become robust against Chrome crash at this timing. But I feel really awkward to do that, because it's just hiding the real bug. The sole purpose of the "stress" test should be to detect and fix this kind of situation (= occasional crash/failure in relatively low repro ratio.)
Sign in to add a comment