New issue
Advanced search Search tips
Note: Color blocks (like or ) mean that a user may not be available. Tooltip shows the reason.

Issue 671425 link

Starred by 1 user

Issue metadata

Status: Untriaged
Owner: ----
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: Chrome
Pri: 2
Type: Bug



Sign in to add a comment

15-second timeout kills login_OwnershipNotRetaken on lulu-release

Project Member Reported by semenzato@chromium.org, Dec 6 2016

Issue description

lulu-release build 634

https://uberchromegw.corp.google.com/i/chromeos/builders/lulu-release/builds/634

First problem: the waterfall shows this:

HWTest [bvt-inline] HWTest [bvt-inline] ( 3 secs )
stdio [stdout]
Link to suite
[Test-Logs]: security_SuidBinaries.suid: retry_count: 1, GOOD: completed successfully
[Flake-Dashboard]: security_SuidBinaries.suid
[Test-Logs]: security_SuidBinaries.sgid: retry_count: 1, GOOD: completed successfully
[Flake-Dashboard]: security_SuidBinaries.sgid
[Test-Logs]: security_SuidBinaries.fscap: retry_count: 1, GOOD: completed successfully
[Flake-Dashboard]: security_SuidBinaries.fscap
[Test-Logs]: security_ptraceRestrictions: retry_count: 1, GOOD: completed successfully
[Flake-Dashboard]: security_ptraceRestrictions

So according to this, the tests are good.  Yet the HWTest entry is red.

Second problem: the stdio shows this failure:

chromeos-server22-188: 32e81af4c1508d10 4
  Autotest instance: cautotest
  12-05-2016 [06:00:44] Created suite job: http://cautotest.corp.google.com/afe/#tab_id=view_job&object_id=89082381
  @@@STEP_LINK@Link to suite@http://cautotest.corp.google.com/afe/#tab_id=view_job&object_id=89082381@@@
  The suite job has another 2:29:46.469915 till timeout.
  The suite job has another 1:59:42.003137 till timeout.
  Suite job                               [ PASSED ]
  login_OwnershipNotRetaken               [ FAILED ]
  login_OwnershipNotRetaken                 ABORT: Timed out, did not run.
  platform_DMVerityBitCorruption.first    [ PASSED ]
  platform_DMVerityBitCorruption.middle   [ PASSED ]

  (... many PASSED tests ...) 

  login_GuestAndActualSession             [ PASSED ]
  login_RetrieveActiveSessions            [ PASSED ]
  security_SuidBinaries.suid              [ PASSED ]
  security_SuidBinaries.suid                retry_count: 1
  security_SuidBinaries.sgid              [ PASSED ]
  security_SuidBinaries.sgid                retry_count: 1
  security_SuidBinaries.fscap             [ PASSED ]
  security_SuidBinaries.fscap               retry_count: 1
  security_ptraceRestrictions             [ PASSED ]
  security_ptraceRestrictions               retry_count: 1
  
  Suite timings:
  Downloads started at 2016-12-05 06:00:39
  Payload downloads ended at 2016-12-05 06:00:41
  Suite started at 2016-12-05 06:01:07
  Artifact downloads ended (at latest) at 2016-12-05 06:01:10
  Testing started at 2016-12-05 06:25:40
  Testing ended at 2016-12-05 07:14:07

so there is at least one failure: login_OwnershipNotRetaken.  But that was not shown in the waterfall.

Third problem: cautotest shows that the job ID for login_OwnershipNotRetaken is 89082407.

http://cautotest.corp.google.com/afe/#tab_id=view_job&object_id=89082407

but the log file gives a different job ID (in fact the same job ID as the suite):

Links to test logs:
  Suite job http://cautotest/tko/retrieve_logs.cgi?job=/results/89082381-chromeos-test/
  login_OwnershipNotRetaken http://cautotest/tko/retrieve_logs.cgi?job=/results/89082381-chromeos-test/

It turns out that cautotest is right and the stdio log is (clearly) not as accurate.

Finally, autoserv.DEBUG shows the failure: a 15 second timeout probably waiting for the browser to start.  I can't locate this exact code, maybe it has changed recently.  I found this in chromium/src/third_party:

./catapult/telemetry/telemetry/internal/backends/chrome/cros_browser_backend.py:135:          util.WaitFor(lambda: pid != self.pid, 10)

but that file hasn't changed since February.  Thus I suspect the code was forked into some other location.

No idea why chrome was having trouble starting.  (If that's what happened.)



12/05 06:38:27.075 DEBUG|          ssh_host:0177| Running (ssh) 'if [ -f '/proc/sys/kernel/random/boot_id' ]; then cat '/proc/sys/kernel/random/boot_id'; else echo 'no boot_id available'; fi'
12/05 06:38:27.332 DEBUG|        base_utils:0280| [stdout] 531dd9a0-cfb1-4de1-89d1-38424b7dfadc
12/05 06:38:27.372 INFO |          autotest:0810| Executing /usr/local/autotest/bin/autotest /usr/local/autotest/control phase 0
12/05 06:38:27.372 DEBUG|          ssh_host:0177| Running (ssh) 'mkdir -p /tmp'
12/05 06:38:27.628 DEBUG|          ssh_host:0177| Running (ssh) 'mktemp -d /tmp/autoserv-XXXXXX'
12/05 06:38:27.890 DEBUG|        base_utils:0280| [stdout] /tmp/autoserv-1R0mez
12/05 06:38:27.942 DEBUG|      abstract_ssh:0418| send_file. source: /tmp/tmpiJN17N, dest: /usr/local/autotest/global_config.ini, delete_dest: False,preserve_symlinks:False
12/05 06:38:27.943 DEBUG|      abstract_ssh:0435| Using Rsync.
12/05 06:38:27.943 DEBUG|        base_utils:0185| Running 'rsync -L  --timeout=1800 --rsh='/usr/bin/ssh -a -x   -o ControlPath=/tmp/_autotmp_1ojWEzssh-master/socket -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null -o BatchMode=yes -o ConnectTimeout=30 -o ServerAliveInterval=900 -o ServerAliveCountMax=3 -o ConnectionAttempts=4 -o Protocol=2 -l root -p 22' -az --no-o --no-g "/tmp/tmpiJN17N" "root@chromeos4-row6-rack2-host17:"/usr/local/autotest/global_config.ini""'
12/05 06:38:28.301 DEBUG|          ssh_host:0177| Running (ssh) 'nohup /usr/local/autotest/bin/autotestd /tmp/autoserv-1R0mez -H autoserv --verbose --hostname=chromeos4-row6-rack2-host17 --user=chromeos-test /usr/local/autotest/control.autoserv >/dev/null 2>/dev/null &'
12/05 06:38:28.604 DEBUG|          ssh_host:0177| Running (ssh) '/usr/local/autotest/bin/autotestd_monitor /tmp/autoserv-1R0mez 0 0'
12/05 06:38:28.897 DEBUG|     site_autotest:0194| Entered autotestd_monitor.
12/05 06:38:28.898 INFO |          autotest:1133| Entered autotestd_monitor.
12/05 06:38:28.898 DEBUG|     site_autotest:0194| Finished launching tail subprocesses.
12/05 06:38:28.898 INFO |          autotest:1133| Finished launching tail subprocesses.
12/05 06:38:28.898 DEBUG|     site_autotest:0194| Finished waiting on autotestd to start.
12/05 06:38:28.898 INFO |          autotest:1133| Finished waiting on autotestd to start.
12/05 06:38:29.149 DEBUG|     site_autotest:0194| AUTOTEST_STATUS::START	----	----	timestamp=1480948708	localtime=Dec 05 06:38:28	
12/05 06:38:29.150 INFO |        server_job:0153| START	----	----	timestamp=1480948708	localtime=Dec 05 06:38:28	
12/05 06:38:29.242 DEBUG|     site_autotest:0194| AUTOTEST_STATUS::	START	login_OwnershipNotRetaken	login_OwnershipNotRetaken	timestamp=1480948708	localtime=Dec 05 06:38:28	
12/05 06:38:29.242 INFO |        server_job:0153| 	START	login_OwnershipNotRetaken	login_OwnershipNotRetaken	timestamp=1480948708	localtime=Dec 05 06:38:28	
12/05 06:39:12.515 DEBUG|          monitors:0191| ts_mon: sending 10 metrics to PubSub
12/05 06:39:59.822 DEBUG|     site_autotest:0194| AUTOTEST_STATUS::		FAIL	login_OwnershipNotRetaken	login_OwnershipNotRetaken	timestamp=1480948799	localtime=Dec 05 06:39:59	Unhandled TimeoutException: Timed out while waiting 15s for py_utils.WaitFor(lambda: pid != self.pid, 15).
12/05 06:39:59.823 INFO |        server_job:0153| 		FAIL	login_OwnershipNotRetaken	login_OwnershipNotRetaken	timestamp=1480948799	localtime=Dec 05 06:39:59	Unhandled TimeoutException: Timed out while waiting 15s for py_utils.WaitFor(lambda: pid != self.pid, 15).
12/05 06:39:59.823 DEBUG|     site_autotest:0194| AUTOTEST_STATUS::  Traceback (most recent call last):
12/05 06:39:59.823 DEBUG|     site_autotest:0194| AUTOTEST_STATUS::    File "/usr/local/autotest/common_lib/test.py", line 804, in _call_test_function
12/05 06:39:59.823 DEBUG|     site_autotest:0194| AUTOTEST_STATUS::      return func(*args, **dargs)
12/05 06:39:59.823 DEBUG|     site_autotest:0194| AUTOTEST_STATUS::    File "/usr/local/autotest/common_lib/test.py", line 461, in execute
12/05 06:39:59.824 DEBUG|     site_autotest:0194| AUTOTEST_STATUS::      dargs)
12/05 06:39:59.824 DEBUG|     site_autotest:0194| AUTOTEST_STATUS::    File "/usr/local/autotest/common_lib/test.py", line 347, in _call_run_once_with_retry
12/05 06:39:59.824 DEBUG|     site_autotest:0194| AUTOTEST_STATUS::      postprocess_profiled_run, args, dargs)
12/05 06:39:59.824 DEBUG|     site_autotest:0194| AUTOTEST_STATUS::    File "/usr/local/autotest/common_lib/test.py", line 376, in _call_run_once
12/05 06:39:59.824 DEBUG|     site_autotest:0194| AUTOTEST_STATUS::      self.run_once(*args, **dargs)
12/05 06:39:59.824 DEBUG|     site_autotest:0194| AUTOTEST_STATUS::    File "/usr/local/autotest/tests/login_OwnershipNotRetaken/login_OwnershipNotRetaken.py", line 55, in run_once
12/05 06:39:59.824 DEBUG|     site_autotest:0194| AUTOTEST_STATUS::      gaia_id=self._TEST_GAIAID) as cr:
12/05 06:39:59.825 DEBUG|     site_autotest:0194| AUTOTEST_STATUS::    File "/usr/local/autotest/common_lib/cros/chrome.py", line 178, in __init__
12/05 06:39:59.825 DEBUG|     site_autotest:0194| AUTOTEST_STATUS::      self._browser = browser_to_create.Create(finder_options)
12/05 06:39:59.825 DEBUG|     site_autotest:0194| AUTOTEST_STATUS::    File "/usr/local/telemetry/src/third_party/catapult/telemetry/telemetry/internal/backends/chrome/cros_browser_finder.py", line 50, in Create
12/05 06:39:59.825 DEBUG|     site_autotest:0194| AUTOTEST_STATUS::      self._credentials_path)
12/05 06:39:59.825 DEBUG|     site_autotest:0194| AUTOTEST_STATUS::    File "/usr/local/telemetry/src/third_party/catapult/telemetry/telemetry/internal/backends/chrome/cros_browser_with_oobe.py", line 14, in __init__
12/05 06:39:59.825 DEBUG|     site_autotest:0194| AUTOTEST_STATUS::      backend, platform_backend, credentials_path)
12/05 06:39:59.825 DEBUG|     site_autotest:0194| AUTOTEST_STATUS::    File "/usr/local/telemetry/src/third_party/catapult/telemetry/telemetry/internal/browser/browser.py", line 55, in __init__
12/05 06:39:59.825 DEBUG|     site_autotest:0194| AUTOTEST_STATUS::      self._browser_backend.Start()
12/05 06:39:59.826 DEBUG|     site_autotest:0194| AUTOTEST_STATUS::    File "/usr/local/telemetry/src/third_party/catapult/common/py_trace_event/py_trace_event/trace_event_impl/decorators.py", line 52, in traced_function
12/05 06:39:59.826 DEBUG|     site_autotest:0194| AUTOTEST_STATUS::      return func(*args, **kwargs)
12/05 06:39:59.826 DEBUG|     site_autotest:0194| AUTOTEST_STATUS::    File "/usr/local/telemetry/src/third_party/catapult/telemetry/telemetry/internal/backends/chrome/cros_browser_backend.py", line 139, in Start
12/05 06:39:59.826 DEBUG|     site_autotest:0194| AUTOTEST_STATUS::      py_utils.WaitFor(lambda: pid != self.pid, 15)
12/05 06:39:59.826 DEBUG|     site_autotest:0194| AUTOTEST_STATUS::    File "/usr/local/telemetry/src/third_party/catapult/common/py_utils/py_utils/__init__.py", line 132, in WaitFor
12/05 06:39:59.826 DEBUG|     site_autotest:0194| AUTOTEST_STATUS::      (timeout, GetConditionString()))
12/05 06:39:59.826 DEBUG|     site_autotest:0194| AUTOTEST_STATUS::  TimeoutException: Timed out while waiting 15s for py_utils.WaitFor(lambda: pid != self.pid, 15).
12/05 06:39:59.827 DEBUG|     site_autotest:0194| AUTOTEST_STATUS::	END FAIL	login_OwnershipNotRetaken	login_OwnershipNotRetaken	timestamp=1480948799	localtime=Dec 05 06:39:59	
12/05 06:39:59.827 INFO |        server_job:0153| 	END FAIL	login_OwnershipNotRetaken	login_OwnershipNotRetaken	timestamp=1480948799	localtime=Dec 05 06:39:59	
12/05 06:39:59.827 DEBUG|     site_autotest:0194| AUTOTEST_TEST_COMPLETE:/usr/local/autotest/tmp/_autotmp_VNDR6Pharness-fifo/autoserv.fifo
12/05 06:39:59.827 DEBUG|          ssh_host:0177| Running (ssh) 'true'
12/05 06:40:00.160 DEBUG|      abstract_ssh:0555| Host chromeos4-row6-rack2-host17 is now up
12/05 06:40:00.161 DEBUG|      abstract_ssh:0337| get_file. source: /usr/local/autotest/results/default/, dest: /usr/local/autotest/results/89082407-chromeos-test/chromeos4-row6-rack2-host17, delete_dest: False,preserve_perm: True, preserve_symlinks:True
12/05 06:40:00.162 DEBUG|      abstract_ssh:0348| Using Rsync.
12/05 06:40:00.162 DEBUG|        base_utils:0185| Running 'rsync   --timeout=1800 --rsh='/usr/bin/ssh -a -x   -o ControlPath=/tmp/_autotmp_1ojWEzssh-master/socket -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null -o BatchMode=yes -o ConnectTimeout=30 -o ServerAliveInterval=900 -o ServerAliveCountMax=3 -o ConnectionAttempts=4 -o Protocol=2 -l root -p 22' -az --no-o --no-g root@chromeos4-row6-rack2-host17:"/usr/local/autotest/results/default/" "/usr/local/autotest/results/89082407-chromeos-test/chromeos4-row6-rack2-host17"'
12/05 06:40:04.664 DEBUG|          ssh_host:0177| Running (ssh) 'echo A > /usr/local/autotest/tmp/_autotmp_VNDR6Pharness-fifo/autoserv.fifo'
12/05 06:40:04.992 DEBUG|     site_autotest:0194| AUTOTEST_STATUS::END GOOD	----	----	timestamp=1480948804	localtime=Dec 05 06:40:04	
12/05 06:40:04.993 INFO |        server_job:0153| END GOOD	----	----	timestamp=1480948804	localtime=Dec 05 06:40:04	
 
Cc: jrbarnette@chromium.org
Adding Richard because there are some infra issues (see first comment).

Sameer, do you know who might help with the telemetry code?

Cc: akes...@chromium.org
+akeshet
Richard is out today
Someone from abodenha's team might.  Maybe start with Achuith?
Components: Test>Telemetry
Components: -Tests>Telemetry

Sign in to add a comment