Build failure chrome-pfq: login_CryptohomeIncognito and security_ProfilePermissions.guest are failing across multiple boards |
||||||||||
Issue descriptionSince master build #5449 login_CryptohomeIncognito and security_ProfilePermissions.guest are failing on multiple boards. https://uberchromegw.corp.google.com/i/chromeos/builders/master-chromium-pfq/builds/5449 Errors from login_CryptohomeIncognito: 12/25 17:54:30.501 ERROR| browser:0049| Failed with LoginException while starting the browser backend. 12/25 17:54:32.731 ERROR| chrome:0182| Timed out logging in, tries=0, error=LoginException("Timed out going through login screen. Browser didn't launch. ",) Errors from security_ProfilePermissions.guest: 12/25 16:11:47.749 ERROR| browser:0049| Failed with LoginException while starting the browser backend. 12/25 16:11:48.691 ERROR| chrome:0182| Timed out logging in, tries=0, error=LoginException("Timed out going through login screen. Browser didn't launch. ",) 12/25 16:27:00.990 ERROR| browser:0049| Failed with LoginException while starting the browser backend. 12/25 16:27:01.942 ERROR| chrome:0182| Timed out logging in, tries=1, error=LoginException("Timed out going through login screen. Browser didn't launch. ",) It appears to be of same cause, probably something to do with guest login on Chrome.
,
Dec 27 2017
I noticed that this breakage happened for R65-10234.0.0 build, that actually shouldn't include any of this changes... Or, autotest was updated for all builds, including old one, that use old Chrome OS solution? So that autotest expect new way to mount cryptohome, while it's mounted with the old way. Is it how changes in autotest work? I asked the question here, because I didn't have much experience: https://chromium-review.googlesource.com/c/chromiumos/third_party/autotest/+/718696/3/client/cros/cryptohome.py#429
,
Dec 27 2017
Unfortunately, just reverting these changes won't help as I submitted several other CLs that depend on them. I'm working on a CL that will update autotest to support both old and new solutions.
,
Dec 27 2017
,
Dec 27 2017
From the error log, the telemetry cros_browser_backend failed to get a valid Devtools connection: https://cs.chromium.org/chromium/src/third_party/catapult/telemetry/telemetry/internal/backends/chrome/cros_browser_backend.py?q=third_party/catapult/telemetry/telemetry/internal/backends/chrome/cros_browser_backend.py&sq=package:chromium&l=299 So it looks like something else was wrong.
,
Dec 27 2017
Juan, this might be related to your refactoring? Could you try any telemetry test (say testBrowserCreation) with system-guest (from the VM), or cros-chrome-guest from your desktop? This is the chromeos guest browser.
,
Dec 27 2017
I think the problem is still on my side - https://chromium-review.googlesource.com/c/catapult/+/713747 updated IsCryptohomeMounted() in a wrong way. I'll send a CL with a fix soon.
,
Dec 27 2017
+ chrome gradener for the week.
,
Dec 27 2017
https://chromium-review.googlesource.com/c/catapult/+/844834 should fix the issue.
,
Dec 27 2017
I've just tried the test "security_ProfilePermissions.guest" on a peach_pit device @ ToT Chrome on ToT ChromeOS, and it passes. Is there a chance this was fixed by something else?
,
Dec 27 2017
login_CryptohomeIncognito also passes.
,
Dec 28 2017
It's possible that the bug is not 100% reproducible. Andrey do you have any insight in this?
,
Dec 28 2017
I'm looking at https://pantheon.corp.google.com/storage/browser/chromeos-autotest-results/164111136-chromeos-test/chromeos6-row2-rack10-host22/login_CryptohomeIncognito/ from the bug description and I see "Unhandled AttributeError: 'CrOSBrowserBackend' object has no attribute 'HasBrowserFinishedLaunching'". Is that not the actual reason? sysinfo/messages there doesn't seem to contain any other errors. For cryptohomed, I don't see any crashes or other issues for it.
,
Dec 28 2017
The most recent failure on peach_pit shows a different error log: https://pantheon.corp.google.com/storage/browser/chromeos-autotest-results/165759587-chromeos-test/chromeos6-row2-rack11-host18/login_CryptohomeIncognito/debug/ File "/usr/local/autotest/bin/screenshot.py", line 20, in <module> image = crtcScreenshot(args.crtc) File "/usr/local/autotest/cros/graphics/gbm.py", line 229, in crtcScreenshot "Unable to take screenshot. There may not be anything on the screen.") RuntimeError: Unable to take screenshot. There may not be anything on the screen. ] 12/27 14:00:24.720 ERROR| browser:0049| Failed with LoginException while starting the browser backend. 12/27 14:00:24.733 DEBUG| global_hooks:0056| ['sh', '-c', '/bin/ps --no-headers -A -o pid,ppid,args:4096,state'] 12/27 14:00:24.782 DEBUG| cros_interface:0367| ListProcesses(<predicate>)->[190 processes] 12/27 14:00:24.784 DEBUG| global_hooks:0056| ['sh', '-c', '/bin/ps --no-headers -A -o pid,ppid,args:4096,state'] 12/27 14:00:24.838 DEBUG| cros_interface:0367| ListProcesses(<predicate>)->[190 processes] 12/27 14:00:24.841 INFO | browser:0206| Closing browser (pid=16595) ... 12/27 14:00:24.842 INFO | cros_interface:0551| (Re)starting the ui (logs the user out) 12/27 14:00:24.843 DEBUG| global_hooks:0056| ['sh', '-c', 'systemctl'] 12/27 14:00:24.857 DEBUG| global_hooks:0056| ['sh', '-c', 'systemctl'] 12/27 14:00:24.870 DEBUG| global_hooks:0056| ['sh', '-c', 'status ui'] 12/27 14:00:24.889 DEBUG| cros_interface:0443| IsServiceRunning(ui)->True 12/27 14:00:24.890 DEBUG| cros_interface:0058| sh -c restart ui 12/27 14:00:24.891 DEBUG| global_hooks:0056| ['sh', '-c', 'restart ui'] 12/27 14:00:26.774 DEBUG| cros_interface:0067| > stdout=[ui start/running, process 19397 ], stderr=[] 12/27 14:00:26.776 DEBUG| cros_interface:0058| sh -c cryptohome-path user '$guest' 12/27 14:00:26.777 DEBUG| global_hooks:0056| ['sh', '-c', "cryptohome-path user '$guest'"] 12/27 14:00:26.808 DEBUG| cros_interface:0067| > stdout=[/home/user/20658402636b0ef08626d3707171c30f5182bcb5 ], stderr=[] 12/27 14:00:26.810 DEBUG| cros_interface:0058| sh -c /bin/df --output=source,target /home/user/20658402636b0ef08626d3707171c30f5182bcb5 12/27 14:00:26.811 DEBUG| global_hooks:0056| ['sh', '-c', '/bin/df --output=source,target /home/user/20658402636b0ef08626d3707171c30f5182bcb5'] 12/27 14:00:26.834 DEBUG| cros_interface:0067| > stdout=[Filesystem Mounted on /dev/mmcblk0p1 /home ], stderr=[] 12/27 14:00:26.835 INFO | browser:0220| Browser is closed. 12/27 14:00:26.836 ERROR| chrome:0182| Timed out logging in, tries=1, error=LoginException("Timed out going through login screen. Browser didn't launch. ",) 12/27 14:00:26.852 INFO | cros_interface:0551| (Re)starting the ui (logs the user out) 12/27 14:00:26.854 DEBUG| global_hooks:0056| ['sh', '-c', 'systemctl'] 12/27 14:00:26.872 DEBUG| cros_interface:0058| sh -c stop ui 12/27 14:00:26.873 DEBUG| global_hooks:0056| ['sh', '-c', 'stop ui'] 12/27 14:00:27.365 DEBUG| cros_interface:0067| > stdout=[ui stop/waiting ], stderr=[]
,
Dec 28 2017
Comment #13 was for build 5449. Looking for later builds for veyron-minnie/peach-pit: 5450-5451: multiple (cheets for veyron) tests failures. 5452: Unhandled DevServerException: CrOS auto-update failed for host chromeos4-row9-rack9-host12: ... 5453-5458: BuildPackages issue 5459: this is where login_CryptohomeIncognito and security_ProfilePermissions.* issues seem to start
,
Dec 28 2017
For build 5459 I see messages like the following in the log (in https://pantheon.corp.google.com/storage/browser/chromeos-autotest-results/165759587-chromeos-test/chromeos6-row2-rack11-host18/sysinfo/) 2017-12-27T13:19:44.686352-08:00 INFO cryptohomed[1252]: Asynced Mount() requested. Tracking request sequence id 1 for later PKCS#11 initialization. I don't remember seeing those in the logs before. Did we switch to using AsyncMount for Chrome recently? Or, alternatively, did we make changes to AsyncMount only recently, which should have been applied to guest users as well (since in comment #14 we seem to have issues logging in a guest user specifically). At least, as of ~10 days ago, guest users were using MountGuest and not AsyncMount.
,
Dec 28 2017
The latest security_ProfilePermissions.guest failure for peach_pit (https://luci-milo.appspot.com/buildbot/chromeos/peach_pit-chrome-pfq/4618 -> https://pantheon.corp.google.com/storage/browser/chromeos-autotest-results/165724056-chromeos-test/chromeos6-row2-rack10-host22/) seems to contain more info in syslogs - see below. Can this "getcwd() failed: No such file or directory" / "getcwd: cannot access parent directories: Inappropriate ioctl for device" during restart ui be our error? Or is it just the byproduct of something else? From the logs: 2017-12-27T18:36:19.978784+00:00 NOTICE ag[14375]: autotest server[stack::compute_checksum|_run_command|run] -> ssh_run(md5sum /usr/local/autotest/packages/client-autotest.tar.bz2 ) 2017-12-27T18:36:20.037176+00:00 NOTICE autotest[14379]: 10:36:20.025 ERROR| browser:0049| Failed with OSError while starting the browser backend. 2017-12-27T18:36:20.149345+00:00 NOTICE autotest[14384]: 10:36:20.138 ERROR|exception_formatte:0040| 2017-12-27T18:36:20.162583+00:00 NOTICE autotest[14385]: 10:36:20.151 ERROR|exception_formatte:0041| Exception raised while closing platform backend 2017-12-27T18:36:20.175988+00:00 NOTICE autotest[14386]: 10:36:20.165 ERROR|exception_formatte:0057| 2017-12-27T18:36:20.188965+00:00 NOTICE autotest[14387]: 10:36:20.178 ERROR|exception_formatte:0060| Traceback (most recent call last): 2017-12-27T18:36:20.205271+00:00 NOTICE autotest[14388]: 10:36:20.191 ERROR|exception_formatte:0063| __init__ at /usr/local/telemetry/src/third_party/catapult/telemetry/telemetry/internal/browser/browser.py:51 2017-12-27T18:36:20.222438+00:00 NOTICE autotest[14389]: 10:36:20.208 ERROR|exception_formatte:0064| self.Close() 2017-12-27T18:36:20.239677+00:00 NOTICE autotest[14390]: 10:36:20.225 ERROR|exception_formatte:0063| Close at /usr/local/telemetry/src/third_party/catapult/telemetry/telemetry/internal/browser/browser.py:215 2017-12-27T18:36:20.257697+00:00 NOTICE autotest[14391]: 10:36:20.243 ERROR|exception_formatte:0064| self._browser_backend.Close() 2017-12-27T18:36:20.276124+00:00 NOTICE autotest[14392]: 10:36:20.260 ERROR|exception_formatte:0063| traced_function at /usr/local/telemetry/src/third_party/catapult/common/py_trace_event/py_trace_event/trace_event_impl/decorators.py:52 2017-12-27T18:36:20.298607+00:00 NOTICE autotest[14393]: 10:36:20.280 ERROR|exception_formatte:0064| return func(*args, **kwargs) 2017-12-27T18:36:20.321241+00:00 NOTICE autotest[14394]: 10:36:20.302 ERROR|exception_formatte:0063| Close at /usr/local/telemetry/src/third_party/catapult/telemetry/telemetry/internal/backends/chrome/cros_browser_backend.py:216 2017-12-27T18:36:20.346748+00:00 NOTICE autotest[14395]: 10:36:20.326 ERROR|exception_formatte:0064| self._cri.RestartUI(False) # Logs out. 2017-12-27T18:36:20.361276+00:00 NOTICE ag[14397]: autotest server[stack::exists|_run_command|run] -> ssh_run(test -d /usr/local/autotest) 2017-12-27T18:36:20.370768+00:00 NOTICE autotest[14398]: 10:36:20.351 ERROR|exception_formatte:0063| RestartUI at /usr/local/telemetry/src/third_party/catapult/telemetry/telemetry/core/cros_interface.py:564 2017-12-27T18:36:20.393502+00:00 NOTICE autotest[14399]: 10:36:20.374 ERROR|exception_formatte:0064| if self.IsServiceRunning('ui'): 2017-12-27T18:36:20.415834+00:00 NOTICE autotest[14400]: 10:36:20.397 ERROR|exception_formatte:0063| IsServiceRunning at /usr/local/telemetry/src/third_party/catapult/telemetry/telemetry/core/cros_interface.py:442 2017-12-27T18:36:20.436409+00:00 NOTICE autotest[14401]: 10:36:20.419 ERROR|exception_formatte:0064| assert stderr == '', stderr 2017-12-27T18:36:20.457109+00:00 NOTICE autotest[14402]: 10:36:20.440 ERROR|exception_formatte:0068| AssertionError: sh: 0: getcwd() failed: No such file or directory 2017-12-27T18:36:20.479484+00:00 NOTICE autotest[14403]: 10:36:20.460 ERROR|exception_formatte:0073| 2017-12-27T18:36:20.501494+00:00 NOTICE autotest[14404]: 10:36:20.483 ERROR|exception_formatte:0074| Locals: 2017-12-27T18:36:20.525418+00:00 NOTICE autotest[14405]: 10:36:20.505 ERROR|exception_formatte:0085| running : True 2017-12-27T18:36:20.548813+00:00 NOTICE autotest[14406]: 10:36:20.530 ERROR|exception_formatte:0085| service_name : 'ui' 2017-12-27T18:36:20.571086+00:00 NOTICE autotest[14407]: 10:36:20.552 ERROR|exception_formatte:0085| stderr : 'sh: 0: getcwd() failed: No such file or directory\n' 2017-12-27T18:36:20.593462+00:00 NOTICE autotest[14408]: 10:36:20.575 ERROR|exception_formatte:0085| stdout : 'ui start/running, process 13890\n' 2017-12-27T18:36:20.616689+00:00 NOTICE autotest[14409]: 10:36:20.598 ERROR|exception_formatte:0089| 2017-12-27T18:36:20.653738+00:00 NOTICE ag[14413]: autotest server[stack::untar_required|_run_command|run] -> ssh_run(cat /usr/local/autotest/.checksum) 2017-12-27T18:36:20.664101+00:00 NOTICE autotest[14415]: 10:36:20.650 ERROR| utils:0280| [stderr] shell-init: error retrieving current directory: getcwd: cannot access parent directories: Inappropriate ioctl for device 2017-12-27T18:36:20.698982+00:00 NOTICE autotest[14417]: 10:36:20.686 ERROR| utils:0280| [stderr] shell-init: error retrieving current directory: getcwd: cannot access parent directories: Inappropriate ioctl for device 2017-12-27T18:36:20.714443+00:00 NOTICE autotest[14418]: 10:36:20.701 ERROR| utils:0280| [stderr] rsync: getcwd(): No such file or directory (2) 2017-12-27T18:36:20.729261+00:00 NOTICE autotest[14419]: 10:36:20.716 ERROR| utils:0280| [stderr] rsync error: errors selecting input/output files, dirs (code 3) at util.c(1058) [Receiver=3.1.2] 2017-12-27T18:36:20.758788+00:00 NOTICE autotest[14420]: 10:36:20.732 ERROR| base_sysinfo:0510| Failed to collect loggable site_sysinfo.logdir('/var/spool/crash', ('**autoserv*', '*.core')) to /usr/local/autotest/results/default/security_ProfilePermissions.guest/sysinfo/iteration.1. Continuing...#012Traceback (most recent call last):#012 File "/usr/local/autotest/bin/base_sysinfo.py", line 506, in _run_loggables_ignoring_errors#012 File "/usr/local/autotest/bin/site_sysinfo.py", line 281, in run#012 File "/usr/local/autotest/bin/site_sysinfo.py", line 92, in run#012 File "/usr/local/autotest/common_lib/utils.py", line 1009, in system#012 File "/usr/local/autotest/common_lib/utils.py", line 738, in run#012CmdError: Command <rsync --no-perms --chmod=ugo+r -a --safe-links --exclude=/crash/**autoserv* --exclude=/crash/*.core /var/spool/crash /usr/local/autotest/results/default/security_ProfilePermissions.guest/sysinfo/iteration.1/var/spool> failed, rc=3, Command returned non-zero exit status#012* Command: #012 rsync --no-perms --chmod=ugo+r -a --safe-links#012 --exclude=/crash/**aut
,
Dec 28 2017
Here is the list of telemetry tests changes between the lkg (5448) and the one with these two tests failing (5459) [thanks, alemate!]: https://chromium.googlesource.com/catapult.git/+log/035dfdbc3ead..3b3f9e1e789d I also see in other logs a repeating error for the failing login_CryptohomeIncognito: 12/27 13:30:04.419 INFO |devtools_client_ba:0145| Devtools client not yet ready: [Errno 111] Connection refused So, this one seems suspicious: https://chromium.googlesource.com/catapult.git/+/a586b0072a208470e55680d2d6f097e5b7c674ff
,
Dec 28 2017
+achuith, in case catapult is indeed related
,
Dec 28 2017
The following revision refers to this bug: https://chromium.googlesource.com/chromiumos/third_party/autotest/+/d85dce52942c9823e31d1e49502376000e0fadc9 commit d85dce52942c9823e31d1e49502376000e0fadc9 Author: Sergey Poromov <poromov@google.com> Date: Thu Dec 28 08:28:28 2017 Update ephemeral cryptohome tests to support old solution. The way how ephemeral cryptohome is mounted, was changed in https://chromium-review.googlesource.com/c/chromiumos/platform2/+/613167, however, updating autotest to support only the new way is wrong: https://chromium-review.googlesource.com/c/chromiumos/third_party/autotest/+/718696 This change returns support of the old way of mounting ephemeral cryptohome. Tested on both new and old binaries. BUG= 797620 TEST=test_that ... e:login_Cryptohome.* TEST=test_that ... e:security_ProfilePermissions.* Change-Id: I7555dbda2fd12bd05605e979d11c81c193083646 Reviewed-on: https://chromium-review.googlesource.com/843907 Commit-Ready: Sergey Poromov <poromov@chromium.org> Tested-by: Sergey Poromov <poromov@chromium.org> Reviewed-by: Ricky Liang <jcliang@chromium.org> [modify] https://crrev.com/d85dce52942c9823e31d1e49502376000e0fadc9/client/cros/cryptohome.py [modify] https://crrev.com/d85dce52942c9823e31d1e49502376000e0fadc9/client/cros/constants.py
,
Dec 28 2017
The submitted CL above shouldn't completely resolve the issue, but might help in case old Chrome OS is tested with a new autotest. I have another important fix, that could resolve "Timed out going through login screen. Browser didn't launch." error, as IsCryptohomeMounted() is not working for ephemeral mode right now. The fix is https://chromium-review.googlesource.com/c/catapult/+/844834, but I can't merge it as all owners are OOO + I can't land it as can't set Code-Review +1: "gerrit_util.GerritError: Forbidden: Applying label "Code-Review": 1 is restricted". Is there any workaround for this?
,
Dec 28 2017
The following revision refers to this bug: https://chromium.googlesource.com/catapult/+/d376d8cf98a258a996586e34073ccf0bf1de0f02 commit d376d8cf98a258a996586e34073ccf0bf1de0f02 Author: Sergey Poromov <poromov@chromium.org> Date: Thu Dec 28 19:18:33 2017 telemetry: Fix IsCryptohomeMounted to support new ephemeral mounts. Telemetry was not properly updated to support new way of mounting ephemeral cryptohome on top of a loop device. Now, instead of mount from 'guestfs' to user home directory, the following mount is returned by 'df' command: /dev/loop13 /run/cryptohome/ephemeral_mount/<hash> TBR=achuith@chromium.org BUG= chromium:797620 TEST=./telemetry/bin/run_tests CrOSInterfaceTest --browser=cros-chrome TEST=./telemetry/bin/run_tests CrOSCryptohomeTest --browser=cros-chrome Change-Id: I7d304cc863100d701fe96e3098e43d095e6e0664 Reviewed-on: https://chromium-review.googlesource.com/844834 Commit-Queue: Sergey Poromov <poromov@chromium.org> Reviewed-by: Ned Nguyen <nednguyen@google.com> [modify] https://crrev.com/d376d8cf98a258a996586e34073ccf0bf1de0f02/telemetry/telemetry/core/cros_interface.py [modify] https://crrev.com/d376d8cf98a258a996586e34073ccf0bf1de0f02/telemetry/telemetry/core/cros_interface_unittest.py [modify] https://crrev.com/d376d8cf98a258a996586e34073ccf0bf1de0f02/telemetry/telemetry/internal/backends/chrome/cros_unittest.py
,
Dec 28 2017
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/0ae572670f396f1b0ec72055702d568915732739 commit 0ae572670f396f1b0ec72055702d568915732739 Author: catapult-deps-roller@chromium.org <catapult-deps-roller@chromium.org> Date: Thu Dec 28 20:36:24 2017 Roll src/third_party/catapult/ 08db5c9b2..d376d8cf9 (1 commit) https://chromium.googlesource.com/catapult.git/+log/08db5c9b2822..d376d8cf98a2 $ git log 08db5c9b2..d376d8cf9 --date=short --no-merges --format='%ad %ae %s' 2017-12-28 poromov telemetry: Fix IsCryptohomeMounted to support new ephemeral mounts. Created with: roll-dep src/third_party/catapult BUG= 797620 The AutoRoll server is located here: https://catapult-roll.skia.org Documentation for the AutoRoller is here: https://skia.googlesource.com/buildbot/+/master/autoroll/README.md If the roll is causing failures, please contact the current sheriff, who should be CC'd on the roll, and stop the roller if necessary. CQ_INCLUDE_TRYBOTS=master.tryserver.chromium.android:android_optional_gpu_tests_rel TBR=sullivan@chromium.org Change-Id: I498895e72fe89659be765f531cc74450efc8ad61 Reviewed-on: https://chromium-review.googlesource.com/845900 Reviewed-by: <catapult-deps-roller@chromium.org> Commit-Queue: <catapult-deps-roller@chromium.org> Cr-Commit-Position: refs/heads/master@{#526322} [modify] https://crrev.com/0ae572670f396f1b0ec72055702d568915732739/DEPS
,
Dec 29 2017
It doesn't seem that this fixed helped. Here's a link to the most recent peach_pit informational build: https://uberchromegw.corp.google.com/i/chromeos.chrome/builders/peach_pit-tot-chrome-pfq-informational/builds/9145 which contains the catapult roll-dep in comment #23. It is still suffering from the same failures.
,
Dec 29 2017
That's bad. According to the logs, it's indeed using the new code. However, looks like the failure happens at another line: https://cs.chromium.org/chromium/src/third_party/catapult/telemetry/telemetry/internal/backends/chrome/cros_browser_backend.py?l=300&rcl=1e74c0dd40c10f6360bf21ac9b5c858fe977196b The submitted CL should fix the issue that _IsCryptohomeMounted() could return false. Now, it looks like HasDevToolsConnection() is returning false. Could it be because of https://chromium-review.googlesource.com/c/catapult/+/833925#299 ?
,
Dec 29 2017
Oops, missed that it was actually fixed in https://chromium-review.googlesource.com/c/catapult/+/839140 However, it still looks like HasDevToolsConnection() is constantly returning false, while _IsCryptohomeMounted() is working: 12/28 14:15:19.875 DEBUG| cros_interface:0058| sh -c cryptohome-path user '$guest' 12/28 14:15:19.876 DEBUG| global_hooks:0056| ['sh', '-c', "cryptohome-path user '$guest'"] 12/28 14:15:19.899 DEBUG| cros_interface:0067| > stdout=[/home/user/17ef84a4d559a9335bd7347375a08e1bf2d6f85c ], stderr=[] 12/28 14:15:19.901 DEBUG| cros_interface:0058| sh -c /bin/df --output=source,target /run/cryptohome/ephemeral_mount/17ef84a4d559a9335bd7347375a08e1bf2d6f85c 12/28 14:15:19.902 DEBUG| global_hooks:0056| ['sh', '-c', '/bin/df --output=source,target /run/cryptohome/ephemeral_mount/17ef84a4d559a9335bd7347375a08e1bf2d6f85c'] 12/28 14:15:19.921 DEBUG| cros_interface:0067| > stdout=[Filesystem Mounted on /dev/loop1 /run/cryptohome/ephemeral_mount/17ef84a4d559a9335bd7347375a08e1bf2d6f85c ], stderr=[] 12/28 14:15:19.924 INFO |devtools_client_ba:0145| Devtools client not yet ready: ''
,
Jan 2 2018
The breaking CL is: https://chromium-review.googlesource.com/c/catapult/+/833925 First breaking build: https://uberchromegw.corp.google.com/i/chromiumos.chromium/builders/amd64-generic-tot-chromium-pfq-informational/builds/14630 catapult deps roller only picked one change which is the one above. +Greg as chromeos gardener for the week. Juan - please look at comment 6 for repo instructions.
,
Jan 2 2018
Sorry for the breakage. I think I know what the problem is within the Telemetry cros backend. Will send out a CL for review shortly.
,
Jan 2 2018
Re ephemeral cryptohome changes: I think that all possible issues (that might have affected only login_CryptohomeIncognito and security_ProfilePermissions.guest as they are only using guest sessions), should already be resolved with my 2 CLs submitted above. The last time I looked at the logs not they were the causes for the issues, also not only these two tests were failing.
,
Jan 2 2018
Got CL ready for review: https://chromium-review.googlesource.com/c/catapult/+/847554 Tried running testBrowserCreation on the vm, but the test passes even before applying the fix. Is there a Telemetry test we know is currently failing and I could run to test the change?
,
Jan 2 2018
Just came on duty, and am still trying to get up to speed on this, so apologies if this isn't helpful, but here's the latest error from the logs (https://uberchromegw.corp.google.com/i/chromeos/builders/amd64-generic-chromium-pfq/builds/11225): 07:23:17 INFO | autoserv| AUTOTEST_STATUS:: FAIL login_CryptohomeIncognito login_CryptohomeIncognito timestamp=1514899397 localtime=Jan 02 07:23:17 Unhandled LoginException: Timed out going through login screen. Browser didn't launch. 07:23:17 INFO | autoserv| FAIL login_CryptohomeIncognito login_CryptohomeIncognito timestamp=1514899397 localtime=Jan 02 07:23:17 Unhandled LoginException: Timed out going through login screen. Browser didn't launch. 07:23:17 INFO | autoserv| AUTOTEST_STATUS:: Traceback (most recent call last): 07:23:17 INFO | autoserv| AUTOTEST_STATUS:: File "/usr/local/autotest/common_lib/test.py", line 831, in _call_test_function 07:23:17 INFO | autoserv| AUTOTEST_STATUS:: return func(*args, **dargs) 07:23:17 INFO | autoserv| AUTOTEST_STATUS:: File "/usr/local/autotest/common_lib/test.py", line 495, in execute 07:23:17 INFO | autoserv| AUTOTEST_STATUS:: dargs) 07:23:17 INFO | autoserv| AUTOTEST_STATUS:: File "/usr/local/autotest/common_lib/test.py", line 362, in _call_run_once_with_retry 07:23:17 INFO | autoserv| AUTOTEST_STATUS:: postprocess_profiled_run, args, dargs) 07:23:17 INFO | autoserv| AUTOTEST_STATUS:: File "/usr/local/autotest/common_lib/test.py", line 400, in _call_run_once 07:23:17 INFO | autoserv| AUTOTEST_STATUS:: self.run_once(*args, **dargs) 07:23:17 INFO | autoserv| AUTOTEST_STATUS:: File "/usr/local/autotest/tests/login_CryptohomeIncognito/login_CryptohomeIncognito.py", line 17, in run_once 07:23:17 INFO | autoserv| AUTOTEST_STATUS:: with chrome.Chrome(logged_in=False): 07:23:17 INFO | autoserv| AUTOTEST_STATUS:: File "/usr/local/autotest/common_lib/cros/chrome.py", line 171, in __init__ 07:23:17 INFO | autoserv| AUTOTEST_STATUS:: self._browser = browser_to_create.Create(finder_options) 07:23:17 INFO | autoserv| AUTOTEST_STATUS:: File "/usr/local/telemetry/src/third_party/catapult/telemetry/telemetry/internal/backends/chrome/cros_browser_finder.py", line 51, in Create 07:23:17 INFO | autoserv| AUTOTEST_STATUS:: browser_backend, self._platform_backend) 07:23:17 INFO | autoserv| AUTOTEST_STATUS:: File "/usr/local/telemetry/src/third_party/catapult/telemetry/telemetry/internal/backends/chrome/cros_browser_with_oobe.py", line 14, in __init__ 07:23:17 INFO | autoserv| AUTOTEST_STATUS:: backend, platform_backend) 07:23:17 INFO | autoserv| AUTOTEST_STATUS:: File "/usr/local/telemetry/src/third_party/catapult/telemetry/telemetry/internal/browser/browser.py", line 41, in __init__ 07:23:17 INFO | autoserv| AUTOTEST_STATUS:: self._browser_backend.Start() 07:23:17 INFO | autoserv| AUTOTEST_STATUS:: File "/usr/local/telemetry/src/third_party/catapult/common/py_trace_event/py_trace_event/trace_event_impl/decorators.py", line 52, in traced_function 07:23:17 INFO | autoserv| AUTOTEST_STATUS:: return func(*args, **kwargs) 07:23:17 INFO | autoserv| AUTOTEST_STATUS:: File "/usr/local/telemetry/src/third_party/catapult/telemetry/telemetry/internal/backends/chrome/cros_browser_backend.py", line 205, in Start 07:23:17 INFO | autoserv| AUTOTEST_STATUS:: + self._GetLoginStatus()) 07:23:17 INFO | autoserv| AUTOTEST_STATUS:: File "/usr/local/telemetry/src/third_party/catapult/common/py_trace_event/py_trace_event/trace_event_impl/decorators.py", line 52, in traced_function 07:23:17 INFO | autoserv| AUTOTEST_STATUS:: return func(*args, **kwargs) 07:23:17 INFO | autoserv| AUTOTEST_STATUS:: File "/usr/local/telemetry/src/third_party/catapult/telemetry/telemetry/internal/backends/chrome/cros_browser_backend.py", line 326, in _RaiseOnLoginFailure 07:23:17 INFO | autoserv| AUTOTEST_STATUS:: raise exceptions.LoginException(error) 07:23:17 INFO | autoserv| AUTOTEST_STATUS:: LoginException: Timed out going through login screen. Browser didn't launch. 07:23:17 INFO | autoserv| AUTOTEST_STATUS:: END FAIL login_CryptohomeIncognito login_CryptohomeIncognito timestamp=1514899397 localtime=Jan 02 07:23:17
,
Jan 2 2018
Yes, that error message is consistent with my theory on what is causing the problem. I'm not sure however how to run that test. $ python /usr/local/telemetry/src/third_party/catapult/telemetry/bin/run_tests login_CryptohomeIncognito runs 0 tests.
,
Jan 2 2018
Juan - don't worry about the autotest. I think you ought to see the same problem with: achuith@achuithz620:~/code/catapult/telemetry/bin$ ./run_tests --browser=cros-chrome-guest --remote=localhost --remote-ssh-port=9222 testBrowserCreation (This is after launching a cros VM using go/cros-vm)
,
Jan 2 2018
Fix in the catapult CQ: https://chromium-review.googlesource.com/c/catapult/+/847554 It'll take a few hours for the catapult deps roller to roll this into chrome, at which point the FYI bots should cycle green.
,
Jan 2 2018
,
Jan 2 2018
The following revision refers to this bug: https://chromium.googlesource.com/catapult/+/339bfb350cd01653c945d2d2e8d98e4381bfd7a2 commit 339bfb350cd01653c945d2d2e8d98e4381bfd7a2 Author: Juan Antonio Navarro Perez <perezju@google.com> Date: Tue Jan 02 17:26:13 2018 [Telemetry] Fix binding to devtools agent on CrOS after login An earlier refactor on how browser backends connect to the devtools agent broke the connection on CrOS browser after logging in. Before `HasBrowserFinishedLaunching` (called in L300 within the _GetLoginStatus) used to implicitly update devtools ports and create a port forwarder. This should be now done explicitly by calling `BindDevToolsClient` before trying to wait for the login status. Bug: chromium:797620 Change-Id: I893d2de7867970cb816134bb2b08c8225c06b829 Reviewed-on: https://chromium-review.googlesource.com/847554 Commit-Queue: Juan Antonio Navarro Pérez <perezju@chromium.org> Reviewed-by: Achuith Bhandarkar <achuith@chromium.org> [modify] https://crrev.com/339bfb350cd01653c945d2d2e8d98e4381bfd7a2/telemetry/telemetry/internal/backends/chrome/cros_browser_backend.py
,
Jan 3 2018
Looks like my CL rolled into and was tested at: https://uberchromegw.corp.google.com/i/chromiumos.chromium/builders/amd64-generic-tot-chromium-pfq-informational/builds/14750 However, login_CryptohomeIncognito is still failing (trimmed to only show the most relevant bits): Traceback (most recent call last): File "login_CryptohomeIncognito.py", line 17, in run_once with chrome.Chrome(logged_in=False): File "telemetry/internal/backends/chrome/cros_browser_backend.py", line 202, in Start self._WaitForLogin() File "telemetry/internal/backends/chrome/cros_browser_backend.py", line 316, in _WaitForLogin self.BindDevToolsClient() File "telemetry/internal/backends/chrome/chrome_browser_backend.py", line 184, in BindDevToolsClient raise exceptions.BrowserConnectionGoneException(self.browser, e) BrowserConnectionGoneException: Timed out while waiting 240s for IsAgentReady. ==== Looking through the logs I see that the first call to self.BindDevToolsClient() at line 176 in the Start method succeeds: https://cs.chromium.org/chromium/src/third_party/catapult/telemetry/telemetry/internal/backends/chrome/cros_browser_backend.py?rcl=7365f02611830723d60963f0619d5bab45060849&l=176 01/02 21:47:52.213 INFO |cros_browser_backe:0076| Discovered ephemeral port 42080 01/02 21:47:52.214 INFO |cros_browser_backe:0077| Browser target: /devtools/browser/ee7317d9-2f6c-4580-b69a-4611e988bf9c 01/02 21:47:52.614 INFO | oobe:0049| Invoking Oobe.guestLoginForTesting However the second call (which now happens within _WaitForLogin) "discovers" the same devtools port and then fails to connect to it: 01/02 21:47:56.922 INFO |cros_browser_backe:0076| Discovered ephemeral port 42080 01/02 21:47:56.922 INFO |cros_browser_backe:0077| Browser target: /devtools/browser/ee7317d9-2f6c-4580-b69a-4611e988bf9c 01/02 21:47:56.923 INFO |devtools_client_ba:0127| Websocket at ws://127.0.0.1:42080/devtools/browser/ee7317d9-2f6c-4580-b69a-4611e988bf9c not yet ready: [Errno 111] Connection refused 01/02 21:47:56.924 INFO |devtools_client_ba:0127| Websocket at ws://127.0.0.1:42080/devtools/browser/ee7317d9-2f6c-4580-b69a-4611e988bf9c not yet ready: [Errno 111] Connection refused 01/02 21:47:57.025 INFO |devtools_client_ba:0127| Websocket at ws://127.0.0.1:42080/devtools/browser/ee7317d9-2f6c-4580-b69a-4611e988bf9c not yet ready: [Errno 111] Connection refused [ .. and so on .. ] Comparing the code before/after my change in https://chromium-review.googlesource.com/c/catapult/+/833925 the problem is that, in the past, when "HasBrowserFinishedLaunching" failed it then retried the entire "read DevToolsActivePort file to discover ports" which, I assume, eventually got updated and produced the new correct ports. In the new code "BindDevToolsClient" only tries to read the DevToolsActivePort file once, and then repeatedly tries to connect to that port. So there is a race between us trying to read the file and it getting updated by Oobe.guestLoginForTesting. I can think of two possible ways forward: 1. Remove the DevToolsActivePort file before calling Oobe.guestLoginForTesting. 2. Make BindDevToolsClient also retry reading the DevToolsActivePort file. I think (1) is a quick cheap fix, but maybe (2) is the correct one. It's tricky because when the cri is not local, the old code also used to "retry" creating a forwarder without properly disposing off the previous one. But when trying to "fix" that bug by moving the forwarder creation out of the retry, it seems that I created now an even larger bug. I'll start working on (2). +nednguyen FYI since I'll probably have to do a similar change for the android platform too. Also, not sure why my local tests on the cros vm didn't catch this? Could that be because of the race? Or is there some way of running testBrowserCreation so it would have caught this?
,
Jan 3 2018
Found that running the test *within* the vm and *with* the guest user, i.e.: (vm) $ python /usr/local/telemetry/src/third_party/catapult/telemetry/bin/run_tests --browser=cros-chrome-guest testBrowserCreation reproduces the latest failure (other combinations did not reproduce, precisely because of the race I pointed out). Also verified locally a fix for that one: https://chromium-review.googlesource.com/c/catapult/+/848793
,
Jan 3 2018
The following revision refers to this bug: https://chromium.googlesource.com/catapult/+/19e11600d3a9410651e6c12e801a03eeca7274cc commit 19e11600d3a9410651e6c12e801a03eeca7274cc Author: Juan Antonio Navarro Perez <perezju@google.com> Date: Wed Jan 03 18:26:34 2018 [Telemetry] Re-read DevTools config if agent is not ready On some platforms, if the agent is not ready it could mean that the DevTools config we got is for an older agent. It's thus important to re-read the config on each retry. Bug: chromium:797620 Change-Id: Ic21492f68bb884204b2d3b9adfcec1959467c871 Reviewed-on: https://chromium-review.googlesource.com/848793 Commit-Queue: Juan Antonio Navarro Pérez <perezju@chromium.org> Reviewed-by: Ned Nguyen <nednguyen@google.com> Reviewed-by: Achuith Bhandarkar <achuith@chromium.org> [modify] https://crrev.com/19e11600d3a9410651e6c12e801a03eeca7274cc/telemetry/telemetry/internal/backends/chrome/chrome_browser_backend.py [modify] https://crrev.com/19e11600d3a9410651e6c12e801a03eeca7274cc/telemetry/telemetry/internal/backends/chrome/android_browser_backend.py [modify] https://crrev.com/19e11600d3a9410651e6c12e801a03eeca7274cc/telemetry/telemetry/internal/backends/chrome/cros_browser_backend.py
,
Jan 3 2018
With the CL above in comment 39 we believe the PFQ should pass? The errors look similar in graphics_Idle failures like https://pantheon.corp.google.com/storage/browser/chromeos-autotest-results/167226823-chromeos-test/chromeos4-row6-rack12-host15/graphics_Idle/debug/ (e.g. "Timed out going through login screen. Browser didn't launch.") so I am hoping this fixes that also. Is this failure only related to testing or is this likely to impact end users in some way?
,
Jan 3 2018
Some PFQ were turned into green after the last fix: https://uberchromegw.corp.google.com/i/chromiumos.chromium/builders/amd64-generic-tot-chromium-pfq-informational (#14758)
,
Jan 3 2018
Yup, graphics_Idle uses the guest browser (http://crbug.com/798876), so that should be fixed as well. It looks like Juan's latest fix should get everything green again. This doesn't impact end users.
,
Jan 4 2018
Neat! Bots are looking green now. Again, sorry for the breakage. Closing this as fixed. On issue 787834 I'll follow up adding some tests to prevent a similar problem to arise again in the future, and continue the refactor to make our code more robust in this area.
,
Jan 4 2018
Sorry for the inconvenience, due to a lack of cros presence in both the catapult CQ and chromium CQ, and thanks for the fixes! |
||||||||||
►
Sign in to add a comment |
||||||||||
Comment 1 by poromov@chromium.org
, Dec 27 2017