New issue
Advanced search Search tips
Note: Color blocks (like or ) mean that a user may not be available. Tooltip shows the reason.

Issue 797620 link

Starred by 2 users

Issue metadata

Status: Fixed
Owner:
Closed: Jan 2018
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: Chrome
Pri: 0
Type: Bug



Sign in to add a comment

Build failure chrome-pfq: login_CryptohomeIncognito and security_ProfilePermissions.guest are failing across multiple boards

Project Member Reported by jcliang@chromium.org, Dec 26 2017

Issue description

Since master build #5449 login_CryptohomeIncognito and security_ProfilePermissions.guest are failing on multiple boards.

https://uberchromegw.corp.google.com/i/chromeos/builders/master-chromium-pfq/builds/5449

Errors from login_CryptohomeIncognito:

12/25 17:54:30.501 ERROR|           browser:0049| Failed with LoginException while starting the browser backend.
12/25 17:54:32.731 ERROR|            chrome:0182| Timed out logging in, tries=0, error=LoginException("Timed out going through login screen. Browser didn't launch. ",)

Errors from security_ProfilePermissions.guest:

12/25 16:11:47.749 ERROR|           browser:0049| Failed with LoginException while starting the browser backend.
12/25 16:11:48.691 ERROR|            chrome:0182| Timed out logging in, tries=0, error=LoginException("Timed out going through login screen. Browser didn't launch. ",)
12/25 16:27:00.990 ERROR|           browser:0049| Failed with LoginException while starting the browser backend.
12/25 16:27:01.942 ERROR|            chrome:0182| Timed out logging in, tries=1, error=LoginException("Timed out going through login screen. Browser didn't launch. ",)


It appears to be of same cause, probably something to do with guest login on Chrome.
 
Cc: poromov@chromium.org
This is because of a recent change in the way how ephemeral cryptohome is mounted, that is used for guest user.
The main CL in Chrome OS cryptohome codebase is here: https://chromium-review.googlesource.com/c/chromiumos/platform2/+/613167
Simultaneosly with that a CL to support the new way was submitted to autotests repo: https://chromium-review.googlesource.com/c/chromiumos/third_party/autotest/+/718696

I tested a lot that both 'login_CryptohomeIncognito' and 'security_ProfilePermissions' are working with this two changes + it passed them in CQ.

I'm sure how this may end up with breaking pfq here...
I noticed that this breakage happened for R65-10234.0.0 build, that actually shouldn't include any of this changes...
Or, autotest was updated for all builds, including old one, that use old Chrome OS solution? So that autotest expect new way to mount cryptohome, while it's mounted with the old way.

Is it how changes in autotest work?
I asked the question here, because I didn't have much experience: https://chromium-review.googlesource.com/c/chromiumos/third_party/autotest/+/718696/3/client/cros/cryptohome.py#429
Unfortunately, just reverting these changes won't help as I submitted several other CLs that depend on them.
I'm working on a CL that will update autotest to support both old and new solutions.
Owner: poromov@chromium.org
Status: Started (was: Available)
Cc: perezju@chromium.org
Juan, this might be related to your refactoring? Could you try any telemetry test (say testBrowserCreation) with system-guest (from the VM), or cros-chrome-guest from your desktop? This is the chromeos guest browser.
I think the problem is still on my side - https://chromium-review.googlesource.com/c/catapult/+/713747 updated IsCryptohomeMounted() in a wrong way.
I'll send a CL with a fix soon.
Cc: afakhry@chromium.org
+ chrome gradener for the week.
I've just tried the test "security_ProfilePermissions.guest" on a peach_pit device @ ToT Chrome on ToT ChromeOS, and it passes. Is there a chance this was fixed by something else?
login_CryptohomeIncognito also passes.
Cc: apronin@chromium.org
It's possible that the bug is not 100% reproducible.

Andrey do you have any insight in this?
I'm looking at https://pantheon.corp.google.com/storage/browser/chromeos-autotest-results/164111136-chromeos-test/chromeos6-row2-rack10-host22/login_CryptohomeIncognito/ from the bug description and I see "Unhandled AttributeError: 'CrOSBrowserBackend' object has no attribute 'HasBrowserFinishedLaunching'". Is that not the actual reason?

sysinfo/messages there doesn't seem to contain any other errors. For cryptohomed, I don't see any crashes or other issues for it.
The most recent failure on peach_pit shows a different error log: https://pantheon.corp.google.com/storage/browser/chromeos-autotest-results/165759587-chromeos-test/chromeos6-row2-rack11-host18/login_CryptohomeIncognito/debug/

  File "/usr/local/autotest/bin/screenshot.py", line 20, in <module>
    image = crtcScreenshot(args.crtc)
  File "/usr/local/autotest/cros/graphics/gbm.py", line 229, in crtcScreenshot
    "Unable to take screenshot. There may not be anything on the screen.")
RuntimeError: Unable to take screenshot. There may not be anything on the screen.
]
12/27 14:00:24.720 ERROR|           browser:0049| Failed with LoginException while starting the browser backend.
12/27 14:00:24.733 DEBUG|      global_hooks:0056| ['sh', '-c', '/bin/ps --no-headers -A -o pid,ppid,args:4096,state']
12/27 14:00:24.782 DEBUG|    cros_interface:0367| ListProcesses(<predicate>)->[190 processes]
12/27 14:00:24.784 DEBUG|      global_hooks:0056| ['sh', '-c', '/bin/ps --no-headers -A -o pid,ppid,args:4096,state']
12/27 14:00:24.838 DEBUG|    cros_interface:0367| ListProcesses(<predicate>)->[190 processes]
12/27 14:00:24.841 INFO |           browser:0206| Closing browser (pid=16595) ...
12/27 14:00:24.842 INFO |    cros_interface:0551| (Re)starting the ui (logs the user out)
12/27 14:00:24.843 DEBUG|      global_hooks:0056| ['sh', '-c', 'systemctl']
12/27 14:00:24.857 DEBUG|      global_hooks:0056| ['sh', '-c', 'systemctl']
12/27 14:00:24.870 DEBUG|      global_hooks:0056| ['sh', '-c', 'status ui']
12/27 14:00:24.889 DEBUG|    cros_interface:0443| IsServiceRunning(ui)->True
12/27 14:00:24.890 DEBUG|    cros_interface:0058| sh -c restart ui 
12/27 14:00:24.891 DEBUG|      global_hooks:0056| ['sh', '-c', 'restart ui']
12/27 14:00:26.774 DEBUG|    cros_interface:0067|  > stdout=[ui start/running, process 19397
], stderr=[]
12/27 14:00:26.776 DEBUG|    cros_interface:0058| sh -c cryptohome-path user '$guest' 
12/27 14:00:26.777 DEBUG|      global_hooks:0056| ['sh', '-c', "cryptohome-path user '$guest'"]
12/27 14:00:26.808 DEBUG|    cros_interface:0067|  > stdout=[/home/user/20658402636b0ef08626d3707171c30f5182bcb5
], stderr=[]
12/27 14:00:26.810 DEBUG|    cros_interface:0058| sh -c /bin/df --output=source,target /home/user/20658402636b0ef08626d3707171c30f5182bcb5 
12/27 14:00:26.811 DEBUG|      global_hooks:0056| ['sh', '-c', '/bin/df --output=source,target /home/user/20658402636b0ef08626d3707171c30f5182bcb5']
12/27 14:00:26.834 DEBUG|    cros_interface:0067|  > stdout=[Filesystem     Mounted on
/dev/mmcblk0p1 /home
], stderr=[]
12/27 14:00:26.835 INFO |           browser:0220| Browser is closed.
12/27 14:00:26.836 ERROR|            chrome:0182| Timed out logging in, tries=1, error=LoginException("Timed out going through login screen. Browser didn't launch. ",)
12/27 14:00:26.852 INFO |    cros_interface:0551| (Re)starting the ui (logs the user out)
12/27 14:00:26.854 DEBUG|      global_hooks:0056| ['sh', '-c', 'systemctl']
12/27 14:00:26.872 DEBUG|    cros_interface:0058| sh -c stop ui 
12/27 14:00:26.873 DEBUG|      global_hooks:0056| ['sh', '-c', 'stop ui']
12/27 14:00:27.365 DEBUG|    cros_interface:0067|  > stdout=[ui stop/waiting
], stderr=[]
Comment #13 was for build 5449.
Looking for later builds for veyron-minnie/peach-pit:
5450-5451: multiple (cheets for veyron) tests failures.
5452: Unhandled DevServerException: CrOS auto-update failed for host chromeos4-row9-rack9-host12: ...
5453-5458: BuildPackages issue
5459: this is where login_CryptohomeIncognito and security_ProfilePermissions.* issues seem to start
For build 5459 I see messages like the following in the log (in https://pantheon.corp.google.com/storage/browser/chromeos-autotest-results/165759587-chromeos-test/chromeos6-row2-rack11-host18/sysinfo/)

2017-12-27T13:19:44.686352-08:00 INFO cryptohomed[1252]: Asynced Mount() requested. Tracking request sequence id 1 for later PKCS#11 initialization.

I don't remember seeing those in the logs before.

Did we switch to using AsyncMount for Chrome recently? 
Or, alternatively, did we make changes to AsyncMount only recently, which should have been applied to guest users as well (since in comment #14 we seem to have issues logging in a guest user specifically). At least, as of ~10 days ago, guest users were using MountGuest and not AsyncMount.
The latest security_ProfilePermissions.guest failure for peach_pit (https://luci-milo.appspot.com/buildbot/chromeos/peach_pit-chrome-pfq/4618 -> https://pantheon.corp.google.com/storage/browser/chromeos-autotest-results/165724056-chromeos-test/chromeos6-row2-rack10-host22/) seems to contain more info in syslogs - see below. 

Can this "getcwd() failed: No such file or directory" / "getcwd: cannot access parent directories: Inappropriate ioctl for device" during restart ui be our error? Or is it just the byproduct of something else?


From the logs:
2017-12-27T18:36:19.978784+00:00 NOTICE ag[14375]: autotest server[stack::compute_checksum|_run_command|run] -> ssh_run(md5sum /usr/local/autotest/packages/client-autotest.tar.bz2 )
2017-12-27T18:36:20.037176+00:00 NOTICE autotest[14379]: 10:36:20.025 ERROR|           browser:0049| Failed with OSError while starting the browser backend.
2017-12-27T18:36:20.149345+00:00 NOTICE autotest[14384]: 10:36:20.138 ERROR|exception_formatte:0040| 
2017-12-27T18:36:20.162583+00:00 NOTICE autotest[14385]: 10:36:20.151 ERROR|exception_formatte:0041| Exception raised while closing platform backend
2017-12-27T18:36:20.175988+00:00 NOTICE autotest[14386]: 10:36:20.165 ERROR|exception_formatte:0057| 
2017-12-27T18:36:20.188965+00:00 NOTICE autotest[14387]: 10:36:20.178 ERROR|exception_formatte:0060| Traceback (most recent call last):
2017-12-27T18:36:20.205271+00:00 NOTICE autotest[14388]: 10:36:20.191 ERROR|exception_formatte:0063|   __init__ at /usr/local/telemetry/src/third_party/catapult/telemetry/telemetry/internal/browser/browser.py:51
2017-12-27T18:36:20.222438+00:00 NOTICE autotest[14389]: 10:36:20.208 ERROR|exception_formatte:0064|     self.Close()
2017-12-27T18:36:20.239677+00:00 NOTICE autotest[14390]: 10:36:20.225 ERROR|exception_formatte:0063|   Close at /usr/local/telemetry/src/third_party/catapult/telemetry/telemetry/internal/browser/browser.py:215
2017-12-27T18:36:20.257697+00:00 NOTICE autotest[14391]: 10:36:20.243 ERROR|exception_formatte:0064|     self._browser_backend.Close()
2017-12-27T18:36:20.276124+00:00 NOTICE autotest[14392]: 10:36:20.260 ERROR|exception_formatte:0063|   traced_function at /usr/local/telemetry/src/third_party/catapult/common/py_trace_event/py_trace_event/trace_event_impl/decorators.py:52
2017-12-27T18:36:20.298607+00:00 NOTICE autotest[14393]: 10:36:20.280 ERROR|exception_formatte:0064|     return func(*args, **kwargs)
2017-12-27T18:36:20.321241+00:00 NOTICE autotest[14394]: 10:36:20.302 ERROR|exception_formatte:0063|   Close at /usr/local/telemetry/src/third_party/catapult/telemetry/telemetry/internal/backends/chrome/cros_browser_backend.py:216
2017-12-27T18:36:20.346748+00:00 NOTICE autotest[14395]: 10:36:20.326 ERROR|exception_formatte:0064|     self._cri.RestartUI(False) # Logs out.
2017-12-27T18:36:20.361276+00:00 NOTICE ag[14397]: autotest server[stack::exists|_run_command|run] -> ssh_run(test -d /usr/local/autotest)
2017-12-27T18:36:20.370768+00:00 NOTICE autotest[14398]: 10:36:20.351 ERROR|exception_formatte:0063|   RestartUI at /usr/local/telemetry/src/third_party/catapult/telemetry/telemetry/core/cros_interface.py:564
2017-12-27T18:36:20.393502+00:00 NOTICE autotest[14399]: 10:36:20.374 ERROR|exception_formatte:0064|     if self.IsServiceRunning('ui'):
2017-12-27T18:36:20.415834+00:00 NOTICE autotest[14400]: 10:36:20.397 ERROR|exception_formatte:0063|   IsServiceRunning at /usr/local/telemetry/src/third_party/catapult/telemetry/telemetry/core/cros_interface.py:442
2017-12-27T18:36:20.436409+00:00 NOTICE autotest[14401]: 10:36:20.419 ERROR|exception_formatte:0064|     assert stderr == '', stderr
2017-12-27T18:36:20.457109+00:00 NOTICE autotest[14402]: 10:36:20.440 ERROR|exception_formatte:0068| AssertionError: sh: 0: getcwd() failed: No such file or directory
2017-12-27T18:36:20.479484+00:00 NOTICE autotest[14403]: 10:36:20.460 ERROR|exception_formatte:0073| 
2017-12-27T18:36:20.501494+00:00 NOTICE autotest[14404]: 10:36:20.483 ERROR|exception_formatte:0074| Locals:
2017-12-27T18:36:20.525418+00:00 NOTICE autotest[14405]: 10:36:20.505 ERROR|exception_formatte:0085|   running      : True
2017-12-27T18:36:20.548813+00:00 NOTICE autotest[14406]: 10:36:20.530 ERROR|exception_formatte:0085|   service_name : 'ui'
2017-12-27T18:36:20.571086+00:00 NOTICE autotest[14407]: 10:36:20.552 ERROR|exception_formatte:0085|   stderr       : 'sh: 0: getcwd() failed: No such file or directory\n'
2017-12-27T18:36:20.593462+00:00 NOTICE autotest[14408]: 10:36:20.575 ERROR|exception_formatte:0085|   stdout       : 'ui start/running, process 13890\n'
2017-12-27T18:36:20.616689+00:00 NOTICE autotest[14409]: 10:36:20.598 ERROR|exception_formatte:0089| 
2017-12-27T18:36:20.653738+00:00 NOTICE ag[14413]: autotest server[stack::untar_required|_run_command|run] -> ssh_run(cat /usr/local/autotest/.checksum)
2017-12-27T18:36:20.664101+00:00 NOTICE autotest[14415]: 10:36:20.650 ERROR|             utils:0280| [stderr] shell-init: error retrieving current directory: getcwd: cannot access parent directories: Inappropriate ioctl for device
2017-12-27T18:36:20.698982+00:00 NOTICE autotest[14417]: 10:36:20.686 ERROR|             utils:0280| [stderr] shell-init: error retrieving current directory: getcwd: cannot access parent directories: Inappropriate ioctl for device
2017-12-27T18:36:20.714443+00:00 NOTICE autotest[14418]: 10:36:20.701 ERROR|             utils:0280| [stderr] rsync: getcwd(): No such file or directory (2)
2017-12-27T18:36:20.729261+00:00 NOTICE autotest[14419]: 10:36:20.716 ERROR|             utils:0280| [stderr] rsync error: errors selecting input/output files, dirs (code 3) at util.c(1058) [Receiver=3.1.2]
2017-12-27T18:36:20.758788+00:00 NOTICE autotest[14420]: 10:36:20.732 ERROR|      base_sysinfo:0510| Failed to collect loggable site_sysinfo.logdir('/var/spool/crash', ('**autoserv*', '*.core')) to /usr/local/autotest/results/default/security_ProfilePermissions.guest/sysinfo/iteration.1. Continuing...#012Traceback (most recent call last):#012  File "/usr/local/autotest/bin/base_sysinfo.py", line 506, in _run_loggables_ignoring_errors#012  File "/usr/local/autotest/bin/site_sysinfo.py", line 281, in run#012  File "/usr/local/autotest/bin/site_sysinfo.py", line 92, in run#012  File "/usr/local/autotest/common_lib/utils.py", line 1009, in system#012  File "/usr/local/autotest/common_lib/utils.py", line 738, in run#012CmdError: Command <rsync --no-perms --chmod=ugo+r -a --safe-links --exclude=/crash/**autoserv* --exclude=/crash/*.core /var/spool/crash /usr/local/autotest/results/default/security_ProfilePermissions.guest/sysinfo/iteration.1/var/spool> failed, rc=3, Command returned non-zero exit status#012* Command: #012    rsync --no-perms --chmod=ugo+r -a --safe-links#012    --exclude=/crash/**aut
Here is the list of telemetry tests changes between the lkg (5448) and the one with these two tests failing (5459) [thanks, alemate!]: https://chromium.googlesource.com/catapult.git/+log/035dfdbc3ead..3b3f9e1e789d

I also see in other logs a repeating error for the failing login_CryptohomeIncognito:
12/27 13:30:04.419 INFO |devtools_client_ba:0145| Devtools client not yet ready: [Errno 111] Connection refused

So, this one seems suspicious: https://chromium.googlesource.com/catapult.git/+/a586b0072a208470e55680d2d6f097e5b7c674ff


Cc: achuith@chromium.org
+achuith, in case catapult is indeed related
Project Member

Comment 20 by bugdroid1@chromium.org, Dec 28 2017

The following revision refers to this bug:
  https://chromium.googlesource.com/chromiumos/third_party/autotest/+/d85dce52942c9823e31d1e49502376000e0fadc9

commit d85dce52942c9823e31d1e49502376000e0fadc9
Author: Sergey Poromov <poromov@google.com>
Date: Thu Dec 28 08:28:28 2017

Update ephemeral cryptohome tests to support old solution.

The way how ephemeral cryptohome is mounted, was changed
in
https://chromium-review.googlesource.com/c/chromiumos/platform2/+/613167,
however, updating autotest to support only the new way is wrong:
https://chromium-review.googlesource.com/c/chromiumos/third_party/autotest/+/718696

This change returns support of the old way of mounting ephemeral
cryptohome.
Tested on both new and old binaries.

BUG= 797620 
TEST=test_that ... e:login_Cryptohome.*
TEST=test_that ... e:security_ProfilePermissions.*
Change-Id: I7555dbda2fd12bd05605e979d11c81c193083646
Reviewed-on: https://chromium-review.googlesource.com/843907
Commit-Ready: Sergey Poromov <poromov@chromium.org>
Tested-by: Sergey Poromov <poromov@chromium.org>
Reviewed-by: Ricky Liang <jcliang@chromium.org>

[modify] https://crrev.com/d85dce52942c9823e31d1e49502376000e0fadc9/client/cros/cryptohome.py
[modify] https://crrev.com/d85dce52942c9823e31d1e49502376000e0fadc9/client/cros/constants.py

The submitted CL above shouldn't completely resolve the issue, but might help in case old Chrome OS is tested with a new autotest.

I have another important fix, that could resolve "Timed out going through login screen. Browser didn't launch." error, as IsCryptohomeMounted() is not working for ephemeral mode right now.
The fix is https://chromium-review.googlesource.com/c/catapult/+/844834, but I can't merge it as all owners are OOO + I can't land it as can't set Code-Review +1: "gerrit_util.GerritError: Forbidden: Applying label "Code-Review": 1 is restricted".
Is there any workaround for this?
Project Member

Comment 22 by bugdroid1@chromium.org, Dec 28 2017

The following revision refers to this bug:
  https://chromium.googlesource.com/catapult/+/d376d8cf98a258a996586e34073ccf0bf1de0f02

commit d376d8cf98a258a996586e34073ccf0bf1de0f02
Author: Sergey Poromov <poromov@chromium.org>
Date: Thu Dec 28 19:18:33 2017

telemetry: Fix IsCryptohomeMounted to support new ephemeral mounts.

Telemetry was not properly updated to support new way of mounting
ephemeral cryptohome on top of a loop device.

Now, instead of mount from 'guestfs' to user home directory,
the following mount is returned by 'df' command:
/dev/loop13  /run/cryptohome/ephemeral_mount/<hash>

TBR=achuith@chromium.org
BUG= chromium:797620 
TEST=./telemetry/bin/run_tests CrOSInterfaceTest --browser=cros-chrome
TEST=./telemetry/bin/run_tests CrOSCryptohomeTest --browser=cros-chrome

Change-Id: I7d304cc863100d701fe96e3098e43d095e6e0664
Reviewed-on: https://chromium-review.googlesource.com/844834
Commit-Queue: Sergey Poromov <poromov@chromium.org>
Reviewed-by: Ned Nguyen <nednguyen@google.com>

[modify] https://crrev.com/d376d8cf98a258a996586e34073ccf0bf1de0f02/telemetry/telemetry/core/cros_interface.py
[modify] https://crrev.com/d376d8cf98a258a996586e34073ccf0bf1de0f02/telemetry/telemetry/core/cros_interface_unittest.py
[modify] https://crrev.com/d376d8cf98a258a996586e34073ccf0bf1de0f02/telemetry/telemetry/internal/backends/chrome/cros_unittest.py

Project Member

Comment 23 by bugdroid1@chromium.org, Dec 28 2017

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/0ae572670f396f1b0ec72055702d568915732739

commit 0ae572670f396f1b0ec72055702d568915732739
Author: catapult-deps-roller@chromium.org <catapult-deps-roller@chromium.org>
Date: Thu Dec 28 20:36:24 2017

Roll src/third_party/catapult/ 08db5c9b2..d376d8cf9 (1 commit)

https://chromium.googlesource.com/catapult.git/+log/08db5c9b2822..d376d8cf98a2

$ git log 08db5c9b2..d376d8cf9 --date=short --no-merges --format='%ad %ae %s'
2017-12-28 poromov telemetry: Fix IsCryptohomeMounted to support new ephemeral mounts.

Created with:
  roll-dep src/third_party/catapult
BUG= 797620 


The AutoRoll server is located here: https://catapult-roll.skia.org

Documentation for the AutoRoller is here:
https://skia.googlesource.com/buildbot/+/master/autoroll/README.md

If the roll is causing failures, please contact the current sheriff, who should
be CC'd on the roll, and stop the roller if necessary.


CQ_INCLUDE_TRYBOTS=master.tryserver.chromium.android:android_optional_gpu_tests_rel
TBR=sullivan@chromium.org

Change-Id: I498895e72fe89659be765f531cc74450efc8ad61
Reviewed-on: https://chromium-review.googlesource.com/845900
Reviewed-by: <catapult-deps-roller@chromium.org>
Commit-Queue: <catapult-deps-roller@chromium.org>
Cr-Commit-Position: refs/heads/master@{#526322}
[modify] https://crrev.com/0ae572670f396f1b0ec72055702d568915732739/DEPS

It doesn't seem that this fixed helped. Here's a link to the most recent peach_pit informational build: https://uberchromegw.corp.google.com/i/chromeos.chrome/builders/peach_pit-tot-chrome-pfq-informational/builds/9145 which contains the catapult roll-dep in comment #23. It is still suffering from the same failures.
That's bad.
According to the logs, it's indeed using the new code.

However, looks like the failure happens at another line:
https://cs.chromium.org/chromium/src/third_party/catapult/telemetry/telemetry/internal/backends/chrome/cros_browser_backend.py?l=300&rcl=1e74c0dd40c10f6360bf21ac9b5c858fe977196b

The submitted CL should fix the issue that _IsCryptohomeMounted() could return false.
Now, it looks like HasDevToolsConnection() is returning false.

Could it be because of https://chromium-review.googlesource.com/c/catapult/+/833925#299 ?
Oops, missed that it was actually fixed in https://chromium-review.googlesource.com/c/catapult/+/839140

However, it still looks like HasDevToolsConnection() is constantly returning false, while _IsCryptohomeMounted() is working:

12/28 14:15:19.875 DEBUG|    cros_interface:0058| sh -c cryptohome-path user '$guest' 
12/28 14:15:19.876 DEBUG|      global_hooks:0056| ['sh', '-c', "cryptohome-path user '$guest'"]
12/28 14:15:19.899 DEBUG|    cros_interface:0067|  > stdout=[/home/user/17ef84a4d559a9335bd7347375a08e1bf2d6f85c
], stderr=[]
12/28 14:15:19.901 DEBUG|    cros_interface:0058| sh -c /bin/df --output=source,target /run/cryptohome/ephemeral_mount/17ef84a4d559a9335bd7347375a08e1bf2d6f85c 
12/28 14:15:19.902 DEBUG|      global_hooks:0056| ['sh', '-c', '/bin/df --output=source,target /run/cryptohome/ephemeral_mount/17ef84a4d559a9335bd7347375a08e1bf2d6f85c']
12/28 14:15:19.921 DEBUG|    cros_interface:0067|  > stdout=[Filesystem     Mounted on
/dev/loop1     /run/cryptohome/ephemeral_mount/17ef84a4d559a9335bd7347375a08e1bf2d6f85c
], stderr=[]
12/28 14:15:19.924 INFO |devtools_client_ba:0145| Devtools client not yet ready: ''
Cc: glevin@chromium.org
Owner: perezju@chromium.org
The breaking CL is:
https://chromium-review.googlesource.com/c/catapult/+/833925

First breaking build:
https://uberchromegw.corp.google.com/i/chromiumos.chromium/builders/amd64-generic-tot-chromium-pfq-informational/builds/14630

catapult deps roller only picked one change which is the one above.

+Greg as chromeos gardener for the week.

Juan - please look at comment 6 for repo instructions.



Sorry for the breakage. I think I know what the problem is within the Telemetry cros backend. Will send out a CL for review shortly.
Re ephemeral cryptohome changes: I think that all possible issues (that might have affected only login_CryptohomeIncognito and security_ProfilePermissions.guest as they are only using guest sessions), should already be resolved with my 2 CLs submitted above.

The last time I looked at the logs not they were the causes for the issues, also not only these two tests were failing.
Got CL ready for review:
https://chromium-review.googlesource.com/c/catapult/+/847554

Tried running testBrowserCreation on the vm, but the test passes even before applying the fix.

Is there a Telemetry test we know is currently failing and I could run to test the change?
Just came on duty, and am still trying to get up to speed on this, so apologies if this isn't helpful, but here's the latest error from the logs
(https://uberchromegw.corp.google.com/i/chromeos/builders/amd64-generic-chromium-pfq/builds/11225):

07:23:17 INFO | autoserv| AUTOTEST_STATUS::		FAIL	login_CryptohomeIncognito	login_CryptohomeIncognito	timestamp=1514899397	localtime=Jan 02 07:23:17	Unhandled LoginException: Timed out going through login screen. Browser didn't launch.
07:23:17 INFO | autoserv| FAIL	login_CryptohomeIncognito	login_CryptohomeIncognito	timestamp=1514899397	localtime=Jan 02 07:23:17	Unhandled LoginException: Timed out going through login screen. Browser didn't launch.
07:23:17 INFO | autoserv| AUTOTEST_STATUS::  Traceback (most recent call last):
07:23:17 INFO | autoserv| AUTOTEST_STATUS::    File "/usr/local/autotest/common_lib/test.py", line 831, in _call_test_function
07:23:17 INFO | autoserv| AUTOTEST_STATUS::      return func(*args, **dargs)
07:23:17 INFO | autoserv| AUTOTEST_STATUS::    File "/usr/local/autotest/common_lib/test.py", line 495, in execute
07:23:17 INFO | autoserv| AUTOTEST_STATUS::      dargs)
07:23:17 INFO | autoserv| AUTOTEST_STATUS::    File "/usr/local/autotest/common_lib/test.py", line 362, in _call_run_once_with_retry
07:23:17 INFO | autoserv| AUTOTEST_STATUS::      postprocess_profiled_run, args, dargs)
07:23:17 INFO | autoserv| AUTOTEST_STATUS::    File "/usr/local/autotest/common_lib/test.py", line 400, in _call_run_once
07:23:17 INFO | autoserv| AUTOTEST_STATUS::      self.run_once(*args, **dargs)
07:23:17 INFO | autoserv| AUTOTEST_STATUS::    File "/usr/local/autotest/tests/login_CryptohomeIncognito/login_CryptohomeIncognito.py", line 17, in run_once
07:23:17 INFO | autoserv| AUTOTEST_STATUS::      with chrome.Chrome(logged_in=False):
07:23:17 INFO | autoserv| AUTOTEST_STATUS::    File "/usr/local/autotest/common_lib/cros/chrome.py", line 171, in __init__
07:23:17 INFO | autoserv| AUTOTEST_STATUS::      self._browser = browser_to_create.Create(finder_options)
07:23:17 INFO | autoserv| AUTOTEST_STATUS::    File "/usr/local/telemetry/src/third_party/catapult/telemetry/telemetry/internal/backends/chrome/cros_browser_finder.py", line 51, in Create
07:23:17 INFO | autoserv| AUTOTEST_STATUS::      browser_backend, self._platform_backend)
07:23:17 INFO | autoserv| AUTOTEST_STATUS::    File "/usr/local/telemetry/src/third_party/catapult/telemetry/telemetry/internal/backends/chrome/cros_browser_with_oobe.py", line 14, in __init__
07:23:17 INFO | autoserv| AUTOTEST_STATUS::      backend, platform_backend)
07:23:17 INFO | autoserv| AUTOTEST_STATUS::    File "/usr/local/telemetry/src/third_party/catapult/telemetry/telemetry/internal/browser/browser.py", line 41, in __init__
07:23:17 INFO | autoserv| AUTOTEST_STATUS::      self._browser_backend.Start()
07:23:17 INFO | autoserv| AUTOTEST_STATUS::    File "/usr/local/telemetry/src/third_party/catapult/common/py_trace_event/py_trace_event/trace_event_impl/decorators.py", line 52, in traced_function
07:23:17 INFO | autoserv| AUTOTEST_STATUS::      return func(*args, **kwargs)
07:23:17 INFO | autoserv| AUTOTEST_STATUS::    File "/usr/local/telemetry/src/third_party/catapult/telemetry/telemetry/internal/backends/chrome/cros_browser_backend.py", line 205, in Start
07:23:17 INFO | autoserv| AUTOTEST_STATUS::      + self._GetLoginStatus())
07:23:17 INFO | autoserv| AUTOTEST_STATUS::    File "/usr/local/telemetry/src/third_party/catapult/common/py_trace_event/py_trace_event/trace_event_impl/decorators.py", line 52, in traced_function
07:23:17 INFO | autoserv| AUTOTEST_STATUS::      return func(*args, **kwargs)
07:23:17 INFO | autoserv| AUTOTEST_STATUS::    File "/usr/local/telemetry/src/third_party/catapult/telemetry/telemetry/internal/backends/chrome/cros_browser_backend.py", line 326, in _RaiseOnLoginFailure
07:23:17 INFO | autoserv| AUTOTEST_STATUS::      raise exceptions.LoginException(error)
07:23:17 INFO | autoserv| AUTOTEST_STATUS::  LoginException: Timed out going through login screen. Browser didn't launch.
07:23:17 INFO | autoserv| AUTOTEST_STATUS::	END FAIL	login_CryptohomeIncognito	login_CryptohomeIncognito	timestamp=1514899397	localtime=Jan 02 07:23:17
Yes, that error message is consistent with my theory on what is causing the problem. I'm not sure however how to run that test.

$ python /usr/local/telemetry/src/third_party/catapult/telemetry/bin/run_tests login_CryptohomeIncognito

runs 0 tests.
Juan - don't worry about the autotest. I think you ought to see the same problem with:
achuith@achuithz620:~/code/catapult/telemetry/bin$ ./run_tests --browser=cros-chrome-guest --remote=localhost --remote-ssh-port=9222 testBrowserCreation

(This is after launching a cros VM using go/cros-vm)
Fix in the catapult CQ: https://chromium-review.googlesource.com/c/catapult/+/847554

It'll take a few hours for the catapult deps roller to roll this into chrome, at which point the FYI bots should cycle green. 
Cc: -semenzato@chromium.org
Project Member

Comment 36 by bugdroid1@chromium.org, Jan 2 2018

The following revision refers to this bug:
  https://chromium.googlesource.com/catapult/+/339bfb350cd01653c945d2d2e8d98e4381bfd7a2

commit 339bfb350cd01653c945d2d2e8d98e4381bfd7a2
Author: Juan Antonio Navarro Perez <perezju@google.com>
Date: Tue Jan 02 17:26:13 2018

[Telemetry] Fix binding to devtools agent on CrOS after login

An earlier refactor on how browser backends connect to the devtools
agent broke the connection on CrOS browser after logging in.

Before `HasBrowserFinishedLaunching` (called in L300 within the
_GetLoginStatus) used to implicitly update devtools ports and create a
port forwarder.

This should be now done explicitly by calling `BindDevToolsClient`
before trying to wait for the login status.

Bug:  chromium:797620 
Change-Id: I893d2de7867970cb816134bb2b08c8225c06b829
Reviewed-on: https://chromium-review.googlesource.com/847554
Commit-Queue: Juan Antonio Navarro Pérez <perezju@chromium.org>
Reviewed-by: Achuith Bhandarkar <achuith@chromium.org>

[modify] https://crrev.com/339bfb350cd01653c945d2d2e8d98e4381bfd7a2/telemetry/telemetry/internal/backends/chrome/cros_browser_backend.py

Cc: nedngu...@google.com
Looks like my CL rolled into and was tested at:
https://uberchromegw.corp.google.com/i/chromiumos.chromium/builders/amd64-generic-tot-chromium-pfq-informational/builds/14750

However, login_CryptohomeIncognito is still failing (trimmed to only show the most relevant bits):

Traceback (most recent call last):
  File "login_CryptohomeIncognito.py", line 17, in run_once
    with chrome.Chrome(logged_in=False):
  File "telemetry/internal/backends/chrome/cros_browser_backend.py", line 202, in Start
    self._WaitForLogin()
  File "telemetry/internal/backends/chrome/cros_browser_backend.py", line 316, in _WaitForLogin
    self.BindDevToolsClient()
  File "telemetry/internal/backends/chrome/chrome_browser_backend.py", line 184, in BindDevToolsClient
    raise exceptions.BrowserConnectionGoneException(self.browser, e)
BrowserConnectionGoneException: Timed out while waiting 240s for IsAgentReady.

====

Looking through the logs I see that the first call to self.BindDevToolsClient() at line 176 in the Start method succeeds:
https://cs.chromium.org/chromium/src/third_party/catapult/telemetry/telemetry/internal/backends/chrome/cros_browser_backend.py?rcl=7365f02611830723d60963f0619d5bab45060849&l=176

01/02 21:47:52.213 INFO |cros_browser_backe:0076| Discovered ephemeral port 42080
01/02 21:47:52.214 INFO |cros_browser_backe:0077| Browser target: /devtools/browser/ee7317d9-2f6c-4580-b69a-4611e988bf9c
01/02 21:47:52.614 INFO |              oobe:0049| Invoking Oobe.guestLoginForTesting

However the second call (which now happens within _WaitForLogin) "discovers" the same devtools port and then fails to connect to it:

01/02 21:47:56.922 INFO |cros_browser_backe:0076| Discovered ephemeral port 42080
01/02 21:47:56.922 INFO |cros_browser_backe:0077| Browser target: /devtools/browser/ee7317d9-2f6c-4580-b69a-4611e988bf9c
01/02 21:47:56.923 INFO |devtools_client_ba:0127| Websocket at ws://127.0.0.1:42080/devtools/browser/ee7317d9-2f6c-4580-b69a-4611e988bf9c not yet ready: [Errno 111] Connection refused
01/02 21:47:56.924 INFO |devtools_client_ba:0127| Websocket at ws://127.0.0.1:42080/devtools/browser/ee7317d9-2f6c-4580-b69a-4611e988bf9c not yet ready: [Errno 111] Connection refused
01/02 21:47:57.025 INFO |devtools_client_ba:0127| Websocket at ws://127.0.0.1:42080/devtools/browser/ee7317d9-2f6c-4580-b69a-4611e988bf9c not yet ready: [Errno 111] Connection refused
[ .. and so on .. ]

Comparing the code before/after my change in https://chromium-review.googlesource.com/c/catapult/+/833925 the problem is that, in the past, when "HasBrowserFinishedLaunching" failed it then retried the entire "read DevToolsActivePort file to discover ports" which, I assume, eventually got updated and produced the new correct ports.

In the new code "BindDevToolsClient" only tries to read the DevToolsActivePort file once, and then repeatedly tries to connect to that port. So there is a race between us trying to read the file and it getting updated by Oobe.guestLoginForTesting.

I can think of two possible ways forward:

1. Remove the DevToolsActivePort file before calling Oobe.guestLoginForTesting.

2. Make BindDevToolsClient also retry reading the DevToolsActivePort file.

I think (1) is a quick cheap fix, but maybe (2) is the correct one. It's tricky because when the cri is not local, the old code also used to "retry" creating a forwarder without properly disposing off the previous one. But when trying to "fix" that bug by moving the forwarder creation out of the retry, it seems that I created now an even larger bug.

I'll start working on (2). +nednguyen FYI since I'll probably have to do a similar change for the android platform too.

Also, not sure why my local tests on the cros vm didn't catch this? Could that be because of the race? Or is there some way of running testBrowserCreation so it would have caught this?
Found that running the test *within* the vm and *with* the guest user, i.e.:

(vm) $ python /usr/local/telemetry/src/third_party/catapult/telemetry/bin/run_tests --browser=cros-chrome-guest testBrowserCreation

reproduces the latest failure (other combinations did not reproduce, precisely because of the race I pointed out).

Also verified locally a fix for that one:
https://chromium-review.googlesource.com/c/catapult/+/848793
Project Member

Comment 39 by bugdroid1@chromium.org, Jan 3 2018

The following revision refers to this bug:
  https://chromium.googlesource.com/catapult/+/19e11600d3a9410651e6c12e801a03eeca7274cc

commit 19e11600d3a9410651e6c12e801a03eeca7274cc
Author: Juan Antonio Navarro Perez <perezju@google.com>
Date: Wed Jan 03 18:26:34 2018

[Telemetry] Re-read DevTools config if agent is not ready

On some platforms, if the agent is not ready it could mean that
the DevTools config we got is for an older agent. It's thus important
to re-read the config on each retry.

Bug:  chromium:797620 
Change-Id: Ic21492f68bb884204b2d3b9adfcec1959467c871
Reviewed-on: https://chromium-review.googlesource.com/848793
Commit-Queue: Juan Antonio Navarro Pérez <perezju@chromium.org>
Reviewed-by: Ned Nguyen <nednguyen@google.com>
Reviewed-by: Achuith Bhandarkar <achuith@chromium.org>

[modify] https://crrev.com/19e11600d3a9410651e6c12e801a03eeca7274cc/telemetry/telemetry/internal/backends/chrome/chrome_browser_backend.py
[modify] https://crrev.com/19e11600d3a9410651e6c12e801a03eeca7274cc/telemetry/telemetry/internal/backends/chrome/android_browser_backend.py
[modify] https://crrev.com/19e11600d3a9410651e6c12e801a03eeca7274cc/telemetry/telemetry/internal/backends/chrome/cros_browser_backend.py

With the CL above in comment 39 we believe the PFQ should pass?

The errors look similar in graphics_Idle failures like https://pantheon.corp.google.com/storage/browser/chromeos-autotest-results/167226823-chromeos-test/chromeos4-row6-rack12-host15/graphics_Idle/debug/ 
 (e.g. "Timed out going through login screen. Browser didn't launch.") so I am hoping this fixes that also. 

Is this failure only related to testing or is this likely to impact end users in some way?

Yup, graphics_Idle uses the guest browser (http://crbug.com/798876), so that should be fixed as well. It looks like Juan's latest fix should get everything green again.

This doesn't impact end users.
Status: Fixed (was: Started)
Neat! Bots are looking green now.

Again, sorry for the breakage. Closing this as fixed.

On  issue 787834  I'll follow up adding some tests to prevent a similar problem to arise again in the future, and continue the refactor to make our code more robust in this area.
Sorry for the inconvenience, due to a lack of cros presence in both the catapult CQ and chromium CQ, and thanks for the fixes!

Sign in to add a comment