New issue
Advanced search Search tips
Note: Color blocks (like or ) mean that a user may not be available. Tooltip shows the reason.

Issue 911159 link

Starred by 2 users

Issue metadata

Status: Fixed
Merged: issue 910029
Owner:
Closed: Dec 4
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: Fuchsia
Pri: 1
Type: Bug



Sign in to add a comment

[Fuchsia] crash_analyzer failed to launch to analyze crash

Project Member Reported by erikc...@chromium.org, Dec 3

Issue description

Unrelated CL [skia roll]: https://chromium-review.googlesource.com/c/chromium/src/+/1358079/2

Build #1: https://ci.chromium.org/p/chromium/builders/luci.chromium.try/fuchsia_x64/162201

The test NavigationControllerTest.BackSubframe flakes. 

When we retry the test, the test runner hits an exception

Retry:
https://chromium-swarm.appspot.com/task?id=418a07cb730f5c10&refresh=10

"""
[00111.014] devmgr: crash_analyzer_listener: analyzing exception type 0x108
[00111.046] [ERROR:garnet/lib/loader/package_loader.cc(55)] Could not open directory /pkgfs/packages/crashpad_analyzer/0 No such file or directory
[00111.058] devmgr: crash_analyzer_listener: failed to analyze crash: -24 (ZX_ERR_PEER_CLOSED)
[00111.070] [ERROR:garnet/bin/sysmgr/app.cc(154)] Singleton fuchsia-pkg://fuchsia.com/crashpad_analyzer#meta/crashpad_analyzer.cmx died
[00111.088] pkgsvr: error closing file: close /blob/33bcc6c8d26a2327b5b6a29c8665240b643a6e8dadeaac908d36cf2d9b02d056: ErrPeerClosed: zx.Channel.Call
[00111.099] pkgsvr: error closing file: close /blob/33b0ca7c67a9171facfac0c853212463b12828f8c60dd26e0515718e61e8810f: ErrPeerClosed: zx.Channel.Call
[00111.113] pkgsvr: error closing file: close /blob/33c6c5e37041c83b68ae0362587f0b8e5cc4ee6b2baa22bb783bbb60b080df70: ErrPeerClosed: zx.Channel.Call
[00111.126] pkgsvr: error closing file: close /blob/00f997b759cd06e2a35a04832ae767f040937a9c21dea6711fbe6b29d6727ff8: ErrPeerClosed: zx.Channel.Call
Traceback (most recent call last):
  File "/b/s/w/ir/build/fuchsia/test_runner.py", line 123, in <module>
    sys.exit(main())
  File "/b/s/w/ir/build/fuchsia/test_runner.py", line 110, in main
    args.package_dep, child_args, run_package_args)
  File "/b/s/w/ir/build/fuchsia/run_package.py", line 143, in RunPackage
    raise Exception('Error while installing: %s' % '\n'.join(output))
Exception: Error while installing: 2018/12/02 23:19:33 meta.far written, starting blobs

2018/12/02 23:19:42 Error writing to blob "33bcc6c8d26a2327b5b6a29c8665240b643a6e8dadeaac908d36cf2d9b02d056": write /pkgfs/install/blob/33bcc6c8d26a2327b5b6a29c8665240b643a6e8dadeaac908d36cf2d9b02d056: ErrInternal: io.file

2018/12/02 23:19:42 Error writing to blob "33b0ca7c67a9171facfac0c853212463b12828f8c60dd26e0515718e61e8810f": write /pkgfs/install/blob/33b0ca7c67a9171facfac0c853212463b12828f8c60dd26e0515718e61e8810f: ErrInternal: io.file

2018/12/02 23:19:42 Error truncating blob "33c6c5e37041c83b68ae0362587f0b8e5cc4ee6b2baa22bb783bbb60b080df70" to 1112: ErrInternal: io.file

2018/12/02 23:19:42 Error writing to blob "00f997b759cd06e2a35a04832ae767f040937a9c21dea6711fbe6b29d6727ff8": write /pkgfs/install/blob/00f997b759cd06e2a35a04832ae767f040937a9c21dea6711fbe6b29d6727ff8: ErrInternal: io.file

2018/12/02 23:19:42 Error creating blob in blobfs: open /pkgfs/install/blob/33c7bd2ac9fab0aa9194dd67124498e77f520d250ea6e75db5678b0239c6f58b: ErrInternal: io.directory

installation finished with errors
"""

When we retry the whole build:
https://ci.chromium.org/p/chromium/builders/luci.chromium.try/fuchsia_x64/162204

A different test flakes: FrameServiceBaseTest.ConnectionError

But succeeds on retry.
 
Mergedinto: 910029
Owner: ----
Status: Duplicate (was: Assigned)
Components: Internals>PlatformIntegration
Labels: -Pri-3 M-73 Pri-1
Owner: w...@chromium.org
Status: Started (was: Duplicate)
Summary: [Fuchsia] Test steps fail due to blobfs garbage-collection racing against package installation (was: Flakiness + invalid test results in fuchsia content_unittests.)
Initially thought this was PKG-364, which was fixed on Friday, but it looks like it is yet another new platform bug; will take a look.
OK, so there are two issues being reported in this bug:

1. content_unittests is flaking in these runs because of  issue 910029 , which appears to be a newly-introduced test-suite race-condition between ScopedTempDir teardown, and initialization of SimpleBackendImpl into that dir.  I had hoped that that was related to  issue 910645  but I've reverted the CLs for that, so I guess not.

2. Retry without patch is failing seemingly because of crash_analyzer being broken or missing; that's a new issue (possibly new since the fix for PKG-364 landed...)

We'll use this bug to track #2, and  issue 910029  to track the actual test flake.

Note that as far as we're aware at this point, the test flakes are not Fuchsia-specific and content_unittests have been stable for quite some time, until US Thanksgiving week. Suspect that these flakes crept in via the existing CQ retries, unfortunately.
Fix the cash analyzer first than use the
the test the fleek. its best to try to always fix the broken critical one
first might reduce the errors in the fleek after running it
Summary: [Fuchsia] crash_analyzer failed to launch to analyze crash (was: [Fuchsia] Test steps fail due to blobfs garbage-collection racing against package installation)
As per the output that erikchen@ provided, we seem to be failing to load the crash_analyzer in response to a crash early during the (without patch) retry.

The timing suggests that it is actually some part of the package filesystem that is crashing.
Status: ExternalDependency (was: Started)
Filed bug PKG-368 against Fuchsia for this issue, and am actively tracking w/ that team.
Cc: frousseau@google.com
Is crashpad_analyzer being correctly included? I wonder if this is a side-effect of crashpad being newly always-on but not fully included in the sdk or Chromium builds? (+frousseau)

(I suppose it might be updating at that moment too in theory, but that seems farther fetched.)
(I think there could be 3 issues here: 1) a test flake; 2) a crash in storage or wherever; 3) crashpad_analyzer missing so not a very nice log report.)
Re #7: Yup, I've filed a couple of Fuchsia bugs, one for us to always print the name of the crashing process, one about crashpad_analyzer being used but not present. :)
Status: Fixed (was: ExternalDependency)
DX-738: Provide crashpad_analyzer in Fuchsia SDK.
DX-740: Fall back to crash_analyser if crashpad_analyzer is missing.

DX-738 is done, so we should now get crash reports to diagnose issues of this sort -> closing this out as Fixed; please file new bugs for any observed crashes.

Sign in to add a comment