Chrome crash in desktopui_MashLogin (chrome-informational suite in caroline-tot-chrome-pfq-informational) |
||||||||||||
Issue descriptionChrome is crashing in desktopui_MashLogin in the chrome-informational test suite in the PFQ informational builders: https://cros-goldeneye.corp.google.com/chromeos/legoland/builderHistory?buildConfig=caroline-tot-chrome-pfq-informational&buildBranch=master The crash is in GetGpuFeatureData(gpu::GpuFeatureInfo and may not be Mash specific.
,
Nov 7
@stevenjb@chromium.org Are you sure you assigned this to the correct person? I don't think I should be fixing this. I'll assign it to you to double check.
,
Nov 7
Nope, sorry, Monorail fail.
,
Nov 7
sadrul/mohsen, do you know of any recent GPU feature data changes? I recall a similar problem a couple years ago in issue 669965. There's a workaround related to GPU data collection in the mash autotests: https://chromium.googlesource.com/chromiumos/third_party/autotest/+/master/client/site_tests/desktopui_MashLogin/desktopui_MashLogin.py#52
,
Nov 8
This issue appears to have resolved itself on ToT; the failure did not appear in the latest run of any of the pfq-informational builders that run chrome-informational. FWIW, I tried to reproduce this locally on amd64-generic, syncing chrome to a version that failed on a couple of the builders and running the following in my chroot to a VM with a deployed chrome build, but the test did not fail: test_that --board=amd64-generic localhost:9222 desktopui_MashLogin
,
Nov 8
Huh. I wonder if it only repros on hardware. That's kinda scary.
,
Nov 8
It may not be VM vs HW, it may be local setup vs. builders. I'm waiting on build_packages for my board so that I can run the auto test on my device.
,
Nov 8
OK, I confirmed that it is failing on a device and not in a VM, and that it is not failing on ToT. There are a number of reason why it might only be failing on hardware. I suspect that the code path leading to the crash simply isn't triggered in a non internal build, or on a VM instead of a device. I don't think it is necessarily worth the effort to investigate further at this point, closing this as WontFix.
,
Nov 8
SGTM, thanks for looking.
,
Nov 14
This showed up again: https://cros-goldeneye.corp.google.com/chromeos/healthmonitoring/buildDetails?id=3131048 I suspect a CL was un-reverted, I'll start looking for that.
,
Nov 14
,
Nov 14
And here are the changes for the run where this first showed up: https://chromium.googlesource.com/chromium/src/+log/d014a672eb4bf7cb281303a0723c0e4e9cb9d2c5..60802eced856d89133f211ef29df6124f29401bd?pretty=fuller&n=10000 And when it started passing again: https://chromium.googlesource.com/chromium/src/+log/31b3c895a586ba11cd0812ec0aec7af5229bfe72..d0a70f4bc16702f0d32435754f280d933959a810?pretty=fuller&n=10000 That list of changes where we stopped seeing the crash includes the following CL, which landed in the range where we first saw the crash and is a very likely suspect given the call stack: [media] Disable MojoVideoDecoder by default on CrOS. This is a partial revert of commit 44323658c5cc4fff0a2d4a0bcd1d4aa9a7290c87. Bug: issue 902900 Reviewed-on: https://chromium-review.googlesource.com/c/1325072 And the list of recent changes where the problem reappeared includes: MojoVideoDecoder: Add histogram to track initialization and error. The autotest video_ChromeHWDecodeUsed verifies Chrome used the HW decoding when playing a video stream by checking the histogram value, which is reported from GpuVideoDecoder. In order to keep the test pass during switching from GpuVideoDecoder to MojoVideoDecoder, we have to report the same histogram value temporarily from MojoVideoDecoder. With this CL, we can enable MojoVideoDecoder at ChromeOS platform. BUG=issue 902968 Reviewed-on: https://chromium-review.googlesource.com/c/1328634 +sandersd@, +akahuang@
,
Nov 14
This appears to be a solid blame, as both CLs toggle the MojoVideoDecoder feature. I am not familiar with these tests and can't determine the actual cause of failure though. The newer CL was intended to make the new code look like the old code to UMA-based CrOS tests, and we were not expecting to see device-specific failures. This CL is reasonably safe to revert if a fast resolution is required.
,
Nov 14
These are informational tests to prevent --enable_features=Mash from regressing, so it isn't urgent per-se, but we do want to avoid the regression (and it may be indicative of an incorrect assumption). I will confirm the blame before considering a revert, and will try to do at least some preliminary investigation.
,
Nov 14
I have confirmed that: a) change 6aac2af34906d6ed98d74773711cddba39ca9825 (https://chromium-review.googlesource.com/c/chromium/src/+/1328634) is the culprit. b) just enabling the "MojoVideoDecoder" by default causes the test to fail. Given that there are a number of changes behind that switch, I don't feel particularly qualified to investigate much further :) I recommend either: 1) Revert the CL and investigate and test the Mash behavior before re-landing. 2) Make a change to wrap the feature check, e.g. IsMojoVideoDecoderEnabled(), and disable it on Mash with a TODO referencing this bug (or a new one). Since this is an informational builder it isn't urgent, but if we don't have a fix or workaround by tomorrow we will need to do a revert.
,
Nov 14
Without a meaningful log of the test, it's hard for me to shed any light here, but I probably won't be the one developing the fix either. Probably MashLogin shouldn't care how videos are being decoded. I'm assuming that this test is running on the same set of hardware as the other CrOS video tests, so it's not that video decoding is fully broken under MojoVideoDecoder on this device, right? I'd rather revert than add conditional logic.
,
Nov 15
It's possible (likely even) that ModoVideoDecoder is generally broken on Mash, but the crash is not tied to testing that feature per-se. We only run a minimal test suite (desktopui_MashLogin) on Mash to ensure that things do not get too badly broken while the code is in development. There are many known issues and it is a long ways from shipping. Something in the test is triggering the crash. Details can be found here: http://cautotest-prod/tko/retrieve_logs.cgi?job=/results/257017957-chromeos-test/
,
Nov 15
Unfortunately this stack trace isn't very believable, GetGpuFeatureData() doesn't *do* anything related to video decoding. If the line information is to be believed, it's |ImageTransportFactory::GetInstance()->IsGpuCompositingDisabled()| that's crashing, and that certainly doesn't depend on the MojoVideoDecoder feature in any way. I suppose it could also be a fluke that the failures align with these CLs, and the problem is actually in creating the ImageTransportFactory. This at least seems to be affected by Mash flags in some way.
,
Nov 15
Since we don't have another solution, I am going to go ahead and plan to revert https://chromium-review.googlesource.com/c/chromium/src/+/1328634 for now.
,
Nov 16
OK, I will revert the CL first, then figure out what is the best way to do this.
,
Nov 16
https://cros-goldeneye.corp.google.com/chromeos/legoland/builderHistory?buildConfig=caroline-tot-chrome-pfq-informational&buildBranch=master I saw several latest builds passed, and didn't find the failure about desktopui_MashLogin. Do we still need to revert the CL?
,
Nov 16
I believe it's still failing. See https://cros-goldeneye.corp.google.com/chromeos/legoland/builderHistory?buildConfig=tricky-tot-chrome-pfq-informational&buildBranch=master . I'm going to revert.
,
Nov 19
,
Nov 19
The crash happened here: https://cs.chromium.org/chromium/src/content/browser/gpu/compositor_util.cc?q=compositor_util.cc&sq=package:chromium&dr&l=86 ImageTransportFactory::GetInstance() return nullptr after switching to MojoVideoDecoder. We need to handle nullptr first before calling its method.
,
Nov 19
,
Nov 19
,
Nov 20
,
Nov 20
I believe the revert fixed this, right? At least I haven't gotten any new failures since the revert.
,
Nov 20
Yes, #24 is for the crash when I enable MojoVideoDecoder. Because we still want to enable it in the future. Maybe I should create another issue for tracking this?
,
Nov 20
> Maybe I should create another issue for tracking this? SGTM. I recommend closing this bug out.
,
Nov 21
|
||||||||||||
►
Sign in to add a comment |
||||||||||||
Comment 1 by steve...@chromium.org
, Nov 7