"content_shell_crash_test (with patch)" is flaky |
|||||||||||||||
Issue description"content_shell_crash_test (with patch)" is flaky. This issue was created automatically by the chromium-try-flakes app. Please find the right owner to fix the respective test/step and assign this issue to them. If the step/test is infrastructure-related, please add Infra-Troopers label and change issue status to Untriaged. When done, please remove the issue from Sheriff Bug Queue by removing the Sheriff-Chromium label. We have detected 3 recent flakes. List of all flakes can be found at https://chromium-try-flakes.appspot.com/all_flake_occurrences?key=ahVzfmNocm9taXVtLXRyeS1mbGFrZXNyMAsSBUZsYWtlIiVjb250ZW50X3NoZWxsX2NyYXNoX3Rlc3QgKHdpdGggcGF0Y2gpDA. The chromium-try-flakes app is able to file bugs for individual tests when the test launcher is uploading results to the Test Results Server. If recent flakes above are caused by failing tests and you would like to have them filed as invidual bugs, please read more at https://goo.gl/QJKXV4. This flaky test/step was previously tracked in issue 798579.
,
Jan 24 2018
Avi, Do you have an idea why this is, or who has?
,
Jan 24 2018
,
Jan 24 2018
+ thakis as compiler peep We're crashing intentionally, via a call to CrashIntentionally(), which is marked NOINLINE. We need to not inline is so that "CrashIntentionally" shows up in the stack walk. Yet it's not. Has anything changed compiler-wise?
,
Jan 24 2018
iirc there's an (old) crbug somewhere that NOINLINE isn't always honored. Is CrashIntentionally() in a separate CL as the caller? (i'm not aware of any recent changes in any case)
,
Jan 24 2018
"Is CrashIntentionally() in a separate CL as the caller?" CL? Compilation unit? No, both are in content/renderer/render_frame_impl.cc in an anonymous namespace, as they're implementation details of a RenderFrameImpl member function.
,
Jan 24 2018
Different TU, sorry. Maybe try moving it to a separate TU if you're in a rush. Maybe the increase in linux flakes made us notice the Windows problem for the first time? If so, and we're not in a rush, file a dedicated bug for the Windows thing and we (compiler folks) will try to look at it.
,
Jan 24 2018
rhalavati, is this new in Windows, or has this changed recently?
,
Jan 25 2018
There's a deliberate attempt to avoid inlining of this function using the NOINLINE directive: https://cs.chromium.org/chromium/src/content/renderer/render_frame_impl.cc?q=CrashIntentionally&sq=package:chromium&l=823 This is a new regression and it's flaking all over the place on win7_chromium_rel_ng: https://ci.chromium.org/buildbot/tryserver.chromium.win/win7_chromium_rel_ng/?limit=200 specifically, for example: https://ci.chromium.org/buildbot/tryserver.chromium.win/win7_chromium_rel_ng/88066 https://ci.chromium.org/buildbot/tryserver.chromium.win/win7_chromium_rel_ng/88064 https://ci.chromium.org/buildbot/tryserver.chromium.win/win7_chromium_rel_ng/88063 https://ci.chromium.org/buildbot/tryserver.chromium.win/win7_chromium_rel_ng/88057 https://ci.chromium.org/buildbot/tryserver.chromium.win/win7_chromium_rel_ng/88054 https://ci.chromium.org/buildbot/tryserver.chromium.win/win7_chromium_rel_ng/88053 https://ci.chromium.org/buildbot/tryserver.chromium.win/win7_chromium_rel_ng/88052 https://ci.chromium.org/buildbot/tryserver.chromium.win/win7_chromium_rel_ng/88048 https://ci.chromium.org/buildbot/tryserver.chromium.win/win7_chromium_rel_ng/88046 I don't know why it's not failing on the associated waterfall bots: https://ci.chromium.org/buildbot/chromium.win/Win7%20Tests%20%28dbg%29%281%29/?limit=200 https://ci.chromium.org/buildbot/chromium.win/Win10%20Tests%20x64/?limit=200 This is urgent; it's dramatically affecting the CQ. Did Clang roll recently?
,
Jan 25 2018
,
Jan 25 2018
I'll try moving these functions to a different translation unit.
,
Jan 25 2018
Re comment 9: No, the last clang roll was 3 weeks ago (cf https://chromium.googlesource.com/chromium/src/+log/master/tools/clang/scripts/update.py). If it started flaking recently, something else changed and my memory about NOINLINE being ignored is probably incorrect.
,
Jan 25 2018
Hard to tell. Probing back 1600 builds on win7_chromium_rel_ng still shows evidence of it: https://ci.chromium.org/buildbot/tryserver.chromium.win/win7_chromium_rel_ng/86607 but that's only as far back as yesterday. Chromium developers send a *lot* of tryjobs. +stgao as FYI chromium-try-flakes isn't discovering all of these flakes: https://chromium-try-flakes.appspot.com/all_flake_occurrences?key=ahVzfmNocm9taXVtLXRyeS1mbGFrZXNyMAsSBUZsYWtlIiVjb250ZW50X3NoZWxsX2NyYXNoX3Rlc3QgKHdpdGggcGF0Y2gpDA maybe because it usually fails both with and without the patch.
,
Jan 25 2018
I also can't find this test on test-results.appspot.com.
,
Jan 25 2018
Re comment 8: I just saw it on Win7.
,
Jan 25 2018
seanmccullough@ - is it possible to make this builder step upload results to test-results.appspot.com?
,
Jan 25 2018
,
Jan 25 2018
chromium-try-flakes relies on test-results.appspot.com, so it won't be able to detect such flakes if results are not uploaded there.
,
Jan 25 2018
It's possible I made a mistake in my query of test-results.appspot.com. Can anyone else find the results being uploaded there? chromium-try-flakes is seeing at least some of the flakes. https://chromium-review.googlesource.com/885901 is inbound attempting to work around this bug. The CQ seemed to be disabled last night.
,
Jan 25 2018
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/538bdbcc1a8c67077ca10c8d68fb6efff7d40958 commit 538bdbcc1a8c67077ca10c8d68fb6efff7d40958 Author: Ken Russell <kbr@chromium.org> Date: Thu Jan 25 18:13:07 2018 Move crash helper functions to another file. This should help prevent the compiler from attempting to inline them. Bug: 805409 Change-Id: I282c12f09048fef3cb24b8c1b1d14c1a0ca705f0 Reviewed-on: https://chromium-review.googlesource.com/885901 Commit-Queue: Kenneth Russell <kbr@chromium.org> Reviewed-by: Jochen Eisinger <jochen@chromium.org> Reviewed-by: Avi Drissman <avi@chromium.org> Cr-Commit-Position: refs/heads/master@{#531945} [modify] https://crrev.com/538bdbcc1a8c67077ca10c8d68fb6efff7d40958/content/renderer/BUILD.gn [add] https://crrev.com/538bdbcc1a8c67077ca10c8d68fb6efff7d40958/content/renderer/crash_helpers.cc [add] https://crrev.com/538bdbcc1a8c67077ca10c8d68fb6efff7d40958/content/renderer/crash_helpers.h [modify] https://crrev.com/538bdbcc1a8c67077ca10c8d68fb6efff7d40958/content/renderer/render_frame_impl.cc
,
Jan 26 2018
Unfortunately that didn't fix the problem. Here's a tryjob that picked up the Chromium revision in which that CL landed: https://ci.chromium.org/buildbot/tryserver.chromium.win/win7_chromium_rel_ng/88769 It still failed content_shell_crash_test here: https://chromium-swarm.appspot.com/task?id=3b481015cb228310&refresh=10&show_raw=1 There have been some internal email threads about Windows debug information and some work still to be done. Could this failure be related? Can someone download the above isolate and see whether that's likely the issue? I'm going to disable this test on Windows for the moment; it's affecting the CQ too badly.
,
Jan 26 2018
Re comment 21: That thread was about tail call optimization. If we were to tailcall into the NOINLINE function, we'd miss the function below the NOINLINE function from the stack, not the NOINLINE function itself, right? CrashIntentionally() doesn't call anything itself.
,
Jan 26 2018
I'll see if I can repro.
,
Jan 26 2018
It works OK locally on Win10. At first I was thinking that it might be because the pdb isn't getting onto the bot. That might be a problem; I don't see anything obvious in https://cs.chromium.org/chromium/src/content/shell/BUILD.gn where the pdb gets packaged. But looking at a local run compared with the output in https://chromium-swarm.appspot.com/task?id=3b43a6dda2d9b510&refresh=10&show_raw=1 it looks to me like cdb isn't even running. There should at a minimum be some logo header, etc. from cdb when it starts running. I haven't been able to find a swarming run where it did work (does anyone know of one?). Looking at https://cs.chromium.org/chromium/src/testing/buildbot/chromium.win.json I'm not confident that this test is even run on either of the waterfall bots linked to in #c9.
,
Jan 26 2018
Ah, it is run on Win 7 Tests 1 though https://ci.chromium.org/buildbot/chromium.win/Win7%20Tests%20%281%29/76166 and that run includes https://chromium-swarm.appspot.com/task?id=3b3ebfc8fc3d9610&refresh=10&show_raw=1 which shows it working, but without output from cdb. So I guess some of stdout or stderr are dropped on swarming.
,
Jan 26 2018
The swarming output doesn't include the .dmp unfortunately, so can't check that vs. reality.
,
Jan 26 2018
Aha, but on a Win7 VM, reproing the failing swarming run fails: python swarming.py reproduce -S chromium-swarm.appspot.com 3b43a6dda2d9b510 whereas the successful one from the bot passes: python swarming.py reproduce -S chromium-swarm.appspot.com 3b3ebfc8fc3d9610 Progress! Maybe.
,
Jan 26 2018
Running python swarming.py reproduce -S chromium-swarm.appspot.com 3b43a6dda2d9b510 directly quietly fails. But running the isolate script test that I believe it runs direct causes a cdb crash dialog. I'm not 100% certain that this is the problem as I can't see how/where the script would be suppressing that crash dialog, but that might be the problem.
,
Jan 26 2018
Two more pieces of information in support of it just being a cdb crash: - Loading that dump manually in windbg displays the right stack - Running the manual command line per the screenshot above with the same retrieved swarming inputs works as expected on Win10. The crash looks like it's in kernelbase.dll, so I think this is likely that the cdb that it's trying to run is just failing on the trybot machines. But I don't know exactly what we should do to fix it. From #c25, the successful slave (vm73-m4) superficially looks very similar to the failing one (vm1372-m4). I'm not sure how those machines are imaged/provisioned/updated though, so it's possible one is a slightly different patchlevel than the other.
,
Jan 26 2018
Yeah, this looks https://cs.chromium.org/chromium/src/build/win/BUILD.gn?type=cs&q=cdb.exe&sq=package:chromium&l=24 a wee bit sketchy.
,
Jan 26 2018
are the trybots running the latest updates for windows?
,
Jan 26 2018
I take it back somewhat, I think that's semi-reasonable as those are redist. Fiddling around a little more, it seems that an older cdb (10.0.14321.1024) runs, whereas the one we're using now (10.0.15063.468) doesn't on this Win7. So I'd guess an compiler update (not clang, but the Microsoft part) probably is when this broke. (?)
,
Jan 26 2018
My suspect for when this broke would have been way back at: https://bugs.chromium.org/p/chromium/issues/detail?id=683729#c158 (noting that that was flipping on a toolchain packaged at https://bugs.chromium.org/p/chromium/issues/detail?id=683729#c146 which mentions debuggers from 15063.468) But looking at the CL where it landed: https://chromium-review.googlesource.com/c/chromium/src/+/665637 content_shell_crash_test ran https://ci.chromium.org/buildbot/tryserver.chromium.win/win_chromium_rel_ng/534228 and allegedly worked https://chromium-swarm.appspot.com/task?id=38a07ef64ecf8e10&refresh=10&show_raw=1 It ran on vm19-m4. I wonder if the "low numbered" vmNN-m4 (73, 19) that were successful are somehow in a different image bucket than the "high numbered " vm1372-m4 that failed?
,
Jan 26 2018
I found the failing slaves that kbr linked to above: vm1372-m4 vm1320-m4 vm170-m4 vm1292-m4 vm311-m4 vm88-m4 vm1411-m4 vm169-m4 vm15-m4 So the numbering seems irrelevant (as you might expect). I'm going to try to figure out if win 7 tests 1 shares bots with win7_chromium_rel_ng or not.
,
Jan 26 2018
scottmg@: yes, sorry about https://cs.chromium.org/chromium/src/build/win/BUILD.gn?type=cs&q=cdb.exe&sq=package:chromium&l=24 . I didn't know how else to express the redistributable dependencies of cdb. If there's a better way I'm happy to fix it. Do you think we're missing a DLL dependency? If the cdb from the newer Windows toolchain is zipped up manually and run on one of the older Win7 VMs, does it run?
,
Jan 26 2018
Also, can I ask you to take this given that it looks like a Windows toolchain problem? Thanks in advance.
,
Jan 26 2018
win7_chromium_rel_ng almost surely shares bots with "Win7 Tests (1)". These tests are Swarmed, and they surely use the same Swarming dimensions.
,
Jan 26 2018
(I don't work on Windows any more, and I'm in a different CC now, but I'm nerd sniped at this point, so yeah :) Nah, I don't have a better solution, other than offloading to ops to have a working cdb on the machine. If I install windbg/cdb via the Windows SDK 10.0.15063.468, cdb runs fine from that install. So either we're missing a dll, or it updates something in Windows during install...
,
Jan 26 2018
Oh wait! I got confused amongst all the directories and VMs... I believe the correct versions what work/don't are: - 10.0.15063.468 works - 10.0.16299.15 doesn't. I don't know what 16299.15 is or where it came from. The normal download at https://developer.microsoft.com/en-us/windows/hardware/download-windbg gives me the 15063.468 sdk. Did we package the Store Preview version for some reason? It wouldn't surprise me that that doesn't work on 7.
,
Jan 26 2018
Oh ffs Microsoft, that page doesn't actually download the latest sdk. Apparently https://docs.microsoft.com/en-us/windows-hardware/drivers/debugger/.
,
Jan 26 2018
Installing Debugging Tools for Windows from 10.0.16299.91 (which is what the link above gets me today) on the Win7 vm installs fine, and runs OK. The packaged version however, continues to not run (that's 10.0.16299.15). I'm not sure if it's the difference in version, or that it updated Windows to make the .91 one work. ... Reverting the VM to pre-install, and copying the 16299.91 cdb on to it (not installing it) runs fine. So it seems plausible that .15 is just broken on Win7.
,
Jan 26 2018
Got it. We're "just" missing good ol' api-ms-win-eventing-provider-l1-1-0.dll. Of course. (The fact that we use an x86 cdb instead of x64 made things extra confusing, but anyway... The upside is that it shouldn't need a toolchain repackage to fix, which is good because I believe the magic toolchain repackager (Bruce) is still on vacation.)
,
Jan 26 2018
BTW, if it's that, can we revert 538bdbcc1a8c67077ca10c8d68fb6efff7d40958?
,
Jan 26 2018
avi: Yeah, I think we should be able to. Let me land a fix first https://crrev.com/c/889701 and confirm things are OK first, and then we can revert that separately.
,
Jan 26 2018
kbr: Just to confirm, you didn't end up disabling anything per #c21, right? I see the test included in "analyze" output in https://ci.chromium.org/buildbot/tryserver.chromium.win/win7_chromium_rel_ng/89655 but just wanted to confirm it wasn't disabled in only some specific configs. +bruce fyi. No packaging problem after all, but cdb-on-win7 is a thing I could imagine causing problems on a future toolchain update, so just so you're aware of this bug.
,
Jan 26 2018
scottmg: correct, nothing was disabled yet. I attempted to do so in https://chromium-review.googlesource.com/888166 but it got kicked out of the CQ due to the infra outage. I think may still be worth putting the crash functions in a separate file. Thanks very much for helping even though this isn't your job any more.
,
Jan 26 2018
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/b91f674a20ce246aa646bb7ecabd217d113c5611 commit b91f674a20ce246aa646bb7ecabd217d113c5611 Author: Scott Graham <scottmg@chromium.org> Date: Fri Jan 26 23:53:13 2018 Add missing cdb dll, causing failures on win7 crash integration tests (Link to working run on Win7 bot: https://chromium-swarm.appspot.com/task?id=3b4d2229fcb0aa10&refresh=10&show_raw=1) Bug: 805409 , 804452 Change-Id: Ieea7271856276162483e9a4240452701a1e062e7 Reviewed-on: https://chromium-review.googlesource.com/889701 Commit-Queue: Scott Graham <scottmg@chromium.org> Reviewed-by: Kenneth Russell <kbr@chromium.org> Cr-Commit-Position: refs/heads/master@{#532094} [modify] https://crrev.com/b91f674a20ce246aa646bb7ecabd217d113c5611/build/win/BUILD.gn [modify] https://crrev.com/b91f674a20ce246aa646bb7ecabd217d113c5611/build/win/copy_cdb_to_output.py
,
Jan 27 2018
|
|||||||||||||||
►
Sign in to add a comment |
|||||||||||||||
Comment 1 by rhalavati@chromium.org
, Jan 24 2018