Issue metadata
Sign in to add a comment
|
mac*_blink_rel bots are missing osmesa on first run after midnight PST |
||||||||||||||||||||||
Issue descriptionFor example: - https://luci-milo.appspot.com/buildbot/tryserver.blink/mac10.10_blink_rel/3252 - https://luci-milo.appspot.com/buildbot/tryserver.blink/mac10.12_blink_rel/1122 - https://luci-milo.appspot.com/buildbot/tryserver.blink/mac10.11_blink_rel/3351 In all cases, the tests are invalid because the runner's crashing: crash log for gpu (pid <unknown>): STDOUT: <empty> STDERR: [48456:34819:0731/003512.746204:50198333299512:ERROR:devtools_http_handler.cc(786)] STDERR: DevTools listening on 127.0.0.1:50758 STDERR: STDERR: [48459:775:0731/003512.823315:50198410411268:ERROR:gl_initializer_mac.cc(90)] osmesa.so not found at /b/c/b/mac_layout/src/out/Release/osmesa.so STDERR: [48459:775:0731/003512.834340:50198421428528:ERROR:gl_initializer_mac.cc(90)] osmesa.so not found at /b/c/b/mac_layout/src/out/Release/osmesa.so STDERR: [48459:775:0731/003512.838383:50198425470971:ERROR:gpu_child_thread.cc(253)] Exiting GPU process due to errors during initialization STDERR: [48456:33027:0731/003512.847939:50198435031013:ERROR:browser_gpu_channel_host_factory.cc(103)] Failed to launch GPU process. This doesn't seem to happen all the time (some WPT import jobs pass all the bots).
,
Aug 1 2017
The logs for that set of failures look pretty similar (e.g. https://storage.googleapis.com/chromium-layout-test-archives/mac10_9_blink_rel/3315/layout-test-results/test-expectations.html). Interestingly, the crashing tests seem to be http tests, and some other tests failed in other ways but didn't crash.
,
Aug 1 2017
Multiple people have been affected by this, and besides blocking the wpt importer, this also blocks people from using webkit-patch rebaseline-cl to rebaseline tests. Will look at this tomorrow.
,
Aug 2 2017
Bug 751421 is probably relevant, although in the examples above, I didn't see the analyze step failure.
,
Aug 2 2017
Bug 751421 was likely caused by a commit that landed yesterday/today, whereas the problems reported here have been happening for at least a few days.
,
Aug 2 2017
Good point; now that one is fixed. Next, we can find some more recent examples to confirm whether this is still happening and look through the logs more.
,
Aug 2 2017
Latest case I saw when checking just now was: https://build.chromium.org/p/tryserver.blink/builders/mac10.12_blink_rel/builds/1157, from about 16 hours ago.
,
Aug 4 2017
Haven't seen this again, probably was a transient issue.
,
Aug 8 2017
It happened again today: https://chromium-review.googlesource.com/c/604878
,
Aug 11 2017
,
Aug 11 2017
Links to the set of failed jobs for that CL: https://build.chromium.org/p/tryserver.blink/builders/mac10.9_blink_rel/builds/3464 https://build.chromium.org/p/tryserver.blink/builders/mac10.10_blink_rel/builds/3423 https://build.chromium.org/p/tryserver.blink/builders/mac10.11_blink_rel/builds/3546 https://build.chromium.org/p/tryserver.blink/builders/mac10.11_retina_blink_rel/builds/3476 https://build.chromium.org/p/tryserver.blink/builders/mac10.12_blink_rel/builds/1272 Quick notes about the crash message: - The error message suggests that osmesa.so is not found in the build directory - I haven't found anything about osmesa in the compile step about this yet though (although perhaps it should be there?) - osmesa stands for "Off-screen Mesa" - osmesa seems to be listed as a dependency of the target webkit_layout_tests, which is a dependency of blink_tests in https://cs.chromium.org/chromium/src/BUILD.gn?l=891 - The code where the message is printed is https://cs.chromium.org/chromium/src/ui/gl/init/gl_initializer_mac.cc?l=71
,
Aug 11 2017
This sort of reminds me of bug 739282 since I've only seen this happen once a day, though the symptoms are quite different from that one.
,
Aug 11 2017
Actually, that's a great point, since the timing of the failed jobs appears to be just after midnight California time. e.g.: https://build.chromium.org/p/tryserver.blink/builders/mac10.10_blink_rel/builds/3380 Tue Aug 8 00:11:46 2017 https://build.chromium.org/p/tryserver.blink/builders/mac10.10_blink_rel/builds/3423 Fri Aug 11 00:11:46 2017 Other notes: - On the waterfall, WebKit Mac Builder compiles and includes osmesa.so in the build package (listed in the "package build" step). - Then the testers on the waterfall unpack that in the "extract build" step, so osmesa.so is present. - The try bots are different since they "compile (with patch)" for each job. I just looked at a couple successful try jobs, and they do have several lines in the compile step related to osmesa, including: [4386/13966] CXX obj/ui/gl/gl/gl_bindings_autogen_osmesa.o [4420/13966] CXX obj/ui/gl/gl/gl_context_osmesa.o [4437/13966] CXX obj/ui/gl/gl/gl_surface_osmesa.o In every non-crashy run I've looked at, these lines are present; but in every crashy run these lines are not present.
,
Aug 11 2017
Actually, the CXX obj/ui/gl/gl/gl_context_osmesa.o lines occured in the compile step for https://build.chromium.org/p/tryserver.blink/builders/mac10.10_blink_rel/builds/3380, but interestingly int build 3381 (the next build), we also get a line that says: [5400/21662] SOLINK_MODULE osmesa.so Also, it's worth noting that for cleanup_disk (which was identified as being related to issue 739282 ), it says: {Path: `/b/c/b/*/src/out/Release*/*`, MaxAge: twoDays}, https://chrome-internal.googlesource.com/infra/infra_internal/+/master/go/src/infra_internal/tools/cleanup_disk/cmd/cleanup_disk/main.go#32 So, if this is happening after cleanup_disk is run, then we expect that this should happen again at: Sun Aug 13 just after midnight Tues Aug 15 just after midnight
,
Aug 11 2017
Not sure if this is related as I'm not familiar with gn, but when I was poking around during the rotation, I found the following difference in the JSON output of the "analyze" (mb analyze) step: * The crashed runs "found dependency" with empty compile targets (e.g. https://luci-logdog.appspot.com/v/?s=chromium%2Fbb%2Ftryserver.blink%2Fmac10.12_blink_rel%2F1272%2F%2B%2Frecipes%2Fsteps%2Fanalyze%2F0%2Flogs%2Fjson.output%2F0) * The successful runs "found dependency (all)" with some compile targets (e.g. https://luci-logdog.appspot.com/v/?s=chromium%2Fbb%2Ftryserver.blink%2Fmac10.12_blink_rel%2F1274%2F%2B%2Frecipes%2Fsteps%2Fanalyze%2F0%2Flogs%2Fjson.output%2F0) I think we probably need to compile the :blink_tests target? It depends on :webkit_layout_tests, which depends on osmesa.
,
Aug 11 2017
If that's the case, perhaps osmesa.so is still present when `gn analyze' is run, which leads to it not being rebuilt, but by the time the compile step finishes (or at any point before webkit_tests starts) cleanup_disk has run and removed it.
,
Aug 13 2017
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/4eb6c118161fb1e8890d66ce4d5a0dc00fa81b03 commit 4eb6c118161fb1e8890d66ce4d5a0dc00fa81b03 Author: Raphael Kubo da Costa <raphael.kubo.da.costa@intel.com> Date: Sun Aug 13 17:12:58 2017 Remove wrong expectations from TestExpectations. These were added incorrectly in https://chromium-review.googlesource.com/c/612807 due to crashes in the Mac bots as well as bad results from android_blink_rel (discussed at https://groups.google.com/a/chromium.org/d/msg/ecosystem-infra/QzH1LlvP5ao/lEnKNDdxAAAJ). TBR=foolip@chromium.org,qyearsley@chromium.org Bug: 750594 Change-Id: Iba524d544e4b29d824add743806c47e9d9cc00f4 Reviewed-on: https://chromium-review.googlesource.com/612077 Reviewed-by: Raphael Kubo da Costa (rakuco) <raphael.kubo.da.costa@intel.com> Commit-Queue: Raphael Kubo da Costa (rakuco) <raphael.kubo.da.costa@intel.com> Cr-Commit-Position: refs/heads/master@{#494000} [modify] https://crrev.com/4eb6c118161fb1e8890d66ce4d5a0dc00fa81b03/third_party/WebKit/LayoutTests/TestExpectations
,
Aug 14 2017
I think the same thing's happened to the win7_blink_rel bot: https://luci-milo.appspot.com/buildbot/tryserver.blink/win7_blink_rel/3677 (from https://luci-milo.appspot.com/buildbot/chromium.infra.cron/wpt-importer/467)
,
Aug 14 2017
> I think the same thing's happened to the win7_blink_rel bot By "same thing" I mean "files getting erased by the cleanup cron job". In this specific case, it looks like icudtl.dat was gone.
,
Sep 7 2017
Haven't seen this recently, but it's probably not actually fixed. Marking as available since I'm not currently working on it.
,
Nov 7 2017
Ecosystem infra bug triage Ping: robertma, can you see any imports failing due to this in the last little while? Just wondering if it should still be Pri-2 (fix soon) or Pri-3 (backlog).
,
Nov 7 2017
Downgrading to P3 as I haven't seen it for a while (I'm also not aware of any intentional effort investigating/fixing the root cause).
,
Jun 14 2018
I think this went away long ago. |
|||||||||||||||||||||||
►
Sign in to add a comment |
|||||||||||||||||||||||
Comment 1 by raphael....@intel.com
, Aug 1 2017