Trybot "Too many open files" and parital logs |
||||||
Issue descriptionI have ran a test against the trybot "linux_chromium_chromeos_ozone_rel_ng" for which I am expecting a failure. However the logs for the failing test show an unexpected error: [0329/154146.881445:FATAL:platform_channel_pair_posix.cc(68)] Check failed: socketpair(AF_UNIX, SOCK_STREAM, 0, fds) == 0. : Too many open files The logs do not contain the rest of the information about failing tests. Even thought the summary says "643 disabled crashed or hung" I'm not sure how to determine the cause of this particular issue, or why the rest of the log is missing. Sample run (see mash_browser_tests): https://build.chromium.org/p/tryserver.chromium.linux/builders/linux_chromium_chromeos_ozone_rel_ng/builds/351068 Sample output: https://luci-logdog.appspot.com/v/?s=chromium%2Fbb%2Ftryserver.chromium.linux%2Flinux_chromium_chromeos_ozone_rel_ng%2F351068%2F%2B%2Frecipes%2Fsteps%2Fmash_browser_tests__with_patch_%2F0%2Fstdout
,
Apr 10 2017
This is blocking the ability to get mash_browser_tests properly running on the bots. The loss of the logs prevents us from finding out what is timingout/failing in the test suite. Are there ways which we could remote into the trybots after failure to try to find the logs?
,
Apr 10 2017
This should be a regular infra bug, not a trooper bug. This is not a production emergency. It should have been triaged sooner than it was though. It should be possible to get access to the trybots to see logs, if you can't find enough information on buildbot.
,
Apr 10 2017
dpranke, do you know who owns the browser_tests runner now? Seems like an error in the runner.
,
Apr 10 2017
It's not obvious to me that there's an error in the runner. I'm not sure what's going on. It looks like mash_browser_tests is running at least some of the time, e.g.: https://luci-milo.appspot.com/buildbot/tryserver.chromium.linux/linux_chromium_chromeos_ozone_rel_ng/359045 and https://build.chromium.org/p/chromium.chromiumos/builders/Linux%20ChromiumOS%20Ozone%20Tests%20%281%29/builds/45494 so maybe this bug is no longer an issue? From the build in question, it looks like mash_browser_tests more-or-less failed on startup: https://luci-milo.appspot.com/buildbot/tryserver.chromium.linux/linux_chromium_chromeos_ozone_rel_ng/351068 https://luci-logdog.appspot.com/v/?s=chromium%2Fbb%2Ftryserver.chromium.linux%2Flinux_chromium_chromeos_ozone_rel_ng%2F351068%2F%2B%2Frecipes%2Fsteps%2Fmash_browser_tests__with_patch_%2F0%2Fstdout I don't know where jonross@ saw the "643 disabled crashed or hung" message. Reassigning back to jonross@. Is this still an issue? If so, I think you may need to figure out what's going wrong with the test itself, since I don't see any infra problems at the moment. Let me know if you need help debugging things.
,
Apr 11 2017
So the test suite currently runs a small subset of all the normal browser_tests. I've been working on finding out which tests work/fail so that we can expand the filter. I was at first trying this with the trybot in this review: https://codereview.chromium.org/2786583003/ That is how I encountered the error. Locally when I run the suite it was eventually failing, with a large number of tests failing/timing out. This correlated with the error shown in #1. Those as you mention the reported error in stdout seems to be a startup error (8s runtime) not encountered locally. I saw the number of failures listed in the results. For example the test run linked in #5 shows: "179. mash_browser_tests (with patch) ( 8 secs ) mash_browser_tests (with patch) 643 disabled crashed or hung Run on OS: 'Ubuntu-14.04'" However on more recent runs in my review, where the majority of failures/timeouts have been blacklisted, the suite runs to completion. While this is no longer blocking the landing of expanded test coverage, I would appreciate any guidance for debugging the trybot in the future.
,
Apr 28 2017
|
||||||
►
Sign in to add a comment |
||||||
Comment 1 by jonr...@chromium.org
, Mar 30 2017