New issue
Advanced search Search tips

Issue 706869 link

Starred by 1 user

Issue metadata

Status: WontFix
Owner:
Closed: Apr 2017
Cc:
EstimatedDays: ----
NextAction: ----
OS: ----
Pri: 1
Type: ----

Blocking:
issue 678687



Sign in to add a comment

Trybot "Too many open files" and parital logs

Project Member Reported by jonr...@chromium.org, Mar 30 2017

Issue description

I have ran a test against the trybot "linux_chromium_chromeos_ozone_rel_ng" for which I am expecting a failure.

However the logs for the failing test show an unexpected error:
[0329/154146.881445:FATAL:platform_channel_pair_posix.cc(68)] Check failed: socketpair(AF_UNIX, SOCK_STREAM, 0, fds) == 0. : Too many open files

The logs do not contain the rest of the information about failing tests. Even thought the summary says "643 disabled crashed or hung"

I'm not sure how to determine the cause of this particular issue, or why the rest of the log is missing.

Sample run (see mash_browser_tests): https://build.chromium.org/p/tryserver.chromium.linux/builders/linux_chromium_chromeos_ozone_rel_ng/builds/351068
Sample output: https://luci-logdog.appspot.com/v/?s=chromium%2Fbb%2Ftryserver.chromium.linux%2Flinux_chromium_chromeos_ozone_rel_ng%2F351068%2F%2B%2Frecipes%2Fsteps%2Fmash_browser_tests__with_patch_%2F0%2Fstdout
 
Blocking: 678687
Labels: -Pri-2 Pri-1
This is blocking the ability to get mash_browser_tests properly running on the bots.

The loss of the logs prevents us from finding out what is timingout/failing in the test suite.

Are there ways which we could remote into the trybots after failure to try to find the logs?
Cc: efoo@chromium.org
Owner: estaab@chromium.org
Status: Assigned (was: Untriaged)
This should be a regular infra bug, not a trooper bug. This is not a production emergency.

It should have been triaged sooner than it was though.

It should be possible to get access to the trybots to see logs, if you can't find enough information on buildbot.

Comment 4 by estaab@chromium.org, Apr 10 2017

Owner: dpranke@chromium.org
dpranke, do you know who owns the browser_tests runner now? Seems like an error in the runner.
Components: -Infra
Labels: -Restrict-View-Google -Infra-Troopers
Owner: jonr...@chromium.org
It's not obvious to me that there's an error in the runner. I'm not sure what's going on.

It looks like mash_browser_tests is running at least some of the time, e.g.:

https://luci-milo.appspot.com/buildbot/tryserver.chromium.linux/linux_chromium_chromeos_ozone_rel_ng/359045

and

https://build.chromium.org/p/chromium.chromiumos/builders/Linux%20ChromiumOS%20Ozone%20Tests%20%281%29/builds/45494

so maybe this bug is no longer an issue?

From the build in question, it looks like mash_browser_tests more-or-less failed on startup:

https://luci-milo.appspot.com/buildbot/tryserver.chromium.linux/linux_chromium_chromeos_ozone_rel_ng/351068
https://luci-logdog.appspot.com/v/?s=chromium%2Fbb%2Ftryserver.chromium.linux%2Flinux_chromium_chromeos_ozone_rel_ng%2F351068%2F%2B%2Frecipes%2Fsteps%2Fmash_browser_tests__with_patch_%2F0%2Fstdout

I don't know where jonross@ saw the "643 disabled crashed or hung" message.

Reassigning back to jonross@. Is this still an issue? If so, I think you may need to figure out what's going wrong with the test itself, since I don't see any infra problems at the moment. Let me know if you need help debugging things.
So the test suite currently runs a small subset of all the normal browser_tests. 
I've been working on finding out which tests work/fail so that we can expand the filter.

I was at first trying this with the trybot in this review: https://codereview.chromium.org/2786583003/
That is how I encountered the error.

Locally when I run the suite it was eventually failing, with a large number of tests failing/timing out. This correlated with the error shown in #1. Those as you mention the reported error in stdout seems to be a startup error (8s runtime) not encountered locally.

I saw the number of failures listed in the results. For example the test run linked in #5 shows:
"179. mash_browser_tests (with patch) ( 8 secs ) mash_browser_tests (with patch)
      643 disabled crashed or hung
      Run on OS: 'Ubuntu-14.04'"

However on more recent runs in my review, where the majority of failures/timeouts have been blacklisted, the suite runs to completion.

While this is no longer blocking the landing of expanded test coverage, I would appreciate any guidance for debugging the trybot in the future.


Status: WontFix (was: Assigned)

Sign in to add a comment