New issue
Advanced search Search tips

Issue 767397 link

Starred by 6 users

Issue metadata

Status: Assigned
Owner:
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: Mac
Pri: 1
Type: Bug

Blocked on:
issue 825215



Sign in to add a comment

many mac10.12 browser tests failing in `bootstrap_check_in`

Project Member Reported by ellyjo...@chromium.org, Sep 21 2017

Issue description

See https://chromium-swarm.appspot.com/task?id=38bcebb531cc5f10&refresh=10&show_raw=1&wide_logs=true:

[ RUN      ] NetworkingPrivateApiTest.GetVisibleNetworks
[0921/040719.186722:ERROR:mach_extensions.cc(68)] bootstrap_check_in org.chromium.crashpad.child_port_handshake.58700.2141078.LXSVGGPFZLMWBIRD: unknown error code (124)
[0921/040719.187049:FATAL:child_port_handshake.cc(111)] Check failed: server_port.is_valid(). 
0   crashpad_handler                    0x00000001078148ac base::debug::StackTrace::StackTrace(unsigned long) + 28
1   crashpad_handler                    0x000000010781aa10 logging::LogMessage::~LogMessage() + 224
2   crashpad_handler                    0x000000010785a16d crashpad::ChildPortHandshake::RunServerForFD(base::ScopedGeneric<int, base::internal::ScopedFDCloseTraits>, crashpad::ChildPortHandshake::PortRightType) + 413
3   crashpad_handler                    0x000000010781058c crashpad::HandlerMain(int, char**, std::__1::vector<std::__1::unique_ptr<crashpad::UserStreamDataSource, std::__1::default_delete<crashpad::UserStreamDataSource> >, std::__1::allocator<std::__1::unique_ptr<crashpad::UserStreamDataSource, std::__1::default_delete<crashpad::UserStreamDataSource> > > > const*) + 4332
4   libdyld.dylib                       0x00007fffcb167235 start + 1
5   ???                                 0x0000000000000008 0x0 + 8

[0921/040719.187800:ERROR:file_io.cc(89)] ReadExactly: expected 4, observed 0
[0921/040719.188690:ERROR:mach_extensions.cc(68)] bootstrap_check_in org.chromium.crashpad.child_port_handshake.58698.2141075.NJBRWZXCFLQMRMPF: unknown error code (124)
[0921/040719.188773:FATAL:child_port_handshake.cc(111)] Check failed: server_port.is_valid(). 
0   crashpad_handler                    0x000000010c8a88ac base::debug::StackTrace::StackTrace(unsigned long) + 28
1   crashpad_handler                    0x000000010c8aea10 logging::LogMessage::~LogMessage() + 224
2   crashpad_handler                    0x000000010c8ee16d crashpad::ChildPortHandshake::RunServerForFD(base::ScopedGeneric<int, base::internal::ScopedFDCloseTraits>, crashpad::ChildPortHandshake::PortRightType) + 413
3   crashpad_handler                    0x000000010c8a458c crashpad::HandlerMain(int, char**, std::__1::vector<std::__1::unique_ptr<crashpad::UserStreamDataSource, std::__1::default_delete<crashpad::UserStreamDataSource> >, std::__1::allocator<std::__1::unique_ptr<crashpad::UserStreamDataSource, std::__1::default_delete<crashpad::UserStreamDataSource> > > > const*) + 4332
4   libdyld.dylib                       0x00007fffcb167235 start + 1
5   ???                                 0x0000000000000009 0x0 + 9

[58671:775:0921/040719.189431:67286091852843:ERROR:file_io.cc(89)] ReadExactly: expected 4, observed 0
Received signal 11 SEGV_MAPERR 000000000008
 [0x00010e4c26cc]
 [0x00010e4c25c1]
 [0x7fffcb376b3a]
 [0x000000000003]
 [0x00010e57a144]
 [0x00010e5071a0]
 [0x00010d37c40e]
 [0x00010d381725]
 [0x00010d37b654]
 [0x00010e4a61ca]
 [0x000110133b1b]
 [0x00010e4a56c4]
 [0x00010eacf7b2]
 [0x00010e587f15]
 [0x00010cb4722a]
 [0x00010cd05cb1]
 [0x00010cd067e0]
 [0x00010cd06d47]
 [0x00010cd0cff7]
 [0x00010cd0cc63]
 [0x00010e5a0147]
 [0x00010e4b3b75]
 [0x00010eb12146]
 [0x00010e4b3f97]
 [0x00010e4b3afc]
 [0x7fffcb167235]
 [0x000000000009]

There's no obvious cause CL. Mark, can you take a peek please? :)
 

Comment 1 by mark@chromium.org, Sep 21 2017

I think that this bot just got sick, either in the test environment or for the login session/boot, and that it cleared up when it moved on to other work or was rebooted.

124 is indeed an unknown error code. It seems like that’s the handiest thing to go on here, but it’s not telling me anything useful. It’s probably coming from launchd (the bootstrap server), but we don’t have any insight into that anymore since it’s closed-source. It’s not coming from the mig stub side of bootstrap_check_in() or anything in the kernel, as far as I can tell.

Anyway, I don’t think that this has happened again. Are you seeing this as a persistent problem?
Status: WontFix (was: Assigned)
No - we've only seen it once. I'll WontFix this bug, and if it reappears I'll reopen it.
Status: Available (was: WontFix)
https://logs.chromium.org/v/?s=chromium%2Fbb%2Fchromium.mac%2FMac10.11_Tests%2F24820%2F%2B%2Frecipes%2Fsteps%2Fbrowser_tests_on_Mac-10.11%2F0%2Fstdout has as bunch of similar-looking errors:


[ RUN      ] JavscriptApiTest.JavasScriptEncodedURL
[0326/155742.122684:ERROR:mach_extensions.cc(68)] bootstrap_check_in org.chromium.crashpad.child_port_handshake.13989.509792.ASETEXTGXUTJAOFT: unknown error code (141)
[0326/155742.122802:FATAL:child_port_handshake.cc(111)] Check failed: server_port.is_valid().
0   crashpad_handler                    0x000000010a01031c base::debug::StackTrace::StackTrace(unsigned long) + 28
1   crashpad_handler                    0x000000010a01545f logging::LogMessage::~LogMessage() + 223
2   crashpad_handler                    0x000000010a0302dd crashpad::ChildPortHandshake::RunServerForFD(base::ScopedGeneric<int, base::internal::ScopedFDCloseTraits>, crashpad::ChildPortHandshake::PortRightType) + 413
3   crashpad_handler                    0x000000010a002086 crashpad::HandlerMain(int, char**, std::__1::vector<std::__1::unique_ptr<crashpad::UserStreamDataSource, std::__1::default_delete<crashpad::UserStreamDataSource> >, std::__1::allocator<std::__1::unique_ptr<crashpad::UserStreamDataSource, std::__1::default_delete<crashpad::UserStreamDataSource> > > > const*) + 4790
4   libdyld.dylib                       0x00007fff97a0f5ad start + 1
[0326/155742.123312:ERROR:file_io.cc(89)] ReadExactly: expected 4, observed 0
[0326/155742.123783:ERROR:mach_extensions.cc(68)] bootstrap_check_in org.chromium.crashpad.child_port_handshake.13987.509790.ZRDPWDQQCDUVWVHU: unknown error code (141)
[0326/155742.123853:FATAL:child_port_handshake.cc(111)] Check failed: server_port.is_valid().
0   crashpad_handler                    0x000000010562b31c base::debug::StackTrace::StackTrace(unsigned long) + 28
1   crashpad_handler                    0x000000010563045f logging::LogMessage::~LogMessage() + 223
2   crashpad_handler                    0x000000010564b2dd crashpad::ChildPortHandshake::RunServerForFD(base::ScopedGeneric<int, base::internal::ScopedFDCloseTraits>, crashpad::ChildPortHandshake::PortRightType) + 413
3   crashpad_handler                    0x000000010561d086 crashpad::HandlerMain(int, char**, std::__1::vector<std::__1::unique_ptr<crashpad::UserStreamDataSource, std::__1::default_delete<crashpad::UserStreamDataSource> >, std::__1::allocator<std::__1::unique_ptr<crashpad::UserStreamDataSource, std::__1::default_delete<crashpad::UserStreamDataSource> > > > const*) + 4790
4   libdyld.dylib                       0x00007fff97a0f5ad start + 1
[13985:775:0326/155742.125375:1927348692830:ERROR:file_io.cc(89)] ReadExactly: expected 4, observed 0
Received signal 11 SEGV_MAPERR 000000000008



Cc: jyasskin@chromium.org

Comment 5 by mark@chromium.org, Mar 28 2018

Cc: rsesek@chromium.org
The problem starts when bootstrap_check_in() fails. The original report has

[0921/040719.186722:ERROR:mach_extensions.cc(68)] bootstrap_check_in org.chromium.crashpad.child_port_handshake.58700.2141078.LXSVGGPFZLMWBIRD: unknown error code (124)

and comment 3 has

[0326/155742.122684:ERROR:mach_extensions.cc(68)] bootstrap_check_in org.chromium.crashpad.child_port_handshake.13989.509792.ASETEXTGXUTJAOFT: unknown error code (141)

But I don’t know what 124 or 141 are either. They’re not bootstrap return codes (which would be in the 1100 range), they’re low-ish like Mach kern_return_t values (the range 0-255 is reserved for these) but don’t map to any defined code (defined codes only go up to 52). And it’s weird that we’re seeing this bug with (at least) two different unknown error codes, but that may be a 10.11–10.12 difference.

Robert, do those numbers, 124 and 141, make any sense in XPC context?

Comment 6 by rsesek@chromium.org, Mar 28 2018

Yeah, they might. It's hard because error codes have become so overloaded (as you note, kern_return_t, errno, etc), but my guess is that they correspond to these:

rsesek@hotwire:/Users/rsesek % launchctl error 124
124: Domain is tearing down
rsesek@hotwire:/Users/rsesek % launchctl error 141
141: Reentrancy avoided

This may be a red herring, but I think there may be something odd with the bootstrap context on another bot, per https://bugs.chromium.org/p/chromium/issues/detail?id=817663#c11. Though we're not seeing those specific errors.

Comment 7 by mark@chromium.org, Mar 28 2018

It does seem like there’s some real badness going on with the bootstrap server. Comment 3 is talking about https://ci.chromium.org/buildbot/chromium.mac/Mac10.11%20Tests/, which is always scheduled on https://build.chromium.org/deprecated/chromium.mac/buildslaves/build179-m1. Can we get that bot rebooted?

Comment 8 by mark@chromium.org, Mar 28 2018

Components: Infra
Labels: Infra-Troopers
+Infra-Troopers for a trooper to try rebooting build179-m1.

Comment 9 by pschm...@google.com, Mar 28 2018

build179-m1 dispatches the tests to swarming and it's some of the swarming slaves that are dying.

I wonder if what you are seeing is related to crbug.com/825215?

Comment 10 by mark@chromium.org, Mar 28 2018

Blockedon: 825215
Yes, that adds up. If the test process is descended from something that started in a UI session, and the UI session disappears (such as a WindowServer crash), then the bootstrap context should be lost and I’d expect all bootstrap_*() calls to fail.

I wouldn’t normally expect suspicious error codes, but now that there’s just one launchd, the bootstrap port won’t really die. bootstrap_check_in() will be able to communicate with it, and launchd may be responding with some weirdo number because the bootstrap context associated with the port is gone.

This is compelling enough that I’m marking bug 825215 as blocking.

Comment 11 by mark@chromium.org, Mar 30 2018

Issue 826159 has been merged into this issue.

Comment 12 by jam@chromium.org, Apr 2 2018

Labels: -Pri-1 Pri-0
Bringing up the priority as Mac10.11 Tests bot has been down for > 10 days.

Are there any changes we can revert in the meantime?
Also see  issue 828031 .

Comment 14 by mark@chromium.org, Apr 2 2018

Labels: -Pri-0 Pri-1
No, This bug is a symptom, not a cause. It’s also not the only symptom.

Please follow up on bug 825215, which we believe to underlie all of these problems.
Labels: Sheriff-Chromium
I just started today's sheriff and still not sure how it's going.
Let me share the status: Mac 10.11 Tests is continuously failing from #25114 to #25132.

https://uberchromegw.corp.google.com/i/chromium.mac/builders/Mac10.11%20Tests?numbuilds=200

Labels: -Sheriff-Chromium
Status: Assigned (was: Available)
Still seeing these failures on Mac 10.11. 
Owner: ellyjo...@chromium.org
This bug was originally filed against 10.12 and was WontFixed, but was reopened as 10.11 failures in comment #3. Can we either update the summary or (preferably) just WontFix this again and deal with the 10.11 problem in bug 825215?

reassigning from mark@, since his description says "on leave".
Cc: guidou@chromium.org mark@chromium.org
 Issue 840809  has been merged into this issue.
Cc: rogerm@chromium.org
 Issue 852465  has been merged into this issue.
Project Member

Comment 21 by bugdroid1@chromium.org, Jun 14 2018

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/7220ee517c622e836d57f36fec3a6af4a32bbd6c

commit 7220ee517c622e836d57f36fec3a6af4a32bbd6c
Author: John Budorick <jbudorick@chromium.org>
Date: Thu Jun 14 03:35:25 2018

Mac10.13 Tests: drop browser_tests to experimental and disable window server suspects.

TBR=ellyjones@chromium.org

Bug: 767397
Change-Id: I88764b5dad273ec9dbea975cc6e97ce799ce6fe1
Reviewed-on: https://chromium-review.googlesource.com/1100296
Reviewed-by: John Budorick <jbudorick@chromium.org>
Commit-Queue: John Budorick <jbudorick@chromium.org>
Cr-Commit-Position: refs/heads/master@{#567119}
[modify] https://crrev.com/7220ee517c622e836d57f36fec3a6af4a32bbd6c/testing/buildbot/chromium.mac.json
[modify] https://crrev.com/7220ee517c622e836d57f36fec3a6af4a32bbd6c/testing/buildbot/test_suite_exceptions.pyl

Project Member

Comment 22 by bugdroid1@chromium.org, Jun 15 2018

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/deda551f496ebb23f44d75fb241fa22ef41088f3

commit deda551f496ebb23f44d75fb241fa22ef41088f3
Author: John Budorick <jbudorick@chromium.org>
Date: Fri Jun 15 19:28:09 2018

Implement --gtest_shuffle in //base/test/launcher.

We were previously passing --gtest_shuffle down to gtest. However,
in some cases (e.g. browser_tests, or other suites using
content::TestLauncherDelegate), gtest wasn't seeing the flag until
it was already in a test-specific worker process -- i.e., until
it only had one test to shuffle.

This moves the shuffling into the test launcher, before we hand
off to the delegate, ensuring that we shuffle across the entire shard.
The implementation is largely based on gtest's: reproducible given a
seed, w/ a seed range of [0, 100000).

This also uses --gtest_shuffle on browser_tests on Mac in an attempt
to collect more data on what's causing VM WindowServer deaths there.

Bug: 767397
Change-Id: I8aced17cdc5a574e3dc127143c0d0a553f62bf62
Reviewed-on: https://chromium-review.googlesource.com/1100966
Reviewed-by: Nico Weber <thakis@chromium.org>
Reviewed-by: Elly Fong-Jones <ellyjones@chromium.org>
Commit-Queue: John Budorick <jbudorick@chromium.org>
Cr-Commit-Position: refs/heads/master@{#567763}
[modify] https://crrev.com/deda551f496ebb23f44d75fb241fa22ef41088f3/base/test/launcher/test_launcher.cc
[modify] https://crrev.com/deda551f496ebb23f44d75fb241fa22ef41088f3/base/test/launcher/test_launcher.h
[modify] https://crrev.com/deda551f496ebb23f44d75fb241fa22ef41088f3/testing/buildbot/chromium.mac.json
[modify] https://crrev.com/deda551f496ebb23f44d75fb241fa22ef41088f3/testing/buildbot/test_suite_exceptions.pyl

Still seeing these errors in browser_tests on Mac-10.12, e.g.:
https://logs.chromium.org/logs/chromium/buildbucket/cr-buildbucket.appspot.com/8935144597685416224/+/steps/browser_tests_on__none__GPU_on_Mac_on_Mac-10.12.6/0/stdout

 Issue 862466  also mentions crashes on Mac-10.13 at the end of August.

Sign in to add a comment