New issue
Advanced search Search tips

Issue 801353 link

Starred by 1 user

Issue metadata

Status: Fixed
Owner:
Closed: Feb 2018
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: Fuchsia
Pri: 2
Type: Bug

Blocking:
issue 738275



Sign in to add a comment

PersistentMemoryAllocatorTest.ParallelismTest flaky on Fuchsia/x64 FYI

Project Member Reported by w...@chromium.org, Jan 11 2018

Issue description

PersistentMemoryAllocatorTest.ParallelismTest regularly takes Too Damn Long to run:

[00092.798] 03895.03941> [ RUN      ] PersistentMemoryAllocatorTest.ParallelismTest
[00092.798] 03895.03941> [       OK ] PersistentMemoryAllocatorTest.ParallelismTest (48128 ms)
[00092.799] 03895.03941> [----------] 3 tests from PersistentMemoryAllocatorTest (48223 ms total)
[00092.799] 03895.03941>
[00092.801] 03895.03941> [----------] Global test environment tear-down
[00092.801] 03895.03941> [==========] 10 tests from 3 test cases ran. (50946 ms total)
[00092.801] 03895.03941> [  PASSED  ] 10 tests.
[00092.801] 03895.03941> [1620/2676] PersistentMemoryAllocatorTest.ParallelismTest (TIMED OUT)


 

Comment 1 by w...@chromium.org, Jan 11 2018

Locally with QEMU+KVM this test takes 305ms to execute.

On the bots, as part of a full base_unittests run, it takes ~38s(!); the flakes look to be variation causing the overall test run to exceed its allotted time.

Comment 2 by w...@chromium.org, Jan 13 2018

Cc: asvitk...@chromium.org
asvitkine: Running this test locally (under Fuchsia) I notice that it makes use of base::RandInt(), which doesn't seem advisable for a unit-test - perhaps we should have it pick (and print out) a RandInt()-based seed, and then use rand() to choose values within the test?
Cc: bcwh...@chromium.org
Setting a seed (and printing it) and then using a seeded random number generated with that seed SGTM. 

Wez, are you planning to make that change or do you want us to look at it?

+bcwhite
As long as the seed itself is random, sure.  There is little benefit to the test if it's always the same sequence.

Why would the bot take 100x longer to run the test?

Comment 5 by w...@chromium.org, Jan 16 2018

Labels: -Pri-2 -M-65 M-66 Pri-1
Re #3: Yes, I have a draft CL in-progress for that, which I'll share later today. In the meantime, we'll filter out this test on Fuchsia (we can easily repro the issue with manual swarming runs ;)

Re #4: There are two main possibilities I'm investigating right now: (1) reading data from CPRNG (which is what RandInt() does under-the-hood on Fuchsia right now) may be especially expensive under VMs or (2) the spin loops in the allocator may be triggering pathological scheduling behaviour under Fuchsia on VMs.
Project Member

Comment 6 by bugdroid1@chromium.org, Jan 17 2018

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/5584757b9aefca3e4836ae15ca7b3ae7b73aeb5f

commit 5584757b9aefca3e4836ae15ca7b3ae7b73aeb5f
Author: Wez <wez@chromium.org>
Date: Wed Jan 17 00:52:04 2018

Filter PersistentMemoryAllocatorTest.ParallelismTest under Fuchsia.

This test takes 3x the time to run on Fuchsia as on Linux, but takes
50x-100x longer to run on the Fuchsia bots, so flakily times-out.

Filtering it out until we can address the underlying issue.

Bug:  801353 
Change-Id: I8aee541a618ca3eccaf90e740d09292f207ef8b5
Reviewed-on: https://chromium-review.googlesource.com/865438
Commit-Queue: Wez <wez@chromium.org>
Reviewed-by: Scott Graham <scottmg@chromium.org>
Cr-Commit-Position: refs/heads/master@{#529533}
[modify] https://crrev.com/5584757b9aefca3e4836ae15ca7b3ae7b73aeb5f/testing/buildbot/filters/fuchsia.base_unittests.filter

Comment 7 by w...@chromium.org, Jan 31 2018

Running this test locally, under QEMU/KVM, I observe that:
1-core VM runs it in ~44ms.
2-core VM runs it in ~305ms
4-core VM runs it in ~304ms.
so inter-processor synchronization emulation is incurring a ~7x increase in time to complete.

Comment 8 by w...@chromium.org, Jan 31 2018

Filed issue ZX-1636 to track zx_cprng_draw() multi-core [[nested] virtualization] performance issue.

Comment 9 by w...@chromium.org, Feb 6 2018

Blocking: 738275

Comment 10 by w...@chromium.org, Feb 7 2018

Labels: -Pri-1 -M-66 M-67 Pri-2
Dropping priority and bumping milestone, since the test is filtered, and we have a well-understood root-cause for this pathological case.

Comment 11 by w...@chromium.org, Feb 7 2018

Status: Assigned (was: Started)

Comment 12 by w...@chromium.org, Feb 15 2018

Status: Fixed (was: Assigned)

Comment 13 by w...@chromium.org, Feb 20 2018

Status: Assigned (was: Fixed)
Re-opening and assigning-to-self to review whether improvements to the platform CPRNG call address this issue.

Comment 14 by w...@chromium.org, Feb 25 2018

Status: Started (was: Assigned)
Confirmed that Zircon kernel changes resolves the issue; working on SDK roll to pull the improved version to Chromium.
Project Member

Comment 15 by bugdroid1@chromium.org, Feb 27 2018

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/193277bcdc3b5c0527eec7f5e36d90588bdddc3d

commit 193277bcdc3b5c0527eec7f5e36d90588bdddc3d
Author: Wez <wez@chromium.org>
Date: Tue Feb 27 03:44:41 2018

Roll Fuchsia SDK to 739d1fd10565f97bacae8215903403b2677d8f48.

- Removes sys/resource.h, requiring some additional conditionals around
  includes of that header under OS_POSIX.
- Improves the system PRNG implementation to reduce the potential for
  lock contention.

We also clean up some test filter entries that either no longer refer
to an existing test, or are no longer required.

Bug: 707030,  801353 ,  809660 , 738275
Change-Id: Ic1f49625f20f2efc6c2509cf0f1fa8265d4e9f7f
Reviewed-on: https://chromium-review.googlesource.com/932822
Reviewed-by: Daniel Cheng <dcheng@chromium.org>
Reviewed-by: Eric Roman <eroman@chromium.org>
Reviewed-by: Scott Graham <scottmg@chromium.org>
Commit-Queue: Wez <wez@chromium.org>
Cr-Commit-Position: refs/heads/master@{#539393}
[modify] https://crrev.com/193277bcdc3b5c0527eec7f5e36d90588bdddc3d/base/allocator/partition_allocator/partition_alloc_unittest.cc
[modify] https://crrev.com/193277bcdc3b5c0527eec7f5e36d90588bdddc3d/base/process/process_metrics_posix.cc
[modify] https://crrev.com/193277bcdc3b5c0527eec7f5e36d90588bdddc3d/base/process/process_util_unittest.cc
[modify] https://crrev.com/193277bcdc3b5c0527eec7f5e36d90588bdddc3d/base/sys_info_posix.cc
[modify] https://crrev.com/193277bcdc3b5c0527eec7f5e36d90588bdddc3d/base/threading/platform_thread_posix.cc
[modify] https://crrev.com/193277bcdc3b5c0527eec7f5e36d90588bdddc3d/build/fuchsia/update_sdk.py
[modify] https://crrev.com/193277bcdc3b5c0527eec7f5e36d90588bdddc3d/net/disk_cache/simple/simple_backend_impl.cc
[modify] https://crrev.com/193277bcdc3b5c0527eec7f5e36d90588bdddc3d/testing/buildbot/filters/fuchsia.base_unittests.filter

Comment 16 by w...@chromium.org, Feb 27 2018

Status: Fixed (was: Started)
Project Member

Comment 17 by bugdroid1@chromium.org, Mar 1 2018

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/26f4cdcf5fea1eb642e3cabd35c1bd1aaf502303

commit 26f4cdcf5fea1eb642e3cabd35c1bd1aaf502303
Author: Wez <wez@chromium.org>
Date: Thu Mar 01 07:39:27 2018

Roll Fuchsia SDK to 9d4016533477903c796470e7ab46c2e1dad31761.

- Removes sys/resource.h, requiring some additional conditionals around
  includes of that header under OS_POSIX.
- Improves the system PRNG implementation to reduce the potential for
  lock contention.

TBR: scottmg
Bug: 707030,  801353 ,  809660 , 738275,  817241 ,  817586 
Change-Id: I4e01a0b23ad66c060ac7a6776a45329775117886
Reviewed-on: https://chromium-review.googlesource.com/942685
Reviewed-by: Wez <wez@chromium.org>
Commit-Queue: Wez <wez@chromium.org>
Cr-Commit-Position: refs/heads/master@{#540093}
[modify] https://crrev.com/26f4cdcf5fea1eb642e3cabd35c1bd1aaf502303/build/fuchsia/update_sdk.py

Sign in to add a comment