New issue
Advanced search Search tips
Note: Color blocks (like or ) mean that a user may not be available. Tooltip shows the reason.

Issue 780096 link

Starred by 1 user

Issue metadata

Status: Fixed
Owner:
Closed: Nov 2017
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: Fuchsia
Pri: 3
Type: Bug



Sign in to add a comment

QEMU ARM64 got much slower after sdk roll to f6dffb2fee82a21900fc3a00261dc5844901ea9e

Project Member Reported by scottmg@chromium.org, Oct 31 2017

Issue description

Started here.

https://ci.chromium.org/buildbot/chromium.fyi/Fuchsia%20ARM64/1004

Running the tests individually does not reproduce the failures, only longer/full runs, sadly.
 

Comment 1 by w...@chromium.org, Oct 31 2017

Does that suggest a system resource leak, across multiple tests?
Yeah, could be some sort of cross-test pollution I guess, yeah.

out/fucha/bin/run_base_unittests --test-launcher-filter-file=../../testing/buildbot/filters/fuchsia.base_unittests.filter --gtest_filter=Process*

out/fucha/bin/run_base_unittests --test-launcher-filter-file=../../testing/buildbot/filters/fuchsia.base_unittests.filter --gtest_filter=P*

sometimes times out, but looks a little different than the bot failure of ProcessUtilTest.LaunchWithHandleTransfer. So there's probably one particular test run before it that must be causing the problem in that one's case.
whew emulation is sooo slooooooow

out/fucha/bin/run_base_unittests --test-launcher-filter-file=../../testing/buildbot/filters/fuchsia.base_unittests.filter --gtest_filter=S*:ProcessUtilTest.* does repro

[00098.557] 02728.02761> [ RUN      ] ProcessUtilTest.LaunchWithHandleTransfer
[00098.557] 02728.02761> ../../base/process/process_util_unittest.cc:779: Failure
[00098.558] 02728.02761>       Expected: (0)
[00098.558] 02728.02761>       Which is: 0
[00098.558] 02728.02761> To be equal to: result
[00098.558] 02728.02761>       Which is: -21
[00098.559] 02728.02761> ../../base/process/process_util_unittest.cc:780: Failure
[00098.559] 02728.02761> Value of: signals & ZX_SOCKET_READABLE
[00098.559] 02728.02761>   Actual: false
[00098.559] 02728.02761> Expected: true
[00098.560] 02728.02761> ../../base/process/process_util_unittest.cc:785: Failure
[00098.560] 02728.02761>       Expected: (0)
[00098.560] 02728.02761>       Which is: 0
[00098.562] 02728.02761> To be equal to: result
[00098.562] 02728.02761>       Which is: -22
[00098.563] 02728.02761> ../../base/process/process_util_unittest.cc:786: Failure
[00098.563] 02728.02761>       Expected: 1u
[00098.564] 02728.02761>       Which is: 1
[00098.564] 02728.02761> To be equal to: bytes_read
[00098.565] 02728.02761>       Which is: 0
[00098.566] 02728.02761> ../../base/process/process_util_unittest.cc:787: Failure
[00098.566] 02728.02761>       Expected: kPipeValue
[00098.566] 02728.02761>       Which is: '\xCC' (204)
[00098.567] 02728.02761> To be equal to: buf[0]
[00098.567] 02728.02761>       Which is: '\0'
[00098.567] 02728.02761> [3:1731338641:1031/175357.837545:89839641:WARNING:test_suite.cc(235)] Test launcher output path /tmp/.org.chromium.Chromium.Knnfln/test_results.xml exists. Not adding test launcher result printer.
[00098.568] 02728.02761> [3:1731338641:1031/175357.892973:89894422:FATAL:process_util_unittest.cc(753)] Check failed: (0) == result (0 vs. -24)
#00: base::debug::StackTrace::StackTrace(unsigned long) at base/debug/stack_trace_fuchsia.cc:173
#01: endl<char, std::__1::char_traits<char> > at buildtools/third_party/libc++/trunk/include/ostream:1001
      (inlined by) operator<< at buildtools/third_party/libc++/trunk/include/ostream:195
      (inlined by) ~LogMessage at base/logging.cc:582
#02: CheckEQImpl<unsigned int, unsigned long> at base/logging.h:757
      (inlined by) ProcessUtilsVerifyHandle at base/process/process_util_unittest.cc:754
#03: Run at base/test/test_suite.cc:265
#04: Run at base/callback.h:92
      (inlined by) LaunchUnitTestsInternal at base/test/launcher/unit_test_launcher.cc:216
#05: LaunchUnitTests at base/test/launcher/unit_test_launcher.cc:475
#06: main at base/test/run_all_base_unittests.cc:12
#07: pc 0xe8d30689f3c4 (libc.so,0x1b3c4)
#08: pc 0xe8d30689f5b0 (libc.so,0x1b5b0)

so I'll look at that one first.

-21 is ZX_TIMED_OUT on the 5s wait. I think that's all it is, at least for this test. :(

Setting -smp 1 seems to make these failing tests pass "reliably" and probably makes more sense for non-KVM anyway as we had before for non-kvm x64.

Trying some tryjobs now to see how that goes.

Comment 5 by w...@chromium.org, Oct 31 2017

The difference between ARM64 and x64 is that QEMU can actually use multiple
host cores to emulate ARM64, thanks to the weaker memory access
guarantees.  There may be other aspects of emulation that are impacted
adversely by multi-core, of course. The other consideration is that more
cores means more parallel test processes, each of which can launch child
processes, which consume CPU and memory, which may also hinder things.

FWIW I think we used to run -smp 2 on x64 under emulation.

Comment 6 by w...@chromium.org, Oct 31 2017

Re #4: Perhaps we're looking at the wrong tests; the bot seemed perfectly happy until the SDK roll, and is now regularly failing base_unittests - if this were just QEMU timeouts then we'd have seen them before?
Yup, I don't understand what's going on. I'm going to go back and try to figure out what changed on the Fuchsia side.
Cc: jam...@chromium.org
Back at pre-roll (sdk 04840186b5fc14dc35b12e77f8028330a4bd8333), I got one crash locally in TaskSchedulerWorkerPoolImplPostTaskBeforeStartTest.PostManyTasks.

[00100.249] 02585.02616> [1500/2591] TaskSchedulerLock.AcquirePredecessorWrongOrder (9 ms)
[00100.738] 01192.01265> <== fatal exception: process base_unittests[31798] thread TaskSchedulerSingleThreadAForeg[32583]
[00100.739] 01192.01265> <== fatal page fault, PC at 0x3c5e9f7c43cc
[00100.739] 01192.01265>  x0      0x4911f13f3298 x1                 0x1 x2                   0 x3                   0
[00100.740] 01192.01265>  x4      0x4911f1445730 x5      0x48d3e816f2c0 x6              0xffff x7             0x10000
[00100.740] 01192.01265>  x8                   0 x9                   0 x10                  0 x11     0x4911f13f8468
[00100.740] 01192.01265>  x12                  0 x13                0x1 x14                  0 x15                0x1
[00100.741] 01192.01265>  x16     0x3c5e9fa131a8 x17     0x836ae07f1a10 x18                  0 x19     0x4911f13f3298
[00100.742] 01192.01265>  x20     0x1c8ad9dfd4c0 x21     0x1b46d7250ac0 x22     0x1b46d7250e10 x23                0x1
[00100.743] 01192.01265>  x24                0x1 x25                  0 x26     0x1c8ad9dfd4c0 x27                0x1
[00100.743] 01192.01265>  x28                  0 x29     0x1b46d7250fc0 lr      0x3c5e9f7c3fe4 sp      0x1b46d7250ac0
[00100.744] 01192.01265>  pc      0x3c5e9f7c43cc psr         0x80000000
[00100.745] 01192.01265>  far                  0
[00100.745] 01192.01265> bottom of user stack:
[00100.752] 01192.01265> 0x00001b46d7250ac0: 7e33536d 95fe1daa 9f7c1ac4 00003c5e |mS3~......|.^<..|
[00100.752] 01192.01265> 0x00001b46d7250ad0: 00000000 00000000 9f702b88 00003c5e |.........+p.^<..|
[00100.752] 01192.01265> 0x00001b46d7250ae0: 7e33536d 95fe1daa 9efbedc0 00000100 |mS3~............|
[00100.752] 01192.01265> 0x00001b46d7250af0: 00000002 00004911 d7250c50 00001b46 |.....I..P.%.F...|
[00100.752] 01192.01265> 0x00001b46d7250b00: d7250fc0 00001b46 9efbedc0 00003c5e |..%.F.......^<..|
[00100.753] 01192.01265> 0x00001b46d7250b10: f13ea028 00004911 f13f8460 00004911 |(.>..I..`.?..I..|
[00100.753] 01192.01265> 0x00001b46d7250b20: 7e33536d 95fe1daa 6a3a71f0 00000100 |mS3~.....q:j....|
[00100.753] 01192.01265> 0x00001b46d7250b30: 00000002 00003c5e 9ec10666 00003c5e |....^<..f...^<..|
[00100.753] 01192.01265> 0x00001b46d7250b40: 9ec4c949 00003c5e d9dfd4c0 00001c8a |I...^<..........|
[00100.753] 01192.01265> 0x00001b46d7250b50: 9ec30b7e 00003c5e f13f8460 00004911 |~...^<..`.?..I..|
[00100.753] 01192.01265> 0x00001b46d7250b60: d9dfd4c0 00001c8a 6a3a71f0 000048f1 |.........q:j.H..|
[00100.754] 01192.01265> 0x00001b46d7250b70: d7250b90 00001b46 e0852c68 0000836a |..%.F...h,..j...|
[00100.754] 01192.01265> 0x00001b46d7250b80: 00000001 00000000 e08b4b50 00000000 |........PK......|
[00100.754] 01192.01265> 0x00001b46d7250b90: d7250bf0 00001b46 e0834b10 0000836a |..%.F....K..j...|
[00100.754] 01192.01265> 0x00001b46d7250ba0: 00000548 00000000 00007000 00000000 |H........p......|
[00100.754] 01192.01265> 0x00001b46d7250bb0: 00000001 00000000 36929880 0000489e |...........6.H..|
[00100.755] 01192.01265> arch: aarch64
[00100.842] 01192.01265> dso: id=277daa8040ca05420abd3109825efe325b478d80 base=0xf421bb5a2000 name=<vDSO>
[00100.842] 01192.01265> dso: id=643a4023ea4785cae3460383b7d9e588414dad41 base=0x838d45ff2000 name=liblaunchpad.so
[00100.842] 01192.01265> dso: id=6655f2aa8d413fde5a6af1cd81468f7d4d9cb846 base=0x836ae07d9000 name=libc.so
[00100.842] 01192.01265> dso: id=015e6d02185d0d473a757284752b228dbdf7104b base=0x7fbdf8f3b000 name=libfdio.so
[00100.842] 01192.01265> dso: id=6ec8ca946d9be2bd base=0x3c5e9ebda000 name=app:base_unittests
#01: fetch_add at buildtools/third_party/libc++/trunk/include/atomic:1014
      (inlined by) NoBarrier_AtomicIncrement at base/atomicops_internals_portable.h:86
      (inlined by) DecrementNumTasksBlockingShutdown at base/task_scheduler/task_tracker.cc:157
      (inlined by) AfterRunTask at base/task_scheduler/task_tracker.cc:582
#02: RunNextTask at base/task_scheduler/task_tracker.cc:315
#03: get at base/memory/scoped_refptr.h:174
      (inlined by) scoped_refptr at base/memory/scoped_refptr.h:159
      (inlined by) operator= at base/memory/scoped_refptr.h:207
      (inlined by) ThreadMain at base/task_scheduler/scheduler_worker.cc:72
#04: ThreadFunc at base/threading/platform_thread_posix.cc:77
#05: pc 0x836ae07f18fc sp 0x1b46d7250fc0 (libc.so,0x188fc)
#06: pc 0x836ae0865198 sp 0x1b46d7250fe0 (libc.so,0x8c198)
[00100.940] 02585.02616> [1501/2591] TaskSchedulerLock.PredecessorLongerCycle (11 ms)
[00100.940] 02585.02616> [1502/2591] TaskSchedulerSingleThreadTaskRunnerManagerTest.DifferentThreadsUsed (200 ms)
[00100.940] 02585.02616> [1503/2591] TaskSchedulerSingleThreadTaskRunnerManagerTest.SameThreadUsed (34 ms)
[00100.941] 02585.02616> [1504/2591] TaskSchedulerSingleThreadTaskRunnerManagerTest.RunsTasksInCurrentSequence (33 ms)
[00100.941] 02585.02616> [1505/2591] TaskSchedulerSingleThreadTaskRunnerManagerTest.SharedWithBaseSyncPrimitivesDCHECKs (30 ms)
[00100.941] 02585.02616> [1506/2591] TaskSchedulerSingleThreadTaskRunnerManagerJoinTest.ConcurrentJoin (42 ms)
[00100.942] 02585.02616> [1507/2591] TaskSchedulerSingleThreadTaskRunnerManagerJoinTest.ConcurrentJoinExtraSkippedTask (30 ms)
[00100.943] 02585.02616> [1508/2591] TaskSchedulerSingleThreadTaskRunnerManagerStartTest.PostTaskBeforeStart (117 ms)
[00100.943] 02585.02616> [1509/2591] TaskSchedulerWorkerPoolImplPostTaskBeforeStartTest.PostTasksBeforeStart (163 ms)
[00100.944] 02585.02616> [ RUN      ] TaskSchedulerWorkerPoolImplPostTaskBeforeStartTest.PostManyTasks
[00100.944] 02585.02616> [1510/2591] TaskSchedulerWorkerPoolImplPostTaskBeforeStartTest.PostManyTasks (CRASHED)


Post-roll (sdk f6dffb2fee82a21900fc3a00261dc5844901ea9e), I get some of the bot's failures on a local run.

But more importantly it's substantially slower, without any changes on the Chrome side:
- before, 215s for a full run
- after, 835s for a full run

which I suspect as the root cause for the failures. In the second run there's no hangs, just a general overall much-slowerness.


The Zircon changelog between those two sdks looks relatively insignificant, but since there's not too many revs, hopefully the SDK will be buildable at intermediate revs and I can see what causes the slowness. Trying that now...

[arm64-release-goma] around:/work/fuchsia/zircon$ git log b1f7d54fa4d0eab09b9253924c748c536c2f7c68...56210e4da4e16fb08593bd706c5c2e81729a1a95 --oneline
56210e4da (HEAD) [fbl] Add DECLARE_HAS_MEMBER_FN_WITH_SIGNATURE.
955e38736 [dev][rtc][pl031] add support for arm primecell pl031 RTC
306edcd43 [kernel] Optimize handle dispatcher accessor.
3d6efb68c [dev][platform-bus] Add support for proxying the i2c protocol to platform devices
fc16da4a9 [devmgr] Allow passing args via zircon.autorun.* options
65e5505d2 [hypervisor][virtio] Specify PCI device class code
8168ad907 [hypervisor][pci] Fix unsupported accesses.
661d3ae66 [ulib][hypervisor] Mark LAPIC access as volatile.
0a194bef7 [ddk][i2c] Add new protocol for I2C
f97615d83 [crypto] Add Bytes::Increment.
3bf3bd6a2 [inet6] Improve diagnostics
6876c33c7 [ulib][hypervisor] Quiet Virtio error output.
4b4977b94 [lib][hypervisor] Move common pinning functions.
e50dcce1b [mdi] Add GICH and GICV offsets for GICv2.

Maybe having Time set properly make these tests slower/more flaky. :)
Hmm, maybe I'm going crazy, I can't get a full run at anything like 215s now.
Cc: phosek@chromium.org
Ah! The roll pulls this in

 {
   "name": "buildtools",
   "remote": "https://fuchsia.googlesource.com/buildtools",
   "path": "/work/fuchsia/buildtools",
   "revision": "0d6e0273b910631407b46cbd446c9bdc780b7963",
   "old_revision": "1e8a2ee6e86e0dd7e3c6061a88a27ac3ff2e1dff",
   "cls": [
    {
     "commit": "0d6e0273b910631407b46cbd446c9bdc780b7963",
     "number": 80200,
     "subject": "Update QEMU prebuilt to v2.10.1",
     "url": "https://fuchsia-review.googlesource.com/c/80200"
    }
   ]
  },


That's looking like a pretty prime suspect.
Confirmed that if I roll the SDK forward and use the pre-roll QEMU binary, I see the difference of 3x-4x in perf on arm64.

phosek, can we go back to 2.10.0 qemu?
Summary: QEMU ARM64 got much slower after sdk roll to f6dffb2fee82a21900fc3a00261dc5844901ea9e (was: Fuchsia ARM64 tests broke after sdk roll to f6dffb2fee82a21900fc3a00261dc5844901ea9e)
Thanks
Project Member

Comment 15 by bugdroid1@chromium.org, Nov 1 2017

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/2ec422303b32292b0acd7bd2781f63498040de88

commit 2ec422303b32292b0acd7bd2781f63498040de88
Author: Scott Graham <scottmg@chromium.org>
Date: Wed Nov 01 06:22:49 2017

Roll Fuchsia SDK to ff6b8e980b4e5b0c898341e8a467b9c751857e5d

Reverts the QEMU binary which caused a substantial perf regression on
arm64 emu.

TBR: wez@chromium.org
Bug:  780096 , 707030
Change-Id: Ib48849dcfb261205a6211c9eec0a9e2b997970cd
Reviewed-on: https://chromium-review.googlesource.com/748682
Commit-Queue: Scott Graham <scottmg@chromium.org>
Commit-Queue: Wez <wez@chromium.org>
Reviewed-by: Wez <wez@chromium.org>
Cr-Commit-Position: refs/heads/master@{#513088}
[modify] https://crrev.com/2ec422303b32292b0acd7bd2781f63498040de88/DEPS

Status: Fixed (was: Started)
-smp 1 doesn't affect perf either way for arm64 when on gcc-built qemu (our 2.10.0 binary), so not changing that.

Sign in to add a comment