QEMU ARM64 got much slower after sdk roll to f6dffb2fee82a21900fc3a00261dc5844901ea9e |
|||||
Issue descriptionStarted here. https://ci.chromium.org/buildbot/chromium.fyi/Fuchsia%20ARM64/1004 Running the tests individually does not reproduce the failures, only longer/full runs, sadly.
,
Oct 31 2017
Yeah, could be some sort of cross-test pollution I guess, yeah. out/fucha/bin/run_base_unittests --test-launcher-filter-file=../../testing/buildbot/filters/fuchsia.base_unittests.filter --gtest_filter=Process* out/fucha/bin/run_base_unittests --test-launcher-filter-file=../../testing/buildbot/filters/fuchsia.base_unittests.filter --gtest_filter=P* sometimes times out, but looks a little different than the bot failure of ProcessUtilTest.LaunchWithHandleTransfer. So there's probably one particular test run before it that must be causing the problem in that one's case.
,
Oct 31 2017
whew emulation is sooo slooooooow
out/fucha/bin/run_base_unittests --test-launcher-filter-file=../../testing/buildbot/filters/fuchsia.base_unittests.filter --gtest_filter=S*:ProcessUtilTest.* does repro
[00098.557] 02728.02761> [ RUN ] ProcessUtilTest.LaunchWithHandleTransfer
[00098.557] 02728.02761> ../../base/process/process_util_unittest.cc:779: Failure
[00098.558] 02728.02761> Expected: (0)
[00098.558] 02728.02761> Which is: 0
[00098.558] 02728.02761> To be equal to: result
[00098.558] 02728.02761> Which is: -21
[00098.559] 02728.02761> ../../base/process/process_util_unittest.cc:780: Failure
[00098.559] 02728.02761> Value of: signals & ZX_SOCKET_READABLE
[00098.559] 02728.02761> Actual: false
[00098.559] 02728.02761> Expected: true
[00098.560] 02728.02761> ../../base/process/process_util_unittest.cc:785: Failure
[00098.560] 02728.02761> Expected: (0)
[00098.560] 02728.02761> Which is: 0
[00098.562] 02728.02761> To be equal to: result
[00098.562] 02728.02761> Which is: -22
[00098.563] 02728.02761> ../../base/process/process_util_unittest.cc:786: Failure
[00098.563] 02728.02761> Expected: 1u
[00098.564] 02728.02761> Which is: 1
[00098.564] 02728.02761> To be equal to: bytes_read
[00098.565] 02728.02761> Which is: 0
[00098.566] 02728.02761> ../../base/process/process_util_unittest.cc:787: Failure
[00098.566] 02728.02761> Expected: kPipeValue
[00098.566] 02728.02761> Which is: '\xCC' (204)
[00098.567] 02728.02761> To be equal to: buf[0]
[00098.567] 02728.02761> Which is: '\0'
[00098.567] 02728.02761> [3:1731338641:1031/175357.837545:89839641:WARNING:test_suite.cc(235)] Test launcher output path /tmp/.org.chromium.Chromium.Knnfln/test_results.xml exists. Not adding test launcher result printer.
[00098.568] 02728.02761> [3:1731338641:1031/175357.892973:89894422:FATAL:process_util_unittest.cc(753)] Check failed: (0) == result (0 vs. -24)
#00: base::debug::StackTrace::StackTrace(unsigned long) at base/debug/stack_trace_fuchsia.cc:173
#01: endl<char, std::__1::char_traits<char> > at buildtools/third_party/libc++/trunk/include/ostream:1001
(inlined by) operator<< at buildtools/third_party/libc++/trunk/include/ostream:195
(inlined by) ~LogMessage at base/logging.cc:582
#02: CheckEQImpl<unsigned int, unsigned long> at base/logging.h:757
(inlined by) ProcessUtilsVerifyHandle at base/process/process_util_unittest.cc:754
#03: Run at base/test/test_suite.cc:265
#04: Run at base/callback.h:92
(inlined by) LaunchUnitTestsInternal at base/test/launcher/unit_test_launcher.cc:216
#05: LaunchUnitTests at base/test/launcher/unit_test_launcher.cc:475
#06: main at base/test/run_all_base_unittests.cc:12
#07: pc 0xe8d30689f3c4 (libc.so,0x1b3c4)
#08: pc 0xe8d30689f5b0 (libc.so,0x1b5b0)
so I'll look at that one first.
,
Oct 31 2017
-21 is ZX_TIMED_OUT on the 5s wait. I think that's all it is, at least for this test. :( Setting -smp 1 seems to make these failing tests pass "reliably" and probably makes more sense for non-KVM anyway as we had before for non-kvm x64. Trying some tryjobs now to see how that goes.
,
Oct 31 2017
The difference between ARM64 and x64 is that QEMU can actually use multiple host cores to emulate ARM64, thanks to the weaker memory access guarantees. There may be other aspects of emulation that are impacted adversely by multi-core, of course. The other consideration is that more cores means more parallel test processes, each of which can launch child processes, which consume CPU and memory, which may also hinder things. FWIW I think we used to run -smp 2 on x64 under emulation.
,
Oct 31 2017
Re #4: Perhaps we're looking at the wrong tests; the bot seemed perfectly happy until the SDK roll, and is now regularly failing base_unittests - if this were just QEMU timeouts then we'd have seen them before?
,
Oct 31 2017
Yup, I don't understand what's going on. I'm going to go back and try to figure out what changed on the Fuchsia side.
,
Nov 1 2017
Back at pre-roll (sdk 04840186b5fc14dc35b12e77f8028330a4bd8333), I got one crash locally in TaskSchedulerWorkerPoolImplPostTaskBeforeStartTest.PostManyTasks. [00100.249] 02585.02616> [1500/2591] TaskSchedulerLock.AcquirePredecessorWrongOrder (9 ms) [00100.738] 01192.01265> <== fatal exception: process base_unittests[31798] thread TaskSchedulerSingleThreadAForeg[32583] [00100.739] 01192.01265> <== fatal page fault, PC at 0x3c5e9f7c43cc [00100.739] 01192.01265> x0 0x4911f13f3298 x1 0x1 x2 0 x3 0 [00100.740] 01192.01265> x4 0x4911f1445730 x5 0x48d3e816f2c0 x6 0xffff x7 0x10000 [00100.740] 01192.01265> x8 0 x9 0 x10 0 x11 0x4911f13f8468 [00100.740] 01192.01265> x12 0 x13 0x1 x14 0 x15 0x1 [00100.741] 01192.01265> x16 0x3c5e9fa131a8 x17 0x836ae07f1a10 x18 0 x19 0x4911f13f3298 [00100.742] 01192.01265> x20 0x1c8ad9dfd4c0 x21 0x1b46d7250ac0 x22 0x1b46d7250e10 x23 0x1 [00100.743] 01192.01265> x24 0x1 x25 0 x26 0x1c8ad9dfd4c0 x27 0x1 [00100.743] 01192.01265> x28 0 x29 0x1b46d7250fc0 lr 0x3c5e9f7c3fe4 sp 0x1b46d7250ac0 [00100.744] 01192.01265> pc 0x3c5e9f7c43cc psr 0x80000000 [00100.745] 01192.01265> far 0 [00100.745] 01192.01265> bottom of user stack: [00100.752] 01192.01265> 0x00001b46d7250ac0: 7e33536d 95fe1daa 9f7c1ac4 00003c5e |mS3~......|.^<..| [00100.752] 01192.01265> 0x00001b46d7250ad0: 00000000 00000000 9f702b88 00003c5e |.........+p.^<..| [00100.752] 01192.01265> 0x00001b46d7250ae0: 7e33536d 95fe1daa 9efbedc0 00000100 |mS3~............| [00100.752] 01192.01265> 0x00001b46d7250af0: 00000002 00004911 d7250c50 00001b46 |.....I..P.%.F...| [00100.752] 01192.01265> 0x00001b46d7250b00: d7250fc0 00001b46 9efbedc0 00003c5e |..%.F.......^<..| [00100.753] 01192.01265> 0x00001b46d7250b10: f13ea028 00004911 f13f8460 00004911 |(.>..I..`.?..I..| [00100.753] 01192.01265> 0x00001b46d7250b20: 7e33536d 95fe1daa 6a3a71f0 00000100 |mS3~.....q:j....| [00100.753] 01192.01265> 0x00001b46d7250b30: 00000002 00003c5e 9ec10666 00003c5e |....^<..f...^<..| [00100.753] 01192.01265> 0x00001b46d7250b40: 9ec4c949 00003c5e d9dfd4c0 00001c8a |I...^<..........| [00100.753] 01192.01265> 0x00001b46d7250b50: 9ec30b7e 00003c5e f13f8460 00004911 |~...^<..`.?..I..| [00100.753] 01192.01265> 0x00001b46d7250b60: d9dfd4c0 00001c8a 6a3a71f0 000048f1 |.........q:j.H..| [00100.754] 01192.01265> 0x00001b46d7250b70: d7250b90 00001b46 e0852c68 0000836a |..%.F...h,..j...| [00100.754] 01192.01265> 0x00001b46d7250b80: 00000001 00000000 e08b4b50 00000000 |........PK......| [00100.754] 01192.01265> 0x00001b46d7250b90: d7250bf0 00001b46 e0834b10 0000836a |..%.F....K..j...| [00100.754] 01192.01265> 0x00001b46d7250ba0: 00000548 00000000 00007000 00000000 |H........p......| [00100.754] 01192.01265> 0x00001b46d7250bb0: 00000001 00000000 36929880 0000489e |...........6.H..| [00100.755] 01192.01265> arch: aarch64 [00100.842] 01192.01265> dso: id=277daa8040ca05420abd3109825efe325b478d80 base=0xf421bb5a2000 name=<vDSO> [00100.842] 01192.01265> dso: id=643a4023ea4785cae3460383b7d9e588414dad41 base=0x838d45ff2000 name=liblaunchpad.so [00100.842] 01192.01265> dso: id=6655f2aa8d413fde5a6af1cd81468f7d4d9cb846 base=0x836ae07d9000 name=libc.so [00100.842] 01192.01265> dso: id=015e6d02185d0d473a757284752b228dbdf7104b base=0x7fbdf8f3b000 name=libfdio.so [00100.842] 01192.01265> dso: id=6ec8ca946d9be2bd base=0x3c5e9ebda000 name=app:base_unittests #01: fetch_add at buildtools/third_party/libc++/trunk/include/atomic:1014 (inlined by) NoBarrier_AtomicIncrement at base/atomicops_internals_portable.h:86 (inlined by) DecrementNumTasksBlockingShutdown at base/task_scheduler/task_tracker.cc:157 (inlined by) AfterRunTask at base/task_scheduler/task_tracker.cc:582 #02: RunNextTask at base/task_scheduler/task_tracker.cc:315 #03: get at base/memory/scoped_refptr.h:174 (inlined by) scoped_refptr at base/memory/scoped_refptr.h:159 (inlined by) operator= at base/memory/scoped_refptr.h:207 (inlined by) ThreadMain at base/task_scheduler/scheduler_worker.cc:72 #04: ThreadFunc at base/threading/platform_thread_posix.cc:77 #05: pc 0x836ae07f18fc sp 0x1b46d7250fc0 (libc.so,0x188fc) #06: pc 0x836ae0865198 sp 0x1b46d7250fe0 (libc.so,0x8c198) [00100.940] 02585.02616> [1501/2591] TaskSchedulerLock.PredecessorLongerCycle (11 ms) [00100.940] 02585.02616> [1502/2591] TaskSchedulerSingleThreadTaskRunnerManagerTest.DifferentThreadsUsed (200 ms) [00100.940] 02585.02616> [1503/2591] TaskSchedulerSingleThreadTaskRunnerManagerTest.SameThreadUsed (34 ms) [00100.941] 02585.02616> [1504/2591] TaskSchedulerSingleThreadTaskRunnerManagerTest.RunsTasksInCurrentSequence (33 ms) [00100.941] 02585.02616> [1505/2591] TaskSchedulerSingleThreadTaskRunnerManagerTest.SharedWithBaseSyncPrimitivesDCHECKs (30 ms) [00100.941] 02585.02616> [1506/2591] TaskSchedulerSingleThreadTaskRunnerManagerJoinTest.ConcurrentJoin (42 ms) [00100.942] 02585.02616> [1507/2591] TaskSchedulerSingleThreadTaskRunnerManagerJoinTest.ConcurrentJoinExtraSkippedTask (30 ms) [00100.943] 02585.02616> [1508/2591] TaskSchedulerSingleThreadTaskRunnerManagerStartTest.PostTaskBeforeStart (117 ms) [00100.943] 02585.02616> [1509/2591] TaskSchedulerWorkerPoolImplPostTaskBeforeStartTest.PostTasksBeforeStart (163 ms) [00100.944] 02585.02616> [ RUN ] TaskSchedulerWorkerPoolImplPostTaskBeforeStartTest.PostManyTasks [00100.944] 02585.02616> [1510/2591] TaskSchedulerWorkerPoolImplPostTaskBeforeStartTest.PostManyTasks (CRASHED) Post-roll (sdk f6dffb2fee82a21900fc3a00261dc5844901ea9e), I get some of the bot's failures on a local run. But more importantly it's substantially slower, without any changes on the Chrome side: - before, 215s for a full run - after, 835s for a full run which I suspect as the root cause for the failures. In the second run there's no hangs, just a general overall much-slowerness. The Zircon changelog between those two sdks looks relatively insignificant, but since there's not too many revs, hopefully the SDK will be buildable at intermediate revs and I can see what causes the slowness. Trying that now... [arm64-release-goma] around:/work/fuchsia/zircon$ git log b1f7d54fa4d0eab09b9253924c748c536c2f7c68...56210e4da4e16fb08593bd706c5c2e81729a1a95 --oneline 56210e4da (HEAD) [fbl] Add DECLARE_HAS_MEMBER_FN_WITH_SIGNATURE. 955e38736 [dev][rtc][pl031] add support for arm primecell pl031 RTC 306edcd43 [kernel] Optimize handle dispatcher accessor. 3d6efb68c [dev][platform-bus] Add support for proxying the i2c protocol to platform devices fc16da4a9 [devmgr] Allow passing args via zircon.autorun.* options 65e5505d2 [hypervisor][virtio] Specify PCI device class code 8168ad907 [hypervisor][pci] Fix unsupported accesses. 661d3ae66 [ulib][hypervisor] Mark LAPIC access as volatile. 0a194bef7 [ddk][i2c] Add new protocol for I2C f97615d83 [crypto] Add Bytes::Increment. 3bf3bd6a2 [inet6] Improve diagnostics 6876c33c7 [ulib][hypervisor] Quiet Virtio error output. 4b4977b94 [lib][hypervisor] Move common pinning functions. e50dcce1b [mdi] Add GICH and GICV offsets for GICv2. Maybe having Time set properly make these tests slower/more flaky. :)
,
Nov 1 2017
Hmm, maybe I'm going crazy, I can't get a full run at anything like 215s now.
,
Nov 1 2017
Ah! The roll pulls this in
{
"name": "buildtools",
"remote": "https://fuchsia.googlesource.com/buildtools",
"path": "/work/fuchsia/buildtools",
"revision": "0d6e0273b910631407b46cbd446c9bdc780b7963",
"old_revision": "1e8a2ee6e86e0dd7e3c6061a88a27ac3ff2e1dff",
"cls": [
{
"commit": "0d6e0273b910631407b46cbd446c9bdc780b7963",
"number": 80200,
"subject": "Update QEMU prebuilt to v2.10.1",
"url": "https://fuchsia-review.googlesource.com/c/80200"
}
]
},
That's looking like a pretty prime suspect.
,
Nov 1 2017
Confirmed that if I roll the SDK forward and use the pre-roll QEMU binary, I see the difference of 3x-4x in perf on arm64. phosek, can we go back to 2.10.0 qemu?
,
Nov 1 2017
,
Nov 1 2017
Thanks
,
Nov 1 2017
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/2ec422303b32292b0acd7bd2781f63498040de88 commit 2ec422303b32292b0acd7bd2781f63498040de88 Author: Scott Graham <scottmg@chromium.org> Date: Wed Nov 01 06:22:49 2017 Roll Fuchsia SDK to ff6b8e980b4e5b0c898341e8a467b9c751857e5d Reverts the QEMU binary which caused a substantial perf regression on arm64 emu. TBR: wez@chromium.org Bug: 780096 , 707030 Change-Id: Ib48849dcfb261205a6211c9eec0a9e2b997970cd Reviewed-on: https://chromium-review.googlesource.com/748682 Commit-Queue: Scott Graham <scottmg@chromium.org> Commit-Queue: Wez <wez@chromium.org> Reviewed-by: Wez <wez@chromium.org> Cr-Commit-Position: refs/heads/master@{#513088} [modify] https://crrev.com/2ec422303b32292b0acd7bd2781f63498040de88/DEPS
,
Nov 1 2017
-smp 1 doesn't affect perf either way for arm64 when on gcc-built qemu (our 2.10.0 binary), so not changing that. |
|||||
►
Sign in to add a comment |
|||||
Comment 1 by w...@chromium.org
, Oct 31 2017