Android Nexus perf bots should be using 32-bit build |
||||||||
Issue descriptionWe ship 32-bit Chrome to the users, hence the Android swarming perf bot should be using 32-bit build
,
Apr 21 2017
The swarming perf bot should be using a 32 bit build. The builder feeding it is configured to build 32 bit. https://cs.chromium.org/chromium/build/scripts/slave/recipe_modules/chromium_tests/chromium_perf_fyi.py?q=chromium_perf_fyi+package:%5Echromium$&l=97 This is weird.
,
Apr 21 2017
I see the problem. I have a CL to fix it. Do we want to switch the builder to run 32 bit builds?
,
Apr 21 2017
We do. At least for now, the Android perf bots are using 32 bit builds so we need to make sure we use the same version to have apple-to-apple comparison.
,
Apr 21 2017
Well, the CL might fix it. I'm not sure. It's weird that the bot triggering it is 32 bit, but it somehow still gets a 64 bit build.
,
Apr 21 2017
The following revision refers to this bug: https://chromium.googlesource.com/chromium/tools/build/+/b32909009cd549cba7c368ebe46d5e42c48283ff commit b32909009cd549cba7c368ebe46d5e42c48283ff Author: Stephen Martinis <martiniss@google.com> Date: Fri Apr 21 20:12:29 2017 Fix target bits for Android Swarming Tester Bug: 714110 Change-Id: I51d848134afcca6fa193bea88f6fe7df5c8505d2 Reviewed-on: https://chromium-review.googlesource.com/484679 Reviewed-by: rnephew <rnephew@chromium.org> Commit-Queue: Stephen Martinis <martiniss@chromium.org> [modify] https://crrev.com/b32909009cd549cba7c368ebe46d5e42c48283ff/scripts/slave/recipe_modules/chromium_tests/chromium_perf_fyi.py [modify] https://crrev.com/b32909009cd549cba7c368ebe46d5e42c48283ff/scripts/slave/recipes/chromium.expected/full_chromium_perf_fyi_Android_Swarming_N5X_Tester.json
,
Apr 27 2017
Ok, I've looked at this a bunch, and I'm at a loss. The isolate which is being downloaded by the swarming bots (https://chromium-swarm.appspot.com/task?id=35c635ffd35c0410&refresh=10&show_raw=1 is a task, 985eea1c3cc4f873e3b545d290e903c5d8e3ed9b is the input isolate for that task) seems to have a 32 bit ChromePublic.apk in it. Specifically, running `aapt dump badging out/Release/apks/ChromePublic.apk | grep native-code` gives me 'native-code: armeabi-v7a', which looks like it's 32 bit, after reading through https://developer.android.com/ndk/guides/abis.html So I'm not sure why the bot is running a 64 bit build. I also discovered that the trace file has some metadata about what architecture it's run on. A sample trace file run on the swarming bot (https://console.developers.google.com/m/cloudstorage/b/chrome-telemetry-output/o/trace-file-id_3-2017-04-27_04-09-30-70215.html, gotten from https://chromium-swarm.appspot.com/task?id=35c638f6e9fe6210&refresh=10&show_raw=1) gives us os-arch: "aarch64", This looks like it's talking about the phone itself. There was some confusion, since the code looks like it could be talking about the host OS, rather than the OS running the trace, but the processor of build248-m4 (the bot with phones, running the above task) has the following processor (according to /proc/cpuinfo) "model name : Intel(R) Xeon(R) CPU E3-1220 v3 @ 3.10GHz" I had a thought though; the os-arch value in the trace metadata could be what the architecture of the phone is, but it isn't what the binary it was running is compiled as. A 64 bit phone can run a 32 bit binary, which might be what's happening here. That doesn't really explain what's happening in the blocking issue. It also doesn't explain why the swarming and non-swarming bots have different os-arch values in their trace files. I'll look into this more.
,
Apr 27 2017
This is definitely running a 32-bit build. If there's a difference in what tracing is reporting, I'd start by looking at tracing.
,
Apr 27 2017
+Juan, Primiano: please see #7.
,
Apr 27 2017
The original reasons that primiaion cited as to why it's a 64 bit are: " I looked at the traces in #11 1) The tracing overhead is 2x (60 MB vs 30MB) in the non-swarming case. This all comes from the TraceBufferVector size. From the trace metadata, in both cases tracing is started with "record-as-much-as-possible", which in turn causes a fixed size of the vector (kTraceEventVectorBigBufferChunks). The only thing that could make a difference that comes to my mind is 32 vs 64 bit. 2) In the trace metadata one is os-arch: "armv8l",the other one is os-arch: "aarch64" 3) If I look at the virtual addresses (select any blue column in memory-infra, and expand the "Stack" section), the swarming trace is definitely running on a 64 bit address space, the non-swarming trace is runing in 32 bit mode " I addressed in 2 in #7. 1 and 3 I'm not sure how to address. For 3, the virtual addresses point, could it be possible that the addresses are 64 bit, but internal to the application they're 32 bit? Not sure... Regardless of if the bot itself is using 32 bit builds, it is confusing that things are so different between swarming and non swarming bots. I'm thinking we can close this bug, and move back to issue 705136 , and figure out why the metrics are so different. I'll wait for primiano and/or perezju to respond before closing this, though.
,
Apr 27 2017
+Alex as I know he can navigate the 100s (this is the real scale, right) of APKs we generate with every release build.
,
Apr 28 2017
> This looks like it's talking about the phone itself. There was some confusion, since the code looks like it could be talking about the host OS, rather than the OS running the trace, but the processor of build248-m4 (the bot with phones, running the above task) has the following processor (according to /proc/cpuinfo) Yes you are correct. The bitness of the OS might not match the bitness of the process, in the cases where we run 32 bit on 64 bit OS. To the best of my knowledge, on Android: armeabi-v7a (and any other armv7 spelling) -> 32 bit userspace running on a 32 bit kernel armv8l -> 32 bit userspace running on a 64 bit kernel (which is what we want for clank) aarch64 -> 64 bit userspace running on a 64 bit kernel (which is what I expect only for webview) > I also discovered that the trace file has some metadata about what architecture it's run on. A sample trace file run on the swarming bot (https://console.developers.google.com/m/cloudstorage/b/chrome-telemetry-output/o/trace-file-id_3-2017-04-27_04-09-30-70215.html, gotten from https://chromium-swarm.appspot.com/task?id=35c638f6e9fe6210&refresh=10&show_raw=1) gives us os-arch: "aarch64", This trace is definitely a 64 bit process running in 64 bit os. The os_arch is aarch64 AND the addresses are definitely 64 bit > This is definitely running a 32-bit build. By looking at the trace, not really at least in terms of what we are "running". Perhaps here we are "downloading" a 32 bit thing and testing a different, 64 bit, thing? > 1 and 3 I'm not sure how to address. For 3, the virtual addresses point, could it be possible that the addresses are 64 bit, but internal to the application they're 32 bit? Not sure... To the best of my knowledge, that's not possible. > I'm thinking we can close this bug, No really, in the trace above "some" 64 bit thing is running. I don't know what, but thas is not what we want, and that is what this bug is about.
,
Apr 28 2017
Juan and I looked into this, I think I know what it's going on: From the https://chromium-swarm.appspot.com/task?id=35c638f6e9fe6210&refresh=10&show_raw=1 : run_telemetry_benchmark_as_googletest.py ... -browser=reference and later Downloading gs://chrome-telemetry/binary_dependencies/chrome_stable_03306b04e49ed3b0c4c29da84a128d76659624f2... Which means that: 1) we are running a binary that we download from GCS, ignoring what we build. This explains why John is so sure the isolate input is 32 bit, and why I am so sure we are running a 64 bit version. 2) why the reference build 64 bit? Are we by any chance running 64 bit on all our ref builds? that would be quite unfortunate. 3) I don't know enough about swarming, but isn't this violating core swarming principles? Essentially we have a set of isolate inputs, but then we dynamically download and test arbitrary binary. Isn't this violating the assumption that fixed input + fixed test = same output?
,
Apr 28 2017
To #13: that was looking at a reference build run, so the fact Telemetry downloaded the apk from cloud storage was intended. We need to look at a non reference build run. For example: https://chromium-swarm.appspot.com/task?id=35cab5ab5ddfa510&refresh=10 The command is: run_telemetry_benchmark_as_googletest.py ... --browser=android-chromium A trace from this run: https://console.developers.google.com/m/cloudstorage/b/chrome-telemetry-output/o/trace-file-id_99-2017-04-28_00-55-30-47535.html *Having said that, the fact the the reference apk Telemetry uses is 64 bit is also a bug.
,
Apr 28 2017
The chrome commit hash from the trace in #14 is 8ba4e2e2f7cab6455607be4902f72379c9e49060, so I am pretty sure that for non reference benchmark runs, Telemetry is running tests against the apk stored in the isolate.
,
Apr 28 2017
Wait, so actually what is not done right is the non-swarming run. In other words: * the current perf bots are 64 bit userspace running on a 64 bit kernel * the swarming bots are 32 bit userspace running on a 64 bit kernel Quote: "armeabi-v7a (and any other armv7 spelling) -> 32 bit userspace running on a 32 bit kernel armv8l -> 32 bit userspace running on a 64 bit kernel (which is what we want for clank) aarch64 -> 64 bit userspace running on a 64 bit kernel (which is what I expect only for webview)" The non-swarming trace ( Android Nexus5X Perf (2)): https://console.developers.google.com/m/cloudstorage/b/chrome-telemetry-output/o/trace-file-id_10-2017-04-26_22-08-29-38454.html metadata shows: "aarch64" The swarming trace ( Android Swarming N5X Tester) https://console.developers.google.com/m/cloudstorage/b/chrome-telemetry-output/o/trace-file-id_10-2017-04-27_14-23-38-48740.html metdata shows: "armv8l" To make sure that I am not insane after so many clicking, people can double check with the traces link in https://chromeperf.appspot.com/report?sid=d73519308f45565e854fab851005bdf3f148195f8d6bd49665e5a21985182142
,
Apr 28 2017
Yep, the error is in the non-swarmed bot :-( Copying gs://chrome-perf/Android arm64 Compile/full-build-linux_0c5dce88667d9ceeed1d90daa8485aa6c6009176.zip... https://luci-logdog.appspot.com/v/?s=chrome%2Fbb%2Fchromium.perf%2FAndroid_Nexus5X_Perf__2_%2F3617%2F%2B%2Frecipes%2Fsteps%2Fextract_build%2F0%2Fstdout
,
Apr 28 2017
> A trace from this run: https://console.developers.google.com/m/cloudstorage/b/chrome-telemetry-output/o/trace-file-id_99-2017-04-28_00-55-30-47535.html Fantastic. This trace looks great to me, 32 bit code running on 64 bit os, which is what we ship. > To make sure that I am not insane after so many clicking, people can double check with the traces link in https://chromeperf.appspot.com/report?sid=d73519308f45565e854fab851005bdf3f148195f8d6bd49665e5a21985182142 Checked. Red (which seems fyi -> I assume non-swarming) is bad (64 bit userspace), blue (seems swarming) is good (32 bit userspace) So if I am reading everything correctly, the issue here is the other way round. We have been running 64 bit on non-swarming on N5X for a while now (and I am sure we had this bug in the past) And now we are seeing a difference by running 32 bit on swarming.
,
Apr 28 2017
#13.1: ah, that makes sense. #13.3: no, because the hashes we download are part of the isolate, e.g. https://isolateserver.appspot.com/browse?namespace=default-gzip&hash=985eea1c3cc4f873e3b545d290e903c5d8e3ed9b includes https://isolateserver.appspot.com/browse?namespacedefault-gzip&digest=5bc59ab580e85ad66fb1eda3b30f832cff2554e5 includes https://isolateserver.appspot.com/browse?namespace=default-gzip&digest=dcc98d581c0d5bb5d5803e7cb1ca30172104e85e&as=chrome_binaries.json
,
Apr 28 2017
Thanks for all the investigation. I realized after I posted the comment that I had linked to a swarming job which was running a reference build. I looked later at a regular swarming task, like you all did, and found it was running a 32 bit build. So, I remember setting up the Android Nexus5X bots. I switched them over to the chromium recipe, and AFAIK I was told (by someone? not sure who) to use the 64 bit build on it. It's possible that there was confusion, in that the nexus5Xs are 64 bit devices, but we should still use 32 bit builds on them.
,
Apr 28 2017
Ok, I misunderstood what someone was saying about 64 bit builds. It's my fault they're running 64 bit builds; before I switched to the chromium recipe, they weren't. See https://chromium.googlesource.com/chromium/tools/build/+/b8c8c287878f1a60e0fb76791158435caae2de32/scripts/slave/recipe_modules/chromium_tests/chromium_perf.py#169 (target bits) CL out to address this: https://chromium-review.googlesource.com/c/490636/
,
May 1 2017
,
May 1 2017
We'll need to check the bisect bots too before we close this out.
,
May 2 2017
The following revision refers to this bug: https://chromium.googlesource.com/chromium/tools/build/+/7e9590a51289827d249a9794913e6918df4d76fc commit 7e9590a51289827d249a9794913e6918df4d76fc Author: Simon <simonhatch@chromium.org> Date: Tue May 02 15:55:57 2017 Bisect - Switch Nexus5X to 32bit builder. TBR=dtu@chromium.org Bug: chromium:714110 Change-Id: If5a27306a2b6ba26b37258a9fbfe608468fe8848 Reviewed-on: https://chromium-review.googlesource.com/493487 Reviewed-by: Simon Hatch <simonhatch@chromium.org> Commit-Queue: Simon Hatch <simonhatch@chromium.org> [modify] https://crrev.com/7e9590a51289827d249a9794913e6918df4d76fc/scripts/slave/recipe_modules/auto_bisect_staging/bisector.py
,
May 2 2017
The following revision refers to this bug: https://chromium.googlesource.com/chromium/tools/build/+/eee167b93de7b16b804bd212a6080d02c571c149 commit eee167b93de7b16b804bd212a6080d02c571c149 Author: Stephen Martinis <martiniss@chromium.org> Date: Tue May 02 18:20:30 2017 Switch nexus 5x to 32 bit build It turns out we were running tests on the nexus 5x bot with 64 bit builds, which we shouldn't have been doing. Bug: 714110 Change-Id: I5ba4196774e3fddd17de1e71d08e787e0867ef9e Reviewed-on: https://chromium-review.googlesource.com/490636 Commit-Queue: Stephen Martinis <martiniss@chromium.org> Reviewed-by: Michael Case <mikecase@chromium.org> [modify] https://crrev.com/eee167b93de7b16b804bd212a6080d02c571c149/scripts/slave/recipe_modules/auto_bisect/bisector.py [modify] https://crrev.com/eee167b93de7b16b804bd212a6080d02c571c149/scripts/slave/recipes/chromium.expected/full_chromium_perf_Android_Nexus5X_Perf__2_.json [modify] https://crrev.com/eee167b93de7b16b804bd212a6080d02c571c149/scripts/slave/recipes/chromium.expected/full_chromium_perf_Android_arm64_Compile.json [modify] https://crrev.com/eee167b93de7b16b804bd212a6080d02c571c149/scripts/slave/recipes/chromium.expected/full_chromium_perf_Android_Compile.json [modify] https://crrev.com/eee167b93de7b16b804bd212a6080d02c571c149/scripts/slave/recipes/chromium.expected/full_chromium_perf_Android_Nexus5X_Perf__1_.json [modify] https://crrev.com/eee167b93de7b16b804bd212a6080d02c571c149/scripts/slave/recipe_modules/chromium_tests/chromium_perf.py [modify] https://crrev.com/eee167b93de7b16b804bd212a6080d02c571c149/scripts/slave/recipes/chromium.expected/full_chromium_perf_Android_Nexus5X_Perf__3_.json
,
May 4 2017
This seems fixed now, memory measurements correlate pretty well now: https://chromeperf.appspot.com/report?sid=d73519308f45565e854fab851005bdf3f148195f8d6bd49665e5a21985182142&start_rev=465994&end_rev=469268 Thanks Stephen!
,
Jul 19 2017
,
Dec 28 2017
|
||||||||
►
Sign in to add a comment |
||||||||
Comment 1 by sullivan@chromium.org
, Apr 21 2017