New issue
Advanced search Search tips
Note: Color blocks (like or ) mean that a user may not be available. Tooltip shows the reason.

Issue 714110 link

Starred by 1 user

Issue metadata

Status: Fixed
Owner: ----
Closed: Dec 2017
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: ----
Pri: 2
Type: Bug

Blocking:
issue 705136



Sign in to add a comment

Android Nexus perf bots should be using 32-bit build

Project Member Reported by nedngu...@google.com, Apr 21 2017

Issue description

We ship 32-bit Chrome to the users, hence the Android swarming perf bot should be using 32-bit build
 
Cc: benhenry@chromium.org klo...@chromium.org
Grace, how important is 64 bit coverage on the perfbots? We currently have it on N5X, but that's not what we ship to users.
The swarming perf bot should be using a 32 bit build. The builder feeding it is configured to build 32 bit. https://cs.chromium.org/chromium/build/scripts/slave/recipe_modules/chromium_tests/chromium_perf_fyi.py?q=chromium_perf_fyi+package:%5Echromium$&l=97

This is weird.
I see the problem. I have a CL to fix it.

Do we want to switch the builder to run 32 bit builds?
We do. At least for now, the Android perf bots are using 32 bit builds so we need to make sure we use the same version to have apple-to-apple comparison.
Well, the CL might fix it. I'm not sure. It's weird that the bot triggering it is 32 bit, but it somehow still gets a 64 bit build.
Cc: jbudorick@chromium.org
Labels: -Pri-3 Pri-2
Status: Started (was: Assigned)
Ok, I've looked at this a bunch, and I'm at a loss.

The isolate which is being downloaded by the swarming bots (https://chromium-swarm.appspot.com/task?id=35c635ffd35c0410&refresh=10&show_raw=1 is a task, 	985eea1c3cc4f873e3b545d290e903c5d8e3ed9b is the input isolate for that task) seems to have a 32 bit ChromePublic.apk in it.

Specifically, running `aapt dump badging out/Release/apks/ChromePublic.apk | grep native-code` gives me 'native-code: armeabi-v7a', which looks like it's 32 bit, after reading through https://developer.android.com/ndk/guides/abis.html

So I'm not sure why the bot is running a 64 bit build.

I also discovered that the trace file has some metadata about what architecture it's run on. A sample trace file run on the swarming bot (https://console.developers.google.com/m/cloudstorage/b/chrome-telemetry-output/o/trace-file-id_3-2017-04-27_04-09-30-70215.html, gotten from https://chromium-swarm.appspot.com/task?id=35c638f6e9fe6210&refresh=10&show_raw=1) gives us 
 os-arch: "aarch64",

This looks like it's talking about the phone itself. There was some confusion, since the code looks like it could be talking about the host OS, rather than the OS running the trace, but the processor of build248-m4 (the bot with phones, running the above task) has the following processor (according to /proc/cpuinfo)

"model name	: Intel(R) Xeon(R) CPU E3-1220 v3 @ 3.10GHz"

I had a thought though; the os-arch value in the trace metadata could be what the architecture of the phone is, but it isn't what the binary it was running is compiled as. A 64 bit phone can run a 32 bit binary, which might be what's happening here.

That doesn't really explain what's happening in the blocking issue. It also doesn't explain why the swarming and non-swarming bots have different os-arch values in their trace files.

I'll look into this more.
This is definitely running a 32-bit build. If there's a difference in what tracing is reporting, I'd start by looking at tracing.
Cc: primiano@chromium.org perezju@chromium.org
+Juan, Primiano: please see #7.


The original reasons that primiaion cited as to why it's a 64 bit are:
"
I looked at the traces in #11
1) The tracing overhead is 2x (60 MB vs 30MB) in the non-swarming case. This all comes from the TraceBufferVector size. From the trace metadata, in both cases tracing is started with "record-as-much-as-possible", which in turn causes a fixed size of the vector (kTraceEventVectorBigBufferChunks). The only thing that could make a difference that comes to my mind is 32 vs 64 bit.

2) In the trace metadata one is  os-arch: "armv8l",the other one is  os-arch: "aarch64"

3) If I look at the virtual addresses (select any blue column in memory-infra, and expand the "Stack" section), the swarming trace is definitely running on a 64 bit address space, the non-swarming trace is runing in 32 bit mode
"

I addressed in 2 in #7. 

1 and 3 I'm not sure how to address. For 3, the virtual addresses point, could it be possible that the addresses are 64 bit, but internal to the application they're 32 bit? Not sure...

Regardless of if the bot itself is using 32 bit builds, it is confusing that things are so different between swarming and non swarming bots. 

I'm thinking we can close this bug, and move back to  issue 705136 , and figure out why the metrics are so different. I'll wait for primiano and/or perezju to respond before closing this, though.
Cc: amineer@chromium.org
+Alex as I know he can navigate the 100s (this is the real scale, right) of APKs we generate with every release build.
> This looks like it's talking about the phone itself. There was some confusion, since the code looks like it could be talking about the host OS, rather than the OS running the trace, but the processor of build248-m4 (the bot with phones, running the above task) has the following processor (according to /proc/cpuinfo)

Yes you are correct. The bitness of the OS might not match the bitness of the process, in the cases where we run 32 bit on 64 bit OS.
To the best of my knowledge, on Android:

armeabi-v7a (and any other armv7 spelling) -> 32 bit userspace running on a 32 bit kernel
armv8l -> 32 bit userspace running on a 64 bit kernel (which is what we want for clank)
aarch64 -> 64 bit userspace running on a 64 bit kernel (which is what I expect only for webview)


> I also discovered that the trace file has some metadata about what architecture it's run on. A sample trace file run on the swarming bot (https://console.developers.google.com/m/cloudstorage/b/chrome-telemetry-output/o/trace-file-id_3-2017-04-27_04-09-30-70215.html, gotten from https://chromium-swarm.appspot.com/task?id=35c638f6e9fe6210&refresh=10&show_raw=1) gives us 
 os-arch: "aarch64",

This trace is definitely a 64 bit process running in 64 bit os. The os_arch is aarch64 AND the addresses are definitely 64 bit

> This is definitely running a 32-bit build.

By looking at the trace, not really at least in terms of what we are "running".
Perhaps here we are "downloading" a 32 bit thing and testing a different, 64 bit, thing?

> 1 and 3 I'm not sure how to address. For 3, the virtual addresses point, could it be possible that the addresses are 64 bit, but internal to the application they're 32 bit? Not sure...
To the best of my knowledge, that's not possible.

> I'm thinking we can close this bug,
No really, in the trace above "some" 64 bit thing is running. I don't know what, but thas is not what we want, and that is what this bug is about.

Juan and I looked into this, I think I know what it's going on:
From the  https://chromium-swarm.appspot.com/task?id=35c638f6e9fe6210&refresh=10&show_raw=1 :

run_telemetry_benchmark_as_googletest.py ... -browser=reference 
and later
Downloading gs://chrome-telemetry/binary_dependencies/chrome_stable_03306b04e49ed3b0c4c29da84a128d76659624f2...

Which means that:
1) we are running a binary that we download from GCS, ignoring what we build. This explains why John is so sure the isolate input is 32 bit, and why I am so sure we are running a 64 bit version.
2) why the reference build 64 bit? Are we by any chance running 64 bit on all our ref builds? that would be quite unfortunate.
3) I don't know enough about swarming, but isn't this violating core swarming principles?
Essentially we have a set of isolate inputs, but then we dynamically download and test arbitrary binary. Isn't this violating the assumption that fixed input + fixed test = same output?
To #13: that was looking at a reference build run, so the fact Telemetry downloaded the apk from cloud storage was intended.

We need to look at a non reference build run. For example:
https://chromium-swarm.appspot.com/task?id=35cab5ab5ddfa510&refresh=10

The command is:
run_telemetry_benchmark_as_googletest.py ... --browser=android-chromium

A trace from this run:
https://console.developers.google.com/m/cloudstorage/b/chrome-telemetry-output/o/trace-file-id_99-2017-04-28_00-55-30-47535.html

*Having said that, the fact the the reference apk Telemetry uses is 64 bit is also a bug.
The chrome commit hash from the trace in #14 is 8ba4e2e2f7cab6455607be4902f72379c9e49060, so I am pretty sure that for non reference benchmark runs, Telemetry is running tests against the apk stored in the isolate.
Wait, so actually what is not done right is the non-swarming run. In other words: 
* the current perf bots are 64 bit userspace running on a 64 bit kernel
* the swarming bots are 32 bit userspace running on a 64 bit kernel  

Quote:
"armeabi-v7a (and any other armv7 spelling) -> 32 bit userspace running on a 32 bit kernel
armv8l -> 32 bit userspace running on a 64 bit kernel (which is what we want for clank)
aarch64 -> 64 bit userspace running on a 64 bit kernel (which is what I expect only for webview)"


The non-swarming trace ( Android Nexus5X Perf (2)):
https://console.developers.google.com/m/cloudstorage/b/chrome-telemetry-output/o/trace-file-id_10-2017-04-26_22-08-29-38454.html
metadata shows: "aarch64"

The swarming trace ( Android Swarming N5X Tester)
https://console.developers.google.com/m/cloudstorage/b/chrome-telemetry-output/o/trace-file-id_10-2017-04-27_14-23-38-48740.html
metdata shows: "armv8l"

To make sure that I am not insane after so many clicking, people can double check with the traces link in https://chromeperf.appspot.com/report?sid=d73519308f45565e854fab851005bdf3f148195f8d6bd49665e5a21985182142
Yep, the error is in the non-swarmed bot :-(

Copying gs://chrome-perf/Android arm64 Compile/full-build-linux_0c5dce88667d9ceeed1d90daa8485aa6c6009176.zip...
https://luci-logdog.appspot.com/v/?s=chrome%2Fbb%2Fchromium.perf%2FAndroid_Nexus5X_Perf__2_%2F3617%2F%2B%2Frecipes%2Fsteps%2Fextract_build%2F0%2Fstdout
>  A trace from this run:
https://console.developers.google.com/m/cloudstorage/b/chrome-telemetry-output/o/trace-file-id_99-2017-04-28_00-55-30-47535.html

Fantastic. This trace looks great to me, 32 bit code running on 64 bit os, which is what we ship.

> To make sure that I am not insane after so many clicking, people can double check with the traces link in https://chromeperf.appspot.com/report?sid=d73519308f45565e854fab851005bdf3f148195f8d6bd49665e5a21985182142

Checked. Red (which seems fyi -> I assume non-swarming) is bad (64 bit userspace), blue (seems swarming) is good (32 bit userspace)

So if I am reading everything correctly, the issue here is the other way round. We have been running 64 bit on non-swarming on N5X for a while now (and I am sure we had this bug in the past)
And now we are seeing a difference by running 32 bit on swarming.


Thanks for all the investigation. I realized after I posted the comment that I had linked to a swarming job which was running a reference build. I looked later at a regular swarming task, like you all did, and found it was running a 32 bit build.

So, I remember setting up the Android Nexus5X bots. I switched them over to the chromium recipe, and AFAIK I was told (by someone? not sure who) to use the 64 bit build on it. It's possible that there was confusion, in that the nexus5Xs are 64 bit devices, but we should still use 32 bit builds on them.


Summary: Android Nexus5X perf bot should be using 32-bit build (was: Android swarming perf bot should be using 32-bit build)
Ok, I misunderstood what someone was saying about 64 bit builds. It's my fault they're running 64 bit builds; before I switched to the chromium recipe, they weren't. See https://chromium.googlesource.com/chromium/tools/build/+/b8c8c287878f1a60e0fb76791158435caae2de32/scripts/slave/recipe_modules/chromium_tests/chromium_perf.py#169 (target bits)

CL out to address this: https://chromium-review.googlesource.com/c/490636/


Summary: Android Nexus perf bots should be using 32-bit build (was: Android Nexus5X perf bot should be using 32-bit build)
We'll need to check the bisect bots too before we close this out.
Project Member

Comment 24 by bugdroid1@chromium.org, May 2 2017

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/tools/build/+/7e9590a51289827d249a9794913e6918df4d76fc

commit 7e9590a51289827d249a9794913e6918df4d76fc
Author: Simon <simonhatch@chromium.org>
Date: Tue May 02 15:55:57 2017

Bisect - Switch Nexus5X to 32bit builder.

TBR=dtu@chromium.org
Bug:  chromium:714110 
Change-Id: If5a27306a2b6ba26b37258a9fbfe608468fe8848
Reviewed-on: https://chromium-review.googlesource.com/493487
Reviewed-by: Simon Hatch <simonhatch@chromium.org>
Commit-Queue: Simon Hatch <simonhatch@chromium.org>

[modify] https://crrev.com/7e9590a51289827d249a9794913e6918df4d76fc/scripts/slave/recipe_modules/auto_bisect_staging/bisector.py

Project Member

Comment 25 by bugdroid1@chromium.org, May 2 2017

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/tools/build/+/eee167b93de7b16b804bd212a6080d02c571c149

commit eee167b93de7b16b804bd212a6080d02c571c149
Author: Stephen Martinis <martiniss@chromium.org>
Date: Tue May 02 18:20:30 2017

Switch nexus 5x to 32 bit build

It turns out we were running tests on the nexus 5x bot with 64 bit
builds, which we shouldn't have been doing.

Bug:  714110 
Change-Id: I5ba4196774e3fddd17de1e71d08e787e0867ef9e
Reviewed-on: https://chromium-review.googlesource.com/490636
Commit-Queue: Stephen Martinis <martiniss@chromium.org>
Reviewed-by: Michael Case <mikecase@chromium.org>

[modify] https://crrev.com/eee167b93de7b16b804bd212a6080d02c571c149/scripts/slave/recipe_modules/auto_bisect/bisector.py
[modify] https://crrev.com/eee167b93de7b16b804bd212a6080d02c571c149/scripts/slave/recipes/chromium.expected/full_chromium_perf_Android_Nexus5X_Perf__2_.json
[modify] https://crrev.com/eee167b93de7b16b804bd212a6080d02c571c149/scripts/slave/recipes/chromium.expected/full_chromium_perf_Android_arm64_Compile.json
[modify] https://crrev.com/eee167b93de7b16b804bd212a6080d02c571c149/scripts/slave/recipes/chromium.expected/full_chromium_perf_Android_Compile.json
[modify] https://crrev.com/eee167b93de7b16b804bd212a6080d02c571c149/scripts/slave/recipes/chromium.expected/full_chromium_perf_Android_Nexus5X_Perf__1_.json
[modify] https://crrev.com/eee167b93de7b16b804bd212a6080d02c571c149/scripts/slave/recipe_modules/chromium_tests/chromium_perf.py
[modify] https://crrev.com/eee167b93de7b16b804bd212a6080d02c571c149/scripts/slave/recipes/chromium.expected/full_chromium_perf_Android_Nexus5X_Perf__3_.json

This seems fixed now, memory measurements correlate pretty well now:
https://chromeperf.appspot.com/report?sid=d73519308f45565e854fab851005bdf3f148195f8d6bd49665e5a21985182142&start_rev=465994&end_rev=469268

Thanks Stephen!
Project Member

Comment 27 by sheriffbot@chromium.org, Jul 19 2017

Labels: Hotlist-Google
Owner: ----
Status: Fixed (was: Started)

Sign in to add a comment