Device flakiness on chromium.perf: Android Galaxy S5 |
||||||||||||||
Issue descriptionLink to buildbot status page: https://build.chromium.org/p/chromium.perf/builders/Android%20Galaxy%20S5%20Perf%20%282%29 32089721b6a351d1: missing [logdog]
,
Aug 3 2016
Can this wait until next week? bpastene has a script that creates a bug with all the offline phones every monday and Hwops handles it. If it can wait, then this bot will get fixed sometime next week.
,
Aug 4 2016
(Similarly to issue 634054 ) We're seeing lots of device issues (purple) on Android Galaxy S5: Android Galaxy S5 Perf (1): https://build.chromium.org/p/chromium.perf/builders/Android%20Galaxy%20S5%20Perf%20%281%29/builds/3434 Android Galaxy S5 Perf (2): https://build.chromium.org/p/chromium.perf/builders/Android%20Galaxy%20S5%20Perf%20%282%29/builds/3004 Android Galaxy S5 Perf (3): https://build.chromium.org/p/chromium.perf/builders/Android%20Galaxy%20S5%20Perf%20%283%29/builds/2842 Could someone look at them? #2: +cc sullivan. No, I don't think this can wait. We need the bots to run benchmarks constantly to make sure that Chrome doesn't regress performance. Could the script run more often? Every day? Once an hour?
,
Aug 4 2016
Issue 634027 has been merged into this issue.
,
Aug 5 2016
Ping. Android Galaxy S5 Perf (1) still has some devices offline: https://build.chromium.org/p/chromium.perf/builders/Android%20Galaxy%20S5%20Perf%20%281%29/builds/3446
,
Aug 5 2016
Re: Android Galaxy S5 Perf (1) see https://bugs.chromium.org/p/chromium/issues/detail?id=634027 I thought one of the major benefits of having multiple devices was to have redundancy? I'll have a look at Android Galaxy S5 Perf (2) later on today.
,
Aug 5 2016
The android perf tests really have no redunduncy for when a device goes offline. When a device fails, all tests allocated to that device will not run. This is because we need the same tests to be run on the same device between runs. Different devices yield different values from the same test; so we cannot to between run comparisons if we are running on different devices.
,
Aug 5 2016
Understood. What I notice with the Galaxy's is that they tend to "come and go"on their own. In Android Galaxy S5 Perf (1) I unplugged one live device and one that was missing came back. Reconnect that one and the missing device goes missing again. So there is extreme flakiness. Trying to narrow down the culprit(s) is a cat and mouse game. Side note. It almost appears to be that trying to reset a particular device in the device_recovery step does more harm than good stability wise? It's definitely the most fragile platform you have.
,
Aug 8 2016
Ping on this--it looks like these devices are still offline: https://build.chromium.org/p/chromium.perf/builders/Android%20Galaxy%20S5%20Perf%20%282%29 Device 32085d1787be514b Device 32089721b6a351d1
,
Aug 9 2016
Another ping: build21-b1 only has 5 devices connected: https://build.chromium.org/p/chromium.perf/builders/Android%20Galaxy%20S5%20Perf%20%281%29 build23-b1 has device 3208d22daac2518f blacklisted: https://build.chromium.org/p/chromium.perf/builders/Android%20Galaxy%20S5%20Perf%20%283%29/builds/2888
,
Aug 9 2016
,
Aug 9 2016
Changing title of bug since we are all over the map. Side note. The device_recovery step is what seems to be hosing these S5's. Re: build21-b1 see https://bugs.chromium.org/p/chromium/issues/detail?id=634027 for context. The 5 devices have been stable. Yesterday I flash the two replacements and let them charge over night (these don't support bc). New devices are 32082067745c515f & 3208df23b0c251e1. So the full complement is now Checking 3208851faca351f3... samsung/k3gxx/k3g:5.0/LRX21T/G900HXXE1BOH4:eng/test-keys Checking 3208e0600bb251f3... samsung/k3gxx/k3g:5.0/LRX21T/G900HXXE1BOH4:eng/test-keys Checking 3208cf5e05b2517f... samsung/k3gxx/k3g:5.0/LRX21T/G900HXXE1BOH4:eng/test-keys Checking 320861234c117165... samsung/k3gxx/k3g:5.0/LRX21T/G900HXXE1BOH4:eng/test-keys Checking 3208584f952c61ef... samsung/k3gxx/k3g:5.0/LRX21T/G900HXXE1BOH4:eng/test-keys Checking 32082067745c515f... samsung/k3gxx/k3g:5.0/LRX21T/G900HXXE1BOH4:eng/test-keys Checking 3208df23b0c251e1... samsung/k3gxx/k3g:5.0/LRX21T/G900HXXE1BOH4:eng/test-keys build22-b1: device_recovery at play here: I'm going to switch out the hub to see if the situation improves. https://uberchromegw.corp.google.com/i/chromium.perf/builders/Android%20Galaxy%20S5%20Perf%20%282%29/builds/3060 all good https://uberchromegw.corp.google.com/i/chromium.perf/builders/Android%20Galaxy%20S5%20Perf%20%282%29/builds/3061 device_recovery blacklists one for "USB failure" https://uberchromegw.corp.google.com/i/chromium.perf/builders/Android%20Galaxy%20S5%20Perf%20%282%29/builds/3062 all good https://uberchromegw.corp.google.com/i/chromium.perf/builders/Android%20Galaxy%20S5%20Perf%20%282%29/builds/3063 device_recovery blacklists 4 devices for offline/missing/offline/usb failure https://uberchromegw.corp.google.com/i/chromium.perf/builders/Android%20Galaxy%20S5%20Perf%20%282%29/builds/3064 all good. I'll go back to build23-b1 after I'm done with the hub swap on build22-b1.
,
Aug 9 2016
build22-b1. Hub and cables replaced. Effective starting with https://uberchromegw.corp.google.com/i/chromium.perf/builders/Android%20Galaxy%20S5%20Perf%20%282%29/builds/3065
,
Aug 9 2016
build23-b1: Same device_recovery induced flakiness here. Devices sometimes flagged as missing/offline/USB error. Replacing the hub/cables on this slave.
,
Aug 9 2016
build23-b1: hub and usb cables replaced. Effective starting with https://build.chromium.org/p/chromium.perf/builders/Android%20Galaxy%20S5%20Perf%20%283%29/builds/2891
,
Aug 10 2016
Revisiting build21-b1: Replaced what looks like flakey devices 320861234c117165 & 3208cf5e05b2517f with 32085a73842c615b and 3208e0a226fa51b7 This will be effective starting with https://uberchromegw.corp.google.com/i/chromium.perf/builders/Android%20Galaxy%20S5%20Perf%20%281%29/builds/3499
,
Aug 10 2016
Looking at build22-b1: +cc jbudorick, stip for some input. Here the device_recovery step seems to blacklist random S5's based on USB failures. It does report offline devices correctly. I've already swapped out the hub and cables. This step I believe is providing a false negative in a lot of cases. A couple of examples: https://build.chromium.org/p/chromium.perf/builders/Android%20Galaxy%20S5%20Perf%20%282%29/builds/3075/steps/device_recovery/logs/stdio 32081d5f765c510d,3208531995be5145,3208dd33a9c25169,32085d1787be514b blacklisted due to "USB failure" and the very next build https://build.chromium.org/p/chromium.perf/builders/Android%20Galaxy%20S5%20Perf%20%282%29/builds/3076/steps/device_recovery/logs/stdio 32081d5f765c510d blacklisted due to "USB failure" The next build it's blacklisting 3208dd33a9c25169 for the same reason. https://build.chromium.org/p/chromium.perf/builders/Android%20Galaxy%20S5%20Perf%20%282%29/builds/3077/steps/device_recovery/logs/stdio
,
Aug 12 2016
Issue 637277 has been merged into this issue.
,
Aug 12 2016
On build22-b1, 32089721b6a351d1 seems to be a continuous bad apple. Just replaced it with 320851777626611b. This will effective starting with https://uberchromegw.corp.google.com/i/chromium.perf/builders/Android%20Galaxy%20S5%20Perf%20%282%29/builds/3114
,
Aug 19 2016
bpastene: can you take a look at these bots as well? I got confused by the comments in bug 638679, these are the ones that have been down for 2 weeks (other ones are critical for BattOr testing).
,
Aug 19 2016
,
Aug 19 2016
I talked to Peter in the hallway yesterday. He mentioned that he's going to talk to stip/ben about this.
,
Aug 22 2016
,
Aug 22 2016
Issue 639887 has been merged into this issue.
,
Aug 22 2016
Issue 638743 has been merged into this issue.
,
Aug 22 2016
Issue 638739 has been merged into this issue.
,
Aug 25 2016
On Android Galaxy S5 Perf (1) I noticed that 32082067745c515f is flakey. Just replaced it with 3208e623c90051f7. Effective starting with https://uberchromegw.corp.google.com/i/chromium.perf/builders/Android%20Galaxy%20S5%20Perf%20%281%29/builds/3640
,
Aug 26 2016
I'm seeing two more phones down on "Android Galaxy S5 Perf (3)"; 3208cd5005b25183 and 32089f2db2a351c5. Is it useful to report these in here? Should I file a separate bug? (I'm reporting these because I'm on the perf bot rotation today)
,
Aug 26 2016
Might as well file it here. These devices comes and go depending on how the device_recovery step treats them.
,
Sep 2 2016
An update. https://codereview.chromium.org/2295933002 was landed that disabled usb resets in the device_recovery step. Yesterday I cleaned up the actual stale devices on the slaves so now they report all devices available. Let's see what happens over the weekend. Effective starting with: https://uberchromegw.corp.google.com/i/chromium.perf/builders/Android%20Galaxy%20S5%20Perf%20%281%29/builds/3761 https://uberchromegw.corp.google.com/i/chromium.perf/builders/Android%20Galaxy%20S5%20Perf%20%282%29/builds/3387 https://uberchromegw.corp.google.com/i/chromium.perf/builders/Android%20Galaxy%20S5%20Perf%20%283%29/builds/3160
,
Sep 19 2016
It looks like this is still continuing: https://uberchromegw.corp.google.com/i/chromium.perf/builders/Android%20Galaxy%20S5%20Perf%20(1) What the heck are we supposed to do here? Samsung Galaxy S5 Perf (1) hasn't had a green run in the last 200 runs (https://uberchromegw.corp.google.com/i/chromium.perf/builders/Android%20Galaxy%20S5%20Perf%20%281%29?numbuilds=400). Nor has Samsung Galaxy S5 Perf (2) (https://uberchromegw.corp.google.com/i/chromium.perf/builders/Android%20Galaxy%20S5%20Perf%20%282%29?numbuilds=200) or Samsung Galaxy S5 Perf (3) (https://uberchromegw.corp.google.com/i/chromium.perf/builders/Android%20Galaxy%20S5%20Perf%20%283%29?numbuilds=200). Even attempting to keep these up is a pretty big burden on the perfbot health sheriffs and infra labs. sullivan@ and nednguyen@, any idea what we should do here?
,
Sep 19 2016
John, did we turn back on USB Resetting for all android bot? If so, we could try making it so just samsung devices do not reset USB.
,
Sep 19 2016
From https://luci-logdog.appspot.com/v/?s=chromium%2Fbb%2Fchromium.perf%2FAndroid_Galaxy_S5_Perf__1_%2F4061%2F%2B%2Frecipes%2Fsteps%2Fv8.browsing_mobile_ignition%2F0%2Fstdout: CRITICAL:root:STDERR: [0917/135846:ERROR:host_forwarder_main.cc(392)] ERROR: Connection to device failed.ERROR: Existing controllers:ERROR: 42931:43241 This appears to be device 3208e0600bb251f3, which is not in our weekly ticket (https://gutsv3.corp.google.com/#ticket/23252763). bpastene@, can you investigate why we're not flagging this?
,
Sep 19 2016
Our usb story is not consistent. Let me fix that up. On build21-b1 (Android Galaxy S5 Perf (1)) the devices are connected to a usb 2.0 hub and onto a usb 2.0 host controller. These devices appear to be more stable? build22-b1 (Android Galaxy S5 Perf (2)) and build23-b1 (Android Galaxy S5 Perf (3)) the devices are connected to a usb 3.0 hub and onto a usb 2.0 host controller. The devices here a much more flakey. As a test I'm going to switch build23-b1 to a host that supports usb 3.0
,
Sep 19 2016
#32: I turned it back on for chromium.perf in https://codereview.chromium.org/2318203002
,
Sep 19 2016
Re #33: Because device_status on that build saw all devices as healthy: https://uberchromegw.corp.google.com/i/chromium.perf/builders/Android%20Galaxy%20S5%20Perf%20%281%29/builds/4061/steps/device_status/logs/json.output Look at that device's steps on that build. It runs fine, but the forwarder always crashes after pulling /proc/net/tcp. Whoever owns the forwarder should take a look.
,
Sep 19 2016
While this does look like a forwarder issue (so I'm self-assigning), we only pull /proc/net/tcp as part of failure diagnosis. It's not the cause of the failure.
,
Sep 22 2016
Given the recent runs on this bot, it looks like this might be a device issue and a forwarder issue. Will stop by the lab tomorrow.
,
Sep 22 2016
No clear sign of device malfeasance this morning in the lab.
,
Sep 22 2016
Going to have to try to catch this in the middle of a run in which it's failing to forward to grab the host forwarder daemon log. Unfortunately, 251f3 is blacklisted in the current run.
,
Sep 23 2016
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/883442d271d1257505fb99d0802b5a1a0c201d51 commit 883442d271d1257505fb99d0802b5a1a0c201d51 Author: catapult-deps-roller <catapult-deps-roller@chromium.org> Date: Fri Sep 23 01:28:56 2016 Roll src/third_party/catapult/ b803018ac..7bd10eda4 (9 commits). https://chromium.googlesource.com/external/github.com/catapult-project/catapult.git/+log/b803018ac776..7bd10eda47f1 $ git log b803018ac..7bd10eda4 --date=short --no-merges --format='%ad %ae %s' 2016-09-22 nednguyen Make use_live_traffic in FakeNetworkController default to False 2016-09-22 jbudorick [Android] Attempt to grab the forwarder daemon logs on map failure. 2016-09-22 aiolos Remove warning when a ref build is set as monitored. 2016-09-22 charliea [trace model] Add .range accessor for Event 2016-09-22 nednguyen [Telemetry] Enable typ's discovery flags for telemetry's unittest_runner framework 2016-09-22 sullivan Add ability to query for test patterns of length 8. 2016-09-22 bccheng Explicitly initialize the network controller 2016-09-22 nednguyen Add logging to _FileLock to debug race condition when multiple processes download a same file 2016-09-22 nednguyen [Telemetry] Start ts_proxy_server with host=None when --use-live-site flag is enabled BUG= 634052 , 643649 ,647340,643320 CQ_INCLUDE_TRYBOTS=master.tryserver.chromium.android:android_optional_gpu_tests_rel TBR=catapult-sheriff@chromium.org Review-Url: https://codereview.chromium.org/2365803002 Cr-Commit-Position: refs/heads/master@{#420534} [modify] https://crrev.com/883442d271d1257505fb99d0802b5a1a0c201d51/DEPS
,
Sep 23 2016
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/cc5e9337589e6bb100b7a0c5d9493557c11b379c commit cc5e9337589e6bb100b7a0c5d9493557c11b379c Author: jbudorick <jbudorick@chromium.org> Date: Fri Sep 23 19:43:03 2016 [Android] Add device serial to all host_controller log messages. BUG= 634052 Review-Url: https://codereview.chromium.org/2366763003 Cr-Commit-Position: refs/heads/master@{#420699} [modify] https://crrev.com/cc5e9337589e6bb100b7a0c5d9493557c11b379c/tools/android/forwarder2/host_controller.cc [modify] https://crrev.com/cc5e9337589e6bb100b7a0c5d9493557c11b379c/tools/android/forwarder2/host_controller.h [modify] https://crrev.com/cc5e9337589e6bb100b7a0c5d9493557c11b379c/tools/android/forwarder2/host_forwarder_main.cc
,
Sep 24 2016
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/2e70970b32c2061fc14fa37a40276f484d772287 commit 2e70970b32c2061fc14fa37a40276f484d772287 Author: catapult-deps-roller <catapult-deps-roller@chromium.org> Date: Sat Sep 24 09:29:34 2016 Roll src/third_party/catapult/ a8deb272b..efbf303a5 (1 commit). https://chromium.googlesource.com/external/github.com/catapult-project/catapult.git/+log/a8deb272b550..efbf303a5360 $ git log a8deb272b..efbf303a5 --date=short --no-merges --format='%ad %ae %s' 2016-09-23 jbudorick [devil] update the forwarder binaries. BUG= 634052 CQ_INCLUDE_TRYBOTS=master.tryserver.chromium.android:android_optional_gpu_tests_rel TBR=catapult-sheriff@chromium.org Review-Url: https://codereview.chromium.org/2368813002 Cr-Commit-Position: refs/heads/master@{#420838} [modify] https://crrev.com/2e70970b32c2061fc14fa37a40276f484d772287/DEPS
,
Sep 27 2016
,
Sep 30 2016
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/3d77e97fb46b4f0a9f9255c30b617abf6721ad33 commit 3d77e97fb46b4f0a9f9255c30b617abf6721ad33 Author: jbudorick <jbudorick@chromium.org> Date: Fri Sep 30 14:59:17 2016 [Android] Add --unmap-all to forwarder2. In some scenarios (e.g., single-device restart), we want to unmap all ports forwarded from a given device up to the host and clear the existing cached adb port for that device. We want to be able to do this even if the calling process doesn't know all of those ports. This change adds the --unmap-all command to forwarder2 to support such use cases. BUG= 634052 , 650674 Review-Url: https://codereview.chromium.org/2381063004 Cr-Commit-Position: refs/heads/master@{#422113} [modify] https://crrev.com/3d77e97fb46b4f0a9f9255c30b617abf6721ad33/tools/android/forwarder2/host_forwarder_main.cc
,
Oct 1 2016
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/dccd754c3b5cc5be5c809ffd6a9b742053f25c76 commit dccd754c3b5cc5be5c809ffd6a9b742053f25c76 Author: jbudorick <jbudorick@chromium.org> Date: Sat Oct 01 01:51:20 2016 [Android] Run shell commands from the forwarder without passing fds. The forwarder daemon was running commands with system(). This would give the newly forked process copies of the same file handles held by the daemon, notably including the unix domain socket. If the adb server wasn't already running and the daemon called an adb command, the adb server would be forked from the adb client process with those same file handles -- including the unix domain socket. This would interfere both with shutting down the host forwarder daemon (as we'd see the unix domain socket still held by the adb server) and with subsequent attempts to bring it up (same reason). BUG= 634052 , 650674 Review-Url: https://codereview.chromium.org/2374183008 Cr-Commit-Position: refs/heads/master@{#422263} [modify] https://crrev.com/dccd754c3b5cc5be5c809ffd6a9b742053f25c76/tools/android/forwarder2/host_forwarder_main.cc
,
Oct 1 2016
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/84526ade9b6d246a8834309d0519d2255c0db91d commit 84526ade9b6d246a8834309d0519d2255c0db91d Author: catapult-deps-roller <catapult-deps-roller@chromium.org> Date: Sat Oct 01 08:06:33 2016 Roll src/third_party/catapult/ f00b66029..507bed462 (2 commits). https://chromium.googlesource.com/external/github.com/catapult-project/catapult.git/+log/f00b66029517..507bed4626dd $ git log f00b66029..507bed462 --date=short --no-merges --format='%ad %ae %s' 2016-09-30 jbudorick [telemetry] Update {device,host}_forwarder binaries. 2016-09-30 jbudorick [devil] Use --unmap-all in Forwarder.UnmapAllDevicePorts. BUG= 634052 , 650674 , 634052 , 650674 CQ_INCLUDE_TRYBOTS=master.tryserver.chromium.android:android_optional_gpu_tests_rel TBR=catapult-sheriff@chromium.org Review-Url: https://codereview.chromium.org/2378773016 Cr-Commit-Position: refs/heads/master@{#422308} [modify] https://crrev.com/84526ade9b6d246a8834309d0519d2255c0db91d/DEPS
,
Oct 3 2016
Issue 652251 has been merged into this issue.
,
Oct 3 2016
The following revision refers to this bug: https://chrome-internal.googlesource.com/chrome-golo/chrome-golo.git/+/b303c1e5595e356c281e0d066934a50225fe7dc0 commit b303c1e5595e356c281e0d066934a50225fe7dc0 Author: pschmidt <pschmidt@google.com> Date: Mon Oct 03 19:22:19 2016
,
Oct 6 2016
Issue 638404 has been merged into this issue.
,
Oct 6 2016
Re #51: I've been fiddling with that bot; don't look at it for an indication of how Galaxy's are performing in the lab.
,
Oct 6 2016
#51: beware that this issue is for the S5 on chromium.perf, not chromium.perf.fyi.
,
Oct 10 2016
,
Oct 24 2016
The following revision refers to this bug: https://chromium.googlesource.com/chromium/tools/build.git/+/284d0f102f55bf7c629a297d89c04b8fde110020 commit 284d0f102f55bf7c629a297d89c04b8fde110020 Author: martiniss <martiniss@chromium.org> Date: Mon Oct 24 21:03:51 2016 Disable galaxy and new mac perf bots for SOM These should be re-enabled once they're sheriffable BUG= 634052 , 639530 Review-Url: https://codereview.chromium.org/2286973002 [modify] https://crrev.com/284d0f102f55bf7c629a297d89c04b8fde110020/scripts/slave/gatekeeper.json
,
Oct 24 2016
The following revision refers to this bug: https://chromium.googlesource.com/chromium/tools/build.git/+/284d0f102f55bf7c629a297d89c04b8fde110020 commit 284d0f102f55bf7c629a297d89c04b8fde110020 Author: martiniss <martiniss@chromium.org> Date: Mon Oct 24 21:03:51 2016 Disable galaxy and new mac perf bots for SOM These should be re-enabled once they're sheriffable BUG= 634052 , 639530 Review-Url: https://codereview.chromium.org/2286973002 [modify] https://crrev.com/284d0f102f55bf7c629a297d89c04b8fde110020/scripts/slave/gatekeeper.json
,
Nov 15 2016
|
||||||||||||||
►
Sign in to add a comment |
||||||||||||||
Comment 1 by benhenry@chromium.org
, Aug 3 2016