Issue metadata
Sign in to add a comment
|
chromium.perf/Android One Perf generating multiple chrome processes |
||||||||||||||||||||||
Issue descriptionmemory.top_10_mobile/http_www_amazon_com_gp_aw_s_k_nexus in memory.top_10_mobile failing on chromium.perf/Android One Perf Builders failed on: - Android One Perf: https://build.chromium.org/p/chromium.perf/builders/Android%20One%20Perf
,
Nov 15 2017
It seems to be fairly flaky and a different failing test each time @Ned, could this have to do with the "at most one instance of process org.chromium.chrome expected" exception above? or is that a new fairly common thing in logs that isn't an actual failure cause?
,
Nov 15 2017
It looks like this might be a telemetry bug so I'm going to merge all the related failures to one bug. v8.runtimestats.browsing_mobile failing on chromium.perf/Android One Perf has the same failure
,
Nov 15 2017
Issue 785433 has been merged into this issue.
,
Nov 16 2017
I've seen this a few times in the past, not sure how two Chrome browser processes end up being alive in the device. This is the relevant bit: Exception: At most one instance of process org.chromium.chrome expected but found 'org.chromium.chrome': ['14627', '14765'] +primiano, any clue of why could this be happening? I've always assumed there can only be at most one browser process at any given time?
,
Nov 16 2017
,
Nov 17 2017
,
Nov 20 2017
Chatted offline with primiano about this one. I'll try to see if we can reproduce locally or, failing that, convince the bots to give us more logcat/traces to help debugging.
,
Dec 4 2017
Issue 791138 has been merged into this issue.
,
Dec 4 2017
Sorry, this has been slipping from my task queue. Will try to get on it tomorrow.
,
Dec 5 2017
The following revision refers to this bug: https://chromium.googlesource.com/catapult/+/f8c2071f10ad72688b1634471e51e18e2980ab5d commit f8c2071f10ad72688b1634471e51e18e2980ab5d Author: Juan Antonio Navarro Perez <perezju@google.com> Date: Tue Dec 05 17:03:21 2017 [Telemetry] Raise AppCrashException on multiple browser PIDs Hoping this will help debug crbug.com/785446 . Also taking the chance to replace GetPids with the more specific GetApplicationPids. Bug: chromium:785446 Change-Id: Iad57935c11c6302a85a318e31bbeb5a4bfb97c6f Reviewed-on: https://chromium-review.googlesource.com/808927 Reviewed-by: John Budorick <jbudorick@chromium.org> Reviewed-by: Ned Nguyen <nednguyen@google.com> Commit-Queue: Juan Antonio Navarro Pérez <perezju@chromium.org> [modify] https://crrev.com/f8c2071f10ad72688b1634471e51e18e2980ab5d/devil/devil/android/device_utils.py [modify] https://crrev.com/f8c2071f10ad72688b1634471e51e18e2980ab5d/telemetry/telemetry/internal/backends/chrome/android_browser_backend.py
,
Dec 6 2017
Got more logs and debug info from: https://logs.chromium.org/v/?s=chrome%2Fbb%2Fchromium.perf%2FAndroid_One_Perf%2F816%2F%2B%2Frecipes%2Fsteps%2Fmemory.top_10_mobile_on_Android%2F0%2Fstdout Full logcat of one such failure attached. Relevant highlights here: AppCrashException: Error getting browser PID: (device: AG860440GCE22GC) Expected a single PID for 'org.chromium.chrome' but found: ['22181', '22322']. [ we see the browser starting with pid 22181 ] I/ActivityManager( 523): Start proc 22181:org.chromium.chrome/u0a74 for activity org.chromium.chrome/com.google.android.apps.chrome.Main I/cr_CommandLineInitUtil(22181): Initializing command line from alternative file /data/local/tmp/chrome-command-line I/cr_InstantAppsHandler(22181): Not handling with Instant Apps because Chrome is not default or there's a specialized handler [ and then a bit later 22322 comes up ] I/dex2oat (22322): /system/bin/dex2oat --runtime-arg -classpath --runtime-arg --instruction-set=arm --instruction-set-features=div --runtime-arg -Xrelocate --boot-image=/system/framework/boot.art --dex-file=/data/data/org.chromium.chrome/app_dex/webapk4.dex --oat-fd=72 --oat-location=/data/data/org.chromium.chrome/app_dex/arm/webapk4.dex --runtime-arg -Xms64m --runtime-arg -Xmx512m --compiler-filter=space D/AndroidRuntime(22327): >>>>>> START com.android.internal.os.RuntimeInit uid 0 <<<<<< D/AndroidRuntime(22327): CheckJNI is OFF E/memtrack(22327): Couldn't load memtrack module (No such file or directory) E/android.os.Debug(22327): failed to load memtrack module: -2 W/dex2oat (22322): Compilation of void a.a.a.a.a.<clinit>() took 103.004ms D/AndroidRuntime(22327): Calling main entry com.android.commands.uiautomator.Launcher W/dex2oat (22322): Compilation of void org.chromium.build.BuildHooks.<init>() took 115.562ms W/dex2oat (22322): Compilation of void org.chromium.webapk.lib.runtime_library.IWebApkApi$Stub$Proxy.cancelNotification(java.lang.String, int) took 111.198ms I/dex2oat (22322): dex2oat took 498.897ms (threads: 4) arena alloc=107KB java alloc=14KB native alloc=895KB free=24KB
,
Dec 6 2017
+Pasko & Maria in case they have some clues of what's going on
,
Dec 6 2017
This runs dex2oat (the ART ahead of time compiler) for some parts of Chrome that were not compiled before. I think Telemetry might mistakenly pick up this PID as the one belonging to Chrome because the package name (org.chromium.chrome) is part of the commandline. https://xkcd.com/208/
,
Dec 6 2017
+jbudorick well, this was probably bound to happen at some point or another. Our "GetPids" gets the output of "adb shell ps" and then for each line [1]: ps_data = line.split() pid, process = ps_data[1], ps_data[-1] which will of course fail to detect the correct "process" name if there are spaces on it. This gets trickier because the output of that command on different Android uses sometimes (slightly) different layouts (number of columns). Thinking now what would be the more robust fix for this ... [1]: https://github.com/catapult-project/catapult/blob/a9cf2fa54a3334ff6834e0598c782156e0b3d364/devil/devil/android/device_utils.py#L2299
,
Dec 6 2017
Yeah the state of "ps" and all busybox on android keeps changing on every release. In recent version of android ps behavior even changes depending on whether is invoked with a tty or without (b/33475740) IMHO we should stop depending on any android binary and have our own busybox (checked in tools/android), so we can stop having all these workarounds. Even better we could have something that just spits out json, so we could stop even doing the parsing. We are already in the state where we push md5sum, memtrack_helper, and who knows what else. Maybe we could consolidate them into one binary, so to have: chrome_android_tools md5 "filename" chrome_android_tools memtrackhelper start chrome_android_tools ps but all this sounds a more medium-term solution. I guess in the meantime we have to figure out something to fix this. bonus point to pasko for the xkcd link :P
,
Dec 6 2017
> Even better we could have something that just spits out json, so we could stop even doing the parsing. That would be an awesome mid to long-term project.
,
Dec 6 2017
I *think* the problem is that on this device the 'ps' command outputs something containing "org.chromium.chrome" instead of what we thin it should do: "/system/bin/dex2oat". Seeing the exact line printed by 'ps' on this device would help finding a narrow workaround.
,
Dec 6 2017
Yep, I have a CL ready to do exactly that: https://chromium-review.googlesource.com/c/catapult/+/810789
,
Dec 6 2017
The following revision refers to this bug: https://chromium.googlesource.com/catapult/+/c30328384381cd19ea20ac994b07036a0774ae1c commit c30328384381cd19ea20ac994b07036a0774ae1c Author: Juan Antonio Navarro Perez <perezju@google.com> Date: Wed Dec 06 15:36:15 2017 [Telemetry] Dump output of "ps" when failing to find PID Adding some more debug info to help confirm our current theory for the cause of this bug. Bug: chromium:785446 Change-Id: I1e59abfc0dc7e296122f4694558815de399ece29 Reviewed-on: https://chromium-review.googlesource.com/810789 Reviewed-by: Ned Nguyen <nednguyen@google.com> Commit-Queue: Juan Antonio Navarro Pérez <perezju@chromium.org> [modify] https://crrev.com/c30328384381cd19ea20ac994b07036a0774ae1c/telemetry/telemetry/internal/backends/chrome/android_browser_backend.py
,
Dec 6 2017
#14: https://xkcd.com/1171/ Android's changes to busybox, then toybox, have certainly been a headache for us, but I'm wary of rolling our own set of command-line tools given the maintenance burden. I'm curious to see what ps is seeing in this case.
,
Dec 7 2017
Issue 788785 has been merged into this issue.
,
Dec 7 2017
#20 just rolled into the bot and, luckily, this failure appeared pretty often while running smoothness.sync_scroll.key_mobile_sites_smooth: https://logs.chromium.org/v/?s=chrome%2Fbb%2Fchromium.perf%2FAndroid_One_Perf%2F820%2F%2B%2Frecipes%2Fsteps%2Fsmoothness.sync_scroll.key_mobile_sites_smooth_on_Android%2F0%2Fstdout One of the instances of the error says: Expected a single PID for 'org.chromium.chrome' but found: ['4212', '4308'] And the relevant bits of the ps output show: USER PID PPID VSIZE RSS WCHAN PC NAME [...] u0_a74 4212 156 814964 74556 ffffffff b6e30f78 S org.chromium.chrome u0_i3 4243 156 790568 44736 ffffffff b6e30f78 S org.chromium.chrome:sandboxed_process0 u0_a74 4308 4212 810496 48424 00000000 98abbbee R org.chromium.chrome [ So it wasn't bad parsing or a bad regex after all, ha! ] Now, since I run "ps" twice (the first triggers the error, the second to dump its output to the logs), there is a small time gap between the two. And this, in fact, is a good thing since it tells us a bit more of the story. In some *other* instance of the error, we see that the name of the "second" PID we found for Chrome eventually does change to /system/bin/dex2oat: Expected a single PID for 'org.chromium.chrome' but found: ['3565', '3668']. USER PID PPID VSIZE RSS WCHAN PC NAME [...] u0_a74 3565 156 815012 74552 ffffffff b6e30f78 S org.chromium.chrome u0_i1 3596 156 790556 44752 ffffffff b6e30f78 S org.chromium.chrome:sandboxed_process0 u0_a74 3668 3565 6640 220 c011570c b6fbb0cc D /system/bin/dex2oat [ For those, like me, wanting a reminder of process status codes, those are: S - Interruptible sleep, R - Running, D - Uninterruptible sleep ] I see two workarounds possible: (a) If we find two PIDs for org.chromium.chrome, but one is the parent (PPID) of the other, choose the parent as "the" correct PID. (b) If we find two PIDs for org.chromium.chrome, wait a bit and retry. Hopefully one of those will eventually go away. Other ideas? Any preference?
,
Dec 7 2017
To #23: I think option (a) seems solid. Though in your example, you know what does '156' belong too? Maybe we just search for all chrome processes that are children of that process?
,
Dec 7 2017
Okay, so we indeed can have PPID(process) == PID(browser_process), and I think only in the circumstance we fork and prefetch the native library [1], which is Android-only. This child process might stay up for a few seconds, and then would go away. As Juan noticed, it is a _different_ case from the dex2oat. I'm +1 to matching PID and PPID (the option (a)). Changing from Chrome to dex2oat is something I never heard of (exec?), is there a logcat for this? [1] NativeLibraryPrefetcher::ForkAndPrefetchNativeLibrary https://codesearch.chromium.org/chromium/src/base/android/library_loader/library_prefetcher.cc?l=253
,
Dec 7 2017
Where would you implement option (a)? android_browser_backend or device_utils?
,
Dec 7 2017
Re #24: what does '156' belong too? root 156 1 663320 21372 ffffffff b6e31b00 S zygote Re #25: hmm, ... Ok I am a bit baffled too. Attaching logcat of the first case above. Reproducing highlights from it again: Expected a single PID for 'org.chromium.chrome' but found: ['4212', '4308'] USER PID PPID VSIZE RSS WCHAN PC NAME [...] u0_a74 4212 156 814964 74556 ffffffff b6e30f78 S org.chromium.chrome u0_i3 4243 156 790568 44736 ffffffff b6e30f78 S org.chromium.chrome:sandboxed_process0 u0_a74 4308 4212 810496 48424 00000000 98abbbee R org.chromium.chrome u0_a74 4311 4212 0 0 c0074ba8 00000000 Z main In the logs I find no reference of that PID 4308, but 4311 is dex2oat; note both are children of the main chrome process. I/dex2oat ( 4311): /system/bin/dex2oat --runtime-arg -classpath --runtime-arg --instruction-set=arm --instruction-set-features=div --runtime-arg -Xrelocate --boot-image=/system/framework/boot.art --dex-file=/data/data/org.chromium.chrome/app_dex/webapk4.dex --oat-fd=163 --oat-location=/data/data/org.chromium.chrome/app_dex/arm/webapk4.dex --runtime-arg -Xms64m --runtime-arg -Xmx512m --compiler-filter=space I/dex2oat ( 4311): dex2oat took 166.523ms (threads: 4) arena alloc=102KB java alloc=14KB native alloc=1027KB free=24KB As said, full logcat attached.
,
Dec 7 2017
Re #26: That logic would go in android_browser_backend (seems Chrome-specific), however I might need to extend device_utils.GetPids to also get PPIDs.
,
Dec 7 2017
#27: race between the org.chromium.chrome child finishing and dex2oat starting? #28: SGTM
,
Dec 8 2017
> In the logs I find no reference of that PID 4308 This is expected, the fork is done in the browser process, and the Android Framework does not know about it. We could spit a log with PID number every time we do a library prefetch, if you think it'd be useful for the future. re dex2oat: in this instance we don't see dex2oat being listed in 'adb shell ps' with 'org.chromium.chrome' as a name. +yfriedman for clues on why dex2oat PID (invoked for processing of webapk4.dex) could appear in output of 'ps' named as 'org.chromium.chrome'. I is puzzling to me, I cannot imagine dex2oat calling exec(). See comment #12.
,
Dec 8 2017
On another thought, would it be better to only find 'org.chromium.chrome' which is the child of zygote? The reason is in case the actualy Chrome process dies, and there is only one 'org.chromium.chrome' from dex2oat, we would report error right away.
,
Dec 8 2017
+1 nice idea!
,
Dec 8 2017
Good point! That's much better. Next week I'll get to implement that.
,
Dec 8 2017
+pkotwicz It's puzzling to me as well. I really have no idea why dex2oat is showing up with chrome process name. #31 sounds like a good idea
,
Dec 11 2017
Where does the code for dex2oat live? If you are just fork()+exec()-ing the new process will very likely inherit the same "comm" (which is different from the executable name AND from the main-thread name, i.e. the one that you set via prctl() or pthread_setname(pthread_self())). IIRC you need to write() to /proc/self/comm to change it.
,
Dec 11 2017
Ah sorry just read #27 after posting #35. Loosk like at the end we set the right comm, but there is a race as that necessarily happens some time after the fork(). Both perezju's options in #23 sgtm.
,
Dec 12 2017
The following revision refers to this bug: https://chromium.googlesource.com/catapult/+/a31a5a7b4ab37db1629b97699b8df49692cafc99 commit a31a5a7b4ab37db1629b97699b8df49692cafc99 Author: Juan Antonio Navarro Perez <perezju@google.com> Date: Tue Dec 12 10:00:44 2017 [Devil] Provide device_utils.ListProcesses - Provide a new ListProcesses method that parses the output of the ps command into a convenient list of namedtuples. - Split the implementation into _GetPsOutput (which abscracts away differences between Android versions) and the parsing code itself. - This allows to write PsOutputCompatibilityTests, to test our assumptions about the ps output against real devices. - Rewrite GetPids and and other pid-related device_utils methods to use ListProcesses instead (happily leading to cleaner code!). Bug: chromium:785446 Change-Id: I87e58c4d6be1bcf30978f6e329a015daf9dff0ab Reviewed-on: https://chromium-review.googlesource.com/819639 Commit-Queue: Juan Antonio Navarro Pérez <perezju@chromium.org> Reviewed-by: John Budorick <jbudorick@chromium.org> [modify] https://crrev.com/a31a5a7b4ab37db1629b97699b8df49692cafc99/devil/devil/android/device_utils.py [modify] https://crrev.com/a31a5a7b4ab37db1629b97699b8df49692cafc99/devil/devil/android/device_utils_devicetest.py [modify] https://crrev.com/a31a5a7b4ab37db1629b97699b8df49692cafc99/devil/devil/android/device_utils_test.py
,
Dec 13 2017
This is also causing flakiness on the webrtc benchmark, which has failed four times in the last 50 runs from this: - https://logs.chromium.org/v/?s=chrome%2Fbb%2Fchromium.perf%2FAndroid_One_Perf%2F846%2F%2B%2Frecipes%2Fsteps%2Fwebrtc_on_Android%2F0%2Fstdout - https://logs.chromium.org/v/?s=chrome%2Fbb%2Fchromium.perf%2FAndroid_One_Perf%2F840%2F%2B%2Frecipes%2Fsteps%2Fwebrtc_on_Android%2F0%2Fstdout - https://logs.chromium.org/v/?s=chrome%2Fbb%2Fchromium.perf%2FAndroid_One_Perf%2F824%2F%2B%2Frecipes%2Fsteps%2Fwebrtc_on_Android%2F0%2Fstdout - https://logs.chromium.org/v/?s=chrome%2Fbb%2Fchromium.perf%2FAndroid_One_Perf%2F823%2F%2B%2Frecipes%2Fsteps%2Fwebrtc_on_Android%2F0%2Fstdout
,
Dec 13 2017
Also found at least one system_health.common_mobile failure stemming from this as well.
,
Dec 13 2017
Looks like it's also causing problems on loading.mobile, as well.
,
Dec 14 2017
Yes, I would expect this to affect all telemetry benchmarks. The fix is pretty much ready, just waiting for a final stamp: https://chromium-review.googlesource.com/c/catapult/+/822261
,
Dec 15 2017
Fix landed: The following revision refers to this bug: https://chromium.googlesource.com/catapult/+/846f7660e7d5004afac6052a9d7905695bc1edb7 commit 846f7660e7d5004afac6052a9d7905695bc1edb7 Author: Juan Antonio Navarro Perez <perezju@google.com> Date: Fri Dec 15 09:19:33 2017 [Telemetry] Fix browser_backend.pid for Android The old code used to expect finding a single process matching the name of the browser app. This, however, is not always reliable in cases where the browser forks its main process and, for a short period of time, two processes may be found with the same name. Instead we now look for the single process with the matching name which is also a direct descendant of the Android zygote. Bug: chromium:794178 Change-Id: Ie577e19e6b63bcb5192fbe16ef79b7cbbe09387e Reviewed-on: https://chromium-review.googlesource.com/822261 Commit-Queue: Juan Antonio Navarro Pérez <perezju@chromium.org> Reviewed-by: Ned Nguyen <nednguyen@google.com> [modify] https://crrev.com/846f7660e7d5004afac6052a9d7905695bc1edb7/telemetry/telemetry/internal/backends/chrome/android_browser_backend.py (but I filled in the wrong bug number, so copy/pasting here)
,
Dec 18 2017
This is fixed!
,
Jan 2 2018
|
|||||||||||||||||||||||
►
Sign in to add a comment |
|||||||||||||||||||||||
Comment 1 by ashleymarie@chromium.org
, Nov 15 2017