Follow up with why activity manager kills service before onServiceConnected |
||||||
Issue descriptionfollow up for crbug.com/735626
,
Jun 22 2017
Log that kills the service process: ActivityManager: Killing 21768:org.chromium.chrome:sandboxed_process0/u0a139i346 (adj 9): isolated not needed Which comes from this line: https://android.googlesource.com/platform/frameworks/base/+/0f2c1c3f4478a2fc1399b1263339df058240b611/services/core/java/com/android/server/am/ActivityManagerService.java#21156 if (app.isolated && app.services.size() <= 0) { // If this is an isolated process, and there are no // services running in it, then the process is no longer // needed. We agressively kill these because we can by // definition not re-use the same process again, and it is // good to avoid having whatever code was running in them // left sitting around after no longer needed. app.kill("isolated not needed", true); } That block of comment is really helpful. According to the comment, there is no service in the process. This process was only alive for like 5ms, so this is something (probably launching fre activity?) triggering updateOomAdjLocked before a service is loaded in the new process, and the process just got garbage collected before it's even initialized. If that's right, clearly an android bug. But hard to confirm what exactly is triggering updateOomAdjLocked, probably need to talk to some android folks.
,
Jun 22 2017
,
Jun 22 2017
it looks like it doesn't happen on m60 because Main activity directly launches FRE activity, which then calls warmup. In m59, it's Main -> ChromeTabbedActivity -> FRE, but CTA triggers warmup, so warmup races with starting FRE m59 (without fix): 06-22 16:36:03.609 662 1321 I am_restart_activity: [0,1112048536,384,org.chromium.chrome/com.google.android.apps.chrome.Main] 06-22 16:36:03.680 662 1322 I am_create_task: [0,385] 06-22 16:36:03.680 662 1322 I am_create_activity: [0,1120108088,385,org.chromium.chrome/.browser.ChromeTabbedActivity,android.intent.action.MAIN,NULL,NULL,335544320] 06-22 16:36:03.680 662 1322 I wm_task_created: [385,6] 06-22 16:36:03.680 662 1322 I wm_task_moved: [385,1,7] 06-22 16:36:03.680 662 1322 I wm_home_stack_moved: 0 06-22 16:36:03.680 662 1322 I am_pause_activity: [0,1112048536,org.chromium.chrome/com.google.android.apps.chrome.Main] 06-22 16:36:03.683 662 1321 I am_finish_activity: [0,1112048536,384,org.chromium.chrome/com.google.android.apps.chrome.Main,app-request] 06-22 16:36:03.686 662 1320 I am_restart_activity: [0,1120108088,385,org.chromium.chrome/.browser.ChromeTabbedActivity] 06-22 16:36:03.691 662 686 I am_destroy_activity: [0,1112048536,384,org.chromium.chrome/com.google.android.apps.chrome.Main,finish-idle] 06-22 16:36:04.032 141 141 I sf_frame_dur: [com.android.launcher3/com.android.launcher3.Launcher,0,4,0,2,2,0,3] 06-22 16:36:04.057 12201 12201 I am_on_resume_called: [0,org.chromium.chrome.browser.ChromeTabbedActivity] 06-22 16:36:04.067 662 673 I wm_task_removed: [384,removeTask] 06-22 16:36:04.068 662 673 I wm_task_removed: [384,removeAppToken: last token] 06-22 16:36:04.069 662 943 I wm_task_moved: [385,1,6] 06-22 16:36:04.069 662 943 I wm_home_stack_moved: 0 06-22 16:36:04.069 662 943 I am_create_activity: [0,1119824368,385,org.chromium.chrome/.browser.firstrun.FirstRunActivity,NULL,NULL,NULL,0] 06-22 16:36:04.071 662 943 I am_pause_activity: [0,1120108088,org.chromium.chrome/.browser.ChromeTabbedActivity] 06-22 16:36:04.093 662 940 I am_proc_start: [0,12228,99355,org.chromium.chrome:sandboxed_process0,service,org.chromium.chrome/org.chromium.content.app.SandboxedProcessService0] 06-22 16:36:04.118 662 880 I am_proc_bound: [0,12228,org.chromium.chrome:sandboxed_process0] 06-22 16:36:04.121 662 880 I am_kill : [0,12228,org.chromium.chrome:sandboxed_process0,9,isolated not needed] m60: 06-22 17:25:13.736 662 672 I am_proc_start: [0,19233,10140,org.chromium.chrome,activity,org.chromium.chrome/com.google.android.apps.chrome.Main] 06-22 17:25:13.758 662 673 I am_proc_bound: [0,19233,org.chromium.chrome] 06-22 17:25:13.761 662 673 I am_restart_activity: [0,1105003432,396,org.chromium.chrome/com.google.android.apps.chrome.Main] 06-22 17:25:13.868 662 673 I am_create_task: [0,397] 06-22 17:25:13.868 662 673 I am_create_activity: [0,1106098272,397,org.chromium.chrome/.browser.firstrun.FirstRunActivity,NULL,NULL,NULL,8388608] 06-22 17:25:13.868 662 673 I wm_task_created: [397,6] 06-22 17:25:13.868 662 673 I wm_task_moved: [397,1,4] 06-22 17:25:13.868 662 673 I wm_home_stack_moved: 0 06-22 17:25:13.873 662 673 I am_pause_activity: [0,1105003432,org.chromium.chrome/com.google.android.apps.chrome.Main] 06-22 17:25:13.901 662 1321 I am_finish_activity: [0,1105003432,396,org.chromium.chrome/com.google.android.apps.chrome.Main,app-request] 06-22 17:25:13.911 662 942 I am_restart_activity: [0,1106098272,397,org.chromium.chrome/.browser.firstrun.FirstRunActivity] 06-22 17:25:14.021 19233 19233 I am_on_resume_called: [0,org.chromium.chrome.browser.firstrun.FirstRunActivity] 06-22 17:25:14.079 662 941 I am_proc_start: [0,19267,99362,org.chromium.chrome:sandboxed_process0,service,org.chromium.chrome/org.chromium.content.app.SandboxedProcessService0] 06-22 17:25:14.082 662 877 I am_create_service: [0,1108755920,.GoogleLoginService,10008,19196] 06-22 17:25:14.093 662 880 I am_destroy_service: [0,1108755920,19196] 06-22 17:25:14.115 662 879 I am_proc_bound: [0,19267,org.chromium.chrome:sandboxed_process0]
,
Jun 23 2017
,
Jun 23 2017
not gonna bother with m60 * the bad timing no longer happens in m60 as described in #4 * even if bug happens, the result would not be as catastrophic since only that particular renderer launch will hang, and user can just swipe tab away. m59 blocks the whole launcher thread, so every child process launch hangs
,
Jun 23 2017
And we have a working theory * the "isolated not needed" bug exists on K, but appears to be fixed N (based on testing on vanilla nexus devices) * on K nexus device, bug does not cause a hang indefinitely because vanilla android activity manager behavior will restart a crashed service as long as it is bound (even if it's one killed by activity manager itself o_O) with a 1s delay. but after the delay, everything just keeps going as if nothing was wrong * we suspect that blu removed the "restart crashed service" behavior, maybe because they believe it will save memory or something in that case, only workaround is add a timeout and retry kind of horrifying, removing restart breaks all bound services o_O
,
Jun 29 2017
hmm, so should I just add a timeout (thinking 10s) + retry? or should I add some histograms first? I'm thinking of adding * histogram of time between first bindContext and first onServiceConnected * count of timeout and probably count of (first) bindContext as well for comparison thoughts? +jay who probably I'm ask to do reviews here
,
Jul 11 2017
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/792c113f9a19b62e2814a0eb8224572762e76880 commit 792c113f9a19b62e2814a0eb8224572762e76880 Author: Bo Liu <boliu@chromium.org> Date: Tue Jul 11 06:37:49 2017 android: Retry bind if timeout Unbind and rebind if onServiceConnection does not come after a timeout, currently set to 10s. Add a unit test, but do not enable this behavior in production yet. The current plan is then to add some UMA metrics and evaluate result to decide whether it's worth turning this on or not. BUG= 736066 Change-Id: Id06b11005c781932810b32e11592727f580ef38c Reviewed-on: https://chromium-review.googlesource.com/562653 Commit-Queue: Bo Liu <boliu@chromium.org> Reviewed-by: Jay Civelli <jcivelli@chromium.org> Cr-Commit-Position: refs/heads/master@{#485537} [modify] https://crrev.com/792c113f9a19b62e2814a0eb8224572762e76880/content/public/android/BUILD.gn [modify] https://crrev.com/792c113f9a19b62e2814a0eb8224572762e76880/content/public/android/java/src/org/chromium/content/browser/ChildConnectionAllocator.java [modify] https://crrev.com/792c113f9a19b62e2814a0eb8224572762e76880/content/public/android/java/src/org/chromium/content/browser/ChildProcessConnection.java [modify] https://crrev.com/792c113f9a19b62e2814a0eb8224572762e76880/content/public/android/javatests/src/org/chromium/content/browser/ChildProcessLauncherTest.java [modify] https://crrev.com/792c113f9a19b62e2814a0eb8224572762e76880/content/public/android/junit/src/org/chromium/content/browser/BindingManagerImplTest.java [modify] https://crrev.com/792c113f9a19b62e2814a0eb8224572762e76880/content/public/android/junit/src/org/chromium/content/browser/ChildConnectionAllocatorTest.java [add] https://crrev.com/792c113f9a19b62e2814a0eb8224572762e76880/content/public/android/junit/src/org/chromium/content/browser/ChildProcessConnectionTest.java [modify] https://crrev.com/792c113f9a19b62e2814a0eb8224572762e76880/content/public/android/junit/src/org/chromium/content/browser/SpareChildConnectionTest.java [modify] https://crrev.com/792c113f9a19b62e2814a0eb8224572762e76880/content/public/android/junit/src/org/chromium/content/browser/TestChildProcessConnection.java
,
Jul 12 2017
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/226ca87e080ea1eedeb4b35342a4fad4ffb05d92 commit 226ca87e080ea1eedeb4b35342a4fad4ffb05d92 Author: Bo Liu <boliu@chromium.org> Date: Wed Jul 12 21:45:18 2017 android: Add bindService/onServiceConnected UMA Add a time histogram to measure Android's delay between bindService and onServiceConnected. Then add a boolean histogram that counts number of times onServiceConnected times out, which should really be considered an infinite bucket in the time measurement. BUG= 736066 Change-Id: Ia73926a04c01f52216dd24805051a077a500c2b7 Reviewed-on: https://chromium-review.googlesource.com/567369 Reviewed-by: Jay Civelli <jcivelli@chromium.org> Reviewed-by: Ilya Sherman <isherman@chromium.org> Commit-Queue: Bo Liu <boliu@chromium.org> Cr-Commit-Position: refs/heads/master@{#486094} [modify] https://crrev.com/226ca87e080ea1eedeb4b35342a4fad4ffb05d92/base/android/java/src/org/chromium/base/process_launcher/ChildProcessConnection.java [modify] https://crrev.com/226ca87e080ea1eedeb4b35342a4fad4ffb05d92/tools/metrics/histograms/histograms.xml
,
Jul 25 2017
So right now I have a 10s "timeout" purely to collect uma, it doesn't actually retry bind. From dev population: https://uma.googleplex.com/p/chrome/histograms/?endDate=20170724&dayCount=28&histograms=Android.ChildProcessLauncher.OnServiceConnectedTime%2CAndroid.ChildProcessLauncher.OnServiceConnectedTimedOut&fixupData=true&showMax=true&filters=platform%2Ceq%2CA%2Cchannel%2Ceq%2C2%2Cisofficial%2Ceq%2CTrue&implicitFilters=isofficial * timeout rate is 0.07% * time histogram onServiceConnected for 10s or above is slightly higher at 0.1% * can't know exactly overlap between the two, but at least it suggests if we really have a 10s timeout + retry, then we will have significant false positive (would have succeeded if waited more) than real positive, at least in the dev popluation We can wait for data from beta or stable I guess. But from dev data, looks like a timeout will do more damage than good, and we should just not bother.
,
Aug 28 2017
same conclusion on data from beta. I'm just gonna delete the uma and leave this as wontfix
,
Aug 29 2017
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/eb2f08eaec11c380c67b688679091d0d8282786d commit eb2f08eaec11c380c67b688679091d0d8282786d Author: Bo Liu <boliu@chromium.org> Date: Tue Aug 29 17:11:33 2017 android: Remove OnServiceConnected uma Made decision to do nothing with data already collected. So remove these metrics. This effectively reverts refs/heads/master@{#486094} and refs/heads/master@{#485537}. Bug: 736066 Change-Id: Id8a80e6a6de8a7f732734cae3daf9634cc0cb572 Reviewed-on: https://chromium-review.googlesource.com/639237 Reviewed-by: Yaron Friedman <yfriedman@chromium.org> Reviewed-by: Ilya Sherman <isherman@chromium.org> Reviewed-by: Jay Civelli <jcivelli@chromium.org> Commit-Queue: Bo <boliu@chromium.org> Cr-Commit-Position: refs/heads/master@{#498147} [modify] https://crrev.com/eb2f08eaec11c380c67b688679091d0d8282786d/base/android/java/src/org/chromium/base/process_launcher/ChildConnectionAllocator.java [modify] https://crrev.com/eb2f08eaec11c380c67b688679091d0d8282786d/base/android/java/src/org/chromium/base/process_launcher/ChildProcessConnection.java [modify] https://crrev.com/eb2f08eaec11c380c67b688679091d0d8282786d/base/android/junit/src/org/chromium/base/process_launcher/ChildConnectionAllocatorTest.java [modify] https://crrev.com/eb2f08eaec11c380c67b688679091d0d8282786d/base/android/junit/src/org/chromium/base/process_launcher/ChildProcessConnectionTest.java [modify] https://crrev.com/eb2f08eaec11c380c67b688679091d0d8282786d/base/test/android/javatests/src/org/chromium/base/test/TestChildProcessConnection.java [modify] https://crrev.com/eb2f08eaec11c380c67b688679091d0d8282786d/content/public/android/junit/src/org/chromium/content/browser/BindingManagerImplTest.java [modify] https://crrev.com/eb2f08eaec11c380c67b688679091d0d8282786d/tools/metrics/histograms/histograms.xml
,
Aug 29 2017
|
||||||
►
Sign in to add a comment |
||||||
Comment 1 by aelias@chromium.org
, Jun 22 2017Labels: ReleaseBlock-Stable M-60