New issue
Advanced search Search tips
Note: Color blocks (like or ) mean that a user may not be available. Tooltip shows the reason.

Issue 736066 link

Starred by 3 users

Issue metadata

Status: WontFix
Owner:
Closed: Aug 2017
Cc:
EstimatedDays: ----
NextAction: ----
OS: Android
Pri: 2
Type: Bug



Sign in to add a comment

Follow up with why activity manager kills service before onServiceConnected

Project Member Reported by boliu@chromium.org, Jun 22 2017

Issue description

follow up for  crbug.com/735626 
 

Comment 1 by aelias@chromium.org, Jun 22 2017

Cc: aelias@chromium.org
Labels: ReleaseBlock-Stable M-60
Marking 60 RBS per  http://crbug.com/735626#c27 

Comment 2 by boliu@chromium.org, Jun 22 2017

Log that kills the service process:
ActivityManager: Killing 21768:org.chromium.chrome:sandboxed_process0/u0a139i346 (adj 9): isolated not needed

Which comes from this line:
https://android.googlesource.com/platform/frameworks/base/+/0f2c1c3f4478a2fc1399b1263339df058240b611/services/core/java/com/android/server/am/ActivityManagerService.java#21156

                if (app.isolated && app.services.size() <= 0) {
                    // If this is an isolated process, and there are no
                    // services running in it, then the process is no longer
                    // needed.  We agressively kill these because we can by
                    // definition not re-use the same process again, and it is
                    // good to avoid having whatever code was running in them
                    // left sitting around after no longer needed.
                    app.kill("isolated not needed", true);
                }

That block of comment is really helpful. According to the comment, there is no service in the process. This process was only alive for like 5ms, so this is something (probably launching fre activity?) triggering updateOomAdjLocked before a service is loaded in the new process, and the process just got garbage collected before it's even initialized.

If that's right, clearly an android bug. But hard to confirm what exactly is triggering updateOomAdjLocked, probably need to talk to some android folks.

Comment 3 by boliu@chromium.org, Jun 22 2017

b/62913309

Comment 4 by boliu@chromium.org, Jun 22 2017

it looks like it doesn't happen on m60 because Main activity directly launches FRE activity, which then calls warmup. In m59, it's Main -> ChromeTabbedActivity -> FRE, but CTA triggers warmup, so warmup races with starting FRE

m59 (without fix):
06-22 16:36:03.609   662  1321 I am_restart_activity: [0,1112048536,384,org.chromium.chrome/com.google.android.apps.chrome.Main]
06-22 16:36:03.680   662  1322 I am_create_task: [0,385]
06-22 16:36:03.680   662  1322 I am_create_activity: [0,1120108088,385,org.chromium.chrome/.browser.ChromeTabbedActivity,android.intent.action.MAIN,NULL,NULL,335544320]
06-22 16:36:03.680   662  1322 I wm_task_created: [385,6]
06-22 16:36:03.680   662  1322 I wm_task_moved: [385,1,7]
06-22 16:36:03.680   662  1322 I wm_home_stack_moved: 0
06-22 16:36:03.680   662  1322 I am_pause_activity: [0,1112048536,org.chromium.chrome/com.google.android.apps.chrome.Main]
06-22 16:36:03.683   662  1321 I am_finish_activity: [0,1112048536,384,org.chromium.chrome/com.google.android.apps.chrome.Main,app-request]
06-22 16:36:03.686   662  1320 I am_restart_activity: [0,1120108088,385,org.chromium.chrome/.browser.ChromeTabbedActivity]
06-22 16:36:03.691   662   686 I am_destroy_activity: [0,1112048536,384,org.chromium.chrome/com.google.android.apps.chrome.Main,finish-idle]
06-22 16:36:04.032   141   141 I sf_frame_dur: [com.android.launcher3/com.android.launcher3.Launcher,0,4,0,2,2,0,3]
06-22 16:36:04.057 12201 12201 I am_on_resume_called: [0,org.chromium.chrome.browser.ChromeTabbedActivity]
06-22 16:36:04.067   662   673 I wm_task_removed: [384,removeTask]
06-22 16:36:04.068   662   673 I wm_task_removed: [384,removeAppToken: last token]
06-22 16:36:04.069   662   943 I wm_task_moved: [385,1,6]
06-22 16:36:04.069   662   943 I wm_home_stack_moved: 0
06-22 16:36:04.069   662   943 I am_create_activity: [0,1119824368,385,org.chromium.chrome/.browser.firstrun.FirstRunActivity,NULL,NULL,NULL,0]
06-22 16:36:04.071   662   943 I am_pause_activity: [0,1120108088,org.chromium.chrome/.browser.ChromeTabbedActivity]
06-22 16:36:04.093   662   940 I am_proc_start: [0,12228,99355,org.chromium.chrome:sandboxed_process0,service,org.chromium.chrome/org.chromium.content.app.SandboxedProcessService0]
06-22 16:36:04.118   662   880 I am_proc_bound: [0,12228,org.chromium.chrome:sandboxed_process0]
06-22 16:36:04.121   662   880 I am_kill : [0,12228,org.chromium.chrome:sandboxed_process0,9,isolated not needed]


m60:
06-22 17:25:13.736   662   672 I am_proc_start: [0,19233,10140,org.chromium.chrome,activity,org.chromium.chrome/com.google.android.apps.chrome.Main]
06-22 17:25:13.758   662   673 I am_proc_bound: [0,19233,org.chromium.chrome]
06-22 17:25:13.761   662   673 I am_restart_activity: [0,1105003432,396,org.chromium.chrome/com.google.android.apps.chrome.Main]
06-22 17:25:13.868   662   673 I am_create_task: [0,397]
06-22 17:25:13.868   662   673 I am_create_activity: [0,1106098272,397,org.chromium.chrome/.browser.firstrun.FirstRunActivity,NULL,NULL,NULL,8388608]
06-22 17:25:13.868   662   673 I wm_task_created: [397,6]
06-22 17:25:13.868   662   673 I wm_task_moved: [397,1,4]
06-22 17:25:13.868   662   673 I wm_home_stack_moved: 0
06-22 17:25:13.873   662   673 I am_pause_activity: [0,1105003432,org.chromium.chrome/com.google.android.apps.chrome.Main]
06-22 17:25:13.901   662  1321 I am_finish_activity: [0,1105003432,396,org.chromium.chrome/com.google.android.apps.chrome.Main,app-request]
06-22 17:25:13.911   662   942 I am_restart_activity: [0,1106098272,397,org.chromium.chrome/.browser.firstrun.FirstRunActivity]
06-22 17:25:14.021 19233 19233 I am_on_resume_called: [0,org.chromium.chrome.browser.firstrun.FirstRunActivity]
06-22 17:25:14.079   662   941 I am_proc_start: [0,19267,99362,org.chromium.chrome:sandboxed_process0,service,org.chromium.chrome/org.chromium.content.app.SandboxedProcessService0]
06-22 17:25:14.082   662   877 I am_create_service: [0,1108755920,.GoogleLoginService,10008,19196]
06-22 17:25:14.093   662   880 I am_destroy_service: [0,1108755920,19196]
06-22 17:25:14.115   662   879 I am_proc_bound: [0,19267,org.chromium.chrome:sandboxed_process0]

Comment 5 by klo...@chromium.org, Jun 23 2017

Cc: klo...@chromium.org

Comment 6 by boliu@chromium.org, Jun 23 2017

Labels: -ReleaseBlock-Stable -M-60
not gonna bother with m60
* the bad timing no longer happens in m60 as described in #4
* even if bug happens, the result would not be as catastrophic since only that particular renderer launch will hang, and user can just swipe tab away. m59 blocks the whole launcher thread, so every child process launch hangs

Comment 7 by boliu@chromium.org, Jun 23 2017

Labels: -Pri-1 Pri-2
And we have a working theory
* the "isolated not needed" bug exists on K, but appears to be fixed N (based on testing on vanilla nexus devices)
* on K nexus device, bug does not cause a hang indefinitely because vanilla android activity manager behavior will restart a crashed service as long as it is bound (even if it's one killed by activity manager itself o_O) with a 1s delay. but after the delay, everything just keeps going as if nothing was wrong
* we suspect that blu removed the "restart crashed service" behavior, maybe because they believe it will save memory or something

in that case, only workaround is add a timeout and retry

kind of horrifying, removing restart breaks all bound services o_O

Comment 8 by boliu@chromium.org, Jun 29 2017

Cc: jcivelli@chromium.org
hmm, so should I just add a timeout (thinking 10s) + retry? or should I add some histograms first?

I'm thinking of adding
* histogram of time between first bindContext and first onServiceConnected
* count of timeout and probably count of (first) bindContext as well for comparison

thoughts?

+jay who probably I'm ask to do reviews here
Project Member

Comment 9 by bugdroid1@chromium.org, Jul 11 2017

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/792c113f9a19b62e2814a0eb8224572762e76880

commit 792c113f9a19b62e2814a0eb8224572762e76880
Author: Bo Liu <boliu@chromium.org>
Date: Tue Jul 11 06:37:49 2017

android: Retry bind if timeout

Unbind and rebind if onServiceConnection does not come after a timeout,
currently set to 10s. Add a unit test, but do not enable this behavior
in production yet.

The current plan is then to add some UMA metrics and evaluate result
to decide whether it's worth turning this on or not.

BUG= 736066 

Change-Id: Id06b11005c781932810b32e11592727f580ef38c
Reviewed-on: https://chromium-review.googlesource.com/562653
Commit-Queue: Bo Liu <boliu@chromium.org>
Reviewed-by: Jay Civelli <jcivelli@chromium.org>
Cr-Commit-Position: refs/heads/master@{#485537}
[modify] https://crrev.com/792c113f9a19b62e2814a0eb8224572762e76880/content/public/android/BUILD.gn
[modify] https://crrev.com/792c113f9a19b62e2814a0eb8224572762e76880/content/public/android/java/src/org/chromium/content/browser/ChildConnectionAllocator.java
[modify] https://crrev.com/792c113f9a19b62e2814a0eb8224572762e76880/content/public/android/java/src/org/chromium/content/browser/ChildProcessConnection.java
[modify] https://crrev.com/792c113f9a19b62e2814a0eb8224572762e76880/content/public/android/javatests/src/org/chromium/content/browser/ChildProcessLauncherTest.java
[modify] https://crrev.com/792c113f9a19b62e2814a0eb8224572762e76880/content/public/android/junit/src/org/chromium/content/browser/BindingManagerImplTest.java
[modify] https://crrev.com/792c113f9a19b62e2814a0eb8224572762e76880/content/public/android/junit/src/org/chromium/content/browser/ChildConnectionAllocatorTest.java
[add] https://crrev.com/792c113f9a19b62e2814a0eb8224572762e76880/content/public/android/junit/src/org/chromium/content/browser/ChildProcessConnectionTest.java
[modify] https://crrev.com/792c113f9a19b62e2814a0eb8224572762e76880/content/public/android/junit/src/org/chromium/content/browser/SpareChildConnectionTest.java
[modify] https://crrev.com/792c113f9a19b62e2814a0eb8224572762e76880/content/public/android/junit/src/org/chromium/content/browser/TestChildProcessConnection.java

Project Member

Comment 10 by bugdroid1@chromium.org, Jul 12 2017

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/226ca87e080ea1eedeb4b35342a4fad4ffb05d92

commit 226ca87e080ea1eedeb4b35342a4fad4ffb05d92
Author: Bo Liu <boliu@chromium.org>
Date: Wed Jul 12 21:45:18 2017

android: Add bindService/onServiceConnected UMA

Add a time histogram to measure Android's delay between bindService and
onServiceConnected. Then add a boolean histogram that counts number of
times onServiceConnected times out, which should really be considered an
infinite bucket in the time measurement.

BUG= 736066 

Change-Id: Ia73926a04c01f52216dd24805051a077a500c2b7
Reviewed-on: https://chromium-review.googlesource.com/567369
Reviewed-by: Jay Civelli <jcivelli@chromium.org>
Reviewed-by: Ilya Sherman <isherman@chromium.org>
Commit-Queue: Bo Liu <boliu@chromium.org>
Cr-Commit-Position: refs/heads/master@{#486094}
[modify] https://crrev.com/226ca87e080ea1eedeb4b35342a4fad4ffb05d92/base/android/java/src/org/chromium/base/process_launcher/ChildProcessConnection.java
[modify] https://crrev.com/226ca87e080ea1eedeb4b35342a4fad4ffb05d92/tools/metrics/histograms/histograms.xml

Comment 11 by boliu@chromium.org, Jul 25 2017

So right now I have a 10s "timeout" purely to collect uma, it doesn't actually retry bind. From dev population:
https://uma.googleplex.com/p/chrome/histograms/?endDate=20170724&dayCount=28&histograms=Android.ChildProcessLauncher.OnServiceConnectedTime%2CAndroid.ChildProcessLauncher.OnServiceConnectedTimedOut&fixupData=true&showMax=true&filters=platform%2Ceq%2CA%2Cchannel%2Ceq%2C2%2Cisofficial%2Ceq%2CTrue&implicitFilters=isofficial

* timeout rate is 0.07%
* time histogram onServiceConnected for 10s or above is slightly higher at 0.1%
* can't know exactly overlap between the two, but at least it suggests if we really have a 10s timeout + retry, then we will have significant false positive (would have succeeded if waited more) than real positive, at least in the dev popluation

We can wait for data from beta or stable I guess. But from dev data, looks like a timeout will do more damage than good, and we should just not bother.

Comment 12 by boliu@chromium.org, Aug 28 2017

same conclusion on data from beta. I'm just gonna delete the uma and leave this as wontfix
Project Member

Comment 13 by bugdroid1@chromium.org, Aug 29 2017

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/eb2f08eaec11c380c67b688679091d0d8282786d

commit eb2f08eaec11c380c67b688679091d0d8282786d
Author: Bo Liu <boliu@chromium.org>
Date: Tue Aug 29 17:11:33 2017

android: Remove OnServiceConnected uma

Made decision to do nothing with data already collected. So remove these
metrics.

This effectively reverts refs/heads/master@{#486094} and
refs/heads/master@{#485537}.

Bug:  736066 
Change-Id: Id8a80e6a6de8a7f732734cae3daf9634cc0cb572
Reviewed-on: https://chromium-review.googlesource.com/639237
Reviewed-by: Yaron Friedman <yfriedman@chromium.org>
Reviewed-by: Ilya Sherman <isherman@chromium.org>
Reviewed-by: Jay Civelli <jcivelli@chromium.org>
Commit-Queue: Bo <boliu@chromium.org>
Cr-Commit-Position: refs/heads/master@{#498147}
[modify] https://crrev.com/eb2f08eaec11c380c67b688679091d0d8282786d/base/android/java/src/org/chromium/base/process_launcher/ChildConnectionAllocator.java
[modify] https://crrev.com/eb2f08eaec11c380c67b688679091d0d8282786d/base/android/java/src/org/chromium/base/process_launcher/ChildProcessConnection.java
[modify] https://crrev.com/eb2f08eaec11c380c67b688679091d0d8282786d/base/android/junit/src/org/chromium/base/process_launcher/ChildConnectionAllocatorTest.java
[modify] https://crrev.com/eb2f08eaec11c380c67b688679091d0d8282786d/base/android/junit/src/org/chromium/base/process_launcher/ChildProcessConnectionTest.java
[modify] https://crrev.com/eb2f08eaec11c380c67b688679091d0d8282786d/base/test/android/javatests/src/org/chromium/base/test/TestChildProcessConnection.java
[modify] https://crrev.com/eb2f08eaec11c380c67b688679091d0d8282786d/content/public/android/junit/src/org/chromium/content/browser/BindingManagerImplTest.java
[modify] https://crrev.com/eb2f08eaec11c380c67b688679091d0d8282786d/tools/metrics/histograms/histograms.xml

Comment 14 by boliu@chromium.org, Aug 29 2017

Status: WontFix (was: Assigned)

Sign in to add a comment