New issue
Advanced search Search tips
Note: Color blocks (like or ) mean that a user may not be available. Tooltip shows the reason.

Issue 720317 link

Starred by 2 users

Issue metadata

Status: Fixed
Owner:
Closed: May 2017
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: ----
Pri: 3
Type: Bug



Sign in to add a comment

Sometimes failing to launch browser on android-nexus5x perf bot after swarming

Project Member Reported by perezju@chromium.org, May 10 2017

Issue description

On this recently swarmed android perf bot at:
https://luci-milo.appspot.com/buildbot/chromium.perf/Android%20Nexus5X%20Perf

Some stories on system health benchmarks are often failing when trying to open a browser. It appears that the browser did not pick up command line flags correctly.

	UI dump
	- (no package):
	  - (no id)
	- org.chromium.chrome:
	  - (no id)
	  - android:id/content
	  - android:id/navigationBarBackground
	  - android:id/statusBarBackground
	  - org.chromium.chrome:id/action_bar_root
	  - org.chromium.chrome:id/fre_content_wrapper
	  - org.chromium.chrome:id/fre_image_and_content
	  - org.chromium.chrome:id/fre_main_layout
	  - org.chromium.chrome:id/fre_pager
	  - org.chromium.chrome:id/image
	  - org.chromium.chrome:id/send_report_checkbox['Help make Chrome better by sending usage statistics and crash reports to Google.']
	  - org.chromium.chrome:id/terms_accept['Accept & continue']
	  - org.chromium.chrome:id/title['Welcome to Chrome']
	  - org.chromium.chrome:id/tos_and_privacy[u'By using this application, you agree to Chrome\u2019s Terms of Service and Privacy Notice.']
	********************************************************************************

Interestingly, not all stories fail (most return OK), but the failing ones are all preceded by an exception:

Traceback (most recent call last):
  File "/b/swarming/w/ir/third_party/android_platform/development/scripts/stack", line 240, in <module>
    sys.exit(main(sys.argv[1:]))
  File "/b/swarming/w/ir/third_party/android_platform/development/scripts/stack", line 196, in main
    constants.CheckOutputDirectory()
  File "/b/swarming/w/ir/third_party/android_platform/development/scripts/../../../../build/android/pylib/constants/__init__.py", line 208, in CheckOutputDirectory
    raise Exception('Neither CHROMIUM_OUTPUT_DIR nor CHROMIUM_OUT_DIR '
Exception: Neither CHROMIUM_OUTPUT_DIR nor CHROMIUM_OUT_DIR has been set

https://luci-logdog.appspot.com/v/?s=chrome%2Fbb%2Fchromium.perf%2FAndroid_Nexus5X_Perf%2F24%2F%2B%2Frecipes%2Fsteps%2Fsystem_health.common_mobile_on_Android%2F0%2Fstdout

Not sure why some stories trigger this, while some do not?

Additionally:
- Screenshots for failed stories are being take too late go/catabug/3548/
- Disabled stories are also trying to open the browser and failing go/catabug/3549/

For the moment assigning to myself to investigate further.
 
Cc: primiano@chromium.org
+primiano, wee need some help debugging.

Regarding this specific problem, the most relevant piece of information appears to be (from the device logcat):

	05-09 23:31:26.248   933  3933 I ActivityManager: START u0 {dat=about:blank flg=0x10000000 cmp=org.chromium.chrome/com.google.android.apps.chrome.Main} from uid 0 on display 0
	05-09 23:31:26.262  4120  6385 W SearchService: Abort, client detached.
	05-09 23:31:26.269  4120  4120 I HotwordDetector: Closing mic
	05-09 23:31:26.269  4120  4383 I MicrophoneInputStream: mic_close com.google.android.apps.gsa.speech.audio.x@fe4349e
	05-09 23:31:26.269  4120 22598 E AudioRecord-JNI: Error -4 during AudioRecord native read
	05-09 23:31:26.289 22653 22653 I cr_InstantAppsHandler: Not handling with Instant Apps because Chrome is not default or there's a specialized handler
	05-09 23:31:26.295   933  3946 I ActivityManager: START u0 {cmp=org.chromium.chrome/.browser.firstrun.FirstRunActivity (has extras)} from uid 10092 on display 0


The first "ActivityManager: START" is triggered by Telemetry when trying to launch the browser, _something_ then triggers the second one with "FirstRunActivity" which prevents us from actually gathering control of the browser.

Not sure exactly how that happens. Is this the normal behavior when Chrome is launched for the first time with an empty set of flags? Or something else is launching the second intent? What's the deal with that "Instant Apps" thing?

On a side note: I've spent yesterday all day looking at the logs; there are a bunch of problems with our logging and exception handling in Telemetry. The exception with the huge diagnostic info is dumped 3 times per incidence in the logs, and we attempt to close the browser_backend twice! -- I'll file separate bugs for these.
A few more updates. From the Telemetry side of things, this is what is happening:

1. We set command line flags (to /data/local/tmp/chrome-command-line)

2. We launch chrome:
   $ adb shell am start -W -d about:blank -n org.chromium.chrome/com.google.android.apps.chrome.Main

3. We set up port forwarding:
   $ adb forward --no-rebind tcp:58975 localabstract:chrome_devtools_remote

4. We check whether we can connect to the devtools client:
   websocket.create_connection('ws://127.0.0.1:58975/devtools/browser', timeout=xxx, sockopt=[(socket.SOL_SOCKET, socket.SO_REUSEADDR, 1)])

5. We then try to set an http connection to it and send a request:
   conn = httplib.HTTPConnection('127.0.0.1:58975', timeout=30)
   conn.request('GET', '/json')
   
Either 4 or 5 failing cause the error we're seeing.

Locally I tried to explicitly trash out the flags (writing always '--foo' or removing the file enrirely) and ignoring what Telemetry wants to set.

The browser does come up anyway *and* the devtools connection is possible. (The error is caught much later because pages never load on the browser).

So I think the problem is in one of the steps 3-5. I'll fire up a CL to add some extra logging on those areas.
Cc: mikec...@chromium.org
 Issue 719613  has been merged into this issue.
Very interested to see what comes of your investigations, perezju@. This is causing quite a few problems on the waterfall.
Summary: Sometimes failing to launch browser on android-nexus5x perf bot after swarming (was: Failures on android-nexus5x perf: Neither CHROMIUM_OUTPUT_DIR nor CHROMIUM_OUT_DIR)
Charlie, do the failures on other benchmarks have the same symptoms? In particular failing while trying to launch the browser with a UI Dump similar to the one in #0?

Also forgot to mention, the "Neither CHROMIUM_OUTPUT_DIR nor CHROMIUM_OUT_DIR has been set" is a red-herring. There is a problem when trying to run the "symbolize stack" script [1], but that happens *after* all things started going south; and there is no stack to symbolize in this case anyway.

[1]: https://cs.chromium.org/chromium/src/third_party/catapult/telemetry/telemetry/internal/platform/android_platform_backend.py?rcl=9e43159fc62c2ba87c5dd5202e38aeb5e207bb68&l=741

Filed issue 721684 for that.

Project Member

Comment 6 by bugdroid1@chromium.org, May 12 2017

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/810daba795ae080b446ab7ef573ed2386028f940

commit 810daba795ae080b446ab7ef573ed2386028f940
Author: catapult-deps-roller@chromium.org <catapult-deps-roller@chromium.org>
Date: Fri May 12 12:48:43 2017

Roll src/third_party/catapult/ 2cbeef5c4..fd3eb1dce (2 commits)

https://chromium.googlesource.com/external/github.com/catapult-project/catapult.git/+log/2cbeef5c42f5..fd3eb1dce85a

$ git log 2cbeef5c4..fd3eb1dce --date=short --no-merges --format='%ad %ae %s'
2017-05-12 perezju [telemety] Reduce log spam of exceptions when creating browser
2017-05-12 perezju Add extra logging to adb.Forward and IsDevToolsAgentAvailable

Created with:
  roll-dep src/third_party/catapult
BUG= 720317 


Documentation for the AutoRoller is here:
https://skia.googlesource.com/buildbot/+/master/autoroll/README.md

If the roll is causing failures, see:
http://www.chromium.org/developers/tree-sheriffs/sheriff-details-chromium#TOC-Failures-due-to-DEPS-rolls


CQ_INCLUDE_TRYBOTS=master.tryserver.chromium.android:android_optional_gpu_tests_rel
TBR=sullivan@chromium.org

Change-Id: I3e3ec1ea285cedc6b05573a0dd043cca0cfa5d7d
Reviewed-on: https://chromium-review.googlesource.com/504088
Reviewed-by: <catapult-deps-roller@chromium.org>
Commit-Queue: <catapult-deps-roller@chromium.org>
Cr-Commit-Position: refs/heads/master@{#471288}
[modify] https://crrev.com/810daba795ae080b446ab7ef573ed2386028f940/DEPS

First build that ran after #6 is:
https://luci-milo.appspot.com/buildbot/chromium.perf/Android%20Nexus5X%20Perf/36

From that log the error is:
(INFO) 2017-05-12 14:33:58,800 flag_changer._UpdateCommandLineFile:206  Flags now set on the device: ['--disable-fre', '--no-default-browser-check', '--metrics-recording-only', ...]
(INFO) 2017-05-12 14:33:58,800 cmd_helper._ValidateAndLogCommand:161  [host]> /b/swarming/w/ir/third_party/catapult/devil/bin/deps/linux2/x86_64/bin/adb -s 01d30cfd7bc7d319 shell '( am start -W -d about:blank -n org.chromium.chrome/com.google.android.apps.chrome.Main );echo %$?'
(INFO) 2017-05-12 14:34:00,802 cmd_helper._ValidateAndLogCommand:161  [host]> /b/swarming/w/ir/third_party/catapult/devil/bin/deps/linux2/x86_64/bin/adb -s 01d30cfd7bc7d319 forward --no-rebind tcp:58267 localabstract:chrome_devtools_remote
(INFO) 2017-05-12 14:34:00,867 devtools_client_backend._IsInspectorWebsocketAvailable:56  Websocket at port 58267 not yet available: [Errno 104] Connection reset by peer
(INFO) 2017-05-12 14:34:00,868 devtools_client_backend._IsInspectorWebsocketAvailable:56  Websocket at port 58267 not yet available: [Errno 104] Connection reset by peer
(INFO) 2017-05-12 14:34:00,968 devtools_client_backend._IsInspectorWebsocketAvailable:56  Websocket at port 58267 not yet available: [Errno 104] Connection reset by peer
(INFO) 2017-05-12 14:34:01,069 devtools_client_backend._IsInspectorWebsocketAvailable:56  Websocket at port 58267 not yet available: [Errno 104] Connection reset by peer
[...]

Before failing as in #0. This means we're failing exactly when trying to:

4.   websocket.create_connection('ws://127.0.0.1:58975/devtools/browser', timeout=xxx, sockopt=[(socket.SOL_SOCKET, socket.SO_REUSEADDR, 1)])


Sounds like Chrome is not responding to devtools requests?

Logcat keeps showing two intents to launch Chrome:

	05-12 16:29:37.924   928  3539 I ActivityManager: START u0 {dat=about:blank flg=0x10000000 cmp=org.chromium.chrome/com.google.android.apps.chrome.Main} from uid 0 on display 0
	05-12 16:29:37.933  3893  3915 W SearchService: Abort, client detached.
	05-12 16:29:37.936  3893  3893 I HotwordDetector: Closing mic
	05-12 16:29:37.936  3893  4164 I MicrophoneInputStream: mic_close com.google.android.apps.gsa.speech.audio.x@6f2982d
	05-12 16:29:37.936  3893 28641 E AudioRecord-JNI: Error -4 during AudioRecord native read
	05-12 16:29:37.984   690 28643 D audio_hw_primary: disable_audio_route: reset and update mixer path: audio-record
	05-12 16:29:37.985   690 28643 D audio_hw_primary: disable_snd_device: snd_device(61: voice-rec-mic)
	05-12 16:29:37.989   690 28643 I soundtrigger: audio_extn_sound_trigger_update_device_status: device 0x3d of type 1 for Event 0
	05-12 16:29:37.989   690 28643 D sound_trigger_platform: platform_stdev_check_and_update_concurrency: concurrency active 0, tx 0, rx 0, concurrency session_allowed 1
	05-12 16:29:37.989   690  3090 I SoundTriggerHwService::Module: void android::SoundTriggerHwService::Module::onCallbackEvent(const android::sp<android::SoundTriggerHwService::CallbackEvent>&) mClient == 0
	05-12 16:29:37.991  3893  4182 I MicroRecognitionRnrImpl: Stopping hotword detection.
	05-12 16:29:37.992  3893 28640 I MicroRecognitionRnrImpl: Detection finished
	05-12 16:29:38.103 28702 28702 I cr_InstantAppsHandler: Not handling with Instant Apps because Chrome is not default or there's a specialized handler
	05-12 16:29:38.150   928  3683 I ActivityManager: START u0 {cmp=org.chromium.chrome/.browser.firstrun.FirstRunActivity (has extras)} from uid 10089 on display 0

Any further ideas?
Ok, I still don't know what the problem is, but I've gotten a bit closer: The stories failing are always run *after* "running" a story was disabled (and therefore it gets skipped).

For example, on one of the recent builds, the status of stories as being run is: -DF------DF------------------DF--------- ... ('-': ok, 'D': disabled, 'F': failed).

I'm assuming that the logic to skip stories is not clearing some state and causing the next story to fail. Tomorrow I'll keep digging into it.
Ah interesting pattern in #8. Your theory sounds really plausible (we are not clearing up something when we skip).
Nothing comes to my mind right now.

A bit confused by seeing FirstRunActivity, don't we pass --disable-fre? I wonder if this is the cmdline getting deleted at the wrong time and not being there when we need for the run after the skip.
Project Member

Comment 10 by bugdroid1@chromium.org, May 17 2017

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/26d71b4f8cbf1fd071ae1817918857613d9338f7

commit 26d71b4f8cbf1fd071ae1817918857613d9338f7
Author: catapult-deps-roller@chromium.org <catapult-deps-roller@chromium.org>
Date: Wed May 17 05:16:56 2017

Roll src/third_party/catapult/ 37015fb47..dab2aa44c (13 commits)

https://chromium.googlesource.com/external/github.com/catapult-project/catapult.git/+log/37015fb470d8..dab2aa44c6cd

$ git log 37015fb47..dab2aa44c --date=short --no-merges --format='%ad %ae %s'
2017-05-16 nednguyen Revert of Remove webrtc_rendering_stats TBMv1 metrics (patchset #2 id:20001 of https://codereview.chromium.org/2890443002/ )
2017-05-16 rnephew Revert of [Telemetry][CodeHealth] Make unittests with fake platforms use fakes.FakePlatform (patchset #3 id:60001 of https://codereview.chromium.org/2889583003/ )
2017-05-16 rnephew [Telemetry][CodeHealth] Make unittests with fake platforms use fakes.FakePlatform
2017-05-16 ctzsm Remove android-webview-shell
2017-05-16 rnephew [Telemetry] Add Expectation module that enables disabling benchmarks/stories
2017-05-16 perezju [common/battor] Fix errors while logging
2017-05-16 nednguyen Remove webrtc_rendering_stats TBMv1 metrics
2017-05-16 perezju [Telemetry] Stop tracing/metrics before closing the browser
2017-05-16 nednguyen Remove v8_gc_latency TBMv1 metric
2017-05-16 ulan Revert of Fix main frame detection in loading metrics. (patchset #7 id:120001 of https://codereview.chromium.org/2862103002/ )
2017-05-15 jreck Quick fix to avoid corruption from binder_parser
2017-05-15 ctzsm Rename AndroidWebView.apk to WebViewInstrumentation.apk
2017-05-15 ulan Fix main frame detection in loading metrics.

Created with:
  roll-dep src/third_party/catapult
BUG= 701938 ,711065, 719447 , 720317 , 692112 , 701938 , 692112 


Documentation for the AutoRoller is here:
https://skia.googlesource.com/buildbot/+/master/autoroll/README.md

If the roll is causing failures, see:
http://www.chromium.org/developers/tree-sheriffs/sheriff-details-chromium#TOC-Failures-due-to-DEPS-rolls


CQ_INCLUDE_TRYBOTS=master.tryserver.chromium.android:android_optional_gpu_tests_rel
TBR=sullivan@chromium.org

Change-Id: I01ecaf24a0ad46e599b4e1272e8bb6204cabf75f
Reviewed-on: https://chromium-review.googlesource.com/507032
Reviewed-by: <catapult-deps-roller@chromium.org>
Commit-Queue: <catapult-deps-roller@chromium.org>
Cr-Commit-Position: refs/heads/master@{#472336}
[modify] https://crrev.com/26d71b4f8cbf1fd071ae1817918857613d9338f7/DEPS

Status: Fixed (was: Assigned)
I'm going to call this fixed, the failure is now gone from the most recent 7 builds.

Interestingly, the failure disappeared a bit earlier than my fix landing. My fix landed at r472336.

But the errors were gone after some CL in:
http://test-results.appspot.com/revision_range?start=472069&end=472153

Just recording this info in case we need to revisit the issue.

Some failures remain on this Android 5X bot, but they have different causes as far as I can tell. New bugs should be opened to track those.

Sign in to add a comment