Issue metadata
Sign in to add a comment
|
media.mobile reference builds fail but error points to telemetry startup code. |
||||||||||||||||||||||
Issue descriptionRef graph busted. Time to audio play regression.
,
Apr 9 2018
📍 Pinpoint job started. https://pinpoint-dot-chromeperf.appspot.com/job/12c0108cc40000
,
Apr 9 2018
📍 Couldn't reproduce a difference. https://pinpoint-dot-chromeperf.appspot.com/job/12c0108cc40000
,
Apr 9 2018
📍 Pinpoint job started. https://pinpoint-dot-chromeperf.appspot.com/job/148a5a6cc40000
,
Apr 10 2018
📍 Couldn't reproduce a difference. https://pinpoint-dot-chromeperf.appspot.com/job/148a5a6cc40000
,
Apr 10 2018
The ref graph hasn't produced a value in days - its build is failing. https://ci.chromium.org/buildbot/chromium.perf/Android%20Nexus7v2%20Perf/ https://chromeperf.appspot.com/group_report?bug_id=830834 I'm tempted to close the bug (2 pinpoints, no repro), but I want to make sure we have eyes on this ref graph issue. Who should I follow up with for that bot?
,
Apr 10 2018
Chris, if you can find the error message from the failing build and post it here or file a new bug, then we can figure out what is wrong.
,
Apr 10 2018
Sure. Of the last 25 builds (going back to April 5), 22 failed. One thing I don't follow is, why didn't the passing builds update the ref graph? We have one as recently as April 6? The failures have a varied failure reasons, but one thing that is common is: "failed loading.mobile on Android" When I pop open the last 3 failed builds, I see "Unexpected Failures: loading.mobile/FlipKart" Looking at the log, we see the renderer is crashing, but the stack is not symbolized. I'm building android chrome now...
,
Apr 11 2018
I'm not able to get a crash directing my dev build to flipkart.com. I'm using a nexus7 and the exact build from the swarming bot. I've attempted to run the benchmark using my binary, but can't seem to get that to work... please help. $ tools/perf/run_benchmark media.mobile --device="android" --browser=exact --browser-executable=/ssd/blink3/src/out/Android/apks/ChromePublic.apk (...lots of debug output...) Cannot find browser of type exact. To list out all available browsers, rerun your command with --browser=list $ tools/perf/run_benchmark loading.mobile --device="android" --browser=list (... lots of debug output ...) Available browsers: Android device 079030cf $ tools/perf/run_benchmark loading.mobile --device="android" --browser="Android device 079030cf" Cannot find browser of type Android device 079030cf. To list out all available browsers, rerun your command with --browser=list
,
Apr 12 2018
flipkart tests are independent of media.mobile. Their failure should have no effect on whether the media.mobile.reference benchmark is run. I got to https://ci.chromium.org/buildbot/chromium.perf/Android%20Nexus7v2%20Perf/ are search for "media.mobile.reference" I see this error: Unexpected Failures: * media.mobile/video.html?src=crowd.ogg&type=audio so I click the "stdout" link under that error. That takes me to a log file that has all the logs from running media.mobile.reference. in those logs I see a couple errors: " Traceback (most recent call last): _RunStoryAndProcessErrorIfNeeded at /b/swarming/w/ir/third_party/catapult/telemetry/telemetry/internal/story_runner.py:147 test.DidRunStory(state.platform, results) DidRunStory at /b/swarming/w/ir/third_party/catapult/telemetry/telemetry/web_perf/timeline_based_measurement.py:289 trace_result, _ = platform.tracing_controller.StopTracing() StopTracing at /b/swarming/w/ir/third_party/catapult/telemetry/telemetry/core/tracing_controller.py:53 return self._tracing_controller_backend.StopTracing() StopTracing at /b/swarming/w/ir/third_party/catapult/telemetry/telemetry/internal/platform/tracing_controller_backend.py:130 self._IssueClockSyncMarker() _IssueClockSyncMarker at /b/swarming/w/ir/third_party/catapult/telemetry/telemetry/internal/platform/tracing_controller_backend.py:234 self._RecordIssuerClockSyncMarker) RecordClockSyncMarker at /b/swarming/w/ir/third_party/catapult/telemetry/telemetry/internal/platform/tracing_agent/chrome_tracing_agent.py:178 raise ChromeClockSyncError('Cannot issue clock sync. No devtools clients') " " Traceback (most recent call last): File "/b/swarming/w/ir/third_party/catapult/telemetry/telemetry/internal/story_runner.py", line 116, in _RunStoryAndProcessErrorIfNeeded state.WillRunStory(story) File "/b/swarming/w/ir/third_party/catapult/common/py_trace_event/py_trace_event/trace_event_impl/decorators.py", line 75, in traced_function return func(*args, **kwargs) File "/b/swarming/w/ir/third_party/catapult/telemetry/telemetry/page/shared_page_state.py", line 234, in WillRunStory self._StartBrowser(page) File "/b/swarming/w/ir/third_party/catapult/common/py_trace_event/py_trace_event/trace_event_impl/decorators.py", line 75, in traced_function return func(*args, **kwargs) File "/b/swarming/w/ir/third_party/catapult/telemetry/telemetry/page/shared_page_state.py", line 198, in _StartBrowser self._browser = self._possible_browser.Create() File "/b/swarming/w/ir/third_party/catapult/telemetry/telemetry/internal/backends/chrome/android_browser_finder.py", line 165, in Create return self._GetBrowserInstance(existing=False) File "/b/swarming/w/ir/third_party/catapult/telemetry/telemetry/internal/backends/chrome/android_browser_finder.py", line 187, in _GetBrowserInstance find_existing=existing) File "/b/swarming/w/ir/third_party/catapult/telemetry/telemetry/internal/browser/browser.py", line 49, in __init__ self._browser_backend.Start(startup_args, startup_url=startup_url) File "/b/swarming/w/ir/third_party/catapult/common/py_trace_event/py_trace_event/trace_event_impl/decorators.py", line 75, in traced_function return func(*args, **kwargs) File "/b/swarming/w/ir/third_party/catapult/telemetry/telemetry/internal/backends/chrome/android_browser_backend.py", line 73, in Start blocking=True) File "/b/swarming/w/ir/third_party/catapult/devil/devil/android/decorators.py", line 57, in timeout_retry_wrapper retry_if_func=retry_if_func) File "/b/swarming/w/ir/third_party/catapult/devil/devil/utils/timeout_retry.py", line 158, in Run error_log_func=error_log_func) File "/b/swarming/w/ir/third_party/catapult/devil/devil/utils/reraiser_thread.py", line 189, in JoinAll self._JoinAll(watcher, timeout) File "/b/swarming/w/ir/third_party/catapult/devil/devil/utils/reraiser_thread.py", line 154, in _JoinAll (len(alive_threads), len(self._threads))) CommandTimeoutError: Timed out waiting for 1 of 1 threads. " None of the errors have any media-specific code in their traces. This is another example of Telemetry failures at startup being incorrectly summarized as a failure of a specific story. This type of thing makes it really difficult for people who don't know the details of the whole system to understand what is going on, and I think it needs to be fixed by the Telemetry team once they are done putting out fires.
,
Apr 12 2018
This is definitely a browser crash: https://console.developers.google.com/m/cloudstorage/b/chrome-telemetry-output/o/ba8e63d8-3e70-11e8-a2d6-0242ac110005 (Tip: search for "Uploading screenshot".. in the log to find these. Note that we are also working with SOM team to surface these links in SOM directly - issue 772212 ) Also apologize for the confusing log. Making the log easier to understand for people definitely on our radar once he fire are putting out.
,
Apr 12 2018
I think the Telemetry bug here is why we don't see Android logcat upon the browser crash.. Hmhh
,
Apr 12 2018
+Juan: I think we may lost the ability to get system log upon browser startup crash with the refactoring to move browser startup to browser_finder*
,
Apr 12 2018
,
Apr 12 2018
,
Apr 12 2018
I don't have expertise to debug Clank crash, so leave this bug available. However, filing issue 832207 as blocker for this
,
Apr 12 2018
Thanks a ton for looking. New questions: 0. > I got to https://ci.chromium.org/buildbot/chromium.perf/Android%20Nexus7v2%20Perf/ > are search for "media.mobile.reference" When I search (cntrl f) for media.mobile.reference on that page, I don't find anything? 1. From comment #11, sounds like the browser (not renderer?) really did crash. I'm not able to open this link (Should I be able? I get 403): https://console.developers.google.com/m/cloudstorage/b/chrome-telemetry-output/o/ba8e63d8-3e70-11e8-a2d6-0242ac110005 So still not clear whether media/ code is possibly at fault. 2. Re comment #10 - do we also have some telemetry errors in addition to the browser crash? 3. Anyone know how to fix my tools/perf/run_benchmark commands from Comment #9 to run the test on my own android build?
,
Apr 12 2018
0. after you go to https://ci.chromium.org/buildbot/chromium.perf/Android%20Nexus7v2%20Perf/ and click on #1603 to get to https://ci.chromium.org/buildbot/chromium.perf/Android%20Nexus7v2%20Perf/1603 and then you will be able to find "media.mobile.reference". https://ci.chromium.org/buildbot/chromium.perf/Android%20Nexus7v2%20Perf/ is the main page that links to each of the runs on that bot. That bot is running tests continuously. 1. It works for me, so maybe there is some permission issue. Are you logged into your @google.com account on pantheon.corp.google.com ? 2. Ned, please look into these. it's difficult to find the real error source when there are a bunch of stack traces all over the place. 3. I run $tools/perf/run_benchmark loading.mobile --device="android" --browser=list (WARNING) 2018-04-12 13:14:00,068 binary_manager._WrapDependencyManagerFunction:59 Cannot find path for chrome_stable on platform android_n_armeabi-v7a. Falling back to android_armeabi-v7a. Available browsers: Android device ZX1G22L8XD android-chrome-dev android-system-chrome so then I run $ tools/perf/run_benchmark loading.mobile --device="android" --browser=android-chrome-dev Perhaps your device doesn't have Chrome on it? Go to the play store and install Chrome Dev.
,
Apr 12 2018
0. thanks 1. fixed now 2. I'd love to see a stack trace for the *chrome* crash - not finding one? 3. My device has my dev build of chromium on it. Is there some way to tell it to use that? It also has dev channel chrome... but my --browser=list command just yields this cryptic result: $ tools/perf/run_benchmark loading.mobile --device="android" --browser=list (... tons of debug output... ) Available browsers: Android device 079030cf Finally, a new set of questions... A. how often does the ref get updated? B. how do I determine the current revision is for the ref? What I'd like to do in these situations where the ref graph has been busted for a long time is just check in on A and B above and make sure things are still working as expected. Also, getting the call stack for the crash would help me be able to query crbug for that stack and see that the issue was addressed.
,
Apr 12 2018
#18: simpifying stack trace in Telemetry in general a hard problem. The reason is because is the life cycle of a benchmark is complex, there are many actors involved in it, e.g: telemetry, the browser, the OS, the web page replay service, battor... So when a crash happens, we need to make sure that all the available logging is there to make sure that we the right data that capture the error state. The problem with this is the log then becomes difficult to consume. So we have two choices: simple log with the worry that we won't have the right data to debug; complex log which is hard for people who are new to the framework to understand. So far we have picked former (as difficult to debug is better than impossible to debug). Having said that, a few things we are trying now is to split the log to artifacts which will be stored separately ( issue 772212 ). I am also open to other suggestions in how we make the log simpler yet still contain all the relevant info for debugging.
,
Apr 12 2018
> complex log which is hard for people who are new to the framework to understand. So far we have picked former (as difficult to debug is better than impossible to debug) I agree, this is the safer bet. What I'm confused about is that the log doesn't appear to have a chrome stack trace? Also, bump to my other questions in comment 19.
,
Apr 12 2018
#21: In general, if browser crashes at startup, we won't get chrome stack trace. The best we have is system log which can shows stuff relevant to Chrome (issue 832207)
,
Apr 12 2018
#22: Is this true of debugging android in general, or some limitation of the test infra? My thinking is I can catch chrome stack traces at startup easily if I just start the logcat before starting chrome.
,
Apr 12 2018
#23: what I meant is we won't get chrome stack trace from the tombstone. But the logcat usually contain Chrome stack as you mentioned.
,
Apr 12 2018
Thanks. Is the logcat uploaded somewhere?
,
Apr 13 2018
#25: not yet (until we fix issue 832207)
,
Apr 13 2018
Ok, thanks. With that I think the bug is ready to be closed. Regression could not be reproduced, and I assume the ref crash is fixed in later revisions (as usual). I just have these 2 lingering questions for next time I go looking ref-debugging. A. how often does the ref get updated? B. how do I determine the current revision is for the ref?
,
Apr 13 2018
,
Apr 16 2018
The following revision refers to this bug: https://chromium.googlesource.com/catapult/+/5d0c16d725ca28dcadddd44d2309ea2257e8b92b commit 5d0c16d725ca28dcadddd44d2309ea2257e8b92b Author: Caleb Rouleau <crouleau@chromium.org> Date: Mon Apr 16 17:26:32 2018 [Telemetry] Explain when no browsers available for a device Bug: chromium:830834 Change-Id: Idd8f25d2863f2235951783665ae0a6972c76616e Reviewed-on: https://chromium-review.googlesource.com/1011107 Commit-Queue: Caleb Rouleau <crouleau@chromium.org> Reviewed-by: Ned Nguyen <nednguyen@google.com> [modify] https://crrev.com/5d0c16d725ca28dcadddd44d2309ea2257e8b92b/telemetry/telemetry/internal/browser/browser_options.py
,
Apr 16 2018
Ned, re comment 20: do any of the non-chromium stacks point to some other telemetry issue we should be tracking in this bug (i.e. is the current title accurate?) > A. how often does the ref get updated? I have mostly answered this by just mousing over the graph and checking the revs. Looks like refs are usually updated often - anywhere from every 2-10 revisions.
,
Apr 16 2018
#30: > do any of the non-chromium stacks point to some other telemetry issue we should be tracking in this bug Not that I am aware of. This is purely a Chrome Android startup problem > how often does the ref get updated? Ideally once per quarter. Right now it's a script that someone needs to run. I will try to make a update now.
,
Apr 16 2018
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/4c4a55dce59dae529d82f7f4371a459e12e6373c commit 4c4a55dce59dae529d82f7f4371a459e12e6373c Author: catapult-chromium-autoroll@skia-buildbots.google.com.iam.gserviceaccount.com <catapult-chromium-autoroll@skia-buildbots.google.com.iam.gserviceaccount.com> Date: Mon Apr 16 22:03:58 2018 Roll src/third_party/catapult/ 037198303..5d0c16d72 (1 commit) https://chromium.googlesource.com/catapult.git/+log/0371983030f2..5d0c16d725ca $ git log 037198303..5d0c16d72 --date=short --no-merges --format='%ad %ae %s' 2018-04-12 crouleau [Telemetry] Explain when no browsers available for a device Created with: roll-dep src/third_party/catapult BUG= chromium:830834 The AutoRoll server is located here: https://catapult-roll.skia.org Documentation for the AutoRoller is here: https://skia.googlesource.com/buildbot/+/master/autoroll/README.md If the roll is causing failures, please contact the current sheriff, who should be CC'd on the roll, and stop the roller if necessary. TBR=sullivan@chromium.org Change-Id: I89bb2ef6dda43a6c723fc7686e01f9278876c1d3 Reviewed-on: https://chromium-review.googlesource.com/1014276 Commit-Queue: catapult-chromium-autoroll <catapult-chromium-autoroll@skia-buildbots.google.com.iam.gserviceaccount.com> Reviewed-by: catapult-chromium-autoroll <catapult-chromium-autoroll@skia-buildbots.google.com.iam.gserviceaccount.com> Cr-Commit-Position: refs/heads/master@{#551139} [modify] https://crrev.com/4c4a55dce59dae529d82f7f4371a459e12e6373c/DEPS
,
Apr 16 2018
>> do any of the non-chromium stacks point to some other telemetry issue we should be tracking in this bug > Not that I am aware of. This is purely a Chrome Android startup problem Ack. In that case we can close and the follow up work of getting the adb log with chrome stack traces is already tracked in the separate bug you filed. >> how often does the ref get updated? > Ideally once per quarter. Right now it's a script that someone needs to run. I will try to make a update now. Interesting! So, when I click around in ref graph in the UI and observe that the "Chromium commit position range" is regularly increasing - is that just a bad info? Or am I misinterpreting? Also, that makes me wonder how it is that ref graph's ever get to be persistently busted. If it worked at the time of the last roll, I would expect it to just continue working (save for the occasional flake)?
,
Apr 17 2018
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/4c4a55dce59dae529d82f7f4371a459e12e6373c commit 4c4a55dce59dae529d82f7f4371a459e12e6373c Author: catapult-chromium-autoroll@skia-buildbots.google.com.iam.gserviceaccount.com <catapult-chromium-autoroll@skia-buildbots.google.com.iam.gserviceaccount.com> Date: Mon Apr 16 22:03:58 2018 Roll src/third_party/catapult/ 037198303..5d0c16d72 (1 commit) https://chromium.googlesource.com/catapult.git/+log/0371983030f2..5d0c16d725ca $ git log 037198303..5d0c16d72 --date=short --no-merges --format='%ad %ae %s' 2018-04-12 crouleau [Telemetry] Explain when no browsers available for a device Created with: roll-dep src/third_party/catapult BUG= chromium:830834 The AutoRoll server is located here: https://catapult-roll.skia.org Documentation for the AutoRoller is here: https://skia.googlesource.com/buildbot/+/master/autoroll/README.md If the roll is causing failures, please contact the current sheriff, who should be CC'd on the roll, and stop the roller if necessary. TBR=sullivan@chromium.org Change-Id: I89bb2ef6dda43a6c723fc7686e01f9278876c1d3 Reviewed-on: https://chromium-review.googlesource.com/1014276 Commit-Queue: catapult-chromium-autoroll <catapult-chromium-autoroll@skia-buildbots.google.com.iam.gserviceaccount.com> Reviewed-by: catapult-chromium-autoroll <catapult-chromium-autoroll@skia-buildbots.google.com.iam.gserviceaccount.com> Cr-Commit-Position: refs/heads/master@{#551139} [modify] https://crrev.com/4c4a55dce59dae529d82f7f4371a459e12e6373c/DEPS
,
Jul 27
Closing per comment 33.
,
Jan 16
(6 days ago)
,
Jan 16
(6 days ago)
|
|||||||||||||||||||||||
►
Sign in to add a comment |
|||||||||||||||||||||||
Comment 1 by 42576172...@developer.gserviceaccount.com
, Apr 9 2018