New issue
Advanced search Search tips
Note: Color blocks (like or ) mean that a user may not be available. Tooltip shows the reason.

Issue 886621 link

Starred by 2 users

Issue metadata

Status: Fixed
Owner:
Closed: Oct 19
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: ----
Pri: 1
Type: ----

Blocking:
issue 760498



Sign in to add a comment

startup.mobile: Make FCP resilient to missing browser process names (or fix browser process name issue in traces)

Project Member Reported by sheriff-...@appspot.gserviceaccount.com, Sep 19

Issue description

Filed by sheriff-o-matic@appspot.gserviceaccount.com on behalf of pasko@google.com

startup.mobile failing on multiple builders

Builders failed on: 
- health-plan-clankium-low-end-phone: 
  https://ci.chromium.org/buildbot/internal.client.clank/health-plan-clankium-low-end-phone
- health-plan-clankium-phone: 
  https://ci.chromium.org/buildbot/internal.client.clank/health-plan-clankium-phone

(ERROR) 2018-09-18 21:35:52,642 page_test_results.Fail:540  Failure recorded: Error: Number of FCP events (10) differs from number of browser starts (9)
    at new androidStartupMetric (/tracing/metrics/android_startup_metric.html:74:13)
    at runMetrics (/b/build/slave/health-plan-clankium-phone/build/src/third_party/catapult/tracing/tracing/metrics/metric_map_function.html:61:16)
    at metricMapFunction (/b/build/slave/health-plan-clankium-phone/build/src/third_party/catapult/tracing/tracing/metrics/metric_map_function.html:190:24)
    at Object.mapSingleTrace (/tracing/mre/map_single_trace.html:39:7)
    at eval (/b/build/slave/health-plan-clankium-phone/build/src/third_party/catapult/tracing/tracing/mre/map_single_trace_cmdline.html:61:18)
    at Object.runAndConvertErrorsToFailures (/tracing/mre/map_single_trace.html:24:10)
    at mapSingleTraceWithResult (/b/build/slave/health-plan-clankium-phone/build/src/third_party/catapult/tracing/tracing/mre/map_single_trace_cmdline.html:52:12)
    at Object.mapSingleTraceMain (/b/build/slave/health-plan-clankium-phone/build/src/third_party/catapult/tracing/tracing/mre/map_single_trace_cmdline.html:77:20)
    at eval (/b/build/slave/health-plan-clankium-phone/build/src/third_party/catapult/tracing/tracing/mre/map_single_trace_cmdline.html:90:15)
    at Object.loadHTMLFile (/b/build/slave/health-plan-clankium-phone/build/src/third_party/catapult/third_party/vinn/vinn/html_imports_loader.js:140:5)

 
Cc: -pasko@google.com perezju@chromium.org lizeb@chromium.org benjhayden@chromium.org nednguyen@chromium.org
Owner: pasko@chromium.org
Status: Started (was: Available)
Attaching the trace from the run that produces "Error: Number of FCP events (10) differs from number of browser starts (9)".

The number of browser starts is computed by iterating over all events 'Startup.BrowserMessageLoopStartTimeFromMainEntry3' on the trace. There are 10 of them, as expected, but iterating with model.getOrCreateHelper(tr.model.helpers.ChromeModelHelper) reveals only 9. Also if we use the search box in the html, the event will be found 9 times. The event that is not found starts at 171871.xxx ms, in "Process 6255". Other browser processes on the trace are marked as "Browser (pid: XXXX), uptime: XXs".

I think this incorrect process name causes the event not to be found. We saw this happening with traces from users before.

To eliminate the crash I will remove the FCP metric shortly.

Possible fixes and workarounds:
1. figure out why we don't set the process name reliably (could be platform bug)
2. add device name to the trace and avoid recording the FCP for known bad devices
3. ignore the FCP event that follows the process start that is unknown
4. look for the event in all processes, not just browser process (is there a Helper for that?)

I am leaning towards (4) for a longer term fix, but not sure how to do it yet. benjhayden: is there a way to iterate over _all_events in the trace?
bug_886621.html.gz
5.5 MB Download
to repro with the trace: choose androidStartupMetric in the metrics dropdown
Blocking: 760498
Labels: -Restrict-View-Google
Summary: startup.mobile: Make FCP resilient to missing browser process names (or fix browser process name issue in traces) (was: startup.mobile failing on multiple builders)
remove the metric temporarily: https://chromium-review.googlesource.com/c/catapult/+/1233578
Project Member

Comment 4 by bugdroid1@chromium.org, Sep 19

The following revision refers to this bug:
  https://chromium.googlesource.com/catapult/+/deef6ea33bc9da62ac8d59cd28e1edc03e202335

commit deef6ea33bc9da62ac8d59cd28e1edc03e202335
Author: Egor Pasko <pasko@chromium.org>
Date: Wed Sep 19 12:49:25 2018

androidStartupMetric: remove the FCP

The metric is flaky and sometimes produces nonsense results. Remove it
to fix the bots. More details are on the bug.

TBR=benjhayden@chromium.org

Bug:  chromium:886621 
Change-Id: If8198b3eb587c5e3211ff87e55da9c2c3c814053
Reviewed-on: https://chromium-review.googlesource.com/1233578
Reviewed-by: Egor Pasko <pasko@chromium.org>
Commit-Queue: Egor Pasko <pasko@chromium.org>

[modify] https://crrev.com/deef6ea33bc9da62ac8d59cd28e1edc03e202335/tracing/tracing/metrics/android_startup_metric_test.html
[modify] https://crrev.com/deef6ea33bc9da62ac8d59cd28e1edc03e202335/tracing/tracing/metrics/android_startup_metric.html

Cc: mythria@chromium.org dproy@chromium.org
> benjhayden: is there a way to iterate over _all_events in the trace?

for (const proc of Object.values(model.processes)) {
  for (const event of proc.getDescendantEvents()) {
    ...
  }
}

Of course, that can take a long time for large traces.

+dproy: Can you take a look at why that browser process isn't being recognized?
Cc: primiano@chromium.org oysteine@chromium.org
thanks!!

Started happening on perf-monochrome-n-phone: https://ci.chromium.org/buildbot/internal.client.clank/perf-monochrome-n-phone/8806

I suspect the incorrect naming of the browser process on the trace may be related to perfetto being enabled: crbug.com/886918,  crbug.com/886943 
Project Member

Comment 7 by bugdroid1@chromium.org, Sep 19

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/20136a68eadb1aaeda4252b590f0bcc047c04ebe

commit 20136a68eadb1aaeda4252b590f0bcc047c04ebe
Author: chromium-autoroll <chromium-autoroll@skia-public.iam.gserviceaccount.com>
Date: Wed Sep 19 18:59:44 2018

Roll src/third_party/catapult c968ea0b6538..92ef0386b8b6 (15 commits)

https://chromium.googlesource.com/catapult.git/+log/c968ea0b6538..92ef0386b8b6


git log c968ea0b6538..92ef0386b8b6 --date=short --no-merges --format='%ad %ae %s'
2018-09-19 dtu@chromium.org [pinpoint] Data migration basic error handling.
2018-09-19 dtu@chromium.org [pinpoint] Basic stats page.
2018-09-19 dtu@chromium.org [pinpoint] Update fields in Jobs table.
2018-09-19 cbruni@chromium.org results2.html debounce search
2018-09-19 simonhatch@chromium.org Dashboard - Fix alert and comparison_magnitude.
2018-09-19 simonhatch@chromium.org Dashboard - Replace last_alerted_revision with query
2018-09-19 slobodan@google.com Fixing frames in animations following response.
2018-09-19 pasko@chromium.org androidStartupMetric: remove the FCP
2018-09-18 dtu@chromium.org [pinpoint] Remove unused <job-chart-icon> import.
2018-09-18 achuith@chromium.org oobe: Catch WebSocketException instead.
2018-09-18 dtu@chromium.org [pinpoint] Remove list of differences and reformat arguments list.
2018-09-18 sadrul@chromium.org rendering: Remove some redundant mean_ metrics.
2018-09-18 sadrul@chromium.org rendering: Remove discrepancy metrics.
2018-09-18 chiniforooshan@chromium.org Telemetry: Fix SF events
2018-09-18 mattm@chromium.org netlog_viewer/README.md: Re-order with user-focused sections at top.


Created with:
  gclient setdep -r src/third_party/catapult@92ef0386b8b6

The AutoRoll server is located here: https://autoroll.skia.org/r/catapult-autoroll

Documentation for the AutoRoller is here:
https://skia.googlesource.com/buildbot/+/master/autoroll/README.md

If the roll is causing failures, please contact the current sheriff, who should
be CC'd on the roll, and stop the roller if necessary.

CQ_INCLUDE_TRYBOTS=luci.chromium.try:android_optional_gpu_tests_rel;luci.chromium.try:linux_optional_gpu_tests_rel;luci.chromium.try:mac_optional_gpu_tests_rel;luci.chromium.try:win_optional_gpu_tests_rel

BUG=chromium:878390, chromium:868073 , chromium:883079 ,chromium:883715, chromium:886621 , chromium:879353 , chromium:884950 , chromium:884950 , chromium:863926 
TBR=sullivan@chromium.org

Change-Id: I9fa8e95d9aad524c032b0bf637dfc07926ce7795
Reviewed-on: https://chromium-review.googlesource.com/1234576
Reviewed-by: chromium-autoroll <chromium-autoroll@skia-public.iam.gserviceaccount.com>
Commit-Queue: chromium-autoroll <chromium-autoroll@skia-public.iam.gserviceaccount.com>
Cr-Commit-Position: refs/heads/master@{#592499}
[modify] https://crrev.com/20136a68eadb1aaeda4252b590f0bcc047c04ebe/DEPS

I used html2trace to extract the offending trace [attached], and the metadata event for process name ({"ph": "M", args {"name": "Browser"}...}) is missing, which is why we're not recognizing the browser process. This looks like a bug in chromium. 

+oysteine do you know what could be going on here?
trace.7
6.4 MB Download
dproy@: My suspicion is that it's missing trace events due to a threading issue fixed in https://bugs.chromium.org/p/chromium/issues/detail?id=886528 but I'm not sure how easily I can validate that until I try to re-enable Perfetto on the perf waterfall again.
Components: Speed>Tracing Speed>Metrics
Project Member

Comment 11 by bugdroid1@chromium.org, Oct 10

The following revision refers to this bug:
  https://chromium.googlesource.com/catapult/+/86bdcbf37ffcfaa2c9064a14e24f6195a56baefe

commit 86bdcbf37ffcfaa2c9064a14e24f6195a56baefe
Author: Egor Pasko <pasko@chromium.org>
Date: Wed Oct 10 02:45:50 2018

androidStartupMetric: slow fallback for incorrect process names

Scan all processes for event
"Startup.BrowserMessageLoopStartTimeFromMainEntry3" in case no event is
found ChromeModelHelper..browserHelpers. This happens often when the
trace is combined with a systace.

The way to reproduce this issue with merging the trace+atrace is
demonstrated in:
  https://chromium-review.googlesource.com/c/chromium/src/+/1264543

Bug:  chromium:886621 
Bug: chromium:891797
Change-Id: Ibdc3d4a30dd25153a7d4b0d824101ab1ba737168
Reviewed-on: https://chromium-review.googlesource.com/c/1270856
Reviewed-by: Ben Hayden <benjhayden@chromium.org>
Commit-Queue: Egor Pasko <pasko@chromium.org>

[modify] https://crrev.com/86bdcbf37ffcfaa2c9064a14e24f6195a56baefe/tracing/tracing/metrics/android_startup_metric.html

Project Member

Comment 12 by bugdroid1@chromium.org, Oct 10

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/ba688e0187826d1ba43799dea76bb0eea45e8008

commit ba688e0187826d1ba43799dea76bb0eea45e8008
Author: chromium-autoroll <chromium-autoroll@skia-public.iam.gserviceaccount.com>
Date: Wed Oct 10 13:08:08 2018

Roll src/third_party/catapult 357c5c287bba..86bdcbf37ffc (3 commits)

https://chromium.googlesource.com/catapult.git/+log/357c5c287bba..86bdcbf37ffc


git log 357c5c287bba..86bdcbf37ffc --date=short --no-merges --format='%ad %ae %s'
2018-10-10 pasko@chromium.org androidStartupMetric: slow fallback for incorrect process names
2018-10-09 benjhayden@chromium.org Add Request classes for v2spa
2018-10-09 mlippautz@chromium.org Ignore forced Blink GCs in metrics


Created with:
  gclient setdep -r src/third_party/catapult@86bdcbf37ffc

The AutoRoll server is located here: https://autoroll.skia.org/r/catapult-autoroll

Documentation for the AutoRoller is here:
https://skia.googlesource.com/buildbot/+/master/autoroll/README.md

If the roll is causing failures, please contact the current sheriff, who should
be CC'd on the roll, and stop the roller if necessary.

CQ_INCLUDE_TRYBOTS=luci.chromium.try:android_optional_gpu_tests_rel;luci.chromium.try:linux_optional_gpu_tests_rel;luci.chromium.try:mac_optional_gpu_tests_rel;luci.chromium.try:win_optional_gpu_tests_rel

BUG= chromium:886621 ,chromium:891797,chromium:843903
TBR=sullivan@chromium.org

Change-Id: I9aedff776241a9f68f0d9efa700c0e769eb29d2c
Reviewed-on: https://chromium-review.googlesource.com/c/1273339
Reviewed-by: chromium-autoroll <chromium-autoroll@skia-public.iam.gserviceaccount.com>
Commit-Queue: chromium-autoroll <chromium-autoroll@skia-public.iam.gserviceaccount.com>
Cr-Commit-Position: refs/heads/master@{#598298}
[modify] https://crrev.com/ba688e0187826d1ba43799dea76bb0eea45e8008/DEPS

Project Member

Comment 13 by bugdroid1@chromium.org, Oct 12

The following revision refers to this bug:
  https://chromium.googlesource.com/catapult/+/1dc9e0a000109cf1190fb37a4834d9ad8a71081a

commit 1dc9e0a000109cf1190fb37a4834d9ad8a71081a
Author: Egor Pasko <pasko@chromium.org>
Date: Fri Oct 12 10:58:58 2018

androidStartupMetric: re-introduce First Contentful Paint

The previous failures were caused by enabling Perfetto, which since has
been reverted.

Re-introducing the metric with a fallback to scanning all processes,
just like for 'messageloop_start_time' - this should make the metric
resilient to missed process names. The issues with missing process name
remain for traces that are combined with systrace.

Note: this is not a trivial revert of [1] in the test because since then
the metric started skipping the first events on the trace.

[1] commit "androidStartupMetric: remove the FCP"
    https://github.com/catapult-project/catapult/commit/deef6ea3

Bug:  chromium:886621 
Bug: chromium:893514
Change-Id: Ib1c2ca5b426a67c7071209ae7969fef4d5651d08
Reviewed-on: https://chromium-review.googlesource.com/c/1273054
Commit-Queue: Egor Pasko <pasko@chromium.org>
Reviewed-by: Ben Hayden <benjhayden@chromium.org>

[modify] https://crrev.com/1dc9e0a000109cf1190fb37a4834d9ad8a71081a/tracing/tracing/metrics/android_startup_metric_test.html
[modify] https://crrev.com/1dc9e0a000109cf1190fb37a4834d9ad8a71081a/tracing/tracing/metrics/android_startup_metric.html

Project Member

Comment 14 by bugdroid1@chromium.org, Oct 12

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/bb11cbc66e56a475bdf0f5bf3d962cf80c3afe27

commit bb11cbc66e56a475bdf0f5bf3d962cf80c3afe27
Author: chromium-autoroll <chromium-autoroll@skia-public.iam.gserviceaccount.com>
Date: Fri Oct 12 22:54:58 2018

Roll src/third_party/catapult c8b97e37ec9c..99b88b0a2d58 (21 commits)

https://chromium.googlesource.com/catapult.git/+log/c8b97e37ec9c..99b88b0a2d58


git log c8b97e37ec9c..99b88b0a2d58 --date=short --no-merges --format='%ad %ae %s'
2018-10-12 simonhatch@chromium.org Pinpoint - Merge bugs with same root cause.
2018-10-12 benjhayden@chromium.org Change deprecation warnings in /api/alerts to errors.
2018-10-12 benjhayden@chromium.org Cache report names, test suites, and descriptors in v2spa service worker
2018-10-12 chiniforooshan@chromium.org Telemetry: process all rendering pipeline events
2018-10-12 benjhayden@chromium.org Cache session ids in v2spa service worker.
2018-10-12 benjhayden@chromium.org Fix links in rollback.md
2018-10-12 benjhayden@chromium.org Add KeyValueCacheRequest for v2spa service worker.
2018-10-12 benjhayden@chromium.org Add ts_mon metrics to common/timing.py
2018-10-12 benjhayden@chromium.org Add CacheRequestBase for v2spa service worker.
2018-10-12 nednguyen@google.com [Telemetry] Add --test-filter flag that support exact matching of multiple tests
2018-10-12 sadrul@chromium.org rendering: Generate metrics at the 95%ile.
2018-10-12 pasko@chromium.org androidStartupMetric: re-introduce First Contentful Paint
2018-10-12 chrishtr@chromium.org Clean up categories to match current tracing in Blink.
2018-10-11 jbudorick@chromium.org Revert "Enable orderfile memory optimization in benchmarks"
2018-10-11 benjhayden@chromium.org Add task queue for v2spa service worker.
2018-10-11 benjhayden@chromium.org Remove unnecessary file dashboard/test/index.html
2018-10-11 perezju@chromium.org [soundwave] Allow points with missing r_chromium
2018-10-11 chiniforooshan@chromium.org Telemetry: migrate frame_times & percentage_smooth
2018-10-11 ulan@chromium.org [tracing] Restore the minimum mutator utilization metric.
2018-10-11 seanmccullough@chromium.org [chromeperf] add prefix to wct script so tests get found.
2018-10-11 pasko@chromium.org Enable orderfile memory optimization in benchmarks


Created with:
  gclient setdep -r src/third_party/catapult@99b88b0a2d58

The AutoRoll server is located here: https://autoroll.skia.org/r/catapult-autoroll

Documentation for the AutoRoller is here:
https://skia.googlesource.com/buildbot/+/master/autoroll/README.md

If the roll is causing failures, please contact the current sheriff, who should
be CC'd on the roll, and stop the roller if necessary.

CQ_INCLUDE_TRYBOTS=luci.chromium.try:android_optional_gpu_tests_rel;luci.chromium.try:linux_optional_gpu_tests_rel;luci.chromium.try:mac_optional_gpu_tests_rel;luci.chromium.try:win_optional_gpu_tests_rel

BUG=chromium:840872, chromium:893199 ,chromium:867060,chromium:894261,chromium:894287, chromium:886621 ,chromium:893514,chromium:758566, chromium:879526 , chromium:890757 , chromium:892385 , chromium:877660 ,chromium:758566
TBR=sullivan@chromium.org

Change-Id: I469ed20d66790fa12e290ad65e2ea51c16cfdba7
Reviewed-on: https://chromium-review.googlesource.com/c/1279078
Reviewed-by: chromium-autoroll <chromium-autoroll@skia-public.iam.gserviceaccount.com>
Commit-Queue: chromium-autoroll <chromium-autoroll@skia-public.iam.gserviceaccount.com>
Cr-Commit-Position: refs/heads/master@{#599392}
[modify] https://crrev.com/bb11cbc66e56a475bdf0f5bf3d962cf80c3afe27/DEPS

Status: Fixed (was: Started)
FCP looks more adequate now:

https://chromeperf.appspot.com/report?sid=36d4a5149ba2094b1d048368008439adadaa5397178ec884bf0b43343b4d0f0b&start_rev=1539538154&end_rev=1539912613

FCP metrics do not crash -> Fixed

Sign in to add a comment