New issue
Advanced search Search tips
Note: Color blocks (like or ) mean that a user may not be available. Tooltip shows the reason.

Issue 839127 link

Starred by 1 user

Issue metadata

Status: Fixed
Owner:
Last visit > 30 days ago
Closed: May 2018
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: ----
Pri: 1
Type: Bug

Blocking:
issue 838161



Sign in to add a comment

Telemetry cache temperature tests fail if error pages are isolated

Project Member Reported by nasko@chromium.org, May 2 2018

Issue description

Error pages currently commit in either the source or destination process, which is how Chrome has been working for a while. We do need to isolate them in their own process to ensure security properties we want to uphold. This is tracked in  issue 838161  with a design doc and the current implementation is in https://chromium-review.googlesource.com/c/chromium/src/+/762520.

With error pages being isolated the following telemetry unit tests are failing:

telemetry.page.cache_temperature_unittest.CacheTemperatureTests.testEnsureHotFromScratch
telemetry.page.cache_temperature_unittest.CacheTemperatureTests.testEnsureCold
telemetry.page.cache_temperature_unittest.CacheTemperatureTests.testEnsureWarmFromScratch

Tracing navigations during the execution of these tests, it is clear that none of them succeed and they are all failing with ERR_SOCKS_CONNECTION_FAILED (-120). It is expected that the error page will be committed in a separate process, but this should work the exact same way as any other cross-process navigation.

It is also puzzling that these tests tend to actually pass in local testing on Linux, but fail on Mac and Windows.

Assigning to nednguyen@ for initial triage and understanding what the root cause of the failure is.
 

Comment 1 by nasko@chromium.org, May 2 2018

Blocking: 838161
Cc: kouhei@chromium.org
Status: Started (was: Untriaged)
Working on reproduce this...

I am able to reproduce and grab a passing trace (with stable Chrome), failing trace (with Chrome build in nasko@'s CL) in it.

pass_trace.html
4.0 MB View Download
fail_trace.html
4.0 MB View Download
We can see that telemetry.internal.warm_cache.warm.start is in the pass_trace.html (Renderer (pid 11036)) but not fail_trace.html

Also the fact that there are two renderer processes in fail_trace.html confirm my theory that this test is broken because the trace event start call & trace event end call now live in two different processes, thus breaking the trace event.
Hmhh, actually looking at this carefully, the problem is different from what I originally thought.

We *do* create 2 trace markers before & after the cache warm up:
"start" marker in https://cs.chromium.org/chromium/src/third_party/catapult/telemetry/telemetry/page/cache_temperature.py?rcl=5b2cfb7fa5f197e561c81012a7cc17e5c19c84ec&l=45

"end" marker in https://cs.chromium.org/chromium/src/third_party/catapult/telemetry/telemetry/page/cache_temperature.py?rcl=5b2cfb7fa5f197e561c81012a7cc17e5c19c84ec&l=54

Looking at the trace, these markers only take 1.5ms-ish due to them being two continuous js calls from Telemetry.

So this is the problem of some how the "start" trace event cannot be created in the renderer process of "about:blank" page with nasko's change. 
Components: Speed>Telemetry
Ok, I think I know what's going on. The order of the events are as follows:
  Start at about://blank
  Issue telemetry.internal.warm_cache.warm.start
  Navigate to http://google.com (which does not exists)
  Issue telemetry.internal.warm_cache.warm.end
  Issue telemetry.internal.warm_cache.hot.start
  Navigate to http://google.com (which does not exists)
  Issue telemetry.internal.warm_cache.hot.end

So because the page google.com doesn't have associated archive with it in the test, it is a an error page. nasko's CL make a new process with that error page, thus killing the old renderer process of "about:blank" page, making us losing all the trace events of that page.

This is a bug due to the fact that tracing v1 isn't able to keep the trace events from dead processes during recording. I will try to make the test use a valid page and hope that unblock nasko@'s CL.
Ok, use a working page fixed the test! I will make the patch fix tomorrow.
Cc: perezju@chromium.org
Project Member

Comment 10 by bugdroid1@chromium.org, May 4 2018

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/f3cb54067e379373eebb8a58399e36f9046a7c35

commit f3cb54067e379373eebb8a58399e36f9046a7c35
Author: catapult-chromium-autoroll@skia-buildbots.google.com.iam.gserviceaccount.com <catapult-chromium-autoroll@skia-buildbots.google.com.iam.gserviceaccount.com>
Date: Fri May 04 00:44:42 2018

Roll src/third_party/catapult/ 19282cf9d..153134ef2 (4 commits)

https://chromium.googlesource.com/catapult.git/+log/19282cf9d32d..153134ef26c3

$ git log 19282cf9d..153134ef2 --date=short --no-merges --format='%ad %ae %s'
2018-05-03 dtu [pinpoint] Increase task queue rate.
2018-05-03 szager Add support for profiling on Android using simpleperf
2018-05-02 ssid Fix symbolization of heap profiles when so is mapped from apk
2018-05-03 nednguyen Fix cache_temperature_unittes to use working pages from test archives

Created with:
  roll-dep src/third_party/catapult
BUG=chromium:734705, chromium:839127 


The AutoRoll server is located here: https://catapult-roll.skia.org

Documentation for the AutoRoller is here:
https://skia.googlesource.com/buildbot/+/master/autoroll/README.md

If the roll is causing failures, please contact the current sheriff, who should
be CC'd on the roll, and stop the roller if necessary.


TBR=sullivan@chromium.org

Change-Id: I97dd890c7eda177ae8b603b830e60bfbb215da98
Reviewed-on: https://chromium-review.googlesource.com/1043079
Commit-Queue: catapult-chromium-autoroll <catapult-chromium-autoroll@skia-buildbots.google.com.iam.gserviceaccount.com>
Reviewed-by: catapult-chromium-autoroll <catapult-chromium-autoroll@skia-buildbots.google.com.iam.gserviceaccount.com>
Cr-Commit-Position: refs/heads/master@{#555933}
[modify] https://crrev.com/f3cb54067e379373eebb8a58399e36f9046a7c35/DEPS

Status: Fixed (was: Started)

Comment 12 by benhenry@google.com, Jan 16 (6 days ago)

Components: Test>Telemetry

Comment 13 by benhenry@google.com, Jan 16 (6 days ago)

Components: -Speed>Telemetry

Sign in to add a comment