New issue
Advanced search Search tips

Issue 673461 link

Starred by 1 user

Issue metadata

Status: Fixed
Owner:
Closed: Dec 2016
Components:
EstimatedDays: ----
NextAction: ----
OS: Android
Pri: 1
Type: Bug-Regression



Sign in to add a comment

loading.mobile and media.mse_cases failing on chromium.perf Android One Perf (3)

Project Member Reported by charliea@chromium.org, Dec 12 2016

Issue description

Revision range: 435222-435255
Link to failing step log: https://luci-logdog.appspot.com/v/?s=chrome%2Fbb%2Fchromium.perf%2FAndroid_One_Perf__3_%2F4509%2F%2B%2Frecipes%2Fsteps%2Floading.mobile%2F0%2Fstdout

Loading.mobile also failed two runs before this revision range, but it seems that it was due to AdbCommandFailed errors, whereas all failures afterwards were due to javascript expression timeouts.

I'm going to kick off a bisect.

If the test is disabled, please downgrade to Pri-2.

 
Cc: -charliea@google.com
Summary: loading.mobile and media.mse_cases failing on chromium.perf Android One Perf (3) (was: loading.mobile failure on chromium.perf Android One Perf (3))
Also going to lump in the media.mse_cases failure on the Android Ones: both loading.mobile and media.mse_cases started failing in the same revision range with the same error (timed out waiting for IsJavascriptExpressionTrue), so I highly suspect that fixing one will fix the other.
Project Member

Comment 4 by 42576172...@developer.gserviceaccount.com, Dec 12 2016


===== BISECT JOB RESULTS =====
Status: started


===== TESTED REVISIONS =====
Revision         Exit Code  Std Dev  N  Good?
chromium@435221  0          N/A      5  good
chromium@435255  0          N/A      5  bad

Bisect job ran on: android_one_perf_bisect
Bug ID: 673461

Test Command: src/tools/perf/run_benchmark -v --browser=android-chromium --output-format=chartjson --upload-results --pageset-repeat=1 --also-run-disabled-tests --story-filter=flower loading.mobile
Test Metric: timeToFirstContentfulPaint_avg/https___www.google.com_search?q_flower_q_flower+delivery
Relative Change: 0.00%

Buildbot stdio: http://build.chromium.org/p/tryserver.chromium.perf/builders/android_one_perf_bisect/builds/1873
Job details: https://chromeperf.appspot.com/buildbucket_job_status/8993450393562116736


Not what you expected? We'll investigate and get back to you!
  https://chromeperf.appspot.com/bad_bisect?try_job_id=5883138138439680

| O O | Visit http://www.chromium.org/developers/speed-infra/perf-bug-faq
|  X  | for more information addressing perf regression bugs. For feedback,
| / \ | file a bug with component Tests>AutoBisect.  Thank you!
Weird: when I look through the logs for the latest run (https://luci-logdog.appspot.com/v/?s=chrome%2Fbb%2Fchromium.perf%2FAndroid_One_Perf__3_%2F4512%2F%2B%2Frecipes%2Fsteps%2Floading.mobile%2F0%2Fstdout), I'm seeing some strange results. When I search for 

failed  ] https://www.google.com/search?q=flower#q=flower+delivery

I get two results. When I search for 

ok ] https://www.google.com/search?q=flower#q=flower+delivery

I also get two results. 

Intuitively, it looks like it makes sense that there's one "failed" and one "ok": it looks like we're trying to run the test twice, once with traffic shaping and one w/o traffic shaping. But why are we running it four times?
Owner: charliea@chromium.org
Status: Assigned (was: Untriaged)
Ah, it's obvious that this regression was caused by the traffic-shaped 3G benchmarks being introduced in the first place:

https://codereview.chromium.org/2493063002

Here's an example stack trace from a failure:

https://paste.googleplex.com/4615545282363392

This certainly makes it look like we're timing out waiting for the page to load. This screenshot also seems to confirm this:

https://console.developers.google.com/m/cloudstorage/b/chrome-telemetry-output/o/profiler-file-id_59-2016-12-12_08-42-2061933.png

Strangely, in the stack trace I linked to, one of the error messages reads:

TimeoutException: Timed out while waiting 26s for IsJavaScriptExpressionTrue.

I also found:

"TimeoutException: Timed out while waiting 0s for IsJavaScriptExpressionTrue."

This doesn't make any frickin' sense. I need to look into how this timeout reaches the error message in order to make sure that we're doing the right thing here.
It's worth noting that some of these timeouts have messages like "TimeoutException: Timed out while waiting 0s for IsJavaScriptExpressionTrue." See: https://paste.googleplex.com/5494691936075776

My suspicion is that we're passing along a duration that means something like "time that this test has left to run", and eventually that number hits zero before we even get a chance to run WaitForDocumentReadyStateToBeInteractiveOrBetter.
And here's (https://paste.googleplex.com/5845565799661568) another similar, but slightly different stack trace. Both of these stack traces indicate that Telemetry is timing out while waiting for a Javascript expression to become true, but in two different ways.
Project Member

Comment 11 by bugdroid1@chromium.org, Dec 16 2016

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/1d47683961dc5963f758463508404ff114133d9a

commit 1d47683961dc5963f758463508404ff114133d9a
Author: charliea <charliea@chromium.org>
Date: Fri Dec 16 00:48:10 2016

Double the navigation timeouts for page cycler stories

This is in response to problems with loading.mobile on Android One/3G,
where almost all stories were timing out.

BUG= 673461 

Review-Url: https://codereview.chromium.org/2577243002
Cr-Commit-Position: refs/heads/master@{#438969}

[modify] https://crrev.com/1d47683961dc5963f758463508404ff114133d9a/tools/perf/page_sets/page_cycler_story.py

My CL yesterday seems to have drastically helped the situation, but it doesn't seem to have made the test green: we're down from 23 failing stories to just one, http://www.ibicn.com. This story is listed as "tough_tti", so it's not too surprising that this one is the one that's causing problems. I'm going to play around with timeouts on my local Android One device in order to see what a good timeout for this story might be.
I've attached the failure image. This thing's ridiculously close to being done loading.

I talked to Ned and we decided that increasing the timeout by another 50% should probably be sufficient to resolve the issue.
failure_image.png
320 KB View Download
Project Member

Comment 14 by bugdroid1@chromium.org, Dec 16 2016

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/33dea85c0961db70473126d594815c5f1739bc84

commit 33dea85c0961db70473126d594815c5f1739bc84
Author: charliea <charliea@chromium.org>
Date: Fri Dec 16 22:18:01 2016

Further increase the navigation timeout to 180s for page cycler stories

The increase done in https://codereview.chromium.org/2577243002 was
effective in decreasing the number of failing 3G/Android One stories
from 23 to 1, and hopefully this increase gets that number to zero.

BUG= 673461 

Review-Url: https://codereview.chromium.org/2576403005
Cr-Commit-Position: refs/heads/master@{#439214}

[modify] https://crrev.com/33dea85c0961db70473126d594815c5f1739bc84/tools/perf/page_sets/page_cycler_story.py

Status: Fixed (was: Assigned)

Sign in to add a comment