New issue
Advanced search Search tips
Note: Color blocks (like or ) mean that a user may not be available. Tooltip shows the reason.

Issue 859073 link

Starred by 2 users

Issue metadata

Status: Verified
Owner:
Closed: Oct 8
Cc:
EstimatedDays: ----
NextAction: ----
OS: Android , Mac
Pri: 1
Type: Bug-Regression



Sign in to add a comment

Lots of dashboard failures on chromium perf across android and mac

Project Member Reported by eyaich@chromium.org, Jun 29 2018

Issue description

We are seeing lots of errors on the chromium.perf waterfall across android and mac. 

From what I can see, I *think* it is just reference build. 

https://uberchromegw.corp.google.com/i/chromium.perf/waterfall

Where valid json is not produced.  Specifically it is failing on this line: 
https://cs.chromium.org/chromium/build/scripts/slave/upload_perf_dashboard_results.py?q=%22Error:+No+perf+dashboard+JSON+was+produced.%22&sq=package:chromium&dr=C&l=153

Meaning that https://cs.chromium.org/chromium/build/scripts/slave/upload_perf_dashboard_results.py?q=%22Error:+No+perf+dashboard+JSON+was+produced.%22&sq=package:chromium&dr=C&l=153 didn't return valid json

If you look at on of the failing perf output for one of them, lets say blink_perf.svg.reference on Nexus 5X (https://uberchromegw.corp.google.com/i/chromium.perf/builders/Android%20Nexus5X%20Perf/builds/1948) 

This is the perf output: https://isolateserver.appspot.com/browse?namespace=default-gzip&digest=ad95edb78aea97c79f684f78208c35e1ca37ba45&as=perftest-output.json

Not sure if that is valid or not, it definitely looks funny.

Therefore, it is failing before it tries to upload to the dashboard and it is happening on OBBS mac as well.

 
So the log output from the last perftest-output.json run that you linked, it seems to have unexpectedly skipped 23 tests, and the last failed. A lot of the messages on the waterfall seem to be about unexpected skips, when did this start happening? Was there some change to the expectations file?
Btw, yes that output is valid. It's not particularly useful, it's just a naked ownership diagnostic that telemetry outputs, but since it's not attached to anything and there is nothing else going on in the file, the dashboard would reject it anyway if you tried to upload it.

Comment 3 by eyaich@chromium.org, Jun 29 2018

Yes we started outputting unexepected skips when a benchmark failed to run part of the test suite becuase we weren't getting any information about skipped test in buildbot.  See crbug.com/850503.

That being said, that isn't the problem here.  The problem is the perf results.  One test did fail causing the whole test suite to fail, but the problem is that the MakeHistogramSetWithDiagnostics fails on this perf data it receives.

Line 227 of this method here says TODO handle reference builds: https://cs.chromium.org/chromium/build/scripts/slave/results_dashboard.py?l=227

Not sure what that is referring to.
From digging into the blame, looks like that was written pre-reference build support, which was added a few weeks later but the TODO wasn't removed. We should just remove it, reference builds have been uploading fine since the histogram format was enabled.

So to be clear, there are 2 things going on here:

1) The failures running the actual reference build. Those seem to be hitting unexpected failures all over. These also end up producing a useless histogram output.

2) That near-empty histogram output is piped into MakeHistogramSetWithDiagnostics, which then fails with "No perf dashboard JSON was produced."

Is that about right? Do you want to vc to go dig into this?

Comment 5 by eyaich@chromium.org, Jun 29 2018

Right.  #1 is a valid error case that the perf sheriff would in theory handle if it was reported correctly.  I can dig into that. 

#2 is failing somewhere in I assume add_reserved_diagnostics since we are output the result to a file and when we try and open it it is saying it is not valid json.

Can you dig into #2 and I will look at #1?

Comment 6 by eyaich@chromium.org, Jun 29 2018

For #2 when we try and read in the json it is invalid and we get this stack trace (https://logs.chromium.org/v/?s=chrome%2Fbb%2Fchromium.perf%2Fmac-10_13_laptop_high_end-perf%2F175%2F%2B%2Frecipes%2Fsteps%2Fperformance_test_suite_on_ATI_GPU_on_Mac%2F0%2Fstdout) 

Benchmark: tab_switching.typical_25.reference, file: /var/folders/2j/22s2gz0s7hn48k32d47clxf80000gm/T/tmpRzYmJqoutputresults/128420b3-068c-4fa1-b382-fb4323c600a0tab_switching.typical_25.reference
Traceback (most recent call last):
  File "/b/c/b/mac_10_13_laptop_high_end_perf/src/tools/perf/process_perf_results.py", line 515, in <module>
    sys.exit(main())
  File "/b/c/b/mac_10_13_laptop_high_end_perf/src/tools/perf/process_perf_results.py", line 511, in main
    args.smoke_test_mode)
  File "/b/c/b/mac_10_13_laptop_high_end_perf/src/tools/perf/process_perf_results.py", line 276, in _process_perf_results
    configuration_name, build_properties, service_account_file, extra_links)
  File "/b/c/b/mac_10_13_laptop_high_end_perf/src/tools/perf/process_perf_results.py", line 422, in _handle_perf_results
    is_reference, failure)
  File "/b/c/b/mac_10_13_laptop_high_end_perf/src/tools/perf/process_perf_results.py", line 447, in _write_perf_data_to_logfile
    results = json.load(f)
  File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/json/__init__.py", line 290, in load
    **kw)
  File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/json/__init__.py", line 338, in loads
    return _default_decoder.decode(s)
  File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/json/decoder.py", line 366, in decode
    obj, end = self.raw_decode(s, idx=_w(s, 0).end())
  File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/json/decoder.py", line 384, in raw_decode
    raise ValueError("No JSON object could be decoded")
ValueError: No JSON object could be decoded
WARNING:root:merge_cmd had non-zero return code: 1
step returned non-zero exit code: 1

Comment 7 by eyaich@chromium.org, Jun 29 2018

Owner: eyaich@chromium.org
Ok Simon and I met offline.  

I am going to handle #1.  This is the reference build, so we need to determine whether or not we should turn the recipe red if the uplaod fails.  I will update the OBBS to fail gracefully and report an upload error since it is not currently and I will chat with Ned to determine how we want to handle presentation. 

For #2, the upload would fail either here or when we go to upload since the benchmark never ran and never produced valid perf outtput.  Simon and I agreed that it is ok if it fails here since it makes no difference in terms of presentation to the recipe.  IT is still a failure.  

I will take ownership of this
Status: Assigned (was: Untriaged)
Project Member

Comment 9 by bugdroid1@chromium.org, Jul 2

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/933e64e8d889dbe40b69074038b09e67892551d4

commit 933e64e8d889dbe40b69074038b09e67892551d4
Author: Emily Hanley <eyaich@google.com>
Date: Mon Jul 02 14:49:31 2018

Reland parallelizing perf dashboard uploads

Failures in this revert https://chromium-review.googlesource.com/c/chromium/src/+/1120125
were for failing mac, but the everything was succeeding, we were
failing when trying to right out not valid json as perf results.

I have added logic in line 447 of process_perf_results.py to catch
these errors in the future.  Note this was also an issue in the old
recipe, it just shows one step as failed instead of the entire suite.

Bug:  713357 , 854162 ,  859073 , 858995
Change-Id: I37c8f8fe3d7973962a17bbd64b758c7c98517799
Reviewed-on: https://chromium-review.googlesource.com/1122478
Reviewed-by: Ned Nguyen <nednguyen@google.com>
Commit-Queue: Emily Hanley <eyaich@chromium.org>
Cr-Commit-Position: refs/heads/master@{#571893}
[modify] https://crrev.com/933e64e8d889dbe40b69074038b09e67892551d4/tools/perf/core/oauth_api.py
[modify] https://crrev.com/933e64e8d889dbe40b69074038b09e67892551d4/tools/perf/core/results_dashboard.py
[modify] https://crrev.com/933e64e8d889dbe40b69074038b09e67892551d4/tools/perf/process_perf_results.py

Cc: charliea@chromium.org eyaich@chromium.org
 Issue 859558  has been merged into this issue.
I know how this got triggered. It was because of Emily's work to remove the benchmark_duration.

Some of those benchmarks wasn't empty before, now they are completely empty because the benchmark_duration are removed
Project Member

Comment 12 by bugdroid1@chromium.org, Jul 3

The following revision refers to this bug:
  https://chromium.googlesource.com/catapult/+/cbfa46069e2bfc4da5704f278b378b1fdf758521

commit cbfa46069e2bfc4da5704f278b378b1fdf758521
Author: Nghia Nguyen <nednguyen@google.com>
Date: Tue Jul 03 01:38:14 2018

Only output the total histogram format if there is at least a single page that succeed

Note: this is just a bandaid fix to ensure that if there is no page that succeed
without being skipped, Telemetry will output a completely empty histogram which enables perf dashboard upload to handle gracefully.

Bug:  chromium:859073 
Change-Id: I3ac5e83b95416fcf9831f8ce39919dcc4c4a5012

TBR=eyaich@chromium.org, simonhatch@chromium.org

Change-Id: I3ac5e83b95416fcf9831f8ce39919dcc4c4a5012
Reviewed-on: https://chromium-review.googlesource.com/1123690
Commit-Queue: Ned Nguyen <nednguyen@google.com>
Reviewed-by: Ned Nguyen <nednguyen@google.com>

[modify] https://crrev.com/cbfa46069e2bfc4da5704f278b378b1fdf758521/telemetry/telemetry/internal/story_runner.py
[modify] https://crrev.com/cbfa46069e2bfc4da5704f278b378b1fdf758521/telemetry/telemetry/internal/results/histogram_set_json_output_formatter.py
[modify] https://crrev.com/cbfa46069e2bfc4da5704f278b378b1fdf758521/telemetry/telemetry/internal/story_runner_unittest.py

Project Member

Comment 13 by bugdroid1@chromium.org, Jul 3

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/ec796af8f8d63809cb0cb2f46fe8b8521ffdc62f

commit ec796af8f8d63809cb0cb2f46fe8b8521ffdc62f
Author: catapult-chromium-autoroll <catapult-chromium-autoroll@skia-buildbots.google.com.iam.gserviceaccount.com>
Date: Tue Jul 03 03:46:24 2018

Roll src/third_party/catapult 153acbd707c0..cbfa46069e2b (1 commits)

https://chromium.googlesource.com/catapult.git/+log/153acbd707c0..cbfa46069e2b


git log 153acbd707c0..cbfa46069e2b --date=short --no-merges --format='%ad %ae %s'
2018-07-03 nednguyen@google.com Only output the total histogram format if there is at least a single page that succeed


Created with:
  gclient setdep -r src/third_party/catapult@cbfa46069e2b

The AutoRoll server is located here: https://catapult-roll.skia.org

Documentation for the AutoRoller is here:
https://skia.googlesource.com/buildbot/+/master/autoroll/README.md

If the roll is causing failures, please contact the current sheriff, who should
be CC'd on the roll, and stop the roller if necessary.

CQ_INCLUDE_TRYBOTS=luci.chromium.try:android_optional_gpu_tests_rel;luci.chromium.try:linux_optional_gpu_tests_rel;luci.chromium.try:mac_optional_gpu_tests_rel;luci.chromium.try:win_optional_gpu_tests_rel

BUG= chromium:859073 
TBR=sullivan@chromium.org

Change-Id: I36cf6a327e921f83587cd3188f85f7e9d83e5e2f
Reviewed-on: https://chromium-review.googlesource.com/1123781
Reviewed-by: catapult-chromium-autoroll <catapult-chromium-autoroll@skia-buildbots.google.com.iam.gserviceaccount.com>
Commit-Queue: catapult-chromium-autoroll <catapult-chromium-autoroll@skia-buildbots.google.com.iam.gserviceaccount.com>
Cr-Commit-Position: refs/heads/master@{#572098}
[modify] https://crrev.com/ec796af8f8d63809cb0cb2f46fe8b8521ffdc62f/DEPS

Project Member

Comment 14 by bugdroid1@chromium.org, Jul 3

The following revision refers to this bug:
  https://chromium.googlesource.com/catapult/+/c61a0380486a3c952109d0024cf77dac8a05bbb5

commit c61a0380486a3c952109d0024cf77dac8a05bbb5
Author: Ned Nguyen <nednguyen@google.com>
Date: Tue Jul 03 07:08:37 2018

Revert "Only output the total histogram format if there is at least a single page that succeed"

This reverts commit cbfa46069e2bfc4da5704f278b378b1fdf758521.

Reason for revert: does not fix the problem

Original change's description:
> Only output the total histogram format if there is at least a single page that succeed
> 
> Note: this is just a bandaid fix to ensure that if there is no page that succeed
> without being skipped, Telemetry will output a completely empty histogram which enables perf dashboard upload to handle gracefully.
> 
> Bug:  chromium:859073 
> Change-Id: I3ac5e83b95416fcf9831f8ce39919dcc4c4a5012
> 
> TBR=eyaich@chromium.org, simonhatch@chromium.org
> 
> Change-Id: I3ac5e83b95416fcf9831f8ce39919dcc4c4a5012
> Reviewed-on: https://chromium-review.googlesource.com/1123690
> Commit-Queue: Ned Nguyen <nednguyen@google.com>
> Reviewed-by: Ned Nguyen <nednguyen@google.com>

TBR=simonhatch@chromium.org,nednguyen@google.com,eyaich@chromium.org

Change-Id: I4e217b0c5f6295a81c5dcd9ed813654b5d085042
No-Presubmit: true
No-Tree-Checks: true
No-Try: true
Bug:  chromium:859073 
Reviewed-on: https://chromium-review.googlesource.com/1124179
Reviewed-by: Ned Nguyen <nednguyen@google.com>
Commit-Queue: Ned Nguyen <nednguyen@google.com>

[modify] https://crrev.com/c61a0380486a3c952109d0024cf77dac8a05bbb5/telemetry/telemetry/internal/story_runner.py
[modify] https://crrev.com/c61a0380486a3c952109d0024cf77dac8a05bbb5/telemetry/telemetry/internal/results/histogram_set_json_output_formatter.py
[modify] https://crrev.com/c61a0380486a3c952109d0024cf77dac8a05bbb5/telemetry/telemetry/internal/story_runner_unittest.py

Project Member

Comment 15 by bugdroid1@chromium.org, Jul 3

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/5d6c7964edd6e0fe1ec1d86ab3c49f75450b5c55

commit 5d6c7964edd6e0fe1ec1d86ab3c49f75450b5c55
Author: catapult-chromium-autoroll <catapult-chromium-autoroll@skia-buildbots.google.com.iam.gserviceaccount.com>
Date: Tue Jul 03 12:29:19 2018

Roll src/third_party/catapult cbfa46069e2b..c61a0380486a (1 commits)

https://chromium.googlesource.com/catapult.git/+log/cbfa46069e2b..c61a0380486a


git log cbfa46069e2b..c61a0380486a --date=short --no-merges --format='%ad %ae %s'
2018-07-03 nednguyen@google.com Revert "Only output the total histogram format if there is at least a single page that succeed"


Created with:
  gclient setdep -r src/third_party/catapult@c61a0380486a

The AutoRoll server is located here: https://catapult-roll.skia.org

Documentation for the AutoRoller is here:
https://skia.googlesource.com/buildbot/+/master/autoroll/README.md

If the roll is causing failures, please contact the current sheriff, who should
be CC'd on the roll, and stop the roller if necessary.

CQ_INCLUDE_TRYBOTS=luci.chromium.try:android_optional_gpu_tests_rel;luci.chromium.try:linux_optional_gpu_tests_rel;luci.chromium.try:mac_optional_gpu_tests_rel;luci.chromium.try:win_optional_gpu_tests_rel

BUG= chromium:859073 
TBR=sullivan@chromium.org

Change-Id: Iaa7a6b9160b2e0235e160f8014e48b85bc4e2a82
Reviewed-on: https://chromium-review.googlesource.com/1123945
Reviewed-by: catapult-chromium-autoroll <catapult-chromium-autoroll@skia-buildbots.google.com.iam.gserviceaccount.com>
Commit-Queue: catapult-chromium-autoroll <catapult-chromium-autoroll@skia-buildbots.google.com.iam.gserviceaccount.com>
Cr-Commit-Position: refs/heads/master@{#572179}
[modify] https://crrev.com/5d6c7964edd6e0fe1ec1d86ab3c49f75450b5c55/DEPS

Project Member

Comment 16 by bugdroid1@chromium.org, Jul 3

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/aff4d711c8f96053382b17b4b5519728c54e8835

commit aff4d711c8f96053382b17b4b5519728c54e8835
Author: Emily Hanley <eyaich@google.com>
Date: Tue Jul 03 14:53:58 2018

Not failing upload on telemetry outputting null

Bug:859073
Cq-Include-Trybots: master.tryserver.chromium.perf:obbs_fyi
Change-Id: I338021866a0a7c975d25198b194999f2c2f3f45a

NOTRY=true # obbs_fyi failure are flakes

Change-Id: I338021866a0a7c975d25198b194999f2c2f3f45a
Reviewed-on: https://chromium-review.googlesource.com/1124380
Commit-Queue: Ned Nguyen <nednguyen@google.com>
Reviewed-by: Ned Nguyen <nednguyen@google.com>
Cr-Commit-Position: refs/heads/master@{#572210}
[modify] https://crrev.com/aff4d711c8f96053382b17b4b5519728c54e8835/tools/perf/core/upload_results_to_perf_dashboard.py
[modify] https://crrev.com/aff4d711c8f96053382b17b4b5519728c54e8835/tools/perf/process_perf_results.py

Status: Verified (was: Assigned)

Sign in to add a comment