Issue metadata
Sign in to add a comment
|
Lots of dashboard failures on chromium perf across android and mac |
||||||||||||||||||||
Issue descriptionWe are seeing lots of errors on the chromium.perf waterfall across android and mac. From what I can see, I *think* it is just reference build. https://uberchromegw.corp.google.com/i/chromium.perf/waterfall Where valid json is not produced. Specifically it is failing on this line: https://cs.chromium.org/chromium/build/scripts/slave/upload_perf_dashboard_results.py?q=%22Error:+No+perf+dashboard+JSON+was+produced.%22&sq=package:chromium&dr=C&l=153 Meaning that https://cs.chromium.org/chromium/build/scripts/slave/upload_perf_dashboard_results.py?q=%22Error:+No+perf+dashboard+JSON+was+produced.%22&sq=package:chromium&dr=C&l=153 didn't return valid json If you look at on of the failing perf output for one of them, lets say blink_perf.svg.reference on Nexus 5X (https://uberchromegw.corp.google.com/i/chromium.perf/builders/Android%20Nexus5X%20Perf/builds/1948) This is the perf output: https://isolateserver.appspot.com/browse?namespace=default-gzip&digest=ad95edb78aea97c79f684f78208c35e1ca37ba45&as=perftest-output.json Not sure if that is valid or not, it definitely looks funny. Therefore, it is failing before it tries to upload to the dashboard and it is happening on OBBS mac as well.
,
Jun 29 2018
Btw, yes that output is valid. It's not particularly useful, it's just a naked ownership diagnostic that telemetry outputs, but since it's not attached to anything and there is nothing else going on in the file, the dashboard would reject it anyway if you tried to upload it.
,
Jun 29 2018
Yes we started outputting unexepected skips when a benchmark failed to run part of the test suite becuase we weren't getting any information about skipped test in buildbot. See crbug.com/850503. That being said, that isn't the problem here. The problem is the perf results. One test did fail causing the whole test suite to fail, but the problem is that the MakeHistogramSetWithDiagnostics fails on this perf data it receives. Line 227 of this method here says TODO handle reference builds: https://cs.chromium.org/chromium/build/scripts/slave/results_dashboard.py?l=227 Not sure what that is referring to.
,
Jun 29 2018
From digging into the blame, looks like that was written pre-reference build support, which was added a few weeks later but the TODO wasn't removed. We should just remove it, reference builds have been uploading fine since the histogram format was enabled. So to be clear, there are 2 things going on here: 1) The failures running the actual reference build. Those seem to be hitting unexpected failures all over. These also end up producing a useless histogram output. 2) That near-empty histogram output is piped into MakeHistogramSetWithDiagnostics, which then fails with "No perf dashboard JSON was produced." Is that about right? Do you want to vc to go dig into this?
,
Jun 29 2018
Right. #1 is a valid error case that the perf sheriff would in theory handle if it was reported correctly. I can dig into that. #2 is failing somewhere in I assume add_reserved_diagnostics since we are output the result to a file and when we try and open it it is saying it is not valid json. Can you dig into #2 and I will look at #1?
,
Jun 29 2018
For #2 when we try and read in the json it is invalid and we get this stack trace (https://logs.chromium.org/v/?s=chrome%2Fbb%2Fchromium.perf%2Fmac-10_13_laptop_high_end-perf%2F175%2F%2B%2Frecipes%2Fsteps%2Fperformance_test_suite_on_ATI_GPU_on_Mac%2F0%2Fstdout) Benchmark: tab_switching.typical_25.reference, file: /var/folders/2j/22s2gz0s7hn48k32d47clxf80000gm/T/tmpRzYmJqoutputresults/128420b3-068c-4fa1-b382-fb4323c600a0tab_switching.typical_25.reference Traceback (most recent call last): File "/b/c/b/mac_10_13_laptop_high_end_perf/src/tools/perf/process_perf_results.py", line 515, in <module> sys.exit(main()) File "/b/c/b/mac_10_13_laptop_high_end_perf/src/tools/perf/process_perf_results.py", line 511, in main args.smoke_test_mode) File "/b/c/b/mac_10_13_laptop_high_end_perf/src/tools/perf/process_perf_results.py", line 276, in _process_perf_results configuration_name, build_properties, service_account_file, extra_links) File "/b/c/b/mac_10_13_laptop_high_end_perf/src/tools/perf/process_perf_results.py", line 422, in _handle_perf_results is_reference, failure) File "/b/c/b/mac_10_13_laptop_high_end_perf/src/tools/perf/process_perf_results.py", line 447, in _write_perf_data_to_logfile results = json.load(f) File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/json/__init__.py", line 290, in load **kw) File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/json/__init__.py", line 338, in loads return _default_decoder.decode(s) File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/json/decoder.py", line 366, in decode obj, end = self.raw_decode(s, idx=_w(s, 0).end()) File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/json/decoder.py", line 384, in raw_decode raise ValueError("No JSON object could be decoded") ValueError: No JSON object could be decoded WARNING:root:merge_cmd had non-zero return code: 1 step returned non-zero exit code: 1
,
Jun 29 2018
Ok Simon and I met offline. I am going to handle #1. This is the reference build, so we need to determine whether or not we should turn the recipe red if the uplaod fails. I will update the OBBS to fail gracefully and report an upload error since it is not currently and I will chat with Ned to determine how we want to handle presentation. For #2, the upload would fail either here or when we go to upload since the benchmark never ran and never produced valid perf outtput. Simon and I agreed that it is ok if it fails here since it makes no difference in terms of presentation to the recipe. IT is still a failure. I will take ownership of this
,
Jun 30 2018
,
Jul 2
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/933e64e8d889dbe40b69074038b09e67892551d4 commit 933e64e8d889dbe40b69074038b09e67892551d4 Author: Emily Hanley <eyaich@google.com> Date: Mon Jul 02 14:49:31 2018 Reland parallelizing perf dashboard uploads Failures in this revert https://chromium-review.googlesource.com/c/chromium/src/+/1120125 were for failing mac, but the everything was succeeding, we were failing when trying to right out not valid json as perf results. I have added logic in line 447 of process_perf_results.py to catch these errors in the future. Note this was also an issue in the old recipe, it just shows one step as failed instead of the entire suite. Bug: 713357 , 854162 , 859073 , 858995 Change-Id: I37c8f8fe3d7973962a17bbd64b758c7c98517799 Reviewed-on: https://chromium-review.googlesource.com/1122478 Reviewed-by: Ned Nguyen <nednguyen@google.com> Commit-Queue: Emily Hanley <eyaich@chromium.org> Cr-Commit-Position: refs/heads/master@{#571893} [modify] https://crrev.com/933e64e8d889dbe40b69074038b09e67892551d4/tools/perf/core/oauth_api.py [modify] https://crrev.com/933e64e8d889dbe40b69074038b09e67892551d4/tools/perf/core/results_dashboard.py [modify] https://crrev.com/933e64e8d889dbe40b69074038b09e67892551d4/tools/perf/process_perf_results.py
,
Jul 2
,
Jul 2
I know how this got triggered. It was because of Emily's work to remove the benchmark_duration. Some of those benchmarks wasn't empty before, now they are completely empty because the benchmark_duration are removed
,
Jul 3
The following revision refers to this bug: https://chromium.googlesource.com/catapult/+/cbfa46069e2bfc4da5704f278b378b1fdf758521 commit cbfa46069e2bfc4da5704f278b378b1fdf758521 Author: Nghia Nguyen <nednguyen@google.com> Date: Tue Jul 03 01:38:14 2018 Only output the total histogram format if there is at least a single page that succeed Note: this is just a bandaid fix to ensure that if there is no page that succeed without being skipped, Telemetry will output a completely empty histogram which enables perf dashboard upload to handle gracefully. Bug: chromium:859073 Change-Id: I3ac5e83b95416fcf9831f8ce39919dcc4c4a5012 TBR=eyaich@chromium.org, simonhatch@chromium.org Change-Id: I3ac5e83b95416fcf9831f8ce39919dcc4c4a5012 Reviewed-on: https://chromium-review.googlesource.com/1123690 Commit-Queue: Ned Nguyen <nednguyen@google.com> Reviewed-by: Ned Nguyen <nednguyen@google.com> [modify] https://crrev.com/cbfa46069e2bfc4da5704f278b378b1fdf758521/telemetry/telemetry/internal/story_runner.py [modify] https://crrev.com/cbfa46069e2bfc4da5704f278b378b1fdf758521/telemetry/telemetry/internal/results/histogram_set_json_output_formatter.py [modify] https://crrev.com/cbfa46069e2bfc4da5704f278b378b1fdf758521/telemetry/telemetry/internal/story_runner_unittest.py
,
Jul 3
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/ec796af8f8d63809cb0cb2f46fe8b8521ffdc62f commit ec796af8f8d63809cb0cb2f46fe8b8521ffdc62f Author: catapult-chromium-autoroll <catapult-chromium-autoroll@skia-buildbots.google.com.iam.gserviceaccount.com> Date: Tue Jul 03 03:46:24 2018 Roll src/third_party/catapult 153acbd707c0..cbfa46069e2b (1 commits) https://chromium.googlesource.com/catapult.git/+log/153acbd707c0..cbfa46069e2b git log 153acbd707c0..cbfa46069e2b --date=short --no-merges --format='%ad %ae %s' 2018-07-03 nednguyen@google.com Only output the total histogram format if there is at least a single page that succeed Created with: gclient setdep -r src/third_party/catapult@cbfa46069e2b The AutoRoll server is located here: https://catapult-roll.skia.org Documentation for the AutoRoller is here: https://skia.googlesource.com/buildbot/+/master/autoroll/README.md If the roll is causing failures, please contact the current sheriff, who should be CC'd on the roll, and stop the roller if necessary. CQ_INCLUDE_TRYBOTS=luci.chromium.try:android_optional_gpu_tests_rel;luci.chromium.try:linux_optional_gpu_tests_rel;luci.chromium.try:mac_optional_gpu_tests_rel;luci.chromium.try:win_optional_gpu_tests_rel BUG= chromium:859073 TBR=sullivan@chromium.org Change-Id: I36cf6a327e921f83587cd3188f85f7e9d83e5e2f Reviewed-on: https://chromium-review.googlesource.com/1123781 Reviewed-by: catapult-chromium-autoroll <catapult-chromium-autoroll@skia-buildbots.google.com.iam.gserviceaccount.com> Commit-Queue: catapult-chromium-autoroll <catapult-chromium-autoroll@skia-buildbots.google.com.iam.gserviceaccount.com> Cr-Commit-Position: refs/heads/master@{#572098} [modify] https://crrev.com/ec796af8f8d63809cb0cb2f46fe8b8521ffdc62f/DEPS
,
Jul 3
The following revision refers to this bug: https://chromium.googlesource.com/catapult/+/c61a0380486a3c952109d0024cf77dac8a05bbb5 commit c61a0380486a3c952109d0024cf77dac8a05bbb5 Author: Ned Nguyen <nednguyen@google.com> Date: Tue Jul 03 07:08:37 2018 Revert "Only output the total histogram format if there is at least a single page that succeed" This reverts commit cbfa46069e2bfc4da5704f278b378b1fdf758521. Reason for revert: does not fix the problem Original change's description: > Only output the total histogram format if there is at least a single page that succeed > > Note: this is just a bandaid fix to ensure that if there is no page that succeed > without being skipped, Telemetry will output a completely empty histogram which enables perf dashboard upload to handle gracefully. > > Bug: chromium:859073 > Change-Id: I3ac5e83b95416fcf9831f8ce39919dcc4c4a5012 > > TBR=eyaich@chromium.org, simonhatch@chromium.org > > Change-Id: I3ac5e83b95416fcf9831f8ce39919dcc4c4a5012 > Reviewed-on: https://chromium-review.googlesource.com/1123690 > Commit-Queue: Ned Nguyen <nednguyen@google.com> > Reviewed-by: Ned Nguyen <nednguyen@google.com> TBR=simonhatch@chromium.org,nednguyen@google.com,eyaich@chromium.org Change-Id: I4e217b0c5f6295a81c5dcd9ed813654b5d085042 No-Presubmit: true No-Tree-Checks: true No-Try: true Bug: chromium:859073 Reviewed-on: https://chromium-review.googlesource.com/1124179 Reviewed-by: Ned Nguyen <nednguyen@google.com> Commit-Queue: Ned Nguyen <nednguyen@google.com> [modify] https://crrev.com/c61a0380486a3c952109d0024cf77dac8a05bbb5/telemetry/telemetry/internal/story_runner.py [modify] https://crrev.com/c61a0380486a3c952109d0024cf77dac8a05bbb5/telemetry/telemetry/internal/results/histogram_set_json_output_formatter.py [modify] https://crrev.com/c61a0380486a3c952109d0024cf77dac8a05bbb5/telemetry/telemetry/internal/story_runner_unittest.py
,
Jul 3
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/5d6c7964edd6e0fe1ec1d86ab3c49f75450b5c55 commit 5d6c7964edd6e0fe1ec1d86ab3c49f75450b5c55 Author: catapult-chromium-autoroll <catapult-chromium-autoroll@skia-buildbots.google.com.iam.gserviceaccount.com> Date: Tue Jul 03 12:29:19 2018 Roll src/third_party/catapult cbfa46069e2b..c61a0380486a (1 commits) https://chromium.googlesource.com/catapult.git/+log/cbfa46069e2b..c61a0380486a git log cbfa46069e2b..c61a0380486a --date=short --no-merges --format='%ad %ae %s' 2018-07-03 nednguyen@google.com Revert "Only output the total histogram format if there is at least a single page that succeed" Created with: gclient setdep -r src/third_party/catapult@c61a0380486a The AutoRoll server is located here: https://catapult-roll.skia.org Documentation for the AutoRoller is here: https://skia.googlesource.com/buildbot/+/master/autoroll/README.md If the roll is causing failures, please contact the current sheriff, who should be CC'd on the roll, and stop the roller if necessary. CQ_INCLUDE_TRYBOTS=luci.chromium.try:android_optional_gpu_tests_rel;luci.chromium.try:linux_optional_gpu_tests_rel;luci.chromium.try:mac_optional_gpu_tests_rel;luci.chromium.try:win_optional_gpu_tests_rel BUG= chromium:859073 TBR=sullivan@chromium.org Change-Id: Iaa7a6b9160b2e0235e160f8014e48b85bc4e2a82 Reviewed-on: https://chromium-review.googlesource.com/1123945 Reviewed-by: catapult-chromium-autoroll <catapult-chromium-autoroll@skia-buildbots.google.com.iam.gserviceaccount.com> Commit-Queue: catapult-chromium-autoroll <catapult-chromium-autoroll@skia-buildbots.google.com.iam.gserviceaccount.com> Cr-Commit-Position: refs/heads/master@{#572179} [modify] https://crrev.com/5d6c7964edd6e0fe1ec1d86ab3c49f75450b5c55/DEPS
,
Jul 3
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/aff4d711c8f96053382b17b4b5519728c54e8835 commit aff4d711c8f96053382b17b4b5519728c54e8835 Author: Emily Hanley <eyaich@google.com> Date: Tue Jul 03 14:53:58 2018 Not failing upload on telemetry outputting null Bug:859073 Cq-Include-Trybots: master.tryserver.chromium.perf:obbs_fyi Change-Id: I338021866a0a7c975d25198b194999f2c2f3f45a NOTRY=true # obbs_fyi failure are flakes Change-Id: I338021866a0a7c975d25198b194999f2c2f3f45a Reviewed-on: https://chromium-review.googlesource.com/1124380 Commit-Queue: Ned Nguyen <nednguyen@google.com> Reviewed-by: Ned Nguyen <nednguyen@google.com> Cr-Commit-Position: refs/heads/master@{#572210} [modify] https://crrev.com/aff4d711c8f96053382b17b4b5519728c54e8835/tools/perf/core/upload_results_to_perf_dashboard.py [modify] https://crrev.com/aff4d711c8f96053382b17b4b5519728c54e8835/tools/perf/process_perf_results.py
,
Oct 8
|
|||||||||||||||||||||
►
Sign in to add a comment |
|||||||||||||||||||||
Comment 1 by simonhatch@chromium.org
, Jun 29 2018