Perf dashboard upload failures on chromium.perf/linux-perf |
||||||
Issue descriptionFiled by sheriff-o-matic@appspot.gserviceaccount.com on behalf of sullivan@google.com performance_test_suite failing on chromium.perf/linux-perf Builders failed on: - linux-perf: https://ci.chromium.org/buildbot/chromium.perf/linux-perf There is no reason given in the buildbot status page for the failure, it looks like all tests either passed or skipped. Why was it red? What action is needed? https://ci.chromium.org/buildbot/chromium.perf/linux-perf/245 Emily, can you take a look?
,
Jun 20 2018
Thanks, Emily! Changing the bug title.
,
Jun 20 2018
Can you expose the data you're uploading somewhere on the waterfall or elsewhere? Difficult to diagnose this without the final data and the commandline that generated it.
,
Jun 21 2018
the stack trace was for smoothness.tough_animation_cases. Here is a link to the results that were uploaded and failed: https://logs.chromium.org/v/?s=chrome%2Fbb%2Fchromium.perf%2Flinux-perf%2F251%2F%2B%2Fsmoothness.tough_animation_cases
,
Jun 21 2018
So here are the 2 diagnostics that are mismatching: ad691ff7-21c6-4555-ae3c-bb3203edead8 - [33696694272] 83664074-4f0d-4238-a2a2-d77254012dfe - [33696591872] These are "memoryAmounts" diagnostics, which to my understanding *must* match and be consistent from run to run. AFAIK they come from Telemetry, but I don't know how they're generated. Ethan/Ben, any ideas?
,
Jun 21 2018
To be clear all the ones that failed where sharded so we merged the results. Not all sharded results failed though and this seems to just be a problem on linux.
,
Jun 21 2018
Is there any chance of getting the pre-merged json files (ones from each shard), out of curiosity in case anything interesting is in there.
,
Jun 21 2018
For smoothness.tough_animation cases: first shard output: https://isolateserver.appspot.com/browse?namespace=default-gzip&digest=d1ee1402e38d69bf93a834e69d183bcc7fc14a56&as=perf_results.json second shard output: https://isolateserver.appspot.com/browse?namespace=default-gzip&digest=60813fdac30478afb08116bf1f1fa7264b2b41a7&as=perf_results.json
,
Jun 21 2018
Followup: Spoke with Ethan and he's going to file a bug about the inconsistent memoryAmounts issue in Telemetry. Might warrant some investigation as to why the values differ slightly from machine to machine, but a possible course of action is to simply round to the nearest mb or even gb.
,
Jun 22 2018
Adding Ned to see if he has any thoughts on the memory issues. This is actively blocking uploads on linux perf for these benchmarks so I am bumping up the priority.
,
Jun 22 2018
Maybe Ben can chime in here since he looked into this a bit last night. IIRC he mentioned that memoryAmounts is derived from MemTotal on linux, which isn't actually a constant. We may want to look into something like https://stackoverflow.com/questions/20348007/how-can-i-find-out-the-total-physical-memory-ram-of-my-linux-box-suitable-to-b which reports the actual physical memory instead and isn't expected to change.
,
Jun 22 2018
The memoryAmounts diagnostic was intended to be used to group low-memory devices separately from high-memory devices without maintaining bot names or device model names. Measuring physical memory instead of MemTotal would make that easier, since MemTotal can vary slightly depending on how the kernel is feeling, as we're seeing.
dmidecode dumps the Desktop Management Interface aka System Management BIOS, which includes physical memory as well as a ton of other information. It requires root access since it reads files that are chmod 0400.
$ sudo dmidecode | wc -l
1419
$ sudo dmidecode -t 17|grep 'Size.*MB'|awk '{s+=$2}END{print s/1024, "GB"}'
128 GB
Google suggests that dmidecode is not generally available on Android. Googling [android get physical memory size] only turns up /proc/meminfo, which is what telemetry is already doing.
I straced dmidecode to see what it reads. Maybe android will also have these files?
/sys/firmware/dmi/tables/smbios_entry_point
/sys/firmware/dmi/tables/DMI
https://www.kernel.org/doc/Documentation/ABI/testing/sysfs-firmware-dmi-tables
`strings $(which dmidecode)` shows that dmidecode hardcodes those filenames, so that's probably what we could do if Android has those files.
It looks like Android has /sys/firmware/dmi/entries/, but I can't tell if it also has tables/.
https://android.googlesource.com/kernel/msm/+/android-7.1.0_r0.2/Documentation/ABI/testing/sysfs-firmware-dmi
Can somebody play with a rooted device to see if it has dmi/tables/?
Alternatively, we could write a table in telemetry mapping from (Build.MANUFACTURER+" "+Build.MODEL) to pre-measured physical memory amounts and try not to worry about people hacking individual devices, at the cost of updating it when new devices are released.
,
Jun 22 2018
Can we not add the memory metric diagnostic for now? RIght now this is making many benchmarks not uploading data to perf dashboard at all. So any bandaid fix to keep the key benchmark metrics uploaded to perf dashboard here would be great
,
Jun 22 2018
If you just want to make things upload then sure, you can either not add it, strip it post-merge, or strip it on the dashboard.
,
Jun 22 2018
I'd vote just having telemetry not output it.
,
Jun 23 2018
The following revision refers to this bug: https://chromium.googlesource.com/catapult/+/2e625dcb82db59c137245a55abc5fae34eceb361 commit 2e625dcb82db59c137245a55abc5fae34eceb361 Author: Nghia Nguyen <nednguyen@google.com> Date: Sat Jun 23 07:07:36 2018 Disable MEMORY_AMOUNTS diagnostic Bug: chromium:854676 Change-Id: I92aca56fbb11087278536940da324a43f43ae351 TBR=benjhayde@chromium.org, simonhatch@chromium.org Reviewed-on: https://chromium-review.googlesource.com/1112765 Commit-Queue: Ned Nguyen <nednguyen@google.com> Reviewed-by: Ned Nguyen <nednguyen@google.com> Reviewed-by: Simon Hatch <simonhatch@chromium.org> [modify] https://crrev.com/2e625dcb82db59c137245a55abc5fae34eceb361/telemetry/telemetry/internal/story_runner.py [modify] https://crrev.com/2e625dcb82db59c137245a55abc5fae34eceb361/telemetry/telemetry/internal/story_runner_unittest.py
,
Jun 23 2018
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/bca5514b0a5cb6c8592b76a294483f6751c0ee75 commit bca5514b0a5cb6c8592b76a294483f6751c0ee75 Author: catapult-chromium-autoroll <catapult-chromium-autoroll@skia-buildbots.google.com.iam.gserviceaccount.com> Date: Sat Jun 23 15:55:12 2018 Roll src/third_party/catapult 34451063bc29..2e625dcb82db (1 commits) https://chromium.googlesource.com/catapult.git/+log/34451063bc29..2e625dcb82db git log 34451063bc29..2e625dcb82db --date=short --no-merges --format='%ad %ae %s' 2018-06-23 nednguyen@google.com Disable MEMORY_AMOUNTS diagnostic Created with: gclient setdep -r src/third_party/catapult@2e625dcb82db The AutoRoll server is located here: https://catapult-roll.skia.org Documentation for the AutoRoller is here: https://skia.googlesource.com/buildbot/+/master/autoroll/README.md If the roll is causing failures, please contact the current sheriff, who should be CC'd on the roll, and stop the roller if necessary. CQ_INCLUDE_TRYBOTS=luci.chromium.try:android_optional_gpu_tests_rel;luci.chromium.try:linux_optional_gpu_tests_rel;luci.chromium.try:mac_optional_gpu_tests_rel;luci.chromium.try:win_optional_gpu_tests_rel BUG= chromium:854676 TBR=sullivan@chromium.org NOTREECHECKS=true Change-Id: If0ebad6d392890f8aaf4b2336dad28367d83973d Reviewed-on: https://chromium-review.googlesource.com/1112138 Commit-Queue: Ned Nguyen <nednguyen@google.com> Reviewed-by: catapult-chromium-autoroll <catapult-chromium-autoroll@skia-buildbots.google.com.iam.gserviceaccount.com> Cr-Commit-Position: refs/heads/master@{#569878} [modify] https://crrev.com/bca5514b0a5cb6c8592b76a294483f6751c0ee75/DEPS
,
Jun 23 2018
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/3161a1dfc1d7e801187a1cdcf60e63bc57b75d72 commit 3161a1dfc1d7e801187a1cdcf60e63bc57b75d72 Author: Nico Weber <thakis@chromium.org> Date: Sat Jun 23 20:39:41 2018 Revert "Roll src/third_party/catapult 34451063bc29..2e625dcb82db (1 commits)" This reverts commit bca5514b0a5cb6c8592b76a294483f6751c0ee75. Reason for revert: TestShardingMapGenerator.testGeneratePerfSharding fails on many mac bots, e.g. here: https://ci.chromium.org/p/chromium/builders/luci.chromium.ci/Mac10.10%20Tests/33316 Error: [18/22] core.sharding_map_generator_unittest.TestShardingMapGenerator.testGeneratePerfSharding queued/b/s/w/ir/.swarming_module_cache/vpython/fe1f6b/bin/python: can't open file 'tools/perf/generate_perf_sharding': [Errno 2] No such file or directory File is probably missing from isolate and needs to be in some gn data list. Original change's description: > Roll src/third_party/catapult 34451063bc29..2e625dcb82db (1 commits) > > https://chromium.googlesource.com/catapult.git/+log/34451063bc29..2e625dcb82db > > > git log 34451063bc29..2e625dcb82db --date=short --no-merges --format='%ad %ae %s' > 2018-06-23 nednguyen@google.com Disable MEMORY_AMOUNTS diagnostic > > > Created with: > gclient setdep -r src/third_party/catapult@2e625dcb82db > > The AutoRoll server is located here: https://catapult-roll.skia.org > > Documentation for the AutoRoller is here: > https://skia.googlesource.com/buildbot/+/master/autoroll/README.md > > If the roll is causing failures, please contact the current sheriff, who should > be CC'd on the roll, and stop the roller if necessary. > > CQ_INCLUDE_TRYBOTS=luci.chromium.try:android_optional_gpu_tests_rel;luci.chromium.try:linux_optional_gpu_tests_rel;luci.chromium.try:mac_optional_gpu_tests_rel;luci.chromium.try:win_optional_gpu_tests_rel > > BUG= chromium:854676 > TBR=sullivan@chromium.org > NOTREECHECKS=true > > Change-Id: If0ebad6d392890f8aaf4b2336dad28367d83973d > Reviewed-on: https://chromium-review.googlesource.com/1112138 > Commit-Queue: Ned Nguyen <nednguyen@google.com> > Reviewed-by: catapult-chromium-autoroll <catapult-chromium-autoroll@skia-buildbots.google.com.iam.gserviceaccount.com> > Cr-Commit-Position: refs/heads/master@{#569878} TBR=sullivan@chromium.org,nednguyen@google.com,catapult-chromium-autoroll@skia-buildbots.google.com.iam.gserviceaccount.com Change-Id: I6e51636a5151616cc5ade6f3ddd4a556d05303fb No-Presubmit: true No-Tree-Checks: true No-Try: true Bug: chromium:854676 Cq-Include-Trybots: luci.chromium.try:android_optional_gpu_tests_rel;luci.chromium.try:linux_optional_gpu_tests_rel;luci.chromium.try:mac_optional_gpu_tests_rel;luci.chromium.try:win_optional_gpu_tests_rel Reviewed-on: https://chromium-review.googlesource.com/1112978 Reviewed-by: Nico Weber <thakis@chromium.org> Commit-Queue: Nico Weber <thakis@chromium.org> Cr-Commit-Position: refs/heads/master@{#569880} [modify] https://crrev.com/3161a1dfc1d7e801187a1cdcf60e63bc57b75d72/DEPS
,
Jun 23 2018
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/fdfa48da09cf9ec14187c3ead2f5bfd128d9561b commit fdfa48da09cf9ec14187c3ead2f5bfd128d9561b Author: Ned Nguyen <nednguyen@google.com> Date: Sat Jun 23 21:53:50 2018 Reland "Roll src/third_party/catapult 34451063bc29..2e625dcb82db (1 commits)" This reverts commit 3161a1dfc1d7e801187a1cdcf60e63bc57b75d72. Reason for revert: the roll wasn't causing the test failure. THe test is also disabled in https://chromium-review.googlesource.com/c/chromium/src/+/1112946 Original change's description: > Revert "Roll src/third_party/catapult 34451063bc29..2e625dcb82db (1 commits)" > > This reverts commit bca5514b0a5cb6c8592b76a294483f6751c0ee75. > > Reason for revert: TestShardingMapGenerator.testGeneratePerfSharding > fails on many mac bots, e.g. here: > https://ci.chromium.org/p/chromium/builders/luci.chromium.ci/Mac10.10%20Tests/33316 > > Error: > [18/22] core.sharding_map_generator_unittest.TestShardingMapGenerator.testGeneratePerfSharding queued/b/s/w/ir/.swarming_module_cache/vpython/fe1f6b/bin/python: can't open file 'tools/perf/generate_perf_sharding': [Errno 2] No such file or directory > > File is probably missing from isolate and needs to be in some gn data list. > > Original change's description: > > Roll src/third_party/catapult 34451063bc29..2e625dcb82db (1 commits) > > > > https://chromium.googlesource.com/catapult.git/+log/34451063bc29..2e625dcb82db > > > > > > git log 34451063bc29..2e625dcb82db --date=short --no-merges --format='%ad %ae %s' > > 2018-06-23 nednguyen@google.com Disable MEMORY_AMOUNTS diagnostic > > > > > > Created with: > > gclient setdep -r src/third_party/catapult@2e625dcb82db > > > > The AutoRoll server is located here: https://catapult-roll.skia.org > > > > Documentation for the AutoRoller is here: > > https://skia.googlesource.com/buildbot/+/master/autoroll/README.md > > > > If the roll is causing failures, please contact the current sheriff, who should > > be CC'd on the roll, and stop the roller if necessary. > > > > CQ_INCLUDE_TRYBOTS=luci.chromium.try:android_optional_gpu_tests_rel;luci.chromium.try:linux_optional_gpu_tests_rel;luci.chromium.try:mac_optional_gpu_tests_rel;luci.chromium.try:win_optional_gpu_tests_rel > > > > BUG= chromium:854676 > > TBR=sullivan@chromium.org > > NOTREECHECKS=true > > > > Change-Id: If0ebad6d392890f8aaf4b2336dad28367d83973d > > Reviewed-on: https://chromium-review.googlesource.com/1112138 > > Commit-Queue: Ned Nguyen <nednguyen@google.com> > > Reviewed-by: catapult-chromium-autoroll <catapult-chromium-autoroll@skia-buildbots.google.com.iam.gserviceaccount.com> > > Cr-Commit-Position: refs/heads/master@{#569878} > > TBR=sullivan@chromium.org,nednguyen@google.com,catapult-chromium-autoroll@skia-buildbots.google.com.iam.gserviceaccount.com > > Change-Id: I6e51636a5151616cc5ade6f3ddd4a556d05303fb > No-Presubmit: true > No-Tree-Checks: true > No-Try: true > Bug: chromium:854676 > Cq-Include-Trybots: luci.chromium.try:android_optional_gpu_tests_rel;luci.chromium.try:linux_optional_gpu_tests_rel;luci.chromium.try:mac_optional_gpu_tests_rel;luci.chromium.try:win_optional_gpu_tests_rel > Reviewed-on: https://chromium-review.googlesource.com/1112978 > Reviewed-by: Nico Weber <thakis@chromium.org> > Commit-Queue: Nico Weber <thakis@chromium.org> > Cr-Commit-Position: refs/heads/master@{#569880} TBR=thakis@chromium.org,sullivan@chromium.org,nednguyen@google.com,catapult-chromium-autoroll@skia-buildbots.google.com.iam.gserviceaccount.com Change-Id: I64954dacf7cc9ef68610f479f34ca9bf5eec7189 No-Presubmit: true No-Tree-Checks: true No-Try: true Bug: chromium:854676 Cq-Include-Trybots: luci.chromium.try:android_optional_gpu_tests_rel;luci.chromium.try:linux_optional_gpu_tests_rel;luci.chromium.try:mac_optional_gpu_tests_rel;luci.chromium.try:win_optional_gpu_tests_rel Reviewed-on: https://chromium-review.googlesource.com/1112919 Reviewed-by: Ned Nguyen <nednguyen@google.com> Commit-Queue: Ned Nguyen <nednguyen@google.com> Cr-Commit-Position: refs/heads/master@{#569890} [modify] https://crrev.com/fdfa48da09cf9ec14187c3ead2f5bfd128d9561b/DEPS
,
Aug 2
,
Oct 4
We ended up removing the diagnostic since it wasn't adding any value. |
||||||
►
Sign in to add a comment |
||||||
Comment 1 by eyaich@chromium.org
, Jun 20 2018Owner: simonhatch@chromium.org
There are upload failures happening. If you look at buildbot the link says "Results Dashboard Upload Failure". If SOM doesn't display any failed stories you know it isn't an individual test failure. The logs I got out of the dashboard are the following: logMessage: "memoryAmounts diagnostics must be the same for all histograms Traceback (most recent call last): File "/base/alloc/tmpfs/dynamic_runtimes/python27g/be731db3a3e23b1a/python27/python27_lib/versions/third_party/webapp2-2.5.1/webapp2.py", line 1536, in __call__ rv = self.handle_exception(request, response, e) File "/base/alloc/tmpfs/dynamic_runtimes/python27g/be731db3a3e23b1a/python27/python27_lib/versions/third_party/webapp2-2.5.1/webapp2.py", line 1530, in __call__ rv = self.router.dispatch(request, response) File "/base/alloc/tmpfs/dynamic_runtimes/python27g/be731db3a3e23b1a/python27/python27_lib/versions/third_party/webapp2-2.5.1/webapp2.py", line 1278, in default_dispatcher return route.handler_adapter(request, response) File "/base/alloc/tmpfs/dynamic_runtimes/python27g/be731db3a3e23b1a/python27/python27_lib/versions/third_party/webapp2-2.5.1/webapp2.py", line 1102, in __call__ return handler.dispatch() File "/base/alloc/tmpfs/dynamic_runtimes/python27g/be731db3a3e23b1a/python27/python27_lib/versions/third_party/webapp2-2.5.1/webapp2.py", line 572, in dispatch return self.handle_exception(e, self.app.debug) File "/base/alloc/tmpfs/dynamic_runtimes/python27g/be731db3a3e23b1a/python27/python27_lib/versions/third_party/webapp2-2.5.1/webapp2.py", line 570, in dispatch return method(*args, **kwargs) File "/base/data/home/apps/s~chromeperf/clean-dtu-f704817c.410552350631385266/dashboard/api/api_request_handler.py", line 46, in post results = self.AuthorizedPost(*args) File "/base/data/home/apps/s~chromeperf/clean-dtu-f704817c.410552350631385266/dashboard/add_histograms.py", line 80, in AuthorizedPost ProcessHistogramSet(histogram_dicts) File "/base/data/home/apps/s~chromeperf/clean-dtu-f704817c.410552350631385266/dashboard/add_histograms.py", line 147, in ProcessHistogramSet histograms, suite_key, revision, internal_only) File "/base/data/home/apps/s~chromeperf/clean-dtu-f704817c.410552350631385266/dashboard/add_histograms.py", line 319, in FindSuiteLevelSparseDiagnostics name + ' diagnostics must be the same for all histograms') ValueError: memoryAmounts diagnostics must be the same for all histograms" severity: "ERROR" time: "2018-06-20T04:53:24.688904Z" Simon is going to investigate.