New issue
Advanced search Search tips
Note: Color blocks (like or ) mean that a user may not be available. Tooltip shows the reason.

Issue 854676 link

Starred by 1 user

Issue metadata

Status: Fixed
Owner:
Closed: Oct 4
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: Linux
Pri: 1
Type: ----



Sign in to add a comment

Perf dashboard upload failures on chromium.perf/linux-perf

Project Member Reported by sheriff-...@appspot.gserviceaccount.com, Jun 20 2018

Issue description

Filed by sheriff-o-matic@appspot.gserviceaccount.com on behalf of sullivan@google.com

performance_test_suite failing on chromium.perf/linux-perf

Builders failed on: 
- linux-perf: 
  https://ci.chromium.org/buildbot/chromium.perf/linux-perf

There is no reason given in the buildbot status page for the failure, it looks like all tests either passed or skipped. Why was it red? What action is needed?
https://ci.chromium.org/buildbot/chromium.perf/linux-perf/245

Emily, can you take a look?
 

Comment 1 by eyaich@chromium.org, Jun 20 2018

Cc: eyaich@chromium.org
Owner: simonhatch@chromium.org
There are upload failures happening.  If you look at buildbot the link says "Results Dashboard Upload Failure".  

If SOM doesn't display any failed stories you know it isn't an individual test failure.

The logs I got out of the dashboard are the following: 

    logMessage: "memoryAmounts diagnostics must be the same for all histograms
Traceback (most recent call last):
  File "/base/alloc/tmpfs/dynamic_runtimes/python27g/be731db3a3e23b1a/python27/python27_lib/versions/third_party/webapp2-2.5.1/webapp2.py", line 1536, in __call__
    rv = self.handle_exception(request, response, e)
  File "/base/alloc/tmpfs/dynamic_runtimes/python27g/be731db3a3e23b1a/python27/python27_lib/versions/third_party/webapp2-2.5.1/webapp2.py", line 1530, in __call__
    rv = self.router.dispatch(request, response)
  File "/base/alloc/tmpfs/dynamic_runtimes/python27g/be731db3a3e23b1a/python27/python27_lib/versions/third_party/webapp2-2.5.1/webapp2.py", line 1278, in default_dispatcher
    return route.handler_adapter(request, response)
  File "/base/alloc/tmpfs/dynamic_runtimes/python27g/be731db3a3e23b1a/python27/python27_lib/versions/third_party/webapp2-2.5.1/webapp2.py", line 1102, in __call__
    return handler.dispatch()
  File "/base/alloc/tmpfs/dynamic_runtimes/python27g/be731db3a3e23b1a/python27/python27_lib/versions/third_party/webapp2-2.5.1/webapp2.py", line 572, in dispatch
    return self.handle_exception(e, self.app.debug)
  File "/base/alloc/tmpfs/dynamic_runtimes/python27g/be731db3a3e23b1a/python27/python27_lib/versions/third_party/webapp2-2.5.1/webapp2.py", line 570, in dispatch
    return method(*args, **kwargs)
  File "/base/data/home/apps/s~chromeperf/clean-dtu-f704817c.410552350631385266/dashboard/api/api_request_handler.py", line 46, in post
    results = self.AuthorizedPost(*args)
  File "/base/data/home/apps/s~chromeperf/clean-dtu-f704817c.410552350631385266/dashboard/add_histograms.py", line 80, in AuthorizedPost
    ProcessHistogramSet(histogram_dicts)
  File "/base/data/home/apps/s~chromeperf/clean-dtu-f704817c.410552350631385266/dashboard/add_histograms.py", line 147, in ProcessHistogramSet
    histograms, suite_key, revision, internal_only)
  File "/base/data/home/apps/s~chromeperf/clean-dtu-f704817c.410552350631385266/dashboard/add_histograms.py", line 319, in FindSuiteLevelSparseDiagnostics
    name + ' diagnostics must be the same for all histograms')
ValueError: memoryAmounts diagnostics must be the same for all histograms"     
    severity: "ERROR"     
    time: "2018-06-20T04:53:24.688904Z"     

Simon is going to investigate.
Summary: Perf dashboard upload failures on chromium.perf/linux-perf (was: performance_test_suite failing on chromium.perf/linux-perf)
Thanks, Emily! Changing the bug title.
Can you expose the data you're uploading somewhere on the waterfall or elsewhere? Difficult to diagnose this without the final data and the commandline that generated it.

Comment 4 by eyaich@chromium.org, Jun 21 2018

the stack trace was for smoothness.tough_animation_cases.  

Here is a link to the results that were uploaded and failed: https://logs.chromium.org/v/?s=chrome%2Fbb%2Fchromium.perf%2Flinux-perf%2F251%2F%2B%2Fsmoothness.tough_animation_cases
Cc: eakuefner@chromium.org benjhayden@chromium.org

So here are the 2 diagnostics that are mismatching:

  ad691ff7-21c6-4555-ae3c-bb3203edead8 - [33696694272]
  83664074-4f0d-4238-a2a2-d77254012dfe - [33696591872]

These are "memoryAmounts" diagnostics, which to my understanding *must* match and be consistent from run to run. AFAIK they come from Telemetry, but I don't know how they're generated. Ethan/Ben, any ideas?

Comment 6 by eyaich@chromium.org, Jun 21 2018

To be clear all the ones that failed where sharded so we merged the results.  Not all sharded results failed though and this seems to just be a problem on linux.
Is there any chance of getting the pre-merged json files (ones from each shard), out of curiosity in case anything interesting is in there.
Followup: Spoke with Ethan and he's going to file a bug about the inconsistent memoryAmounts issue in Telemetry. Might warrant some investigation as to why the values differ slightly from machine to machine, but a possible course of action is to simply round to the nearest mb or even gb.
Cc: nednguyen@chromium.org
Components: Speed>Benchmarks>Waterfall
Labels: -Pri-2 OS-Linux Pri-1
Adding Ned to see if he has any thoughts on the memory issues.  This is actively blocking uploads on linux perf for these benchmarks so I am bumping up the priority.  


Maybe Ben can chime in here since he looked into this a bit last night. IIRC he mentioned that memoryAmounts is derived from MemTotal on linux, which isn't actually a constant. We may want to look into something like https://stackoverflow.com/questions/20348007/how-can-i-find-out-the-total-physical-memory-ram-of-my-linux-box-suitable-to-b which reports the actual physical memory instead and isn't expected to change.
The memoryAmounts diagnostic was intended to be used to group low-memory devices separately from high-memory devices without maintaining bot names or device model names. Measuring physical memory instead of MemTotal would make that easier, since MemTotal can vary slightly depending on how the kernel is feeling, as we're seeing.

dmidecode dumps the Desktop Management Interface aka System Management BIOS, which includes physical memory as well as a ton of other information. It requires root access since it reads files that are chmod 0400.

$ sudo dmidecode | wc -l
1419
$ sudo dmidecode -t 17|grep 'Size.*MB'|awk '{s+=$2}END{print s/1024, "GB"}'
128 GB

Google suggests that dmidecode is not generally available on Android. Googling [android get physical memory size] only turns up /proc/meminfo, which is what telemetry is already doing.
I straced dmidecode to see what it reads. Maybe android will also have these files?
/sys/firmware/dmi/tables/smbios_entry_point
/sys/firmware/dmi/tables/DMI

https://www.kernel.org/doc/Documentation/ABI/testing/sysfs-firmware-dmi-tables

`strings $(which dmidecode)` shows that dmidecode hardcodes those filenames, so that's probably what we could do if Android has those files.
It looks like Android has /sys/firmware/dmi/entries/, but I can't tell if it also has tables/.
https://android.googlesource.com/kernel/msm/+/android-7.1.0_r0.2/Documentation/ABI/testing/sysfs-firmware-dmi

Can somebody play with a rooted device to see if it has dmi/tables/?

Alternatively, we could write a table in telemetry mapping from (Build.MANUFACTURER+" "+Build.MODEL) to pre-measured physical memory amounts and try not to worry about people hacking individual devices, at the cost of updating it when new devices are released.
Can we not add the memory metric diagnostic for now? RIght now this is making many benchmarks not uploading data to perf dashboard at all. So any bandaid fix to keep the key benchmark metrics uploaded to perf dashboard here would be great
If you just want to make things upload then sure, you can either not add it, strip it post-merge, or strip it on the dashboard.
I'd vote just having telemetry not output it.
Project Member

Comment 17 by bugdroid1@chromium.org, Jun 23 2018

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/bca5514b0a5cb6c8592b76a294483f6751c0ee75

commit bca5514b0a5cb6c8592b76a294483f6751c0ee75
Author: catapult-chromium-autoroll <catapult-chromium-autoroll@skia-buildbots.google.com.iam.gserviceaccount.com>
Date: Sat Jun 23 15:55:12 2018

Roll src/third_party/catapult 34451063bc29..2e625dcb82db (1 commits)

https://chromium.googlesource.com/catapult.git/+log/34451063bc29..2e625dcb82db


git log 34451063bc29..2e625dcb82db --date=short --no-merges --format='%ad %ae %s'
2018-06-23 nednguyen@google.com Disable MEMORY_AMOUNTS diagnostic


Created with:
  gclient setdep -r src/third_party/catapult@2e625dcb82db

The AutoRoll server is located here: https://catapult-roll.skia.org

Documentation for the AutoRoller is here:
https://skia.googlesource.com/buildbot/+/master/autoroll/README.md

If the roll is causing failures, please contact the current sheriff, who should
be CC'd on the roll, and stop the roller if necessary.

CQ_INCLUDE_TRYBOTS=luci.chromium.try:android_optional_gpu_tests_rel;luci.chromium.try:linux_optional_gpu_tests_rel;luci.chromium.try:mac_optional_gpu_tests_rel;luci.chromium.try:win_optional_gpu_tests_rel

BUG= chromium:854676 
TBR=sullivan@chromium.org
NOTREECHECKS=true

Change-Id: If0ebad6d392890f8aaf4b2336dad28367d83973d
Reviewed-on: https://chromium-review.googlesource.com/1112138
Commit-Queue: Ned Nguyen <nednguyen@google.com>
Reviewed-by: catapult-chromium-autoroll <catapult-chromium-autoroll@skia-buildbots.google.com.iam.gserviceaccount.com>
Cr-Commit-Position: refs/heads/master@{#569878}
[modify] https://crrev.com/bca5514b0a5cb6c8592b76a294483f6751c0ee75/DEPS

Project Member

Comment 18 by bugdroid1@chromium.org, Jun 23 2018

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/3161a1dfc1d7e801187a1cdcf60e63bc57b75d72

commit 3161a1dfc1d7e801187a1cdcf60e63bc57b75d72
Author: Nico Weber <thakis@chromium.org>
Date: Sat Jun 23 20:39:41 2018

Revert "Roll src/third_party/catapult 34451063bc29..2e625dcb82db (1 commits)"

This reverts commit bca5514b0a5cb6c8592b76a294483f6751c0ee75.

Reason for revert: TestShardingMapGenerator.testGeneratePerfSharding
fails on many mac bots, e.g. here:
https://ci.chromium.org/p/chromium/builders/luci.chromium.ci/Mac10.10%20Tests/33316

Error:
[18/22] core.sharding_map_generator_unittest.TestShardingMapGenerator.testGeneratePerfSharding queued/b/s/w/ir/.swarming_module_cache/vpython/fe1f6b/bin/python: can't open file 'tools/perf/generate_perf_sharding': [Errno 2] No such file or directory

File is probably missing from isolate and needs to be in some gn data list.

Original change's description:
> Roll src/third_party/catapult 34451063bc29..2e625dcb82db (1 commits)
> 
> https://chromium.googlesource.com/catapult.git/+log/34451063bc29..2e625dcb82db
> 
> 
> git log 34451063bc29..2e625dcb82db --date=short --no-merges --format='%ad %ae %s'
> 2018-06-23 nednguyen@google.com Disable MEMORY_AMOUNTS diagnostic
> 
> 
> Created with:
>   gclient setdep -r src/third_party/catapult@2e625dcb82db
> 
> The AutoRoll server is located here: https://catapult-roll.skia.org
> 
> Documentation for the AutoRoller is here:
> https://skia.googlesource.com/buildbot/+/master/autoroll/README.md
> 
> If the roll is causing failures, please contact the current sheriff, who should
> be CC'd on the roll, and stop the roller if necessary.
> 
> CQ_INCLUDE_TRYBOTS=luci.chromium.try:android_optional_gpu_tests_rel;luci.chromium.try:linux_optional_gpu_tests_rel;luci.chromium.try:mac_optional_gpu_tests_rel;luci.chromium.try:win_optional_gpu_tests_rel
> 
> BUG= chromium:854676 
> TBR=sullivan@chromium.org
> NOTREECHECKS=true
> 
> Change-Id: If0ebad6d392890f8aaf4b2336dad28367d83973d
> Reviewed-on: https://chromium-review.googlesource.com/1112138
> Commit-Queue: Ned Nguyen <nednguyen@google.com>
> Reviewed-by: catapult-chromium-autoroll <catapult-chromium-autoroll@skia-buildbots.google.com.iam.gserviceaccount.com>
> Cr-Commit-Position: refs/heads/master@{#569878}

TBR=sullivan@chromium.org,nednguyen@google.com,catapult-chromium-autoroll@skia-buildbots.google.com.iam.gserviceaccount.com

Change-Id: I6e51636a5151616cc5ade6f3ddd4a556d05303fb
No-Presubmit: true
No-Tree-Checks: true
No-Try: true
Bug:  chromium:854676 
Cq-Include-Trybots: luci.chromium.try:android_optional_gpu_tests_rel;luci.chromium.try:linux_optional_gpu_tests_rel;luci.chromium.try:mac_optional_gpu_tests_rel;luci.chromium.try:win_optional_gpu_tests_rel
Reviewed-on: https://chromium-review.googlesource.com/1112978
Reviewed-by: Nico Weber <thakis@chromium.org>
Commit-Queue: Nico Weber <thakis@chromium.org>
Cr-Commit-Position: refs/heads/master@{#569880}
[modify] https://crrev.com/3161a1dfc1d7e801187a1cdcf60e63bc57b75d72/DEPS

Project Member

Comment 19 by bugdroid1@chromium.org, Jun 23 2018

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/fdfa48da09cf9ec14187c3ead2f5bfd128d9561b

commit fdfa48da09cf9ec14187c3ead2f5bfd128d9561b
Author: Ned Nguyen <nednguyen@google.com>
Date: Sat Jun 23 21:53:50 2018

Reland "Roll src/third_party/catapult 34451063bc29..2e625dcb82db (1 commits)"

This reverts commit 3161a1dfc1d7e801187a1cdcf60e63bc57b75d72.

Reason for revert: the roll wasn't causing the test failure. THe test is also disabled in https://chromium-review.googlesource.com/c/chromium/src/+/1112946

Original change's description:
> Revert "Roll src/third_party/catapult 34451063bc29..2e625dcb82db (1 commits)"
> 
> This reverts commit bca5514b0a5cb6c8592b76a294483f6751c0ee75.
> 
> Reason for revert: TestShardingMapGenerator.testGeneratePerfSharding
> fails on many mac bots, e.g. here:
> https://ci.chromium.org/p/chromium/builders/luci.chromium.ci/Mac10.10%20Tests/33316
> 
> Error:
> [18/22] core.sharding_map_generator_unittest.TestShardingMapGenerator.testGeneratePerfSharding queued/b/s/w/ir/.swarming_module_cache/vpython/fe1f6b/bin/python: can't open file 'tools/perf/generate_perf_sharding': [Errno 2] No such file or directory
> 
> File is probably missing from isolate and needs to be in some gn data list.
> 
> Original change's description:
> > Roll src/third_party/catapult 34451063bc29..2e625dcb82db (1 commits)
> > 
> > https://chromium.googlesource.com/catapult.git/+log/34451063bc29..2e625dcb82db
> > 
> > 
> > git log 34451063bc29..2e625dcb82db --date=short --no-merges --format='%ad %ae %s'
> > 2018-06-23 nednguyen@google.com Disable MEMORY_AMOUNTS diagnostic
> > 
> > 
> > Created with:
> >   gclient setdep -r src/third_party/catapult@2e625dcb82db
> > 
> > The AutoRoll server is located here: https://catapult-roll.skia.org
> > 
> > Documentation for the AutoRoller is here:
> > https://skia.googlesource.com/buildbot/+/master/autoroll/README.md
> > 
> > If the roll is causing failures, please contact the current sheriff, who should
> > be CC'd on the roll, and stop the roller if necessary.
> > 
> > CQ_INCLUDE_TRYBOTS=luci.chromium.try:android_optional_gpu_tests_rel;luci.chromium.try:linux_optional_gpu_tests_rel;luci.chromium.try:mac_optional_gpu_tests_rel;luci.chromium.try:win_optional_gpu_tests_rel
> > 
> > BUG= chromium:854676 
> > TBR=sullivan@chromium.org
> > NOTREECHECKS=true
> > 
> > Change-Id: If0ebad6d392890f8aaf4b2336dad28367d83973d
> > Reviewed-on: https://chromium-review.googlesource.com/1112138
> > Commit-Queue: Ned Nguyen <nednguyen@google.com>
> > Reviewed-by: catapult-chromium-autoroll <catapult-chromium-autoroll@skia-buildbots.google.com.iam.gserviceaccount.com>
> > Cr-Commit-Position: refs/heads/master@{#569878}
> 
> TBR=sullivan@chromium.org,nednguyen@google.com,catapult-chromium-autoroll@skia-buildbots.google.com.iam.gserviceaccount.com
> 
> Change-Id: I6e51636a5151616cc5ade6f3ddd4a556d05303fb
> No-Presubmit: true
> No-Tree-Checks: true
> No-Try: true
> Bug:  chromium:854676 
> Cq-Include-Trybots: luci.chromium.try:android_optional_gpu_tests_rel;luci.chromium.try:linux_optional_gpu_tests_rel;luci.chromium.try:mac_optional_gpu_tests_rel;luci.chromium.try:win_optional_gpu_tests_rel
> Reviewed-on: https://chromium-review.googlesource.com/1112978
> Reviewed-by: Nico Weber <thakis@chromium.org>
> Commit-Queue: Nico Weber <thakis@chromium.org>
> Cr-Commit-Position: refs/heads/master@{#569880}

TBR=thakis@chromium.org,sullivan@chromium.org,nednguyen@google.com,catapult-chromium-autoroll@skia-buildbots.google.com.iam.gserviceaccount.com

Change-Id: I64954dacf7cc9ef68610f479f34ca9bf5eec7189
No-Presubmit: true
No-Tree-Checks: true
No-Try: true
Bug:  chromium:854676 
Cq-Include-Trybots: luci.chromium.try:android_optional_gpu_tests_rel;luci.chromium.try:linux_optional_gpu_tests_rel;luci.chromium.try:mac_optional_gpu_tests_rel;luci.chromium.try:win_optional_gpu_tests_rel
Reviewed-on: https://chromium-review.googlesource.com/1112919
Reviewed-by: Ned Nguyen <nednguyen@google.com>
Commit-Queue: Ned Nguyen <nednguyen@google.com>
Cr-Commit-Position: refs/heads/master@{#569890}
[modify] https://crrev.com/fdfa48da09cf9ec14187c3ead2f5bfd128d9561b/DEPS

Status: Assigned (was: Available)
Status: Fixed (was: Assigned)
We ended up removing the diagnostic since it wasn't adding any value.

Sign in to add a comment