New issue
Advanced search Search tips
Note: Color blocks (like or ) mean that a user may not be available. Tooltip shows the reason.

Issue 923564 link

Starred by 2 users

Issue metadata

Status: Assigned
Owner:
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: ----
Pri: 2
Type: Bug


Sign in to add a comment

Plan for gtest_perf_tests conversion to Histograms

Project Member Reported by crouleau@chromium.org, Jan 18 (4 days ago)

Issue description

gtest_perf_tests have been using the chromeperf dashboard's old /add_point API that supports IP whitelisting. With the switch to LUCI, we want to discourage IP whitelisting in favor of LOAS. The negative of this is that the histogram API supports LOAS, but it is in a new data format called histograms.

Brian attempted to do a conversion to histograms and wrote some useful scripts. However, this ended up breaking some of the tests.

See https://bugs.chromium.org/p/chromium/issues/detail?id=918601#c16
and https://bugs.chromium.org/p/chromium/issues/detail?id=921342
and https://bugs.chromium.org/p/chromium/issues/detail?id=923536

It looks like we need to do at least for each of the gtest_perf_tests:

"
1. Update the conversion script to not automatically append _smallerIsBetter if the unit already ends with _smallerIsBetter or _biggerIsBetter, and instead append whatever the existing suffix is https://cs.chromium.org/chromium/src/third_party/catapult/tracing/tracing/value/gtest_json_converter.py?l=97

2. Update your metric name to be clockless_playback_biggerIsBetter, which should then be converted to unitless_biggerIsBetter.
"

And also we may need to update the allowed units for histograms as well.

We need a real owner for this. Brian was simply trying to help out here, but I think it's too much to ask him to convert all the gtest_perf_tests given that that is not his team's responsibility.
 

Comment 1 by crouleau@chromium.org, Jan 18 (4 days ago)

Blockedon: 921342

Comment 2 by crouleau@chromium.org, Jan 18 (4 days ago)

Blockedon: 923536 918601

Comment 3 by crouleau@chromium.org, Jan 18 (4 days ago)

Cc: ushesh@chromium.org

Comment 4 by crouleau@chromium.org, Jan 18 (4 days ago)

Cc: jbudorick@chromium.org

Comment 5 by crouleau@chromium.org, Jan 18 (4 days ago)

Blocking: 919503

Comment 6 by crouleau@chromium.org, Jan 18 (4 days ago)

Owner: benjhayden@chromium.org
Status: Assigned (was: Untriaged)
As a straw man, I propose one or the other of the following:

1. Speed Dashboard team migrates everyone to histograms

or 

2. Speed Dashboard team starts supporting /add_point with LOAS

Comment 7 by crouleau@chromium.org, Jan 18 (4 days ago)

Owner: bsheedy@chromium.org
I spoke with Brian and we came up with a plan:

1. Brian fixes the conversion script gtest_json_converter.py to not automatically append _smallerIsBetter if the unit already ends with _smallerIsBetter or _biggerIsBetter. and also he fixes the issue where for https://chrome-isolated.appspot.com/browse?namespace=default-gzip&hash=26ea4e13d253cc91b55701074e678a3e8f14a86c the units for playout_resolution get set to ms even though playout_resolution should have units "lines" from 

*RESULT CastV2Performance_gpu_30fps_bad_autothrottling: playout_resolution= {890.000000,0.000000} lines

from https://chrome-swarming.appspot.com/task?id=427b546cbc0c1210&refresh=10&show_raw=1


2. Caleb figures out all the places where the gtest_perf_tests don't have _biggerIsBetter or _smallerIsBetter on their units and reaches out to the owners to get them to add that suffix.


3. After waiting for a couple months for owners to start using the suffix, we do the switch to histograms. Owners who haven't migrated will start getting alerts for the opposite of the correct direction, but hopefully we will have migrated all the important owners by then.


Comment 8 by crouleau@chromium.org, Jan 18 (4 days ago)

Labels: LUCI-Chromium

Comment 9 by crouleau@chromium.org, Jan 19 (4 days ago)

Spoke with Yuri and Brian again. 

We're wondering if gtest_perf_tests can output histograms natively. "histograms" are the same thing that UMA uses, right? So then we can use that implementation for the gtests to do the same thing?

We will schedule a meeting to discuss with Ben.

Brian will explain the exact data format issue. TL;DR: is that we think that the majority of gtest_perf_tests are doing this wrong. Useful reference: https://bugs.chromium.org/p/chromium/issues/detail?id=900677#c10

Comment 10 by bsheedy@chromium.org, Jan 19 (4 days ago)

TL;DR I think we'll also have to change the functions used to print the perf results, as the way they're currently implemented (or at least the way they're currently used) ends up making the results on the dashboard appear different than from other sources (e.g. Telemetry tests). Or, ideally, we just remove the whole log scraping conversion stuff and just output data directly in JSON.

Data on the dashboard is generally drilled down in the following order:

Waterfall/Master -> Builder/Bot -> Benchmark/Test Suite -> Metric -> Story/Specific Test Case

E.g. this memory Telemetry benchmark https://chromeperf.appspot.com/report?sid=35abc411e74356b766f45827cfd0a87377f3a0d21d4a5dbfa53e993d24a677c1

Most gtest perf tests, however, report results in the following order:

Waterfall/Master -> Builder/Bot -> Benchmark/Test Suite -> Story/Specific Test Case -> Metric

E.g. this Angle test https://chromeperf.appspot.com/report?sid=16dfb0a1f500dc5c0ffed7ecd42a7b89c2aa7c8992434284c8111119fc087324

This is caused by the data being uploaded being in the wrong format. Gtest perf data is supposed to be in the following format:

{
    'metric1': {
      'units': 'unit1',
      'traces': {
        'story1': ['mean', 'std_dev'],
        'story2': ['mean', 'std_dev'],
      },
      'important': ['testcase1', 'testcase2'],
    },
    'metric2': {
      'units': 'unit2',
      'traces': {
        'story1': ['mean', 'std_dev'],
        'story2': ['mean', 'std_dev'],
      },
      'important': ['testcase1', 'testcase2'],
    },
    ...
  }

  But most have the metric and story swapped. This causes the data display issues mentioned above, but more critically, makes it so that each story
  can only have one unit type. I believe this is also what was causing the weird halving of data when converting from gtest JSON to histograms, as the conversion script expects the metric -> story format, not the story -> metric format.

  This appears to be caused by user error when calling the functions to print data during a test. At some point in time, people swapped metric and story accidentally, and it's ended up becoming the norm, I assume because the swapped usage was used as an example for later benchmarks.

  For example, look at https://cs.chromium.org/chromium/src/cc/paint/paint_op_perftest.cc?q=perf_test::PrintResult&sq=package:chromium&dr=C&l=70.

  PrintResult takes the arguments measurement, modifier, trace, value, units, and important, in that order https://cs.chromium.org/chromium/src/testing/perf/perf_test.cc?gsn=PrintResult&l=94, where measurement + modifier = metric and trace = story. However, in the linked example, the story name is passed for measurement and "serialize" is passed for trace.

  The easiest but hacky solution would be to swap the order of measurement and trace in the function definition, which would fix the issue for all tests assuming all tests were using the wrong ordering, but we have no way to verify that that's the case.

  The idea we're leaning towards right now is rewriting the result printing to be a class instead of a bunch of static functions so that it can't be misused in this way.

Comment 11 by crouleau@chromium.org, Jan 19 (4 days ago)

Blockedon: 923616

Comment 12 by benjhayden@google.com, Jan 19 (3 days ago)

> We're wondering if gtest_perf_tests can output histograms natively. "histograms" are the same thing that UMA uses, right? So then we can use that implementation for the gtests to do the same thing?

Not quite. UMA Histograms work a bit different from catapult histograms. dproy is porting catapult histograms to C++ for TBMv3.
Looking forward to chatting on Tuesday.

Sign in to add a comment