New issue
Advanced search Search tips

Issue 799079 link

Starred by 1 user

Issue metadata

Status: Started
Owner:
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: ----
Pri: 1
Type: Bug



Sign in to add a comment

test result data missing for perf alerts

Project Member Reported by seanmccullough@chromium.org, Jan 4 2018

Issue description

Should check logs to see if we got some errors requesting data from test-results.appspot.com while generating these alerts.
Seeing lots of error messages like this in the chromium.perf analyzer logs:

name != f.Step.Name: "speedometer2-future.reference" vs "speedometer2-future.reference on ATI GPU on Windows on Windows-2008ServerR2-SP1"

This happens in the getTestNames func: https://cs.chromium.org/chromium/infra/go/src/infra/appengine/sheriff-o-matic/som/analyzer/step/test_step.go?l=215 so it looks like it's getting confused by test names that include configuration information (which get "normalized" using some heuristics that may be wrong in these cases).
Cc: -seanmccullough@chromium.org
Owner: seanmccullough@chromium.org
Status: Assigned (was: Available)
For the specific step and builder in the screenshots:

"couldn't get test results history for "" "chromium.perf" "Android Nexus7v2 Perf" "v8.browsing_mobile-future on Android" :: {"error":"json: cannot unmarshal number 1515060931.119271 into Go struct field BuilderTestHistory.secondsSinceEpoch of type int"}"

I see a trivial fix for this error, but trying it out locally doesn't make alerts show test results. I'll submit a fix for this regardless.
Project Member

Comment 4 by bugdroid1@chromium.org, Jan 5 2018

The following revision refers to this bug:
  https://chromium.googlesource.com/infra/infra/+/1da28cfeaed402105db1624352832ac0300e2244

commit 1da28cfeaed402105db1624352832ac0300e2244
Author: Sean McCullough <seanmccullough@chromium.org>
Date: Fri Jan 05 22:45:33 2018

[som] Fix test result parsing for secondsSinceEpoch.

Bug: 799079
Change-Id: I651b589fb984da5cc96f34841a2d04ac33c4927d
Reviewed-on: https://chromium-review.googlesource.com/852523
Reviewed-by: Tiffany Zhang <zhangtiff@chromium.org>
Commit-Queue: Sean McCullough <seanmccullough@chromium.org>

[modify] https://crrev.com/1da28cfeaed402105db1624352832ac0300e2244/go/src/infra/appengine/sheriff-o-matic/som/client/testresults.go

Labels: Milestone-Perf
Status: Started (was: Assigned)
Project Member

Comment 6 by bugdroid1@chromium.org, Jan 9 2018

The following revision refers to this bug:
  https://chromium.googlesource.com/infra/infra/+/462e3ac67ab4549da9425e53e0ee2bb58b842f68

commit 462e3ac67ab4549da9425e53e0ee2bb58b842f68
Author: Sean McCullough <seanmccullough@chromium.org>
Date: Tue Jan 09 19:41:25 2018

[som] Get rid of messages.TestResults and custom parsing logic.

This switches over to just use the same model classes from the
test-results server code.  Also deletes some now un-used bits.

I love fixing bugs by deleting code :D

Bug: 799079
Change-Id: I86a7baa2c0ce91ef78cb6979346614036a91fb46
Reviewed-on: https://chromium-review.googlesource.com/855598
Reviewed-by: Tiffany Zhang <zhangtiff@chromium.org>
Commit-Queue: Sean McCullough <seanmccullough@chromium.org>

[delete] https://crrev.com/69e7ba262c6bfb135b5dd39e35cf088d196dd0db/go/src/infra/monitoring/messages/testresults.go
[modify] https://crrev.com/462e3ac67ab4549da9425e53e0ee2bb58b842f68/go/src/infra/appengine/sheriff-o-matic/som/client/testresults.go
[modify] https://crrev.com/462e3ac67ab4549da9425e53e0ee2bb58b842f68/go/src/infra/appengine/sheriff-o-matic/som/analyzer/analyzer_test.go
[modify] https://crrev.com/462e3ac67ab4549da9425e53e0ee2bb58b842f68/go/src/infra/appengine/sheriff-o-matic/som/analyzer/step/test_step_test.go
[modify] https://crrev.com/462e3ac67ab4549da9425e53e0ee2bb58b842f68/go/src/infra/appengine/sheriff-o-matic/som/client/test/test.go
[modify] https://crrev.com/462e3ac67ab4549da9425e53e0ee2bb58b842f68/go/src/infra/appengine/sheriff-o-matic/som/analyzer/step/test_step.go

Anecdotally, after the above fix has been in prod for a bit: I've noticed some of the alerts that still do not have test result data/links usually have a build *in progress* (after the alert was generated) that have reported the previously failing tests as passing. The test goes green between the time that we generate the alert and the time that we check for test results for the most recent build.

This means that if the analyzer just happens to scan the builder when it's most recent completed build reported a test failure, but the currently running build reports a passing result, we end up with a test step failure that tests-results doesn't have any failures to report for.

I think one heuristic we could implement for perf would be to require tests to fail at least two builds in a row before we consider them alertable. That's kind of a blunt instrument but I think it might decrease the noise.

The analyzer logic already looks at in-progress builds, but there may be some lag between what appears in milo and what has reported results to test-results.appspot.com.

Sign in to add a comment