test result data missing for perf alerts |
|||
Issue descriptionlooking at nexus 7v2: https://lh3.googleusercontent.com/-0gK-RBsnfcs/Wk5d14I2i_I/AAAAAAAAB2I/LCuWCA_ievkk663iFOMCxVzCgJXHMS_dgCL0BGAYYCw/h992/2018-01-04.png consistent failures, including one in the last run here's what it looks like in sheriff-o-matic https://lh3.googleusercontent.com/-XriK00RrDAM/Wk5d84SO7YI/AAAAAAAAB2M/veilfJagb20zZW8S4fCoYhJnNoN317bkwCL0BGAYYCw/h602/2018-01-04.png (no test result data available)
,
Jan 4 2018
Seeing lots of error messages like this in the chromium.perf analyzer logs: name != f.Step.Name: "speedometer2-future.reference" vs "speedometer2-future.reference on ATI GPU on Windows on Windows-2008ServerR2-SP1" This happens in the getTestNames func: https://cs.chromium.org/chromium/infra/go/src/infra/appengine/sheriff-o-matic/som/analyzer/step/test_step.go?l=215 so it looks like it's getting confused by test names that include configuration information (which get "normalized" using some heuristics that may be wrong in these cases).
,
Jan 5 2018
For the specific step and builder in the screenshots:
"couldn't get test results history for "" "chromium.perf" "Android Nexus7v2 Perf" "v8.browsing_mobile-future on Android" :: {"error":"json: cannot unmarshal number 1515060931.119271 into Go struct field BuilderTestHistory.secondsSinceEpoch of type int"}"
I see a trivial fix for this error, but trying it out locally doesn't make alerts show test results. I'll submit a fix for this regardless.
,
Jan 5 2018
The following revision refers to this bug: https://chromium.googlesource.com/infra/infra/+/1da28cfeaed402105db1624352832ac0300e2244 commit 1da28cfeaed402105db1624352832ac0300e2244 Author: Sean McCullough <seanmccullough@chromium.org> Date: Fri Jan 05 22:45:33 2018 [som] Fix test result parsing for secondsSinceEpoch. Bug: 799079 Change-Id: I651b589fb984da5cc96f34841a2d04ac33c4927d Reviewed-on: https://chromium-review.googlesource.com/852523 Reviewed-by: Tiffany Zhang <zhangtiff@chromium.org> Commit-Queue: Sean McCullough <seanmccullough@chromium.org> [modify] https://crrev.com/1da28cfeaed402105db1624352832ac0300e2244/go/src/infra/appengine/sheriff-o-matic/som/client/testresults.go
,
Jan 9 2018
,
Jan 9 2018
The following revision refers to this bug: https://chromium.googlesource.com/infra/infra/+/462e3ac67ab4549da9425e53e0ee2bb58b842f68 commit 462e3ac67ab4549da9425e53e0ee2bb58b842f68 Author: Sean McCullough <seanmccullough@chromium.org> Date: Tue Jan 09 19:41:25 2018 [som] Get rid of messages.TestResults and custom parsing logic. This switches over to just use the same model classes from the test-results server code. Also deletes some now un-used bits. I love fixing bugs by deleting code :D Bug: 799079 Change-Id: I86a7baa2c0ce91ef78cb6979346614036a91fb46 Reviewed-on: https://chromium-review.googlesource.com/855598 Reviewed-by: Tiffany Zhang <zhangtiff@chromium.org> Commit-Queue: Sean McCullough <seanmccullough@chromium.org> [delete] https://crrev.com/69e7ba262c6bfb135b5dd39e35cf088d196dd0db/go/src/infra/monitoring/messages/testresults.go [modify] https://crrev.com/462e3ac67ab4549da9425e53e0ee2bb58b842f68/go/src/infra/appengine/sheriff-o-matic/som/client/testresults.go [modify] https://crrev.com/462e3ac67ab4549da9425e53e0ee2bb58b842f68/go/src/infra/appengine/sheriff-o-matic/som/analyzer/analyzer_test.go [modify] https://crrev.com/462e3ac67ab4549da9425e53e0ee2bb58b842f68/go/src/infra/appengine/sheriff-o-matic/som/analyzer/step/test_step_test.go [modify] https://crrev.com/462e3ac67ab4549da9425e53e0ee2bb58b842f68/go/src/infra/appengine/sheriff-o-matic/som/client/test/test.go [modify] https://crrev.com/462e3ac67ab4549da9425e53e0ee2bb58b842f68/go/src/infra/appengine/sheriff-o-matic/som/analyzer/step/test_step.go
,
Jan 12 2018
Anecdotally, after the above fix has been in prod for a bit: I've noticed some of the alerts that still do not have test result data/links usually have a build *in progress* (after the alert was generated) that have reported the previously failing tests as passing. The test goes green between the time that we generate the alert and the time that we check for test results for the most recent build. This means that if the analyzer just happens to scan the builder when it's most recent completed build reported a test failure, but the currently running build reports a passing result, we end up with a test step failure that tests-results doesn't have any failures to report for. I think one heuristic we could implement for perf would be to require tests to fail at least two builds in a row before we consider them alertable. That's kind of a blunt instrument but I think it might decrease the noise. The analyzer logic already looks at in-progress builds, but there may be some lag between what appears in milo and what has reported results to test-results.appspot.com. |
|||
►
Sign in to add a comment |
|||
Comment 1 by seanmccullough@chromium.org
, Jan 4 2018