New issue
Advanced search Search tips
Note: Color blocks (like or ) mean that a user may not be available. Tooltip shows the reason.

Issue 621193 link

Starred by 1 user

Issue metadata

Status: Assigned
Owner:
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: ----
Pri: 1
Type: Bug

Blocking:
issue 545408



Sign in to add a comment

test-results returns 404 when requesting full_results.json that previously successfully uploaded

Project Member Reported by martiniss@chromium.org, Jun 17 2016

Issue description

Flaky tests are currently hard to deal with in SOM.

Currently, the best practice is probably to link a bug to the alert when you see it, and then when it does show up again, it should have a bug linked to it, and you can see information over time about it.

Really though, we should link up to the test-results dashboard, and give you information about flakiness history for the bug, etc...

We should also have test-results tell us what some flaky tests are, and have those show up as needing the sheriffs attention as well.
 

Comment 1 by jpar...@google.com, Jun 17 2016

Cc: serg...@chromium.org
Martiniss@, can you elaborate more on the last point?  The flakiness pipeline work by serigyb@ is already surfacing flaky tests to sheriffs in SOM ...?
I've heard feedback from android sheriffs that it's not as easy for them. I had forgotten about that, though. It does answer the last point I posted in comment 1. 

We should definitely use the same bug queue style surfacing of flaky tests that chromium has for android sheriffs; I'll need to hook that up. I'm not sure if they have a bug label that they use for sheriff bugs though.

For people who aren't familiar with this, chromium sheriffs have a bug queue they're responsible for triaging and fixing, and there is an automated system (built by sergiyb@) which files bugs to this queue for flaky tests it detects. 
Hm, I wonder if we don't have something hooked up properly on the Android side. Do you have an example of the flakiness surfacing?
and #2: are you referring to chromium-try-flakes?
Re test-results, I can add a link to test-results page to the flaky test page easily. However, note that not all test suites upload to test-results and we only detect specific failing tests if they do. I am not sure what Android tests do... can you please give me an example of a failing test on Android?
#5: I don't see any three-try failures at the moment, but an example Android test suite is here: http://test-results.appspot.com/dashboards/flakiness_dashboard.html#testType=chrome_public_test_apk

Comment 7 by jpar...@google.com, Jun 20 2016

jbudorick@: No, this isn't chromium-try-flakes, this is the flakiness pipeline sergiyb@ has been working on for a few months now.  If we can get it working for android tests, that would be ideal.

sergiyb@ is there a Design Doc/One Pager you can point clank folks at?
jparent: ah, ok, it wasn't clear from martiniss's comment. If there's a doc to look at for that pipeline (or an available example of bugs it has filed), that'd be great.

Comment 9 by jpar...@google.com, Jun 20 2016

Check out https://sheriff-o-matic.appspot.com/chromium right now.

The top section has "Bug queue (what to do with this?):".  The issues all listed there were flakes detected by, and filed by, the bug queue.  The "What to do" link will take you to the info on the triage process from there.
All three of the current bugs were filed by chromium-try-flakes...? I'm confused.
Created issue 621498 for adding flakiness dashboard link.
Example of the bug filed by the flakiness pipeline: https://bugs.chromium.org/p/chromium/issues/detail?id=621315. It is filed by the  service account on the chromium-try-flakes app. Flakiness Dashboard is hosted on test-results, which is a different app. Chromium-try-flakes app talks to test-results to get the list of failures for a given step, so if you want flakiness pipeline to file bugs for chrome_public_test_apk, then someone will need to teach the test launcher that runs it to upload results to test-results in standard JSON format: https://www.chromium.org/developers/the-json-test-results-format.

Unfortunately I do not have an up-to-date design doc, but I'll try to draw a diagram now and post it here.
I think we already do (at least for the two chromium.linux bots), based on both them showing up in test-results and on the URL that chromium-try-flakes loads, e.g. http://test-results.appspot.com/testfile?builder=Android%20Tests&name=full_results.json&master=chromium.linux&testtype=chrome_public_test_apk&buildnumber=27740
We appear to be uploading from chromium.android and tryserver.chromium.android (i.e., larng) as well, but those don't even show up in test-results. (maybe because neither waterfall is configured here: https://chromium.googlesource.com/infra/infra/+/master/appengine/test_results/appengine_module/test_results/handlers/master_config.py#10 ?)
This are results from the main waterfall, we do not yet process results from the main waterfall. Only from CQ. Please track issue 587323 for progress on this front.
Sorry, that was reply to #14. Re #15, I'm not very familiar with test-results app, but I'd imagine adding android tryserver to the config would not harm it :-).
adding chromium.android + tryserver.chromium.android to test-results here: https://codereview.chromium.org/2083623002/
Project Member

Comment 19 by bugdroid1@chromium.org, Jun 20 2016

The following revision refers to this bug:
  https://chromium.googlesource.com/infra/infra.git/+/76fdb7373de8bb8fb401325aedee7f2d5c90f06a

commit 76fdb7373de8bb8fb401325aedee7f2d5c90f06a
Author: jbudorick <jbudorick@chromium.org>
Date: Mon Jun 20 16:02:34 2016

[test-results] Add {tryserver.,}chromium.android to the test_results configuration.

BUG=621193

Review-Url: https://codereview.chromium.org/2083623002

[modify] https://crrev.com/76fdb7373de8bb8fb401325aedee7f2d5c90f06a/appengine/test_results/appengine_module/test_results/handlers/master_config.py

I've deployed new version with the changes.

Btw, I've had a searched for chrome_public_test_apk results and found quite many including ones for tryserver.chromium.android: https://test-results.appspot.com/testfile?testtype=chrome_public_test_apk%20(with%20patch).

Apparently Flakiness Pipeline also registers flakes on chrome_public_test_apk, so I've tried to trace one flake: https://build.chromium.org/p/tryserver.chromium.android/builders/linux_android_rel_ng/builds/90094. It seems however, that test-results returns 500 when requesting JSON for this run: https://test-results.appspot.com/testfile?builder=linux_android_rel_ng&name=full_results.json&master=tryserver.chromium.android&testtype=chrome_public_test_apk%20(with%20patch)&buildnumber=90094.
Oh. Turns out in the last CL there was a typo. Fix: https://codereview.chromium.org/2080873002.
Blocking: 545408
Labels: -Pri-2 Pri-1
Owner: serg...@chromium.org
Status: Assigned (was: Available)
Summary: chromium-try-flakes fails to get individual tests for chrome_public_test_apk from test-results (was: Better support flaky tests in SOM)
Ok. Now that the endpoint is working I was able to verify that there is needed data on test-results for this test suite. Looks like this is an actual bug in chromium-try-flakes. Thanks for reporting, I'll look into it.
Cc: estaab@chromium.org
After looking through the logs I've found out that the JSON file was uploaded from buildbot on 2016-06-20 at 08:48:26 UTC and requested by chromium-try-flakes at 09:25:01 UTC. Despite the upload being successful, the request returned 404. Times are also far enough from each other to exclude the possibility of the eventual consistency. Not sure what happened here. Here are the logs:

https://pantheon.corp.google.com/logs?project=test-results-hrd&minLogLevel=0&expandAll=false&resource=appengine.googleapis.com&logName=&advancedFilter=metadata.serviceName%3D%22appengine.googleapis.com%22%0Alog%3D%22appengine.googleapis.com%2Frequest_log%22%0A%2290094%22%0A(%22chrome_public_test_apk%20(with%20patch)%22%20OR%20%22chrome_public_test_apk%2520(with%2520patch)%22)%0A%22linux_android_rel_ng%22%0A%22tryserver.chromium.android%22%0A&lastVisibleTimestampNanos=1466412506414690000 (set Just To Date field to 2016-06-20 09:30 UTC).

Eric, do you have an idea what could have happened here?
There was a warning during upload, but it was about updating incremental JSON for a unknown master, which would be fixed by John's CL. Also this warning was reported after the success log about storage of the full_results.json file.
Owner: estaab@chromium.org
Since this is a test-results app issue and not chromium-try-flakes, assigning to Erik.
I have a CL out (https://chromium-review.googlesource.com/c/354202/) which makes test results show up for android builders. I've tested it locally, and it works, and I've manually confirmed the data for android builders is on the server. So yay!
#21/22: thanks for fixing that, Sergiy.
Summary: test-results returns 404 after uploading full_results.json was successful (was: chromium-try-flakes fails to get individual tests for chrome_public_test_apk from test-results)
Summary: test-results returns 404 when requesting full_results.json that previously successfully uploaded (was: test-results returns 404 after uploading full_results.json was successful)
Components: Infra>Flakiness>Pipeline
Components: Infra>Sheriffing>SheriffOMatic
Components: -Infra>Sheriffing
Labels: Milestone-SoMNGFollowUp
Components: -Infra>Flakiness>Pipeline Infra>Flakiness>Dashboard
Flakiness dashboard shows nothing for me recently (perhaps since 1 or 2 days ago). Is this because this bug, or a new bug?
#36 seems unrelated. Filed  bug 632484 .
Labels: -Milestone-SoMNGFollowUp Milestone-Workflow
Ping - please provide an update to your high priority bug. This bug is stale. Is it really P-1?
Components: -Infra>Sheriffing>SheriffOMatic

Sign in to add a comment