New issue
Advanced search Search tips

Issue 728139 link

Starred by 0 users

Issue metadata

Status: Untriaged
Owner: ----
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: ----
Pri: 3
Type: ----

Blocked on:
issue 695061
issue 710897



Sign in to add a comment

Develop user stories and improve Flakiness Surface UI

Project Member Reported by serg...@chromium.org, May 31 2017

Issue description

Collect feedback from team leads and developers and develop user stories based on that feedback. This will drive the improvements to the UI as discussed in the project review.

Some AIs from project review:
 - include relative ratios for flakiness compared to 7 days ago
 - investigate reducing window used for averaging flakiness to 1 hour 
   - devs will see progress soon after fixing the flake
   - this will negatively impact on flakiness precision, but should still be able to estimate flakiness up to 1%-precision
   - some tests may have too few runs in last hour, so we may need to dynamically change the window and somehow display in the UI
   - alternatively we can average using window based on the number of runs
 
Status: Started (was: Assigned)
Sent followup emails to leads who've replied that a problem with Flakiness Surface is blocking them from using it. I've also demoed Flakiness Surface on a regular "Demo Days" meeting in SF - the feedback was very positive, but most people did not have any concrete improvement suggestions since they have not used it yet.
Feedback from jochen@

 - provide a way to see when the test started/stopped being flaky, where "when" may mean a revision, bot name, OS version or any other metadata
   - FindIt provides some of this information already, therefore adding its results to the Unified Flakiness Page will help
   - Adding dimensions to the Test Results will allow users to write custom queries in Dremel
 - need a way to have a look at a specific subset of tests
   - educate users to use substring search in omnibox on test-results.appspot.com/flakiness
 - add search by component to the omnibox on https://test-results.appspot.com/flakiness
 - feature request: subscribe to a group of tests (same criteria as used in omnibox on https://test-results.appspot.com/flakiness)
   - notify when test has become flaky
   - notify when test has been disabled
Received other feedback as well:
 - falken@ noticed that mapping is out of date
   - investigation revealed that our OWNERS extraction pipeline is broken, created issue 729037
 - people need more raw data to investigate flakes
   - this maps well into Unified Flakiness Page project
 - some people wanted better tools to investigate reliable failures
   - wrote a Dremel script to provide some help with that
   - this is out of scope of the Flakiness Project
 - received feedback about Flakiness Dashboard being slow
   - explained that it's a known issue, but that we do not have resources to improve it
 - filed  issue 729035  for jochen@'s feature request for flakiness alerts
Created issue 731140 to track request for adding search by component on https://test-results.appspot.com/flakiness.
Cc: estaab@chromium.org
Team leader user story:

  1. Team leader opens chromiumdash.chromium.org.
  2. He or she opens "Teams" page and selects their team.
  3. The graphs for the team show current health of the system, including the number of flaky tests and number of disabled tests.
  4. The team lead has an option to load the same graphs at a more granular level: component, directory or test suite owned by the team.
  5. They can then discuss the health with the team and create an OKR to drive down the number of flaky/disabled tests.

Developer user story:

  1. Develop needs to reduce the number of flaky tests and opens test-results.chromium.org.
  2. They enter a team, component, directory or a test suite name into the omnibox.
  3. The page loads a list of tests sorted by the number of flaky builds failures on CQ caused by this test.
  4. Developer has an option to change sorting column to test name and estimated flakiness (both desc and asc).
  5. Developer picks a test to work on and clicks on the row.
  6. Page loads and shows the list of links to recent flaky build failures on CQ and where test passed after few failed runs. Each list contains entries with links to builds and respective logs.
  7. Developer may click on the links to builds and logs to examine them and fix the flaky test.

Erik, do these sound good to you? Do you think it makes sense to share this with some developers and/or team leads and gather their feedback?

Comment 6 by estaab@chromium.org, Jun 21 2017

Yeah, these seem reasonable. Is reporting/alerting separate from this? It might be useful for team leads to know if things have gotten better or worse over a period (e.g. a monthly report).
Alerting/reporting is not on my agenda at this point. That's what
chromium-try-flakes is for and it's already working. My focus is on
creating tools for triaging/monitoring flakiness and I expect teams to
learn to visit this page regularly via Chromium Dash. If necessary, we can
add reporting/alerting later, but not yet.
Blockedon: 710897 695061
Status: Assigned (was: Started)
This is a tracking bug and should be blocked on all work needed to implement those user stories. I am not working on it actively, so moving to Assigned status.

Another blocker is adding Teams view to Chromium Dash. Design Doc is here: https://docs.google.com/document/d/1CwvhWfOIPYjGZImqYeBJSW7tgQzYZEmrd3k08SizpbI/edit (sorry, internal only). I am not sure if there is a tracking bug yet or who will be working on this.
Labels: Pri-3
Reducing priority for Flakiness Surface work.
Cc: seanmccullough@chromium.org
Owner: ----
Status: Available (was: Assigned)
Removing ownership as I'm transitioning to another team and Sean is taking over the flakiness effort.
Project Member

Comment 11 by sheriffbot@chromium.org, Aug 13

Labels: Hotlist-Recharge-Cold
Status: Untriaged (was: Available)
This issue has been Available for over a year. If it's no longer important or seems unlikely to be fixed, please consider closing it out. If it is important, please re-triage the issue.

Sorry for the inconvenience if the bug really should have been left as Available.

For more details visit https://www.chromium.org/issue-tracking/autotriage - Your friendly Sheriffbot

Sign in to add a comment