New issue
Advanced search Search tips
Starred by 4 users

Issue metadata

Status: Fixed
Owner:
Closed: Feb 15
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: Android
Pri: 1
Type: ----



Sign in to add a comment
link

Issue 853248: Android test failure not showing up on Sheriff-o-Matic

Reported by huayinz@chromium.org, Jun 15 2018 Project Member

Issue description

Recent example failure: https://ci.chromium.org/p/chromium/builders/luci.chromium.ci/KitKat%20Tablet%20Tester/485

When looking at Sheriff-o-Matic, the test failure doesn't show up.
 

Comment 1 by huayinz@chromium.org, Jun 15 2018

Labels: sheriff-android

Comment 2 by zhangtiff@chromium.org, Jun 16 2018

Labels: Milestone-Data

Comment 3 by twelling...@chromium.org, Jun 25 2018

This bot is still red and is still not showing up in sheriff-o-matic. In fact, when looking at sheriff-o-matic there are currently no failing bots but when looking at the waterfall there are quite a few red bots.

Any updates here?

Comment 4 by perezju@chromium.org, Jun 27 2018

Cc: jbudorick@chromium.org
Indeed the bot remains red, and still not showing up. Is this tablet bot expected to be excluded from SoM? I couldn't find it in:
https://cs.chromium.org/chromium/build/scripts/slave/gatekeeper.json

+jbudorick FYI

Comment 5 by perezju@chromium.org, Aug 10

Components: Infra>Client>Chrome
All of the "tablet" testers are now purple (appear to be timing out after 3 hours)?

https://ci.chromium.org/p/chromium/builders/luci.chromium.ci/KitKat%20Tablet%20Tester
https://ci.chromium.org/p/chromium/builders/luci.chromium.ci/Lollipop%20Tablet%20Tester
https://ci.chromium.org/p/chromium/builders/luci.chromium.ci/Marshmallow%20Tablet%20Tester

And indeed they do not show up on SoM. Is this intended?

Comment 6 by martiniss@chromium.org, Aug 10

Cc: bpastene@chromium.org
SoM team can look at why it's not showing up on SoM. Might be that internal failures somehow get filtered out?

The bots are having issues because the tasks are pending for much too long. Example build of https://ci.chromium.org/p/chromium/builders/luci.chromium.ci/Marshmallow%20Tablet%20Tester/1006. https://chromium-swarm.appspot.com/task?id=3f3d547c32492610&refresh=10&show_raw=1 is a chrome_public_apk task which has a pending time of 2 hours. The overall build timeout is 3 hours, so it times out before all the tasks are done. Probably a capacity issue?

Comment 7 by martiniss@chromium.org, Aug 11

Status: Available (was: Untriaged)
The problem we have right now is the LUCI swarming tasks are hitting a 3 hour execution timeout. That can be remedied fairly easily. We are still out of capacity though. I'll make a CL to bump the timeout for the tablet builders.

Comment 8 by martiniss@chromium.org, Aug 11

Status: Assigned (was: Available)
Assigning to sean to figure out what's happening with it not showing up on sheriff-o-matic.

Comment 9 by bugdroid1@chromium.org, Aug 11

Project Member
The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/711748ea30b592183f2d37f54091dda9f4fee9c2

commit 711748ea30b592183f2d37f54091dda9f4fee9c2
Author: Stephen Martinis <martiniss@chromium.org>
Date: Sat Aug 11 01:49:40 2018

Bump tablet tester execution timeout

They seem to need longer timeout for now. Hopefully will be temporary.

TBR=bpastene

Bug:  853248 
Change-Id: I6665013085631c69e6d78fe1f43f195e12d3295c
Reviewed-on: https://chromium-review.googlesource.com/1171909
Reviewed-by: Stephen Martinis <martiniss@chromium.org>
Commit-Queue: Stephen Martinis <martiniss@chromium.org>
Cr-Commit-Position: refs/heads/master@{#582420}
[modify] https://crrev.com/711748ea30b592183f2d37f54091dda9f4fee9c2/infra/config/global/cr-buildbucket.cfg

Comment 10 by perezju@chromium.org, Aug 13

Owner: seanmccullough@chromium.org

Comment 11 by dewittj@chromium.org, Aug 22

ping, is this still happening, and is the owner working on this issue?

Comment 12 by perezju@chromium.org, Sep 13

All tablet testers listed in #5 still failing.

Do we care about those bots? Should they be removed?

Comment 13 by twelling...@chromium.org, Sep 13

I care about these bots. We have tablet-only code and tests that need to run somewhere and be monitored.

Comment 14 by twelling...@chromium.org, Oct 11

It looks like the K, L, and M tablets are all still failing consistently. I'd love to get these bots greened up, which I think is likely going to require getting them back on sheriff-o-matic so sheriffs are actually looking the failures and filing bugs.

Did we ever get to the bottom of why these bots aren't appearing?

Comment 15 by perezju@chromium.org, Nov 1

Labels: -Pri-2 Pri-1
All still very red, and still not showing up on SoM.

SoM team any clue why is that?

Raising Pri to get some attention to this, as it has been going like this for quite a while.

Comment 16 by agrieve@google.com, Nov 20

Ping

Comment 17 by seanmccullough@chromium.org, Nov 20

Cc: hinoka@chromium.org martiniss@chromium.org
No mention of these builders in the analyzer logs, so I don't think SoM even checks them.

I don't see any of these builders (or "luci.chromium.ci" master, for that matter) mentioned in https://cs.chromium.org/chromium/build/scripts/slave/gatekeeper_trees.json?l=14

Are these builders part of a tree that's supposed to be handled by gatekeeper? That's the config file that SoM gets it's "Tree" definitions from.

martiniss@, hinoka@ do you know if luci.chromium.ci builders are supposed to show up in GetCompressedMasterJSON responses from Milo?

Comment 18 by hinoka@chromium.org, Nov 20

Yes it should.

Using this: https://paste.googleplex.com/6151095676043264
I see:
$ ./get_master.py chromium.android
...
KitKat Tablet Tester [12127, 12126, 12125, 12124, 12123, 12122]
...

It looks like the build number wasn't incremented when it was flipped (luci is at 1840, buildbot was at 12127), so Milo keeps returning the buildbot results.  I incremented the build number here:
https://apis-explorer.appspot.com/apis-explorer/?base=https://cr-buildbucket.appspot.com/_ah/api#p/swarmbucket/v1/swarmbucket.set_next_build_number

Comment 19 by hinoka@chromium.org, Nov 20

More explanation: Milo knows the builder is a chromium.android builder by looking at the "mastername" property in the buildbucket config.

Comment 20 by seanmccullough@chromium.org, Nov 20

It looks like https://ci.chromium.org/p/chromium/builders/luci.chromium.ci/KitKat%20Tablet%20Tester is still building 1841, so I presume it will only start using the larger build numbers on the next build?

hinoka@: can you or someone else just write a script to go through and reset all of the build numbers for builders who've migrated to luci so they're ahead of the buildbot build numbers? This isn't the first time this particular issue has hidden failures from sheriffs, and I have a bad feeling it won't be the last.

Comment 21 by seanmccullough@chromium.org, Nov 21

Owner: hinoka@chromium.org
"KitKat Tablet Tester" faillures now show up in SoM, but the Lollipop Tablet Tester and Marshmallow Tablet Tester still do not.

I'm not sure what values to set in the form at https://apis-explorer.appspot.com/apis-explorer/?base=https://cr-buildbucket.appspot.com/_ah/api#p/swarmbucket/v1/swarmbucket.set_next_build_number safely, or how monitor potential side effects.

Handing over to hinoka to complete those steps.

Comment 22 by dewittj@chromium.org, Dec 10

ping, this is sheriff P1 - does it still warrant that priority?

Comment 23 by digit@google.com, Dec 12

Ping, new sherriff here, is this still a P1?

Comment 24 by seanmccullough@chromium.org, Dec 12

Priority is the sheriff's call, I think. Are there still broken Android builds missing from SoM alerts?

Comment 25 by na...@chromium.org, Jan 2

New sheriff -- I can confirm that "Lollipop tablet tester" failures (most recent one from today) aren't showing up on SoM. P1 still applies.

Comment 26 by bsazonov@chromium.org, Feb 11

New sheriff. Lollipop Phone Tester, Lollipop Tablet Tester and Marshmallow Tablet Tester failures still aren't showing up on SoM.

Comment 27 by hinoka@chromium.org, Feb 15

Status: Fixed (was: Assigned)
I bumped the build numbers by marking the builder as non-prod, then prod again, on luci-migration.  They show up on SoM now:
https://screenshot.googleplex.com/CvkQScpiN7b

This should only happen to builders that were migrated early before automatic build number bumping was added.

Sign in to add a comment