New issue
Advanced search Search tips

Issue 889399 link

Starred by 1 user

Issue metadata

Status: Fixed
Owner:
Closed: Oct 4
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: iOS
Pri: 1
Type: Bug

Blocking:
issue 888476



Sign in to add a comment

https://ci.chromium.org/p/chromium/builders/luci.chromium.ci/ToTiOSDevice/99 shows "Internal Failure"

Project Member Reported by h...@chromium.org, Sep 26

Issue description

On the bot page (https://ci.chromium.org/buildbot/chromium.clang/ToTiOSDevice/) it says that build 99 failed with a compile error. However, visiting the page for build 99 (https://ci.chromium.org/p/chromium/builders/luci.chromium.ci/ToTiOSDevice/99) shows internal failure and doesn't let me see the build steps.

Marking P1 because investigating the compile failure, which I can't see, is blocking the clang roll.
 
Labels: -Pri-1 Pri-0
Actually, I can't access any of the builds on this bot -- red or green ones, they all show "Infra Failure"

P0 for ongoing outage.
Cc: justincohen@chromium.org
Labels: OS-iOS
Cc: sergeybe...@chromium.org jbudorick@chromium.org
Cc: mar...@chromium.org
per https://chromium-swarm.appspot.com/task?id=3fd032a7d6e67110, slice 1 isn't running because nothing has the builder cache. slice 2 should run but doesn't?
Labels: Foundation-Troopers
Digging more, this appears to be an issue w/ the builder's migration to luci.

https://build.chromium.org/deprecated/chromium.clang/builders/ToTiOSDevice has the full history of the buildbot (& shows the compile failures).
Labels: Infra-Troopers
Project Member

Comment 8 by bugdroid1@chromium.org, Sep 26

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/0465ccf6c78bd8a13e4ce87bb92cf9868e3b716b

commit 0465ccf6c78bd8a13e4ce87bb92cf9868e3b716b
Author: John Budorick <jbudorick@chromium.org>
Date: Wed Sep 26 15:06:46 2018

luci: pull misconfigured ios chromium.clang bots off the console.

TBR=hinoka@chromium.org

No-Try: true
Bug:  889399 
Change-Id: I0184ec4ab05b26c9e7abd0d5405f400e118cda2e
Reviewed-on: https://chromium-review.googlesource.com/1246241
Commit-Queue: John Budorick <jbudorick@chromium.org>
Reviewed-by: John Budorick <jbudorick@chromium.org>
Cr-Commit-Position: refs/heads/master@{#594314}
[modify] https://crrev.com/0465ccf6c78bd8a13e4ce87bb92cf9868e3b716b/infra/config/global/luci-milo.cfg

Bumped ToTiOSDevice's buildbot build number.
Project Member

Comment 10 by bugdroid1@chromium.org, Sep 26

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/c3f089dc172f7175f7d866597f25d0a56eb21598

commit c3f089dc172f7175f7d866597f25d0a56eb21598
Author: John Budorick <jbudorick@chromium.org>
Date: Wed Sep 26 15:53:34 2018

luci: fix recipe configs for chromium.clang:ToTiOS{,Device}.

TBR=hinoka@chromium.org

Bug:  889399 , 861396 ,885799
Change-Id: I7cb770cbe0d4b94f3de501c1e8bc55e37ccd77ce
Reviewed-on: https://chromium-review.googlesource.com/1246283
Reviewed-by: John Budorick <jbudorick@chromium.org>
Commit-Queue: John Budorick <jbudorick@chromium.org>
Cr-Commit-Position: refs/heads/master@{#594329}
[modify] https://crrev.com/c3f089dc172f7175f7d866597f25d0a56eb21598/infra/config/global/cr-buildbucket.cfg

ToTiOSDevice is not flipped to LUCI yet, but for some reason it's not listed in luci-migration (Probably because it was added more recently...), so Milo assumes it must be a LUCI builder.

Will need to add this into luci-migration somehow.
Cc: no...@chromium.org
Oh it's totally on luci-migration (just not in the right order).  I think this happened due to the luci-migration outage previously, so the luci build got marked as "prod".

I can't think of a quick way to fix this, I think it might be better to turn off the buildbot -> luci build redirection logic, it causes more user confusion than it solves I think.
Actually it's best not to turn it off, instead just bump the build number on the buildbot side.
Did that in #9.
(though after build 103 started)
I see.  The build takes 8 hours to cycle, maybe we should kill #103 and let the next one start?
I hadn't done so thus far because the build is still visible on the deprecated buildbot endpoint (https://build.chromium.org/deprecated/chromium.clang/builders/ToTiOSDevice/builds/103), but if hans/thakis would prefer killing it, that's fine with me.
Owner: sergeybe...@chromium.org
Status: Assigned (was: Untriaged)
There is a page & ticket - see issue 889490 , I wonder if it's related. Was busy with that page and didn't notice this issue - let me look at both.
#103 will not show up in RPC endpoints or tools because the latest build number (ie max(luci, buildbot)) is already 4000+, and Milo will only return the latest build number (and assume that there is only one "real" build for any build number, for the purposes of emulation mode).

My inclination is that since this is a P0 wrt some autoroller not being able to see the build, the best way forward will be to kill the current build so that the next can start.
OK, it's not related to issue 889490 . 
hinoka@ - are you already handling this issue? Want to grab it? Do you need any of my help (as a CCI trooper)?
..a-and build 103 is dead with NO_RESOURCE. No killing required.
Owner: hinoka@chromium.org
Sure I'll take this.

If you click on the link, you get taken to the luci build, which is incorrect.  jbudorick fixed this by bumping the buildbot build number, but it hasn't yet taken effect.  Buildbot still thinks #103 is the latest build number.

It's past work hours in MUC, so I don't think we're going to get a response.  I'll kill #103 so that we can downgrade the from a P0.
Thanks! Ping me if you need any help.
#18: the page is due to a network issue; see the ops chat.

#19: my impression is that this is a P0 because hans couldn't see what the compile failure was, and fixing it blocks the manual clang roll. (I don't think clang rolls are automated?)
oh, and hinoka: note that I bumped the buildbot number *considerably* -- to 10000, well beyond the minimum safe distance -- given the rate at which the LUCI builders were cycling. w/ #10, they should be cycling more slowly given that they're not failing immediately.
I'm fine with downgrading from P0; I was able to guess and go to the logdog URL directly.
Labels: -Pri-0 Pri-1
I see.  I mistakenly assumed all of our rolls our automated, and that this was a tools issue.  Since the build is still accessable via uberchromegw, I'll downgrade this to P1.

The next build has started, but the number is #104:
https://uberchromegw.corp.google.com/i/chromium.clang/builders/ToTiOSDevice/builds/104

Maybe the master needs to be restarted?
Project Member

Comment 28 by bugdroid1@chromium.org, Sep 26

The following revision refers to this bug:
  https://chrome-internal.googlesource.com/infradata/master-manager/+/98fe49e35c11dca59f2fb10448da8d25df24f1ee

commit 98fe49e35c11dca59f2fb10448da8d25df24f1ee
Author: Ryan Tseng <hinoka@google.com>
Date: Wed Sep 26 17:04:05 2018

#27: yeah, seems like it.
Status: Fixed (was: Assigned)
I didn't quite understand what happened, but I can see the builds now so it seems like this is fixed.
So which bot is correct?  I see:

https://ci.chromium.org/buildbot/chromium.clang/ToTiOSDevice/
Running on build155-m1

And I see:
https://ci.chromium.org/p/chromium/builders/luci.chromium.ci/ToTiOSDevice
Running on:
  build281-m9
  build282-m9
  build283-m9
  build284-m9
  build285-m9
  build286-m9
  
For the latter, all the builds are failing because the iOS device cert isn't installed.  I can fix this -- but which one is correct?  Is the old machine (build155-m1) being turned down?
  
Cc: efoo@chromium.org
Status: Assigned (was: Fixed)
+efoo who I think worked on this migration. Do you know what's going on here?
This one is correct right now:
https://ci.chromium.org/buildbot/chromium.clang/ToTiOSDevice/

The latter is LUCI, and because of the issues you're seeing, they haven't been flipped yet (But if you can fix them, that would be pretty fantastic for us).
Cc: d...@chromium.org
dba@ can you please add the iOS device cert and mobileprovision available at https://drive.google.com/corp/drive/folders/1lB7bCARh1DvHoOKtB0OPvpVqwY-TKoZ8 to the -m9 bots listed in #31?

Thanks!

justincohen - just to confirm (since I'm the one that set these bots up), even though they need device certs, they don't need actual devices because the tests are swarmed, is that correct?
#34 - Done.
dba@ thanks!

hinoka@ gn now passes.
Status: Fixed (was: Assigned)

Sign in to add a comment