New issue
Advanced search Search tips

Issue 653290 link

Starred by 1 user

Issue metadata

Status: Verified
Owner:
Closed: Oct 2016
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: Chrome
Pri: 3
Type: Bug



Sign in to add a comment

All builds show some duplicated steps in the waterfall

Project Member Reported by nxia@chromium.org, Oct 5 2016

Issue description

Looks like now all builds show duplicated steps in the waterfall page.


ex. 
https://uberchromegw.corp.google.com/i/chromeos/builders/master-paladin/builds/12539

see the BuildStart, CleanUp and CommitQueueSync stages (basically all the stages before BuildReexecutionFinished)

 

Comment 1 by nxia@chromium.org, Oct 5 2016

Cc: pprabhu@chromium.org
Looks like it's introduced by this CL. Previously completed stage should skip printing step logs in the Begin()?

https://chromium-review.googlesource.com/#/c/386141/

pprabhu@ do you want to take a look?
Owner: pprabhu@chromium.org
Status: Assigned (was: Untriaged)
Not surprised: https://chromium-review.googlesource.com/#/c/386141/5/cbuildbot/stages/generic_stages.py@472

:)
Ugh, that does look ugly.

Note that this is at least partly WAI -- if a stage were to raise an exception while being skipped, that exception did not have a logical home before this change (overall build fails, but no stage to fail).

I'd suggest that we add a BUILDBOT_TEXT saying SKIPPED on the landing page and keep the skipped stages there.
Thoughts?
There is a related problem though.
The repeated steps print the exact same banner in step start. The way buildbot parses the logs, it ends up linking from both instances of the stage to the logs from the first one.

for example, for the build above, if you look at the combined logs, you'll find that BuildStart was processed twice with different logs (second one says skipped, as expected), but the link from the two stages both go to the first instance's logs
combined logs: https://uberchromegw.corp.google.com/i/chromeos/builders/master-paladin/builds/12539/steps/steps/logs/stdio
Try clicking BuildStart's stdio from both instances here: https://uberchromegw.corp.google.com/i/chromeos/builders/master-paladin/builds/12539
The luci log links work correctly because luci inserts a counter in the url.

Comment 5 by nxia@chromium.org, Oct 5 2016

right. also with the change, both skipped and previously_completed stages will show up in the page. but we do need to handle the exceptions in the right places. 

Can we create a separate try/except block for them, and print Begin and Finish in the except block. so if there's exception caught, create the step and flush the std output.
Status: Started (was: Assigned)
try/except block isn't enough. Any logs that HandleSkip creates from the stage would show up before the call to _Begin, and hence outside the stage on the waterfall (in the previous stage).

How about just making it clear that this is a reexecution / skipped stage on the waterfall landing page.
e.g.:https://chromium-review.googlesource.com/#/c/394148/
and example build: https://uberchromegw.corp.google.com/i/chromiumos.tryserver/builders/pre_cq/builds/2170/

Note that this still can not distinguish between three iterations of the same stage (second and third will point to the same logs). iirc, that never happens.

Also, I'm not entirely sure why PreCQSync's second instance there was not recognized as previously_completed in the run above... That stage actually failed trying to re-apply the CL since it's already applied. Might have something to do with the fact that I ran with --test-bootstrap. (cbuildbot tried a bit too hard ;) )

My first thought was that the stages which fail to run the second time should exit via "WaitForReady".

But that now creates a stage on the waterfall, even if it exits with False, doesn't it? That's actually bad in other cases, because stages that never run at all will appear.

Maybe we do both.

Have the repeat stages exit in WaitForReady, then have the "WaitForReady" handler change stage names to be "<name> - Skipped" when False is returned.
Re #7, NACK.

WaitForReady means that the stage was waiting for some external condition to become true. Let's not conflate that with "we've already run, so let's skip".

As the CL stands currently: https://chromium-review.googlesource.com/#/c/394148/
- Skipped and previously processed stages are marked distinctly (by deciding early what we'll do with the stage
BUT
- The actual HandleSkip handlers are run later, so that logs and exceptions from these stages show up under the correct stage.

There is the minor annoyance that skipped stages now show up on the waterfall, but imo, it's better to show these stages always than only when exceptions occur, because it's more confusing that "from the waterfall, the stage that just failed in HandleSkip was never even run in the passing runs" from a sheriffing POV.
Ping.... CL waiting on discussion here.
Part of me says that we should hide the extra stages, because of the confusion they cause. Our bootstrap mechanism is confusing for the people that have worked on it, much less the average sheriff.
Labels: BuildHealth iptaskforce
Cc: aaboagye@chromium.org
Aseda, I was thinking we should ask the taskforce for design guidance.

The general question is.... should all skipped stages appear on the waterfall.

Pros:
  Much more straight forward code.
  Clearly showing what we actually do.

Cons:
  Really confusing, because it exposes a lot of complexity that is mostly uninteresting/hard to follow without prior knowledge.
  Examples:
    Stages repeated (and skipped) during bootstrap.
    Stages that fail to run because requirements not met.
      Paygen if signing fails.
      Paygen, if paygen is disabled for a given build config.

I think that we shouldn't show the skipped stages (at least by default). It adds noise and it's not _that_ useful to know that it was skipped (for the average person). Perhaps add a link to the *full* build log/waterfall that shows the skipped stages for those that actually want to see the build in its gory details.

But yeah, let's bring it up in the taskforce and see what people think.
The consensus from the meeting today was to show all of the stages, but to make sure that the title shows "- Skipped" or something to make it obvious to anyone investigating.
Labels: Week-1641 Week-1642
Project Member

Comment 16 by bugdroid1@chromium.org, Oct 17 2016

The following revision refers to this bug:
  https://chromium.googlesource.com/chromiumos/chromite/+/e1ea26bfdd880f89950e687329fe7b9c445efd9c

commit e1ea26bfdd880f89950e687329fe7b9c445efd9c
Author: Prathmesh Prabhu <pprabhu@chromium.org>
Date: Thu Oct 06 00:16:12 2016

cbuildbot: Mark skipped and repeated stages clearly.

After crosreview.com/386141, skipped and repeated stages show up on the
waterfall. This CL clearly denotes the skipped / repeated stage so that
buildbot treats it as a separate stage.

BUG= chromium:653290 
TEST=unittests (1 added), tryjob with repeated stages.

Change-Id: Ie3f7d5b06e8e01d57441fdec2446db23392bcff9
Reviewed-on: https://chromium-review.googlesource.com/394148
Commit-Ready: Prathmesh Prabhu <pprabhu@chromium.org>
Tested-by: Prathmesh Prabhu <pprabhu@chromium.org>
Reviewed-by: Ningning Xia <nxia@chromium.org>

[modify] https://crrev.com/e1ea26bfdd880f89950e687329fe7b9c445efd9c/cbuildbot/stages/generic_stages.py
[modify] https://crrev.com/e1ea26bfdd880f89950e687329fe7b9c445efd9c/cbuildbot/stages/generic_stages_unittest.py

Sign in to add a comment