All builds show some duplicated steps in the waterfall |
|||||||
Issue descriptionLooks like now all builds show duplicated steps in the waterfall page. ex. https://uberchromegw.corp.google.com/i/chromeos/builders/master-paladin/builds/12539 see the BuildStart, CleanUp and CommitQueueSync stages (basically all the stages before BuildReexecutionFinished)
,
Oct 5 2016
Not surprised: https://chromium-review.googlesource.com/#/c/386141/5/cbuildbot/stages/generic_stages.py@472 :) Ugh, that does look ugly. Note that this is at least partly WAI -- if a stage were to raise an exception while being skipped, that exception did not have a logical home before this change (overall build fails, but no stage to fail). I'd suggest that we add a BUILDBOT_TEXT saying SKIPPED on the landing page and keep the skipped stages there. Thoughts?
,
Oct 5 2016
There is a related problem though. The repeated steps print the exact same banner in step start. The way buildbot parses the logs, it ends up linking from both instances of the stage to the logs from the first one. for example, for the build above, if you look at the combined logs, you'll find that BuildStart was processed twice with different logs (second one says skipped, as expected), but the link from the two stages both go to the first instance's logs combined logs: https://uberchromegw.corp.google.com/i/chromeos/builders/master-paladin/builds/12539/steps/steps/logs/stdio Try clicking BuildStart's stdio from both instances here: https://uberchromegw.corp.google.com/i/chromeos/builders/master-paladin/builds/12539
,
Oct 5 2016
The luci log links work correctly because luci inserts a counter in the url.
,
Oct 5 2016
right. also with the change, both skipped and previously_completed stages will show up in the page. but we do need to handle the exceptions in the right places. Can we create a separate try/except block for them, and print Begin and Finish in the except block. so if there's exception caught, create the step and flush the std output.
,
Oct 6 2016
try/except block isn't enough. Any logs that HandleSkip creates from the stage would show up before the call to _Begin, and hence outside the stage on the waterfall (in the previous stage). How about just making it clear that this is a reexecution / skipped stage on the waterfall landing page. e.g.:https://chromium-review.googlesource.com/#/c/394148/ and example build: https://uberchromegw.corp.google.com/i/chromiumos.tryserver/builders/pre_cq/builds/2170/ Note that this still can not distinguish between three iterations of the same stage (second and third will point to the same logs). iirc, that never happens. Also, I'm not entirely sure why PreCQSync's second instance there was not recognized as previously_completed in the run above... That stage actually failed trying to re-apply the CL since it's already applied. Might have something to do with the fact that I ran with --test-bootstrap. (cbuildbot tried a bit too hard ;) )
,
Oct 6 2016
My first thought was that the stages which fail to run the second time should exit via "WaitForReady". But that now creates a stage on the waterfall, even if it exits with False, doesn't it? That's actually bad in other cases, because stages that never run at all will appear. Maybe we do both. Have the repeat stages exit in WaitForReady, then have the "WaitForReady" handler change stage names to be "<name> - Skipped" when False is returned.
,
Oct 6 2016
Re #7, NACK. WaitForReady means that the stage was waiting for some external condition to become true. Let's not conflate that with "we've already run, so let's skip". As the CL stands currently: https://chromium-review.googlesource.com/#/c/394148/ - Skipped and previously processed stages are marked distinctly (by deciding early what we'll do with the stage BUT - The actual HandleSkip handlers are run later, so that logs and exceptions from these stages show up under the correct stage. There is the minor annoyance that skipped stages now show up on the waterfall, but imo, it's better to show these stages always than only when exceptions occur, because it's more confusing that "from the waterfall, the stage that just failed in HandleSkip was never even run in the passing runs" from a sheriffing POV.
,
Oct 7 2016
Ping.... CL waiting on discussion here.
,
Oct 7 2016
Part of me says that we should hide the extra stages, because of the confusion they cause. Our bootstrap mechanism is confusing for the people that have worked on it, much less the average sheriff.
,
Oct 7 2016
,
Oct 7 2016
Aseda, I was thinking we should ask the taskforce for design guidance.
The general question is.... should all skipped stages appear on the waterfall.
Pros:
Much more straight forward code.
Clearly showing what we actually do.
Cons:
Really confusing, because it exposes a lot of complexity that is mostly uninteresting/hard to follow without prior knowledge.
Examples:
Stages repeated (and skipped) during bootstrap.
Stages that fail to run because requirements not met.
Paygen if signing fails.
Paygen, if paygen is disabled for a given build config.
,
Oct 7 2016
I think that we shouldn't show the skipped stages (at least by default). It adds noise and it's not _that_ useful to know that it was skipped (for the average person). Perhaps add a link to the *full* build log/waterfall that shows the skipped stages for those that actually want to see the build in its gory details. But yeah, let's bring it up in the taskforce and see what people think.
,
Oct 10 2016
The consensus from the meeting today was to show all of the stages, but to make sure that the title shows "- Skipped" or something to make it obvious to anyone investigating.
,
Oct 10 2016
,
Oct 17 2016
The following revision refers to this bug: https://chromium.googlesource.com/chromiumos/chromite/+/e1ea26bfdd880f89950e687329fe7b9c445efd9c commit e1ea26bfdd880f89950e687329fe7b9c445efd9c Author: Prathmesh Prabhu <pprabhu@chromium.org> Date: Thu Oct 06 00:16:12 2016 cbuildbot: Mark skipped and repeated stages clearly. After crosreview.com/386141, skipped and repeated stages show up on the waterfall. This CL clearly denotes the skipped / repeated stage so that buildbot treats it as a separate stage. BUG= chromium:653290 TEST=unittests (1 added), tryjob with repeated stages. Change-Id: Ie3f7d5b06e8e01d57441fdec2446db23392bcff9 Reviewed-on: https://chromium-review.googlesource.com/394148 Commit-Ready: Prathmesh Prabhu <pprabhu@chromium.org> Tested-by: Prathmesh Prabhu <pprabhu@chromium.org> Reviewed-by: Ningning Xia <nxia@chromium.org> [modify] https://crrev.com/e1ea26bfdd880f89950e687329fe7b9c445efd9c/cbuildbot/stages/generic_stages.py [modify] https://crrev.com/e1ea26bfdd880f89950e687329fe7b9c445efd9c/cbuildbot/stages/generic_stages_unittest.py
,
Oct 17 2016
Duplicate steps are being clearly marked and skipped now: https://uberchromegw.corp.google.com/i/chromeos/builders/master-paladin/builds/12642 https://luci-logdog.appspot.com/v/?s=chromeos%2Fbb%2Fchromeos%2Fmaster-paladin%2F12642%2F%2B%2Frecipes%2Fsteps%2FCleanUp_%3A__PREVIOUSLY_PROCESSED_%2F0%2Fstdout |
|||||||
►
Sign in to add a comment |
|||||||
Comment 1 by nxia@chromium.org
, Oct 5 2016