New issue
Advanced search Search tips
Note: Color blocks (like or ) mean that a user may not be available. Tooltip shows the reason.
Starred by 1 user
Status: Untriaged
Owner:
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: Chrome
Pri: 2
Type: Bug



Sign in to add a comment
Android PFQs failed due to getting half of circular dependent uprevs
Project Member Reported by diand...@chromium.org, Jan 9 Back to list
A whole pile of PFQs failed:

betty-arc64-nyc-android-pfq:543 failed
betty-nyc-android-pfq:1190 failed
bob-nyc-android-pfq:1189 failed
caroline-nyc-android-pfq:1619 failed
coral-nyc-android-pfq:308 failed
cyan-nyc-android-pfq:1415 failed
hana-nyc-android-pfq:1189 failed
reef-nyc-android-pfq:1618 failed
samus-nyc-android-pfq:1415 failed
veyron_jaq-nyc-android-pfq:1563 failed
veyron_minnie-nyc-android-pfq:1415 failed

--

Digging into betty-arc64-nyc-android-pfq:543, AKA <https://luci-milo.appspot.com/buildbot/chromeos/betty-arc64-nyc-android-pfq/543>

Build packages was failing with stuff like this:

dlm-0.0.1-r17: tk/libPOSIXDLM/EvdiLibrary.cpp:201:112: error: cannot initialize a member subobject of type 'void (*)(struct evdi_cursor_set, void *)' with an rvalue of type 'void *'
dlm-0.0.1-r17:   evdi_event_context context = { dpms_handler, mode_changed_handler, update_ready_handler, crtc_state_handler, (void*)handler };


...which made us think of the dlm change that was listed in the blame list.  AKA:

dlm | michal.lukaszek | 508612
https://chromium-review.googlesource.com/c/508612/

...but it made no sense since that CL passed the normal CQ and there should be nothing special and different here.

--

Digging deeper, it turns out that CL was't actually part of the PFQ.  This PFQ appears to be an inconsistent run.  Specifically see these two CLs:

https://chromium-review.googlesource.com/c/chromiumos/overlays/chromiumos-overlay/+/743561
https://chrome-internal-review.googlesource.com/c/chromeos/vendor/dlm/+/508612


These two CLs are mutually dependent on each other.

The first patch uprevs evdi to evdi-1.5.0.ebuild.

For the DLM patch (the second one) to actually take effect we need the uprev since DLM is a cros-workon package.  AKA logically to "depend on" the DLM change you need to actually "depend on" the automatic uprev from:
  src/private-overlays/chromeos-partner-overlay/media-libs/dlm

We need this commit:

  commit 326ffcc813179a5d1785aeb4cc96713ff4b5bda0 (HEAD, m/master, cros-internal/master)
  Author:     chrome-bot <chrome-bot@chromium.org>
  AuthorDate: Tue Jan 9 08:22:53 2018 -0800
  Commit:     chrome-bot <chrome-bot@chromium.org>
  CommitDate: Tue Jan 9 08:22:53 2018 -0800

      Marking set of ebuilds as stable
  ...
  ...
   media-libs/dlm/{dlm-0.0.1-r17.ebuild => dlm-0.0.1-r18.ebuild} | 4 ++--

--

As you can see from the build failure, the PFQs above somehow ended up trying to compile evdi-1.5.0 together with the old dlm-0.0.1-r17.  That was an incorrect thing for the builders to do.

Also a bit confusing is that the builders listed the DLM change in the blamelist even though the DLM uprev wasn't in the build...

--

Presumably all the above happened because the android PFQs kicked off at just the wrong time as the CQ was midway through committing changes.

====

There's nothing we instantly need to do since we expect the next builds to work just fine.  ...but it would be good to avoid the above situations in the future if possible.
 
Cc: akes...@chromium.org displaylink@chromium.org
Thanks for the analysis, it does appear we have a race with the PFQs and the CQ for cases where CLs that have cross repository dependencies can break. I suspect it is not feasible to atomically commit everything the CQ needs to do at once?

Perhaps we should adjust the timing of the PFQs starting to be relative to the CQ?

I am glad we don't have to revert anything, displaylink@chromium.org for FYI.
Cc: davidjames@chromium.org
> Thanks for the analysis, it does appear we have a race with the PFQs and the CQ for cases where CLs that have cross repository dependencies can break. I suspect it is not feasible to atomically commit everything the CQ needs to do at once?


If we started using Gerrit topics, we might be able to atomically submit to multiple repositires on the same gerrit host atomically. I say *might* because I'm not sure how reliably the feature works (+davidjames?)

For submitting atomically to repos that are on different gerrit hosts, there is no support for this. And in many cases we need that anyway, so unfortunately I think the request is not feasible.

> Perhaps we should adjust the timing of the PFQs starting to be relative to the CQ?

The CQ starts at random times, so the PFQ would have to learn to wait for the CQ to reach a likely-to-be-lazy state before starting.
PFQ seems to be back to green now, so I'll abandon the reverts.
Sign in to add a comment