New issue
Advanced search Search tips
Note: Color blocks (like or ) mean that a user may not be available. Tooltip shows the reason.

Issue 829055 link

Starred by 3 users

Issue metadata

Status: Fixed
Owner:
Last visit > 30 days ago
Closed: May 2018
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: Chrome
Pri: 1
Type: Bug



Sign in to add a comment

whirlwind-pre-cq BuildPackages/unittest flake

Project Member Reported by smbar...@chromium.org, Apr 4 2018

Issue description

whirlwind builds are flaky due to missing DEPENDs

Example in Unittests stage: actions requires a retry until ap-infra is merged: https://uberchromegw.corp.google.com/i/chromeos/builders/whirlwind-paladin/builds/11096

Example failure in BuildPackages stage: https://uberchromegw.corp.google.com/i/chromeos/builders/whirlwind-paladin/builds/11107
 
Cc: jkardatzke@chromium.org
As far as I can tell, this breakage is causing every pre-cq that runs on whirlwind-no-vmtest-pre-cq to fail since yesterday. Can we get this out of the pre-cq if its not a quick fix?
Agreed, I'm trying to do that now.
Labels: -Pri-2 Pri-0
blocking the pre-cq
Cc: jclinton@chromium.org
One option is to remove it from the default set:  https://chromium-review.googlesource.com/#/c/chromiumos/chromite/+/996640

Note that everybody who is affected will need to rebase their CLs after this lands (and the pre-cq-launcher restart) in order for this to take effect.

+jclinton as this was also a failure of alerting.
Project Member

Comment 6 by bugdroid1@chromium.org, Apr 4 2018

The following revision refers to this bug:
  https://chromium.googlesource.com/chromiumos/chromite/+/bca1fd053debef0c43ec2b2955fd62a33190905f

commit bca1fd053debef0c43ec2b2955fd62a33190905f
Author: Aviv Keshet <akeshet@chromium.org>
Date: Wed Apr 04 23:11:27 2018

remove whirlwind-no-vmtest-pre-cq from default set

BUG= chromium:829055 
TEST=None

Change-Id: Id9d8305099150a181c622f6fcab146af9730e52c
Reviewed-on: https://chromium-review.googlesource.com/996640
Tested-by: Aviv Keshet <akeshet@chromium.org>
Trybot-Ready: Aviv Keshet <akeshet@chromium.org>
Reviewed-by: Stephen Barber <smbarber@chromium.org>

[modify] https://crrev.com/bca1fd053debef0c43ec2b2955fd62a33190905f/lib/constants.py

Components: Infra>Client>ChromeOS>CI
Labels: -Pri-0 Pri-1
Project Member

Comment 8 by bugdroid1@chromium.org, Apr 5 2018

Cc: jintao@chromium.org mxt@chromium.org vapier@chromium.org sa...@chromium.org bingxue@chromium.org
From jintao@:

Stephen,  I don't think the following CL is related to the dbus generation error.  Generating dbus-proxies.h should not require the DEPEND definition in ebuild file.  
https://chrome-internal-review.googlesource.com/c/chromeos/overlays/project-jetstream-private/+/602180

mxt@ from my team identified a recent CL that very likely have caused this issue.  
https://chromium-review.googlesource.com/c/chromiumos/platform2/+/959844

After reverting this CL, dbus proxies and adaptors can be properly generated.  This is also consistent with our observation that this dbus header generation error started occurring last week. 

The build log shows that the build process failed to detect the updated dbus definition and failed to re-generate the dbus proxies.  I think we need to include sources in the dbus generator inputs in order to establish the dependencies 

Cc: -mxt@chromium.org briannorris@chromium.org
Missed adding briannorris@ from the thread

Agreed that DEPEND isn't helping here. Sometimes the build_packages or unittest stage would succeed on retry, which is often a sign of missing DEPENDs entries. We can revert my CL.

sammc@'s change seems to imply that if we just revert that change, with the newer version of GYP we're using now the dbus generator gypi files shouldn't work at all. Is there something else we're missing?
Cc: mxt@chromium.org
Cc: mnissler@chromium.org
+ non-PDT sheriff
The gyp has underspecified dependencies: see also https://gyp.gsrc.io/docs/InputFormatReference.md#linking-dependencies.

Essentially, if you export a generated header by including it in a header in a particular target, that target needs to set 'hard_dependency': 1. Otherwise, ninja is free to reorder the generate step and the build step.

In this case, the dependency chain is libvorlonclient-test -> libvorlonclient -> metric-reporter -> ap-monlog-dbus-proxies. However, only ap-monlog-dbus-proxies sets hard_dependency, so the only build order dependency is metric-reporter -> ap-monlog-dbus-proxies. libvorlonclient-test and libvorlonclient have no build order constraints with respect to generating the headers so their .cc files can be compiled before the generation occurs.

libvorlonclient-test and libvorlonclient need a hard_dependency chain to ap-monlog-dbus-proxies, either by depending on it directly (it already has hard_dependency set) or having all intermediate steps set hard_dependency.
Sam, thanks a lot for the explanation. We will try the hard_dependency solution and will upload CL once it's verified
Here's a completely untested CL (I don't even know whether it compiles) that tries to break the header dependencies so the hard_dependency change wouldn't be necessary: https://chrome-internal-review.googlesource.com/#/c/chromeos/ap-daemons/+/602528

I'll check whether it fixes things.
The pre-cq now at least got a green build_packages step, so that's a good sign.

I'm still trying to validate locally (I've seen the bug once when rebuilding my whirlwind sysroot) to gain confidence that the fix actually works.
Project Member

Comment 18 by bugdroid1@chromium.org, Apr 5 2018

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/ec1cd17614f7aff652dc7983eb4d14f8a5b40257

commit ec1cd17614f7aff652dc7983eb4d14f8a5b40257
Author: chromite-chromium-autoroll@skia-buildbots.google.com.iam.gserviceaccount.com <chromite-chromium-autoroll@skia-buildbots.google.com.iam.gserviceaccount.com>
Date: Thu Apr 05 13:31:20 2018

Roll src/third_party/chromite/ ec8d7a766..8c177b33b (6 commits)

https://chromium.googlesource.com/chromiumos/chromite.git/+log/ec8d7a766a7e..8c177b33bdde

$ git log ec8d7a766..8c177b33b --date=short --no-merges --format='%ad %ae %s'
2018-04-05 chrome-bot Update config settings by config-updater.
2018-04-05 smbarber Revert "cbuildbot_launch: Clean distfiles cache when too old"
2018-04-04 dgarrett chromeos_config: Move fuzzer template to 'informational'.
2018-04-03 pprabhu cbuildbot_launch: Clean distfiles cache when too old
2018-04-04 chrome-bot Update config settings by config-updater.
2018-04-04 akeshet remove whirlwind-no-vmtest-pre-cq from default set

Created with:
  roll-dep src/third_party/chromite
BUG=chromium:None,chromium:814989,chromium:829055


The AutoRoll server is located here: https://chromite-chromium-roll.skia.org

Documentation for the AutoRoller is here:
https://skia.googlesource.com/buildbot/+/master/autoroll/README.md

If the roll is causing failures, please contact the current sheriff, who should
be CC'd on the roll, and stop the roller if necessary.


TBR=chrome-os-gardeners@chromium.org

Change-Id: Ic84d9c712043bce2cfaa2452eb101b70ec29843e
Reviewed-on: https://chromium-review.googlesource.com/997754
Reviewed-by: Chromite Chromium Autoroll <chromite-chromium-autoroll@skia-buildbots.google.com.iam.gserviceaccount.com>
Commit-Queue: Chromite Chromium Autoroll <chromite-chromium-autoroll@skia-buildbots.google.com.iam.gserviceaccount.com>
Cr-Commit-Position: refs/heads/master@{#548406}
[modify] https://crrev.com/ec1cd17614f7aff652dc7983eb4d14f8a5b40257/DEPS

My pre-cq run with the proposed fix runs into apparently unrelated unit-test failure: https://luci-logdog.appspot.com/v/?s=chromeos/buildbucket/cr-buildbucket.appspot.com/8950083994102712528/+/steps/UnitTest/0/stdout

Maybe the compilation bug has masked another bug that got introduced meanwhile?
It looks like it was relying on a transitive include of brillo/bind_lambda.h. https://cs.corp.google.com/chromeos_public/src/aosp/external/libbrillo/brillo/bind_lambda.h?l=53 seems to be the missing template specialization.
More bad news: In an attempt to validate the fix locally, I've replaced generate-chromeos-dbus-bindings with this script:

#!/bin/sh
sleep 10
echo "$0" "$@" >> /tmp/generator_log
exec /usr/bin/generate-chromeos-dbus-bindings.real "$@"

It turns out there is at least one more instance of the missing hard dependency problem:
                     
FAILED: platform/ap-daemons/common/libipv6mtumonitor.ipv6_upstream_mtu_monitor.o
armv7a-cros-linux-gnueabi-clang++ -MMD -MF platform/ap-daemons/common/libipv6mtumonitor.ipv6_upstream_mtu_monitor.o.d -DGWIFI_CHROMEOS -DUSE_TLSDATED -DUSE_METRICS_UPLOADER=1 -D_LARGEFILE_SOURCE -D_LARGEFILE64_SOURCE -D_FILE_OFFSET_BITS=64 -Iplatform/ap-daemons/libipv6mtumonitor.gen/include -Igen/include -I/build/whirlwind/tmp/portage/chromeos-base/ap-daemons-9999/work/ap-daemons-9999/platform2 -I/build/whirlwind/tmp/portage/chromeos-base/ap-daemons-9999/work/ap-daemons-9999/platform -I/build/whirlwind/usr/include -Igen -Wall -Wno-psabi -Wunused -Wno-unused-parameter -ggdb3 -fstack-protector-strong -Wformat=2 -fvisibility=internal -Wa,--noexecstack -Werror --sysroot=/build/whirlwind -DUSE_RTTI_FOR_TYPE_TAGS -std=c++11 -pthread -Wno-c++11-extensions -Wno-unused
-local-typedefs -DBASE_VER=395517 -pthread -I/build/whirlwind/usr/include/shill-client -I/build/whirlwind/usr/include/chromeos -I/build/whirlwind/usr/include/base-395517 -I/build/whirlwind/usr/include/glib-2.0 -I/build/whirlwind/usr/lib/glib-2.0/include -I/build/whirlwind/usr/include/nss -I/build/whirlwind/usr/include/nspr -I/build/whirlwind/usr/include/dbus-1.0 -I/build/whirlwind/usr/lib/dbus-1.0/include -I/build/whirlwind/usr/include/power_manager-client -fPIE -std=gnu++14 -D_FILE_OFFSET_BITS=64 -D_LARGEFILE_SOURCE -D_LARGEFILE64_SOURCE -O2 -O2 -O2 -pipe -march=armv7-a -mfpu=vfpv3-d16 -mfloat-abi=hard -g -fno-exceptions -fno-unwind-tables -fno-asynchronous-unwind-tables -clang-syntax -clang-syntax -fno-exceptions -fno-unwind-tables -fno-asynchronous-unwind-tables  -c ../../../../../../../tmp/portage/chromeos-base/ap-daemons-9999/work/ap-daemons-9999/platform/ap-daemons/common/ipv6_upstream_mtu_monitor.cc -o p
latform/ap-daemons/common/libipv6mtumonitor.ipv6_upstream_mtu_monitor.oIn file included from ../../../../../../../tmp/portage/chromeos-base/ap-daemons-9999/work/ap-daemons-9999/platform/ap-daemons/common/ipv6_upstream_mtu_monitor.cc:5:
In file included from /build/whirlwind/tmp/portage/chromeos-base/ap-daemons-9999/work/ap-daemons-9999/platform/ap-daemons/common/ipv6_upstream_mtu_monitor.h:16:
In file included from /build/whirlwind/tmp/portage/chromeos-base/ap-daemons-9999/work/ap-daemons-9999/platform/ap/common/wan_configuration.h:8:
/build/whirlwind/tmp/portage/chromeos-base/ap-daemons-9999/work/ap-daemons-9999/platform/ap/common/configuration_monitor.h:13:10: fatal error: 'ap-daemons/controller/dbus-proxies.h' file not found
#include "ap-daemons/controller/dbus-proxies.h"
         ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
1 error generated.

I'm inclined to just put hard_dependency: 1 into target_defaults to get things building again while we figure out a longer-term solution. This CL should do the trick: https://chrome-internal-review.googlesource.com/#/c/chromeos/ap-daemons/+/602571
Project Member

Comment 22 by bugdroid1@chromium.org, Apr 5 2018

The following revision refers to this bug:
  https://chrome-internal.googlesource.com/chromeos/ap-daemons/+/bb62b5cc2018e19e94b566562be967ab311d9a4b

commit bb62b5cc2018e19e94b566562be967ab311d9a4b
Author: Mattias Nissler <mnissler@chromium.org>
Date: Thu Apr 05 16:04:47 2018

Is PreCQ supposed to be fine? https://chromium-review.googlesource.com/c/chromiumos/platform2/+/760486 just failed (20 mins ago) on whirlwind BuildPackages.
PreCQ has passed on the above CL.
It looks like whirlwind-no-vmtest-pre-cq is green again: https://cros-goldeneye.corp.google.com/chromeos/healthmonitoring/buildDetails?id=2447085
I've encountered a case locally that has issue with even direct dependencies after the fix in comment#21:

Local change deletes a dbus method in ap-monlog (package ap-infra), successfully ran build_packages, switch back to master where the deleted method still exists (both ap-infra and ap-daemons) and run build_packages, it would complain the deleted method does not exist for ap-daemons, looking into ap-daemons/out/Default/gen/include/ap/monlog reveals the dbus-proxies have an older timestamp and is not updated in the most recent run of build_packages.

The error message complains about libapmonitor which directly depends on ap-monlog-dbus-proxies, and ap-monlog-dbus-proxies is set with hard_dependency:

x -fno-exceptions -fno-unwind-tables -fno-asynchronous-unwind-tables  -c ../../../../../../../tmp/portage/chromeos-base/ap-daemons-9999/work/ap-daemons-9999/platform/ap-daemons/monitor/monitor_daemon.cc -o platform/ap-daemons/monitor/libapmonitor.monitor_daemon.o
ap-daemons-9999: ../../../../../../../tmp/portage/chromeos-base/ap-daemons-9999/work/ap-daemons-9999/platform/ap-daemons/monitor/monitor_daemon.cc:597:31: error: no member named 'RecordStructuredDataByRequestUid' in 'org::chromium::ap::Monlog::ManagerProxyInterface'
ap-daemons-9999:   if (!monlog_manager_proxy_->RecordStructuredDataByRequestUid(
ap-daemons-9999:        ~~~~~~~~~~~~~~~~~~~~~  ^
ap-daemons-9999: 1 error generated.
ap-daemons-9999: ninja: build stopped: subcommand failed.
ap-daemons-9999:
ap-daemons-9999: cmd=['ninja', '-C', '/build/gale/var/cache/portage/chromeos-base/ap-daemons/out/Default', '-j', '12', 'all']





Attempt to set hard_dependency in target_defaults for ap-infra.gyp and monlog.gypi doesn't solve this problem still
#c26 a Portage-level problem with invalidation of reverse deps. https://b.corp.google.com/issues/75396373 is tracking fixing that when using build_packages.
re #c27, adding ap-infra to RDEPEND in ap-daemons does not resolve the issue either, is there a separate bug tracking portage-level problem with outdated deps?  
RDEPEND is for *runtime* dependencies, not reverse dependencies so is not expected to affect reverse dependencies.

Comment 30 by mxt@google.com, Apr 5 2018

whirlwind-no-vmtest-pre-cq has been green consecutively, pls put whirlwind back to the pre_cq_default_config.
Before we put this back into a place where it can block all ChromeOS development, I think we need to address the ownership question that allowed it to remain broken for so long. https://b.corp.google.com/issues/37222958

Comment 32 by mxt@chromium.org, Apr 5 2018

In this particular case, whirlwind was a victim.

The breakage is noticed 4/4 and immediate actions were taken by jetstream team

https://buganizer.corp.google.com/issues/77601144
Project Member

Comment 33 by bugdroid1@chromium.org, Apr 9 2018

The following revision refers to this bug:
  https://chromium.googlesource.com/chromiumos/chromite/+/12a067168f2523c71ef27f1e3e48a6d47692f7c6

commit 12a067168f2523c71ef27f1e3e48a6d47692f7c6
Author: Mu Lin <mxt@chromium.org>
Date: Mon Apr 09 22:18:57 2018

Revert "remove whirlwind-no-vmtest-pre-cq from default set"

This reverts commit bca1fd053debef0c43ec2b2955fd62a33190905f.

Reason for revert: https://bugs.chromium.org/p/chromium/issues/detail?id=829055#c32

Original change's description:
> remove whirlwind-no-vmtest-pre-cq from default set
>
> BUG= chromium:829055 
> TEST=None
>
> Change-Id: Id9d8305099150a181c622f6fcab146af9730e52c
> Reviewed-on: https://chromium-review.googlesource.com/996640
> Tested-by: Aviv Keshet <akeshet@chromium.org>
> Trybot-Ready: Aviv Keshet <akeshet@chromium.org>
> Reviewed-by: Stephen Barber <smbarber@chromium.org>

Bug:  chromium:829055 
Change-Id: I81956a4f70f2a896aad09a02ff372cbd166d3abe
Reviewed-on: https://chromium-review.googlesource.com/998393
Commit-Ready: Mu Lin <mxt@google.com>
Tested-by: Jason Clinton <jclinton@chromium.org>
Reviewed-by: Jason Clinton <jclinton@chromium.org>
Reviewed-by: Aviv Keshet <akeshet@chromium.org>
Reviewed-by: Stephen Barber <smbarber@chromium.org>

[modify] https://crrev.com/12a067168f2523c71ef27f1e3e48a6d47692f7c6/lib/constants.py

Project Member

Comment 34 by bugdroid1@chromium.org, Apr 10 2018

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/4d1bbb3fd11879ae03b79f844719703b9d4b959c

commit 4d1bbb3fd11879ae03b79f844719703b9d4b959c
Author: chromite-chromium-autoroll@skia-buildbots.google.com.iam.gserviceaccount.com <chromite-chromium-autoroll@skia-buildbots.google.com.iam.gserviceaccount.com>
Date: Tue Apr 10 04:36:17 2018

Roll src/third_party/chromite/ 02ef76b01..871e7778e (6 commits)

https://chromium.googlesource.com/chromiumos/chromite.git/+log/02ef76b018ac..871e7778ef8b

$ git log 02ef76b01..871e7778e --date=short --no-merges --format='%ad %ae %s'
2018-03-28 pprabhu gerrit: Add 'private' subcommand.
2018-04-09 dgarrett cros tryjob: Remove --waterfall option.
2018-04-09 cmtice [clang-tidy builder] Change board from 'kahlee' to 'grunt'.
2018-04-05 mxt Revert "remove whirlwind-no-vmtest-pre-cq from default set"
2018-04-06 xzhou Update tss group test to add bootlockboxd to tss.
2018-04-06 dgarrett cbuildbot_launch: RunCbuildbot -> Cbuildbot.

Created with:
  roll-dep src/third_party/chromite
BUG=chromium:None,chromium:752342,chromium:830691,chromium:829055,chromium:None


The AutoRoll server is located here: https://chromite-chromium-roll.skia.org

Documentation for the AutoRoller is here:
https://skia.googlesource.com/buildbot/+/master/autoroll/README.md

If the roll is causing failures, please contact the current sheriff, who should
be CC'd on the roll, and stop the roller if necessary.


TBR=chrome-os-gardeners@chromium.org

Change-Id: Iad8d29ecd5f47e06ce4867453d3ba4a4606d835c
Reviewed-on: https://chromium-review.googlesource.com/1004003
Commit-Queue: Chromite Chromium Autoroll <chromite-chromium-autoroll@skia-buildbots.google.com.iam.gserviceaccount.com>
Reviewed-by: Chromite Chromium Autoroll <chromite-chromium-autoroll@skia-buildbots.google.com.iam.gserviceaccount.com>
Cr-Commit-Position: refs/heads/master@{#549423}
[modify] https://crrev.com/4d1bbb3fd11879ae03b79f844719703b9d4b959c/DEPS

Owner: mxt@chromium.org
Status: Assigned (was: Untriaged)
I believe that this is fixed.

mxt@ - can you please confirm?

Labels: -Restrict-View-Google
Status: Fixed (was: Assigned)
Project Member

Comment 38 by bugdroid1@chromium.org, Jul 7

Sign in to add a comment