New issue
Advanced search Search tips
Note: Color blocks (like or ) mean that a user may not be available. Tooltip shows the reason.

Issue 892309 link

Starred by 1 user

Issue metadata

Status: Assigned
Owner:
Cc:
EstimatedDays: ----
NextAction: ----
OS: ----
Pri: 3
Type: Bug



Sign in to add a comment

Better CQ recipe recovery from Cygwin/Nacl related compile errors

Project Member Reported by erikc...@chromium.org, Oct 4

Issue description

These transient/machine related issues cause the CQ recipe to immediately fail. We need a more graceful recovery mechanism.

Alternatively, we can discover and fix the root cause of the compile error.

https://chromium-cq-status.appspot.com/v2/patch-status/chromium-review.googlesource.com/1252086/1
https://ci.chromium.org/p/chromium/builders/luci.chromium.try/win7_chromium_rel_ng/98399

Compile error causes immediate exit.
"""
FAILED: glibc_x64/obj/chrome/test/data/nacl/exit_status_test_nexe/pm_exit_status_test.o 
  C:\b\s\w\ir\cache\goma\client/gomacc ../../native_client/toolchain/win_x86/nacl_x86_glibc/bin/x86_64-nacl-g++.exe -MMD -MF glibc_x64/obj/chrome/test/data/nacl/exit_status_test_nexe/pm_exit_status_test.o.d -DNACL_TC_REV=9ff1dc0c05b45941b86bed303a87a9eac17192ea -DV8_DEPRECATION_WARNINGS -DDCHECK_ALWAYS_ON=1 -DNO_TCMALLOC -DFULL_SAFE_BROWSING -DSAFE_BROWSING_CSD -DSAFE_BROWSING_DB_LOCAL -DCHROMIUM_BUILD -DFIELDTRIAL_TESTING_ENABLED -D_FILE_OFFSET_BITS=64 -D_LARGEFILE_SOURCE -D_LARGEFILE64_SOURCE -D__STDC_CONSTANT_MACROS -D__STDC_FORMAT_MACROS -D_FORTIFY_SOURCE=2 -DNDEBUG -DNVALGRIND -D_POSIX_C_SOURCE=199506 -D_XOPEN_SOURCE=600 -D_GNU_SOURCE=1 -D__STDC_LIMIT_MACROS=1 -DDYNAMIC_ANNOTATIONS_ENABLED=1 -DDYNAMIC_ANNOTATIONS_PREFIX=NACL_ -I../.. -Iglibc_x64/gen -fno-strict-aliasing -Wno-builtin-macro-redefined -D__DATE__= -D__TIME__= -D__TIMESTAMP__= -m64 -march=x86-64 -Wall -Werror -Wno-unused-local-typedefs -Wno-maybe-uninitialized -Wno-deprecated-declarations -fno-delete-null-pointer-checks -Wno-comments -Wno-missing-field-initializers -Wno-unused-parameter -O2 -fno-ident -fdata-sections -ffunction-sections -fomit-frame-pointer -g1 -fvisibility=hidden -Wno-narrowing -fno-exceptions -fno-rtti -fvisibility-inlines-hidden -c ../../chrome/test/data/nacl/exit_status/pm_exit_status_test.cc -o glibc_x64/obj/chrome/test/data/nacl/exit_status_test_nexe/pm_exit_status_test.o
        2 [main] x86_64-nacl-g++ (8972) C:\b\s\w\ir\cache\builder\src\native_client\toolchain\win_x86\nacl_x86_glibc\libexec\x86_64-nacl-g++.exe: *** fatal error - cygheap base mismatch detected - 0x28E0970/0x28D0970.
  This problem is probably due to using incompatible versions of the cygwin DLL.
  Search for cygwin1.dll using the Windows Start->Find/Search facility
  and delete all but the most recent version.  The most recent version *should*
  reside in x:\cygwin\bin, where 'x' is the drive on which you have
  installed the cygwin distribution.  Rebooting is also suggested if you
  are unable to find another cygwin DLL.
        1 [main] x86_64-nacl-g++ 808 fork: child -1 - forked process 8972 died unexpectedly, retry 0, exit code 0xC0000142, errno 11
"""
 

Comment 1 Deleted

In the comment in https://bugs.chromium.org/p/chromium/issues/detail?id=888734#c6 that lead to this bug, the suggestion is that it was a machine-specific issue. 

If that was the case, I would expect that there wouldn't be a good way to recover from this apart from triggering a new build, i.e., the CQ-level retry that you're trying to get rid of *is* the recovery mechanism. I'm not sure what else would be an option (apart from fixing the build to be reliable, of course).
Cc: tandrii@chromium.org
> If that was the case, I would expect that there wouldn't be a good way to recover from this apart from triggering a new build, i.e., the CQ-level retry that you're trying to get rid of *is* the recovery mechanism.

Agreed. If/when we get remote compile, then we can revisit. Until then, we'll need CQ-level retry to recover from machine-specific compile failures.

+ tandrii

I suspect the right approach here is to add state to the build job which indicates whether the failure should be retried by the CQ. I'd like to reach a state where compile failures get retried by the CQ, but test failures do not.
I understand. See also https://crbug.com/874117. Having well-defined output property instructing CQ to not retry a builder might be what you want.
On CQ side, it's a few hours of work to support. But before that, CCI team should agree with this idea.
Labels: Infra-Platform-Test

Sign in to add a comment