Various goma and cloudtail exceptions on PDFium waterfall. |
||||||||
Issue descriptionThese have been appearing randomly over the PDFium waterfall [1]. stop cloudtail https://build.chromium.org/p/client.pdfium/builders/windows_xfa/builds/1887 https://build.chromium.org/p/client.pdfium/builders/windows_xfa_clang/builds/1250 https://build.chromium.org/p/client.pdfium/builders/drm_win_xfa/builds/1157 https://build.chromium.org/p/client.pdfium/builders/windows_xfa_rel/builds/509 start goma https://build.chromium.org/p/client.pdfium/builders/windows_xfa_rel/builds/511 [1] https://build.chromium.org/p/client.pdfium/console
,
Oct 19 2016
sorry, I'm ooo today. could you fix this yyanagisawa?
,
Oct 19 2016
Let me check what is happening in each errors one by one. https://build.chromium.org/p/client.pdfium/builders/windows_xfa/builds/1887 https://build.chromium.org/p/client.pdfium/builders/windows_xfa_clang/builds/1250 https://build.chromium.org/p/client.pdfium/builders/windows_xfa_rel/builds/509 Above show followings and this must be cloudtail_utils.py's bug. Windows does not have signal.SIGKILL but it was used. Traceback (most recent call last): File "E:\b\build\scripts/slave\recipe_modules\goma\resources\cloudtail_utils.py", line 127, in <module> sys.exit(main()) File "E:\b\build\scripts/slave\recipe_modules\goma\resources\cloudtail_utils.py", line 121, in main os.kill(pid, signal.SIGKILL) AttributeError: 'module' object has no attribute 'SIGKILL' step returned non-zero exit code: 1 https://build.chromium.org/p/client.pdfium/builders/drm_win_xfa/builds/1157 Taking too long time to unknown operation and get killed? It should be better to print more logs? https://build.chromium.org/p/client.pdfium/builders/windows_xfa_rel/builds/511 Something wrong seems to happened in Google API server? I hope Google Cloud folks would solve this issue. gs://chrome-goma-log/2016/10/17/vm51-m3/compiler_proxy.exe.VM51-M3.chrome-bot.log.INFO.20161017-064426.148.gz W1017 06:44:31.515723 3988 http.cc:1758] oauth2Refresh read http=500 path=//oauth2/v4/token Details:HTTP/1.1 500 Internal Server Error Vary: X-Origin Content-Type: application/json; charset=UTF-8 Date: Mon, 17 Oct 2016 13:44:31 GMT Expires: Mon, 17 Oct 2016 13:44:31 GMT Cache-Control: private, max-age=0 X-Content-Type-Options: nosniff X-Frame-Options: SAMEORIGIN X-XSS-Protection: 1; mode=block Server: GSE Alt-Svc: quic=":443"; ma=2592000; v="36,35,34,33,32" Accept-Ranges: none Vary: Origin,Accept-Encoding Connection: close { "error": "internal_failure", "error_description": "Backend Error" }
,
Oct 19 2016
FYI working on https://codereview.chromium.org/2430123002/
,
Oct 19 2016
The following revision refers to this bug: https://chromium.googlesource.com/chromium/tools/build.git/+/4f6540ce526f62da04741be7d86861db8550705e commit 4f6540ce526f62da04741be7d86861db8550705e Author: yyanagisawa <yyanagisawa@chromium.org> Date: Wed Oct 19 04:46:48 2016 Use signal.SIGTERM instead of signal.SIGKILL to kill process. Since Windows does not have signal.SIGKILL, we need to use signal.SIGTERM to use the same code in both posix and Windows. Additional changes: - minimize scope file handler is opened. - make more chatty for ease of understanding what happens in where. - ignore exceptions caused in wait_terminate. I believe making the service running is more important than completely killing cloudtail the way we thought. BUG= 656846 Review-Url: https://chromiumcodereview.appspot.com/2430123002 [modify] https://crrev.com/4f6540ce526f62da04741be7d86861db8550705e/scripts/slave/recipe_modules/goma/resources/cloudtail_utils.py
,
Oct 19 2016
The following revision refers to this bug: https://chromium.googlesource.com/chromium/tools/build.git/+/4f6540ce526f62da04741be7d86861db8550705e commit 4f6540ce526f62da04741be7d86861db8550705e Author: yyanagisawa <yyanagisawa@chromium.org> Date: Wed Oct 19 04:46:48 2016 Use signal.SIGTERM instead of signal.SIGKILL to kill process. Since Windows does not have signal.SIGKILL, we need to use signal.SIGTERM to use the same code in both posix and Windows. Additional changes: - minimize scope file handler is opened. - make more chatty for ease of understanding what happens in where. - ignore exceptions caused in wait_terminate. I believe making the service running is more important than completely killing cloudtail the way we thought. BUG= 656846 Review-Url: https://chromiumcodereview.appspot.com/2430123002 [modify] https://crrev.com/4f6540ce526f62da04741be7d86861db8550705e/scripts/slave/recipe_modules/goma/resources/cloudtail_utils.py
,
Oct 19 2016
,
Oct 19 2016
The following revision refers to this bug: https://chrome-internal.googlesource.com/chrome/tools/build_limited/scripts/slave/+/78c706f5e2c837c5cf2c9f7acee99c5b444be8e4 commit 78c706f5e2c837c5cf2c9f7acee99c5b444be8e4 Author: recipe-roller <recipe-roller@chromium.org> Date: Wed Oct 19 04:54:17 2016
,
Oct 19 2016
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/c0efd536bca76095cf6619fbcd8ec62cb99208a1 commit c0efd536bca76095cf6619fbcd8ec62cb99208a1 Author: recipe-roller <recipe-roller@chromium.org> Date: Wed Oct 19 11:11:44 2016 Roll recipe dependencies (trivial). This is an automated CL created by the recipe roller. This CL rolls recipe changes from upstream projects (e.g. depot_tools) into downstream projects (e.g. tools/build). More info is at https://goo.gl/zkKdpD. Use https://goo.gl/noib3a to file a bug (or complain) build: https://crrev.com/2a932686de7f14f155a66e02e9c21c30aa2926a9 Roll recipe dependencies (trivial). (recipe-roller@chromium.org) https://crrev.com/4f6540ce526f62da04741be7d86861db8550705e Use signal.SIGTERM instead of signal.SIGKILL to kill process. (yyanagisawa@chromium.org) depot_tools: https://crrev.com/6ff1fc0e0163002596edbfbca2335325b043b823 Automatically map urls to their raw appengine forms (agable@chromium.org) TBR=martiniss@chromium.org,phajdan.jr@chromium.org BUG= 656846 , 657216 Recipe-Tryjob-Bypass-Reason: Autoroller Bugdroid-Send-Email: False Review-Url: https://chromiumcodereview.appspot.com/2436493002 Cr-Commit-Position: refs/heads/master@{#426149} [modify] https://crrev.com/c0efd536bca76095cf6619fbcd8ec62cb99208a1/infra/config/recipes.cfg
,
Oct 19 2016
The following revision refers to this bug: https://chromium.googlesource.com/infra/infra.git/+/8fa9582d564549815771a7b920c2f4e08e7d6beb commit 8fa9582d564549815771a7b920c2f4e08e7d6beb Author: recipe-roller <recipe-roller@chromium.org> Date: Wed Oct 19 11:22:52 2016 Roll recipe dependencies (trivial). This is an automated CL created by the recipe roller. This CL rolls recipe changes from upstream projects (e.g. depot_tools) into downstream projects (e.g. tools/build). More info is at https://goo.gl/zkKdpD. Use https://goo.gl/noib3a to file a bug (or complain) build: https://crrev.com/2a932686de7f14f155a66e02e9c21c30aa2926a9 Roll recipe dependencies (trivial). (recipe-roller@chromium.org) https://crrev.com/4f6540ce526f62da04741be7d86861db8550705e Use signal.SIGTERM instead of signal.SIGKILL to kill process. (yyanagisawa@chromium.org) https://crrev.com/d688a695ab02001fa8eb03c371d205725ecc37b7 V8: Bump shards on x87 bot (machenbach@chromium.org) depot_tools: https://crrev.com/6ff1fc0e0163002596edbfbca2335325b043b823 Automatically map urls to their raw appengine forms (agable@chromium.org) TBR=martiniss@chromium.org,phajdan.jr@chromium.org BUG= 656846 , 657216 Recipe-Tryjob-Bypass-Reason: Autoroller Bugdroid-Send-Email: False Review-Url: https://chromiumcodereview.appspot.com/2433003003 [modify] https://crrev.com/8fa9582d564549815771a7b920c2f4e08e7d6beb/infra/config/recipes.cfg
,
Oct 20 2016
Thank you for fixing this.
,
Oct 20 2016
There still seem to be issues around this. [1] has been sitting for > 20 minutes in the 'stop cloudtail' task with: SIGINT has been sent to process 596. Going to wait for the process finishes. [1] https://build.chromium.org/p/tryserver.client.pdfium/builders/win_xfa_clang/builds/1258
,
Oct 20 2016
Also randomly fails on the linux bots [1]
Going to send SIGTERM to process 6134 due to Error [Errno 3] No such process
Traceback (most recent call last):
File "/mnt/data/b/build/scripts/slave/recipe_modules/goma/resources/cloudtail_utils.py", line 132, in <module>
sys.exit(main())
File "/mnt/data/b/build/scripts/slave/recipe_modules/goma/resources/cloudtail_utils.py", line 126, in main
os.kill(pid, signal.SIGTERM)
OSError: [Errno 3] No such process
step returned non-zero exit code: 1
@@@STEP_EXCEPTION@@@
[1] https://build.chromium.org/p/tryserver.client.pdfium/builders/linux_xfa/builds/2512
,
Oct 21 2016
,
Oct 21 2016
Ah, there seems to be two kinds of issues. #12: no fix might be implemented. We thought waitpid would eventually finish but it might actually not. #13: thanks Vadim, it has been fixed as crbug.com/658049. I will try to fix #12 here.
,
Oct 24 2016
Issue 658444 has been merged into this issue.
,
Oct 24 2016
,
Oct 26 2016
The following revision refers to this bug: https://chromium.googlesource.com/chromium/tools/build.git/+/0d3a30427b4a190ca371310172455c0ea0c9064d commit 0d3a30427b4a190ca371310172455c0ea0c9064d Author: yyanagisawa <yyanagisawa@chromium.org> Date: Wed Oct 26 02:40:53 2016 Not wait cloudtail finish forerver on Windows. Let me provide better way of stopping cloudtail on Windows. 1. Use WaitForSingleObject to timeout in 10 seconds. 2. use handler to waitpid instead of pid. Also, handler is got before sending signal. 3. use signal.CTRL_C_EVENT if possible. BUG= 656846 Review-Url: https://codereview.chromium.org/2444233002 [modify] https://crrev.com/0d3a30427b4a190ca371310172455c0ea0c9064d/scripts/slave/recipe_modules/goma/resources/cloudtail_utils.py
,
Oct 26 2016
I believe #18 fixes #12. We use WaitForSingle object instead of waitpid to timeout in 10 seconds. |
||||||||
►
Sign in to add a comment |
||||||||
Comment 1 by dsinclair@chromium.org
, Oct 18 2016Labels: -Pri-3 Pri-1