New issue
Advanced search Search tips

Issue 656846 link

Starred by 1 user

Issue metadata

Status: Fixed
Owner:
Closed: Oct 2016
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: ----
Pri: 1
Type: Bug

Blocking:
issue 656540



Sign in to add a comment

Various goma and cloudtail exceptions on PDFium waterfall.

Project Member Reported by dsinclair@chromium.org, Oct 18 2016

Issue description

Cc: thestig@chromium.org npm@chromium.org weili@chromium.org tsepez@chromium.org
Labels: -Pri-3 Pri-1
These keep causing the PDFium tree to close, bumping to P1.

Comment 2 by tikuta@chromium.org, Oct 19 2016

Components: Infra>Goma
Owner: yyanagisawa@chromium.org
sorry, I'm ooo today.
could you fix this yyanagisawa?
Let me check what is happening in each errors one by one.

https://build.chromium.org/p/client.pdfium/builders/windows_xfa/builds/1887
https://build.chromium.org/p/client.pdfium/builders/windows_xfa_clang/builds/1250
https://build.chromium.org/p/client.pdfium/builders/windows_xfa_rel/builds/509
Above show followings and this must be cloudtail_utils.py's bug.  Windows does not have signal.SIGKILL but it was used.

Traceback (most recent call last):
  File "E:\b\build\scripts/slave\recipe_modules\goma\resources\cloudtail_utils.py", line 127, in <module>
    sys.exit(main())
  File "E:\b\build\scripts/slave\recipe_modules\goma\resources\cloudtail_utils.py", line 121, in main
    os.kill(pid, signal.SIGKILL)
AttributeError: 'module' object has no attribute 'SIGKILL'
step returned non-zero exit code: 1


https://build.chromium.org/p/client.pdfium/builders/drm_win_xfa/builds/1157
Taking too long time to unknown operation and get killed?
It should be better to print more logs?


https://build.chromium.org/p/client.pdfium/builders/windows_xfa_rel/builds/511
Something wrong seems to happened in Google API server?  I hope Google Cloud folks would solve this issue.
gs://chrome-goma-log/2016/10/17/vm51-m3/compiler_proxy.exe.VM51-M3.chrome-bot.log.INFO.20161017-064426.148.gz
W1017 06:44:31.515723  3988 http.cc:1758] oauth2Refresh read  http=500 path=//oauth2/v4/token Details:HTTP/1.1 500 Internal Server Error
Vary: X-Origin
Content-Type: application/json; charset=UTF-8
Date: Mon, 17 Oct 2016 13:44:31 GMT
Expires: Mon, 17 Oct 2016 13:44:31 GMT
Cache-Control: private, max-age=0
X-Content-Type-Options: nosniff
X-Frame-Options: SAMEORIGIN
X-XSS-Protection: 1; mode=block
Server: GSE
Alt-Svc: quic=":443"; ma=2592000; v="36,35,34,33,32"
Accept-Ranges: none
Vary: Origin,Accept-Encoding
Connection: close

{
 "error": "internal_failure",
 "error_description": "Backend Error"
}


Project Member

Comment 5 by bugdroid1@chromium.org, Oct 19 2016

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/tools/build.git/+/4f6540ce526f62da04741be7d86861db8550705e

commit 4f6540ce526f62da04741be7d86861db8550705e
Author: yyanagisawa <yyanagisawa@chromium.org>
Date: Wed Oct 19 04:46:48 2016

Use signal.SIGTERM instead of signal.SIGKILL to kill process.

Since Windows does not have signal.SIGKILL, we need to use signal.SIGTERM to use the same code in both posix and Windows.

Additional changes:
- minimize scope file handler is opened.
- make more chatty for ease of understanding what happens in where.
- ignore exceptions caused in wait_terminate.
  I believe making the service running is more important than
  completely killing cloudtail the way we thought.

BUG= 656846 

Review-Url: https://chromiumcodereview.appspot.com/2430123002

[modify] https://crrev.com/4f6540ce526f62da04741be7d86861db8550705e/scripts/slave/recipe_modules/goma/resources/cloudtail_utils.py

Project Member

Comment 6 by bugdroid1@chromium.org, Oct 19 2016

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/tools/build.git/+/4f6540ce526f62da04741be7d86861db8550705e

commit 4f6540ce526f62da04741be7d86861db8550705e
Author: yyanagisawa <yyanagisawa@chromium.org>
Date: Wed Oct 19 04:46:48 2016

Use signal.SIGTERM instead of signal.SIGKILL to kill process.

Since Windows does not have signal.SIGKILL, we need to use signal.SIGTERM to use the same code in both posix and Windows.

Additional changes:
- minimize scope file handler is opened.
- make more chatty for ease of understanding what happens in where.
- ignore exceptions caused in wait_terminate.
  I believe making the service running is more important than
  completely killing cloudtail the way we thought.

BUG= 656846 

Review-Url: https://chromiumcodereview.appspot.com/2430123002

[modify] https://crrev.com/4f6540ce526f62da04741be7d86861db8550705e/scripts/slave/recipe_modules/goma/resources/cloudtail_utils.py

Status: Fixed (was: Assigned)
Project Member

Comment 8 by bugdroid1@chromium.org, Oct 19 2016

Project Member

Comment 9 by bugdroid1@chromium.org, Oct 19 2016

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/c0efd536bca76095cf6619fbcd8ec62cb99208a1

commit c0efd536bca76095cf6619fbcd8ec62cb99208a1
Author: recipe-roller <recipe-roller@chromium.org>
Date: Wed Oct 19 11:11:44 2016

Roll recipe dependencies (trivial).

This is an automated CL created by the recipe roller. This CL rolls recipe
changes from upstream projects (e.g. depot_tools) into downstream projects
(e.g. tools/build).

More info is at https://goo.gl/zkKdpD. Use https://goo.gl/noib3a to file a bug
(or complain)

build:
  https://crrev.com/2a932686de7f14f155a66e02e9c21c30aa2926a9 Roll recipe dependencies (trivial). (recipe-roller@chromium.org)
  https://crrev.com/4f6540ce526f62da04741be7d86861db8550705e Use signal.SIGTERM instead of signal.SIGKILL to kill process. (yyanagisawa@chromium.org)
depot_tools:
  https://crrev.com/6ff1fc0e0163002596edbfbca2335325b043b823 Automatically map urls to their raw appengine forms (agable@chromium.org)

TBR=martiniss@chromium.org,phajdan.jr@chromium.org
BUG= 656846 , 657216 

Recipe-Tryjob-Bypass-Reason: Autoroller
Bugdroid-Send-Email: False
Review-Url: https://chromiumcodereview.appspot.com/2436493002
Cr-Commit-Position: refs/heads/master@{#426149}

[modify] https://crrev.com/c0efd536bca76095cf6619fbcd8ec62cb99208a1/infra/config/recipes.cfg

Project Member

Comment 10 by bugdroid1@chromium.org, Oct 19 2016

The following revision refers to this bug:
  https://chromium.googlesource.com/infra/infra.git/+/8fa9582d564549815771a7b920c2f4e08e7d6beb

commit 8fa9582d564549815771a7b920c2f4e08e7d6beb
Author: recipe-roller <recipe-roller@chromium.org>
Date: Wed Oct 19 11:22:52 2016

Roll recipe dependencies (trivial).

This is an automated CL created by the recipe roller. This CL rolls recipe
changes from upstream projects (e.g. depot_tools) into downstream projects
(e.g. tools/build).

More info is at https://goo.gl/zkKdpD. Use https://goo.gl/noib3a to file a bug
(or complain)

build:
  https://crrev.com/2a932686de7f14f155a66e02e9c21c30aa2926a9 Roll recipe dependencies (trivial). (recipe-roller@chromium.org)
  https://crrev.com/4f6540ce526f62da04741be7d86861db8550705e Use signal.SIGTERM instead of signal.SIGKILL to kill process. (yyanagisawa@chromium.org)
  https://crrev.com/d688a695ab02001fa8eb03c371d205725ecc37b7 V8: Bump shards on x87 bot (machenbach@chromium.org)
depot_tools:
  https://crrev.com/6ff1fc0e0163002596edbfbca2335325b043b823 Automatically map urls to their raw appengine forms (agable@chromium.org)

TBR=martiniss@chromium.org,phajdan.jr@chromium.org
BUG= 656846 , 657216 

Recipe-Tryjob-Bypass-Reason: Autoroller
Bugdroid-Send-Email: False
Review-Url: https://chromiumcodereview.appspot.com/2433003003

[modify] https://crrev.com/8fa9582d564549815771a7b920c2f4e08e7d6beb/infra/config/recipes.cfg

Thank you for fixing this.
Cc: tikuta@chromium.org
Status: Assigned (was: Fixed)
There still seem to be issues around this. [1] has been sitting for > 20 minutes in the 'stop cloudtail' task with:

  SIGINT has been sent to process 596. Going to wait for the process finishes.


[1] https://build.chromium.org/p/tryserver.client.pdfium/builders/win_xfa_clang/builds/1258
Also randomly fails on the linux bots [1]

Going to send SIGTERM to process 6134 due to Error [Errno 3] No such process
Traceback (most recent call last):
  File "/mnt/data/b/build/scripts/slave/recipe_modules/goma/resources/cloudtail_utils.py", line 132, in <module>
    sys.exit(main())
  File "/mnt/data/b/build/scripts/slave/recipe_modules/goma/resources/cloudtail_utils.py", line 126, in main
    os.kill(pid, signal.SIGTERM)
OSError: [Errno 3] No such process
step returned non-zero exit code: 1
@@@STEP_EXCEPTION@@@


[1] https://build.chromium.org/p/tryserver.client.pdfium/builders/linux_xfa/builds/2512
Mergedinto: 658049
Status: Duplicate (was: Assigned)
Status: Started (was: Duplicate)
Ah, there seems to be two kinds of issues.
#12: no fix might be implemented.  We thought waitpid would eventually finish but it might actually not.
#13: thanks Vadim, it has been fixed as crbug.com/658049.

I will try to fix #12 here.

Issue 658444 has been merged into this issue.
Blocking: 656540
Project Member

Comment 18 by bugdroid1@chromium.org, Oct 26 2016

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/tools/build.git/+/0d3a30427b4a190ca371310172455c0ea0c9064d

commit 0d3a30427b4a190ca371310172455c0ea0c9064d
Author: yyanagisawa <yyanagisawa@chromium.org>
Date: Wed Oct 26 02:40:53 2016

Not wait cloudtail finish forerver on Windows.

Let me provide better way of stopping cloudtail on Windows.
1. Use WaitForSingleObject to timeout in 10 seconds.
2. use handler to waitpid instead of pid.
   Also, handler is got before sending signal.
3. use signal.CTRL_C_EVENT if possible.

BUG= 656846 

Review-Url: https://codereview.chromium.org/2444233002

[modify] https://crrev.com/0d3a30427b4a190ca371310172455c0ea0c9064d/scripts/slave/recipe_modules/goma/resources/cloudtail_utils.py

Status: Fixed (was: Started)
I believe #18 fixes #12.
We use WaitForSingle object instead of waitpid to timeout in 10 seconds.

Sign in to add a comment