New issue
Advanced search Search tips
Note: Color blocks (like or ) mean that a user may not be available. Tooltip shows the reason.

Issue 599838 link

Starred by 2 users

Issue metadata

Status: Fixed
Owner:
Closed: Apr 2016
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: Windows
Pri: 3
Type: Bug



Sign in to add a comment

"start_crash_service" is flaky

Project Member Reported by chromium...@appspot.gserviceaccount.com, Apr 1 2016

Issue description

"start_crash_service" is flaky.

This issue was created automatically by the chromium-try-flakes app. Please find the right owner to fix the respective test/step and assign this issue to them. If the step/test is infrastructure-related, please add Infra-Troopers label and change issue status to Untriaged. When done, please remove the issue from Sheriff Bug Queue by removing the Sheriff-Chromium label.

We have detected 3 recent flakes. List of all flakes can be found at https://chromium-try-flakes.appspot.com/all_flake_occurrences?key=ahVzfmNocm9taXVtLXRyeS1mbGFrZXNyHgsSBUZsYWtlIhNzdGFydF9jcmFzaF9zZXJ2aWNlDA.

Flaky tests should be disabled within 30 minutes unless culprit CL is found and reverted. Please see more details here: https://sites.google.com/a/chromium.org/dev/developers/tree-sheriffs/sheriffing-bug-queues#triaging-auto-filed-flakiness-bugs

This flaky test/step was previously tracked in  issue 595200 .
 
Labels: Infra
Labels: -Sheriff-Chromium
Owner: pfeldman@chromium.org
pfeldman@, I wonder if you could hind an owner for this bug?  This seems to have the same symptoms as  issue 595200  from 2 weeks ago:

https://build.chromium.org/p/tryserver.chromium.win/builders/win_chromium_rel_ng/builds/197823/steps/start_crash_service/logs/stdio:

Traceback (most recent call last):
  File "E:\b\build\scripts\slave\chromium\run_crash_handler.py", line 71, in <module>
    sys.exit(main())
  File "E:\b\build\scripts\slave\chromium\run_crash_handler.py", line 50, in main
    raise chromium_utils.PathNotFound('Unable to find %s' % exe_path)
common.chromium_utils.PathNotFound: Unable to find E:\b\build\slave\win\build\src\out\Release\crash_service.exe
Cc: lukasza@chromium.org
Oh, and my apologies for throwing a more-or-less random bug at you - I found your name if history of changes for scripts/slave/chromium/run_crash_handler.py - this is why I hope you might know who can investigate this bug further.
Project Member

Comment 4 by chromium...@appspot.gserviceaccount.com, Apr 3 2016

Labels: Sheriff-Chromium
Detected 3 new flakes for test/step "start_crash_service". To see the actual flakes, please visit https://chromium-try-flakes.appspot.com/all_flake_occurrences?key=ahVzfmNocm9taXVtLXRyeS1mbGFrZXNyHgsSBUZsYWtlIhNzdGFydF9jcmFzaF9zZXJ2aWNlDA. This message was posted automatically by the chromium-try-flakes app. Since flakiness is ongoing, the issue was moved back into Sheriff Bug Queue (unless already there).
Cc: ivanpe@chromium.org
Components: Internals>CrashReporting
Labels: OS-Windows
Labels: -Sheriff-Chromium Infra-Troopers
I'm guessing this is something for the Trooper to look at:

common.chromium_utils.PathNotFound: Unable to find E:\b\build\slave\win\build\src\out\Release\crash_service.exe


Cc: manisca...@chromium.org pfeldman@chromium.org
Owner: ----
The issue had an owner, and thus didn't show properly in the trooper queue.

As today's trooper, I'll take a look, but my hunch is I won't be able to do much for a missing executable.
Owner: sergeybe...@chromium.org
Status: Assigned (was: Untriaged)
It appears the step ran successfully on win_chromium_rel_ng since Apr 4

E.g. https://build.chromium.org/p/tryserver.chromium.win/builders/win_chromium_rel_ng/builds/202696

It also hasn't failed since Apr 8 globally. The step seems to fail in bursts about once a week. Not sure what to do about it, since the logs are now gone. 

Sample Dremel query to confirm:

SELECT
  ANY(master) sample_master, ANY(builder) as sample_builder, ANY(build_number) as sample_build, COUNT(build_number) as num_builds, host_name,
  FORMAT_UTC_USEC(TIME_USEC_TO_DAY(build_sched_msec * 1000)) as timestamp_utc
FROM chrome_infra.completed_steps
WHERE
  step_name == 'start_crash_service'
  AND result IN ('INFRA_FAILURE', 'FAILURE')
  -- AND builder = 'win_chromium_rel_ng'
GROUP BY host_name, timestamp_utc, 
ORDER BY timestamp_utc DESC, num_builds DESC
LIMIT 1000
Labels: -Infra
Now that we have logdoc with a permanent log retention, I'd like to wait for a few more days to see another instance of a failure.
Labels: -Pri-1 Pri-3
Status: Fixed (was: Assigned)
This step should be gone now everywhere, I think (we're getting rid of crash_service as it's not actually needed).

Sign in to add a comment