Issue metadata
Sign in to add a comment
|
NOTREACHED in GlobalActivityTracker::RecordProcessLaunch |
||||||||||||||||||||||
Issue description
This is hitting in my local dcheck_always_on build:
void GlobalActivityTracker::RecordProcessLaunch(
ProcessId process_id,
const FilePath::StringType& cmd) {
const int64_t pid = process_id;
DCHECK_NE(GetProcessId(), pid);
DCHECK_NE(0, pid);
base::AutoLock lock(global_tracker_lock_);
if (base::ContainsKey(known_processes_, pid)) {
// TODO(bcwhite): Measure this in UMA.
NOTREACHED() << "Process #" << process_id <<< HERE
<< " was previously recorded as \"launched\""
<< " with no corresponding exit.";
known_processes_.erase(pid);
}
#if defined(OS_WIN)
known_processes_.insert(std::make_pair(pid, UTF16ToUTF8(cmd)));
#else
known_processes_.insert(std::make_pair(pid, cmd));
#endif
}
,
Aug 22 2017
What type of process was being launched that caused this CHECK?
,
Aug 22 2017
Issue 756149 has been merged into this issue.
,
Aug 23 2017
On deliberation, what might be happening here is that either events are being lost, or not synchronized. PIDs in Windows are aggressively reused, so it's generally unwise to key on PID alone. {PID|CreationTime} does however uniquely identify a process instance.
,
Aug 23 2017
This hasn't reproduced for my DCHECK build in a couple of hours of usage, don't think it need block shipping DCHECK.
,
Aug 24 2017
,
Oct 17 2017
This hits me every couple of days or so. See e.g. crash/e7582a26b9c65088.
,
Oct 17 2017
Do you have a way to reproduce it at will?
,
Oct 17 2017
Nopes, the check just hits every other day or so. You can repro locally with go/syzyasan-optin, then set chrome://flags/#dcheck-is-fatal to "Enabled" and run under "windbg -g -G -o". That'll drop you into a debugger when the DCHECK hits.
,
Oct 30 2017
,
Oct 30 2017
,
Oct 30 2017
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/e7f9a9155ea03d25da4e5269c9532fce9d5f77ac commit e7f9a9155ea03d25da4e5269c9532fce9d5f77ac Author: Brian White <bcwhite@chromium.org> Date: Mon Oct 30 14:53:47 2017 Include command line in error message. In order to find the process that isn't being cleaned up, it's necessary to know what the process is. Bug: 757946 Change-Id: I4c7d8cf139451fdfc194d6aa02d28ad8799dddc8 Reviewed-on: https://chromium-review.googlesource.com/726599 Reviewed-by: Sigurður Ásgeirsson <siggi@chromium.org> Commit-Queue: Brian White <bcwhite@chromium.org> Cr-Commit-Position: refs/heads/master@{#512493} [modify] https://crrev.com/e7f9a9155ea03d25da4e5269c9532fce9d5f77ac/base/debug/activity_tracker.cc
,
Oct 30 2017
Another failure: https://ci.chromium.org/buildbot/chromium.gpu.fyi/Win7%20Experimental%20Release%20%28NVIDIA%29/260 The test: WebglConformance_conformance2_textures_video_tex_3d_rgb565_rgb_unsigned_short_5_6_5 from the webgl2_conformance_tests suite failed. Stack trace: ******************************************************************************** Last event: 5c4.5a4: Break instruction exception - code 80000003 (first/second chance not available) debugger time: Mon Oct 30 08:15:17.764 2017 (UTC - 7:00) ChildEBP RetAddr Args to Child 06cde6ac 6b50d217 6d2da4fd 00000587 0c261e94 chrome!base::debug::BreakDebugger+0xc 06cde6d4 6b43ce9e 0294fc88 06cde720 06cde724 chrome!?Run@?$Invoker@U?$BindState@P6AXPBDHV?$BasicStringPiece@V?$basic_string@DU?$char_traits@D@std@@V?$allocator@D@2@@std@@@base@@1@Z$$V@internal@base@@$$A6AXPBDHV?$BasicStringPiece@V?$basic_string@DU?$char_traits@D@std@@V?$allocator@D@2@@std@@@3@1@Z@internal@base@@SAXPAVBindStateBase@23@$$QAPBD$$QAH$$QAV?$BasicStringPiece@V?$basic_string@DU?$char_traits@D@std@@V?$allocator@D@2@@std@@@3@3@Z+0x25 06cdeb68 6b4cdbe4 02610358 0297ffc8 000003b0 chrome!logging::LogMessage::~LogMessage+0x41e 06cdec70 6b49cbbc 00000978 06cdee04 084db101 chrome!base::debug::GlobalActivityTracker::RecordProcessLaunch+0x2f4 06cdedf0 6b49c57d 06cdeed8 06cdee04 06cdeef0 chrome!base::LaunchProcess+0x5ec 06cdee2c 6c06ba71 06cdeed8 08687820 06cdeef0 chrome!base::LaunchProcess+0x2d 06cdf10c 6aa9d5bd 08687820 06cdf180 06cdf1d4 chrome!service_manager::SandboxWin::StartSandboxedProcess+0x10b 06cdf1b0 6ad977bf 086d7750 08687820 06cdf1d4 chrome!content::StartSandboxedProcess+0x14c 06cdf2bc 6ad96ada 06cdf2f0 06cdf308 00000000 chrome!content::internal::ChildProcessLauncherHelper::LaunchProcessOnLauncherThread+0x19b 06cdf3cc 6b4fdf95 07114f68 00310fb3 6b44080d chrome!content::internal::ChildProcessLauncherHelper::LaunchOnLauncherThread+0x102 06cdf4f0 6b4f8e19 6d2e0b07 0c21bc00 01010170 chrome!base::debug::TaskAnnotator::RunTask+0xe5 06cdf62c 6b4f844b 0c21bc00 02666550 00000001 chrome!base::internal::TaskTracker::RunOrSkipTask+0x279 06cdf72c 6b507a93 06cdf758 02666550 02666500 chrome!base::internal::TaskTracker::RunNextTask+0x12b 06cdf820 6b458cd3 02692c60 0000042c 0000042c chrome!base::internal::SchedulerWorker::Thread::ThreadMain+0x263 *** WARNING: Unable to verify checksum for kernel32.dll *** ERROR: Symbol file could not be found. Defaulted to export symbols for kernel32.dll - 06cdf844 771b338a 026933c8 06cdf890 77849902 chrome!base::PlatformThread::GetCurrentThreadPriority+0x1d3 WARNING: Stack unwind information not available. Following frames may be wrong. 06cdf850 77849902 026933c8 7008d84c 00000000 kernel32!BaseThreadInitThunk+0x12 06cdf890 778498d5 6b458c30 026933c8 ffffffff ntdll!RtlInitializeExceptionChain+0x63 06cdf8a8 00000000 6b458c30 026933c8 00000000 ntdll!RtlInitializeExceptionChain+0x36 It's only an intermittent failure but is affecting random tests. To run these, build the telemetry_gpu_integration_test target (which has Chrome as a dependency) and run: ./content/test/gpu/run_gpu_integration_test.py webgl_conformance --browser=release --webgl-conformance-version=2.0.1 --test-filter=conformance2_textures_video_tex_3d_rgb565_rgb_unsigned_short_5_6_5 Alternatively, use --browser=debug if you have a debug build. It's unlikely to fail with just that one test. Perhaps use conformance2_textures_video as the filter, or conformance2_textures.
,
Nov 1 2017
Issue 780465 has been merged into this issue.
,
Nov 1 2017
This is now affecting the ANGLE project's commit queue, causing random test flakiness and preventing CLs from landing. See Issue 780465 . Recent failures: https://build.chromium.org/p/tryserver.chromium.angle/builders/win_angle_rel_ng/builds/7541 https://chromium-swarm.appspot.com/task?id=39908a6ee4dd5810&refresh=10&show_raw=1 https://build.chromium.org/p/tryserver.chromium.angle/builders/win_angle_rel_ng/builds/7538 https://chromium-swarm.appspot.com/task?id=399063b9cf126110&refresh=10&show_raw=1 https://build.chromium.org/p/tryserver.chromium.angle/builders/win_angle_rel_ng/builds/7536 https://chromium-swarm.appspot.com/task?id=39904a4522527810&refresh=10&show_raw=1 There is no way we can work around this. Upgrading this bug to P1. bcwhite@, please do something to make this stop crashing, whether it is downgrading the NOTREACHED to something else, improving the tracking, etc.
,
Nov 1 2017
Full error info: [1156:908:1101/081036.329:FATAL:activity_tracker.cc(1415)] Check failed: false. Process #2556 was previously recorded as "launched" with no corresponding exit. "c:\b\swarm_slave\w\ir\out\Release\chrome.exe" --type=gpu-process --field-trial-handle=1300,18379742025283163801,14681353244698629401,131072 --disable-gpu-sandbox --enable-logging=stderr --use-cmd-decoder=validating --noerrdialogs --user-data-dir="c:\b\swarm_slave\w\itsx5riw\tmpr9wmxh" --gpu-preferences=GAAAAAAAAAAQBwAAAQAAAAAAAAAAAGAA --gpu-vendor-id=0x1002 --gpu-device-id=0x6613 --gpu-driver-vendor="Advanced Micro Devices, Inc." --gpu-driver-version=21.19.137.1 --gpu-driver-date=9-16-2016 --gpu-secondary-vendor-ids=0x102b --gpu-secondary-device-ids=0x0534 --noerrdialogs --user-data-dir="c:\b\swarm_slave\w\itsx5riw\tmpr9wmxh" --enable-logging=stderr --service-request-channel-token=F8113125735ACD9F00D8540775107014 --mojo-platform-channel-handle=2472 /prefetch:2 Backtrace: base::debug::StackTrace::StackTrace [0x66B93350+32] base::debug::StackTrace::StackTrace [0x66B6DD0D+13] logging::LogMessage::~LogMessage [0x66B05BBE+78] base::debug::GlobalActivityTracker::RecordProcessLaunch [0x66B97274+884] base::LaunchProcess [0x66B6581C+1516] base::LaunchProcess [0x66B651DD+45] service_manager::SandboxWin::StartSandboxedProcess [0x677465E9+267] content::StartSandboxedProcess [0x6615D939+332] content::internal::ChildProcessLauncherHelper::LaunchProcessOnLauncherThread [0x66459B7F+411] content::internal::ChildProcessLauncherHelper::LaunchOnLauncherThread [0x66458E36+258] base::debug::TaskAnnotator::RunTask [0x66BC7745+229] base::internal::TaskTracker::RunOrSkipTask [0x66BC2758+664] base::internal::TaskTracker::RunNextTask [0x66BC1D6B+299] base::internal::SchedulerWorker::Thread::ThreadMain [0x66BD1353+611] base::PlatformThread::GetCurrentThreadPriority [0x66B21F83+467] BaseThreadInitThunk [0x75FA337A+18] RtlInitializeExceptionChain [0x775892B2+99] RtlInitializeExceptionChain [0x77589285+54]
,
Nov 2 2017
The flakiness induced by this assertion failure is causing CQ jobs to be retried multiple times, causing excessive load on the tryservers. This was part of the reason for a machine outage today. See Issue 780662 and Issue 780665. I can't stress how important a fix is for this. Please acknowledge that this is being worked on.
,
Nov 2 2017
Pretty sure I've got it nailed down: https://chromium-review.googlesource.com/c/chromium/src/+/750281
,
Nov 2 2017
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/ae2a8b9a9455eede005c731e053e22ab98a0f6d7 commit ae2a8b9a9455eede005c731e053e22ab98a0f6d7 Author: Brian White <bcwhite@chromium.org> Date: Thu Nov 02 19:10:36 2017 Record browser process exits in tracker. The tracker needs to be notified when a process exits but the Process object doesn't do that (and can't do that) on its own so it needs to be called where the termination status is fetched. It still makes sense to pipe this through the Process object just to keep things in one place and make it easier to be called from other places where necessary. Bug: 757946 Change-Id: I9e039bcd143277461e6b8323f1756e03a46f005b Reviewed-on: https://chromium-review.googlesource.com/750281 Reviewed-by: Mark Mentovai <mark@chromium.org> Reviewed-by: Antoine Labour <piman@chromium.org> Commit-Queue: Jamie Madill <jmadill@chromium.org> Cr-Commit-Position: refs/heads/master@{#513577} [modify] https://crrev.com/ae2a8b9a9455eede005c731e053e22ab98a0f6d7/base/process/process.h [modify] https://crrev.com/ae2a8b9a9455eede005c731e053e22ab98a0f6d7/base/process/process_fuchsia.cc [modify] https://crrev.com/ae2a8b9a9455eede005c731e053e22ab98a0f6d7/base/process/process_posix.cc [modify] https://crrev.com/ae2a8b9a9455eede005c731e053e22ab98a0f6d7/base/process/process_win.cc [modify] https://crrev.com/ae2a8b9a9455eede005c731e053e22ab98a0f6d7/content/browser/child_process_launcher.cc
,
Nov 2 2017
,
Nov 2 2017
Big thanks Brian for fixing this!
,
Nov 8 2017
FYI, just observed this in Albatross build 64.0.3261.1, crash id f740e36fb21b4cfb - I think the CL from comment #19 should be present already in this build? Might this be related to the issue of PID = 0u processes sometimes cropping up in Task Manager?
,
Nov 8 2017
And another instance (crash id abfd38180cd8ae9a) - this one was launching a NativeMessaging child process, FWIW.
,
Nov 8 2017
Both of these are the gnubbyagent process not being cleaned up. I'll take a look.
,
Nov 9 2017
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/d8cc0afb39ed0c1bf7e701d800c11d7a5d29d820 commit d8cc0afb39ed0c1bf7e701d800c11d7a5d29d820 Author: Brian White <bcwhite@chromium.org> Date: Thu Nov 09 12:01:45 2017 Record process exit from EnsureProcessTerminated. When a process is terminated, the tracker has to be notified so that it can be cleaned up. Bug: 757946 Change-Id: I1b894893b3c15228923f88573aa19d46faaa33c0 Reviewed-on: https://chromium-review.googlesource.com/759088 Reviewed-by: Mark Mentovai <mark@chromium.org> Commit-Queue: Brian White <bcwhite@chromium.org> Cr-Commit-Position: refs/heads/master@{#515144} [modify] https://crrev.com/d8cc0afb39ed0c1bf7e701d800c11d7a5d29d820/base/process/kill_posix.cc [modify] https://crrev.com/d8cc0afb39ed0c1bf7e701d800c11d7a5d29d820/base/process/kill_win.cc
,
Nov 9 2017
In case the cause of these is not clear... Chrome has common code for handling the launch of new processes but has no common code for reaping them. Thus, tracking code has to be added to the many places where this is done and it's easy to miss them. Please reopen if another case arises.
,
Nov 14 2017
crash/e1b6250cc508b5ec from 64.0.3267.1.
,
Nov 14 2017
This one is a Renderer. Renderer processes ARE reaped, though. Hmmm... It seems that a Renderer is created during startup, perhaps for the NTP, that dies without being reported. Other renderers are seen to exit, though. I wonder what's special about this first one. Now if VC2017 would just stop crashing all the time, perhaps I could figure this out...
,
Nov 14 2017
There's a race condition in Process::Terminate where if the process happens to exit before it can be terminated, the termination isn't recognized and the process is still believed (by that code) to be running and so never records the exit.
,
Nov 14 2017
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/ccb2c6e22528fb2926067b46c62eb2ee7e9af86e commit ccb2c6e22528fb2926067b46c62eb2ee7e9af86e Author: Brian White <bcwhite@chromium.org> Date: Tue Nov 14 19:09:44 2017 Fix race condition in Process::Terminate. Bug: 757946 Change-Id: Ia7e92e2a44f9f17f5bfae10a43ea05ccde74c9eb Reviewed-on: https://chromium-review.googlesource.com/768968 Commit-Queue: Brian White <bcwhite@chromium.org> Reviewed-by: Mark Mentovai <mark@chromium.org> Cr-Commit-Position: refs/heads/master@{#516374} [modify] https://crrev.com/ccb2c6e22528fb2926067b46c62eb2ee7e9af86e/base/process/process_win.cc
,
Nov 14 2017
Another one down. Re-open if another one pops up.
,
Nov 16 2017
Report Id 37f0d220efded90f appears to be a renderer Pid being re-used; looks like that build 64.0.3269.1 should include the patch from #30.
,
Nov 16 2017
I also see report id 8c1a097ac8f909b6 in my chrome://crashes, for this signature.
,
Nov 22 2017
More crash Ids: 0400d024e8cb955b and 39bf42c855e593c1, in Chrome 64.0.3274.1.
,
Nov 22 2017
,
Nov 27 2017
And another (Id 45bbaf5f52eb3489) - this and the ones from #34 were all missed renderer teardowns.
,
Dec 1 2017
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/6c8f3b9c3906cd4d9b96d264277d52b2c47e7564 commit 6c8f3b9c3906cd4d9b96d264277d52b2c47e7564 Author: Brian White <bcwhite@chromium.org> Date: Fri Dec 01 18:55:31 2017 Add timeout waiting for exit of process. Turns out that there are cases where a process has started exiting (and thus can't be terminated) but still isn't ready to be reaped. Thus, a wait timeout is required in this case as well. Bug: 757946 Change-Id: Iaf79092a991200c14d321963d9a20bfc191867bc Reviewed-on: https://chromium-review.googlesource.com/803837 Reviewed-by: Mark Mentovai <mark@chromium.org> Commit-Queue: Brian White <bcwhite@chromium.org> Cr-Commit-Position: refs/heads/master@{#521001} [modify] https://crrev.com/6c8f3b9c3906cd4d9b96d264277d52b2c47e7564/base/process/process_win.cc
,
Dec 5 2017
The last CL went out with the first M65 build. Any new reports can be found with this: https://crash.corp.google.com/browse?q=product.Version%3E%3D%2765.0%27%20AND%20product.Version%20LIKE%20%27__.%25.%25.%25%27%20AND%20product.name%20CONTAINS%20%27Chrome%27%20AND%20custom_data.ChromeCrashProto.magic_signature_1.name%20CONTAINS%20%27RecordProcessLaunch%27&sql_dialect=dremelsql&ignore_case=false&enable_rewrite=true&omit_field_name=&omit_field_value=&omit_field_opt=%3D
,
Dec 6 2017
Another hit me: crash/b1f9ab898ef97a0f. This is SyzyASAN version 64.0.3282.1, unfortunately the SyzyASAN build is broken ATM.
,
Dec 6 2017
Only M65 is invulnerable. (It is "inconceivable" that any more crashes will occur with M65 or above. :-)
,
Dec 7 2017
,
Dec 14 2017
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/c3bff5937cbf29ecc6ea51f189446bce4e2cff9d commit c3bff5937cbf29ecc6ea51f189446bce4e2cff9d Author: Brian White <bcwhite@chromium.org> Date: Thu Dec 14 17:57:52 2017 Track process only if launch is successful. If the process launch fails it ends up trying to track PID #0 which causes a DCHECK. Bug: 757946 Change-Id: If4558b1707aad237273127365c5b5be64d32f327 Reviewed-on: https://chromium-review.googlesource.com/826085 Reviewed-by: Will Harris <wfh@chromium.org> Reviewed-by: Mark Mentovai <mark@chromium.org> Commit-Queue: Will Harris <wfh@chromium.org> Commit-Queue: Brian White <bcwhite@chromium.org> Cr-Commit-Position: refs/heads/master@{#524111} [modify] https://crrev.com/c3bff5937cbf29ecc6ea51f189446bce4e2cff9d/services/service_manager/sandbox/win/sandbox_win.cc
,
Dec 14 2017
This last fix was for a related but DIFFERENT crash. So far, no more of the "inconceivable" type. New query for monitoring crashes: https://crash.corp.google.com/browse?q=product.Version>%3D%2765.0.3295%27%20AND%20product.Version%20LIKE%20%27__.%25.%25.%25%27%20AND%20product.name%20CONTAINS%20%27Chrome%27%20AND%20custom_data.ChromeCrashProto.magic_signature_1.name%20CONTAINS%20%27RecordProcessLaunch%27&sql_dialect=dremelsql&ignore_case=false&enable_rewrite=true&omit_field_name=&omit_field_value=&omit_field_opt=%3D |
|||||||||||||||||||||||
►
Sign in to add a comment |
|||||||||||||||||||||||
Comment 1 by siggi@chromium.org
, Aug 22 2017