wizpig-paladin , winky-paladin failed: login_LoginSuccess: Timed out waiting for condition: Session stopped. |
||||||||||||||||
Issue descriptionFiled by sheriff-o-matic@appspot.gserviceaccount.com on behalf of tfiga@chromium.org wizpig-paladin:3846 failed login_LoginSuccess: retry_count: 2, ERROR: Timed out waiting for condition: Session stopped. Builders failed on: - wizpig-paladin: https://cros-goldeneye.corp.google.com/chromeos/healthmonitoring/buildDetails?buildbucketId=8937626561988762208 Same failure on 3844 and 3845: https://luci-milo.appspot.com/buildbot/chromeos/wizpig-paladin/3844 https://luci-milo.appspot.com/buildbot/chromeos/wizpig-paladin/3845 08/20 12:00:16.622 INFO | cros_interface:0575| (Re)starting the ui (logs the user out) 08/20 12:00:16.624 DEBUG| global_hooks:0056| ['sh', '-c', 'systemctl'] 08/20 12:00:16.639 DEBUG| global_hooks:0056| ['sh', '-c', 'systemctl'] 08/20 12:00:16.653 DEBUG| global_hooks:0056| ['sh', '-c', 'status ui'] 08/20 12:00:16.672 DEBUG| cros_interface:0454| IsServiceRunning(ui)->True 08/20 12:00:16.673 DEBUG| cros_interface:0058| sh -c restart ui 08/20 12:00:16.673 DEBUG| global_hooks:0056| ['sh', '-c', 'restart ui'] 08/20 12:00:39.979 DEBUG| cros_interface:0067| > stdout=[ui start/running, process 22584 ], stderr=[] 08/20 12:00:39.980 DEBUG| cros_interface:0058| sh -c cryptohome-path user 'test@test.test' 08/20 12:00:39.980 DEBUG| global_hooks:0056| ['sh', '-c', "cryptohome-path user 'test@test.test'"] 08/20 12:00:40.016 DEBUG| cros_interface:0067| > stdout=[/home/user/580bcfa450e2f41f4333b0e8067f8fd279423bfa ], stderr=[] 08/20 12:00:40.018 DEBUG| cros_interface:0058| sh -c /bin/df --output=source,target /run/cryptohome/ephemeral_mount/580bcfa450e2f41f4333b0e8067f8fd279423bfa 08/20 12:00:40.018 DEBUG| global_hooks:0056| ['sh', '-c', '/bin/df --output=source,target /run/cryptohome/ephemeral_mount/580bcfa450e2f41f4333b0e8067f8fd279423bfa'] 08/20 12:00:40.039 DEBUG| cros_interface:0067| > stdout=[], stderr=[df: /run/cryptohome/ephemeral_mount/580bcfa450e2f41f4333b0e8067f8fd279423bfa: No such file or directory ] 08/20 12:00:40.040 DEBUG| cros_interface:0058| sh -c cryptohome-path user 'test@test.test' 08/20 12:00:40.040 DEBUG| global_hooks:0056| ['sh', '-c', "cryptohome-path user 'test@test.test'"] 08/20 12:00:40.058 DEBUG| cros_interface:0067| > stdout=[/home/user/580bcfa450e2f41f4333b0e8067f8fd279423bfa ], stderr=[] 08/20 12:00:40.059 DEBUG| cros_interface:0058| sh -c /bin/df --output=source,target /home/user/580bcfa450e2f41f4333b0e8067f8fd279423bfa 08/20 12:00:40.059 DEBUG| global_hooks:0056| ['sh', '-c', '/bin/df --output=source,target /home/user/580bcfa450e2f41f4333b0e8067f8fd279423bfa'] 08/20 12:00:40.080 DEBUG| cros_interface:0067| > stdout=[Filesystem Mounted on /dev/mmcblk0p1 /home ], stderr=[] 08/20 12:00:40.081 INFO | browser:0221| Browser is closed. 08/20 12:01:40.023 ERROR| utils:2767| Timed out waiting for condition: Session stopped. 08/20 12:01:40.034 DEBUG| test:0381| Test failed due to Timed out waiting for condition: Session stopped.. Exception log follows the after_iteration_hooks.
,
Aug 21
Same failure also happened on winky-paladin: https://luci-milo.appspot.com/buildbot/chromeos/winky-paladin/5891 https://luci-milo.appspot.com/buildbot/chromeos/winky-paladin/5892 https://luci-milo.appspot.com/buildbot/chromeos/winky-paladin/5893
,
Aug 21
I don't see how I am the right owner for this bug.
This should be handled initially by the chrome gardner or the sheriff to figure out which component of ChromeOS is at fault.
The initial report should at least contain the stack trace reported in the test logs:
Traceback (most recent call last):
File "/usr/local/autotest/common_lib/test.py", line 600, in _exec
_call_test_function(self.execute, *p_args, **p_dargs)
File "/usr/local/autotest/common_lib/test.py", line 800, in _call_test_function
return func(*args, **dargs)
File "/usr/local/autotest/common_lib/test.py", line 464, in execute
postprocess_profiled_run, args, dargs)
File "/usr/local/autotest/common_lib/test.py", line 371, in _call_run_once
self.run_once(*args, **dargs)
File "/usr/local/autotest/tests/login_LoginSuccess/login_LoginSuccess.py", line 58, in run_once
timeout=self._SESSION_STOP_TIMEOUT)
File "/usr/local/autotest/common_lib/cros/session_manager.py", line 55, in wait_for_signals
timeout=timeout)
File "/usr/local/autotest/common_lib/utils.py", line 2768, in poll_for_condition
raise TimeoutError(message=desc)
TimeoutError: Timed out waiting for condition: Session stopped.
Next step is to figure out from chrome / ui/ arc++ logs why the session was stopped on the DUT.
This is a ChromeOS bug.
,
Aug 22
+other sheriffs Prathmesh, I'm the sheriff this week and you're the person who modified this test last time. If you know a better way of finding the right people to investigate test failures, please let me know. (I missed achuith@ among the other, inactive authors of this test, though. Some tests don't even seem to have any active authors anymore.) Also, the trace is just a generic timeout trace from autotest and is not very useful, so I went with direct log from the test on the DUT. achuith@, among the 3 authors listed in control files, you're the only one still in Chrome OS. Do you think you could take a look at this?
,
Aug 22
I don't think it's a problem with the test. Looks like the session manager died? As noted above, the sheriff should handle this since it seems like a regression. Did you look at the logs to see what's happening? Is this a reliable failure that can be bisected?
,
Aug 22
Fixing regressions is not a part of sheriff's responsibilities... I can quickly look through the list of CLs, but I have completely no idea about session manager or this particular test, so if there isn't anything obvious, I can't do too much. The main responsibility of the sheriff is to catch failures, file bugs with any potentially useful details picked from the builders and delegate to appropriate owners. +derat@ and apronin@, since they seem to have made some progress on investigating another session manager issue 876197 .
,
Aug 22
https://storage.cloud.google.com/chromeos-autotest-results/229433642-chromeos-test/chromeos6-row2-rack19-host1/login_LoginSuccess/sysinfo/var/log_diff/messages I see messages like: 2018-08-20T18:15:14.469112+00:00 WARNING crash_reporter[7824]: [user] Received crash notification for chrome[7278] sig 6, user 1000 group 1000 (ignoring call by kernel - chrome crash; waiting for chrome to call us directly) 2018-08-20T18:15:14.472614+00:00 WARNING crash_reporter[7825]: Could not load the device policy file. 2018-08-20T18:15:14.472854+00:00 WARNING crash_reporter[7825]: [user] Received crash notification for chrome[7232] sig 6, user 1000 group 1000 (ignoring call by kernel - chrome crash; waiting for chrome to call us directly) I don't know if this is a problem with chrome crashing?
,
Aug 22
My usual approach is to delete flaky tests that nobody wants to own, or at least remove them from bvt-inline. I'm not familiar with this test (achuith@ is the lucky person left in its AUTHOR attribute... :-P), but it looks like it logs in, starts listening for SessionStateChanged D-Bus signals, and then logs out and hopes to see a "stopped" signal. Here's the failure from login_LoginSuccess.INFO in http://stainless/browse/chromeos-autotest-results/229425820-chromeos-test/: 08/20 11:10:58.623 INFO | browser:0208| Closing browser (pid=6498) ... 08/20 11:10:58.626 INFO | cros_interface:0575| (Re)starting the ui (logs the user out) 08/20 11:11:20.630 INFO | browser:0221| Browser is closed. 08/20 11:12:20.581 ERROR| utils:2767| Timed out waiting for condition: Session stopped. And here's what the messages log says: ... 2018-08-20T18:10:58.750307+00:00 INFO session_manager[6419]: [INFO:session_manager_impl.cc(388)] Emitting D-Bus signal SessionStateChanged:stopping 2018-08-20T18:10:58.752037+00:00 INFO session_manager[6419]: [INFO:browser_job.cc(167)] Terminating browser process 6498 with signal 15: exiting cleanly 2018-08-20T18:10:58.752616+00:00 INFO session_manager[6419]: [INFO:system_utils_impl.cc(94)] Sending 15 to 6498 as 1000 2018-08-20T18:10:58.752958+00:00 INFO session_manager[6419]: [INFO:system_utils_impl.cc(94)] Sending 15 to 6685 as 1000 2018-08-20T18:10:58.909715+00:00 INFO chrome[6498]: [6498:6498:0820/111058.909620:INFO:remote_commands_invalidator.cc(45)] Shutdown RemoteCommandsInvalidator. 2018-08-20T18:10:58.915966+00:00 INFO chrome[6498]: [6498:6498:0820/111058.915896:INFO:remote_commands_invalidator.cc(68)] Stopping RemoteCommandsInvalidator. 2018-08-20T18:10:58.916179+00:00 INFO chrome[6498]: [6498:6498:0820/111058.916142:INFO:remote_commands_invalidator.cc(167)] Unregister RemoteCommandsInvalidator. 2018-08-20T18:11:18.771862+00:00 WARNING kernel: [ 1386.003086] init: ui main process (6419) killed by KILL signal So it looks like the "stopping" state is sent, but then session_manager receives SIGKILL before it emits "stopped". And SIGKILL comes 20 seconds after Chrome gets SIGTERM, which seems like way too long. void SessionManagerService::CleanupChildren(base::TimeDelta timeout, ExitCode code) { std::string reason = ExitCodeToString(code); browser_->Kill(SIGTERM, reason); key_gen_.RequestJobExit(reason); android_container_->RequestJobExit( code == ExitCode::SUCCESS ? ArcContainerStopReason::SESSION_MANAGER_SHUTDOWN : ArcContainerStopReason::BROWSER_SHUTDOWN); browser_->WaitAndAbort(timeout); key_gen_.EnsureJobExit(timeout); } The "Terminating browser process 6498 with signal 15" comes from browser_->Kill, so we at least made it that far.
,
Aug 22
So, same thing as https://crbug.com/876197#c4 ? Occasionally chrome/session_manager takes too long to properly shutdown, and is force-killed instead. Sounds like an actual issue (not sure how important, though), which that test helped to uncover. The waiting-for-stopped logic also looked sound after a quick glance (it first starts to wait and then asks to shutdown the session, so no race). Maybe don't delete the test yet? :)
,
Aug 22
Here's the code that calls CleanupChildren:
void SessionManagerService::SetExitAndScheduleShutdown(ExitCode code) {
shutting_down_ = true;
exit_code_ = code;
impl_->AnnounceSessionStoppingIfNeeded();
child_exit_dispatcher_.reset();
liveness_checker_->Stop();
CleanupChildren(GetKillTimeout(), code);
impl_->AnnounceSessionStopped();
brillo::MessageLoop::current()->PostTask(
FROM_HERE, base::Bind(&brillo::MessageLoop::BreakLoop,
base::Unretained(brillo::MessageLoop::current())));
LOG(INFO) << "SessionManagerService quitting run loop";
}
AnnounceSessionStopped logs a "Emitting D-Bus signal SessionStateChanged:stopped" message that we didn't see here (and it not running is the reason for the test failure), so it looks like session_manager got stuck in CleanupChildren. Looking through CleanupChildren:
- BrowserJob::Kill doesn't block.
- KeyGenerator::RequestJobExit doesn't block.
- AndroidOciWrapper::RequestJobExit runs "/usr/bin/run_oci --signal=KILL kill <id>" synchronously. From the logs, I don't think that ARC was even running here, so AndroidOciWrapper::RequestJobExit seems unlikely to be the culprit.
- BrowserJob::WaitAndAbort at least attempts to block for no more than 3 seconds. If it returned, it would've logged either "Cleaned up browser process" or "Aborting browser process", so I think it's safe to say we didn't get past here.
Here's WaitAndAbort:
void BrowserJob::WaitAndAbort(base::TimeDelta timeout) {
const pid_t pid = subprocess_->GetPid();
if (pid < 0)
return;
if (!system_->ProcessGroupIsGone(pid, timeout)) {
LOG(WARNING) << "Aborting browser process " << pid << "'s process group "
<< timeout.InSeconds() << " seconds after sending signal";
std::string message = base::StringPrintf("Browser took more than %" PRId64
" seconds to exit after signal.",
timeout.InSeconds());
KillEverything(SIGABRT, message);
} else {
DLOG(INFO) << "Cleaned up browser process " << pid;
}
}
SystemUtilsImpl::ProcessGroupIsGone seems like the likely culprit since it's the only blocking operation before the log messages that we didn't see. This method is tricky and has had at least one race in the past ( issue 754076 ).
This comment is long enough, so I'll spend some time staring at the code.
,
Aug 23
Seems to be happening on other boards too. wolf-paladin: https://luci-milo.appspot.com/buildbot/chromeos/wolf-paladin/18609 cave-paladin: https://luci-milo.appspot.com/buildbot/chromeos/cave-paladin/4527 link-paladin: https://luci-milo.appspot.com/buildbot/chromeos/link-paladin/32599
,
Aug 23
I uploaded https://crrev.com/c/1186107 to make session_manager not call waitpid() so quickly and to improve its logging. I couldn't find any bugs in ProcessGroupIsGone that would result in it hanging, though. Additional eyes on the code might help. And note that in addition to session_manager apparently hanging instead of aborting Chrome, it may be the case that Chrome didn't exit when it received SIGTERM.
,
Aug 23
Looking at http://stainless/browse/chromeos-image-archive/betty-paladin/R70-10993.0.0-rc3/vm_test_results_1/smoke/test_harness/all/SimpleTestVerify/1_autotest_tests/results-33-login_LoginSuccess/ (the betty-paladin run from #12), the timestamps in the messages log that Autotest collected don't even line up with the test failure: 08/22 23:35:18.537 INFO | cros_interface:0575| (Re)starting the ui (logs the user out) 08/22 23:35:41.166 INFO | browser:0221| Browser is closed. 08/22 23:36:41.166 ERROR| utils:2767| Timed out waiting for condition: Session stopped. Whereas messages goes from 2018-08-23T03:52:05.756243+00:00 to 2018-08-23T04:03:32.051352+00:00. Hard to say for sure that it's the same cause.
,
Aug 29
The following revision refers to this bug: https://chromium.googlesource.com/chromiumos/platform2/+/4ce3b598d951eb4082eade187e9704e402afc4da commit 4ce3b598d951eb4082eade187e9704e402afc4da Author: Daniel Erat <derat@chromium.org> Date: Wed Aug 29 18:15:25 2018 login: Sleep for 10 ms between waitpid() calls. Make session_manager's SystemUtilsImpl::Wait method sleep for 10 milliseconds between waitpid calls instead of calling base::PlatformThread::YieldCurrentThread. Using YieldCurrentThread can result in hundreds of thousands of waitpid calls while Chrome is exiting, which seems unnecessary at best. Also make logging clearer to help in debugging hang-during-signout issues: log a new "Waiting up to 3 seconds for ___'s process group to exit" message just before waiting for Chrome to exit, and remove a "Will wait 3s for graceful browser exit." message that was confusingly logged at startup. BUG= chromium:876196 TEST=manual: normal chrome-initiated signout still works; also verified that session_manager still sends SIGABRT chrome when i send SIGSTOP to chrome and then make a StopSession d-bus method call to session_manager Change-Id: I5d8a8700032c6fe8deae613615f7d4a54b3a458f Reviewed-on: https://chromium-review.googlesource.com/1186107 Commit-Ready: Dan Erat <derat@chromium.org> Tested-by: Dan Erat <derat@chromium.org> Reviewed-by: Yusuke Sato <yusukes@chromium.org> [modify] https://crrev.com/4ce3b598d951eb4082eade187e9704e402afc4da/login_manager/session_manager_main.cc [modify] https://crrev.com/4ce3b598d951eb4082eade187e9704e402afc4da/login_manager/system_utils_impl.cc [modify] https://crrev.com/4ce3b598d951eb4082eade187e9704e402afc4da/login_manager/system_utils.h [modify] https://crrev.com/4ce3b598d951eb4082eade187e9704e402afc4da/login_manager/browser_job.cc
,
Sep 3
I don't see any of these failures on -release builders after R70-10993.0.0-rc4 on 2018-08-23: https://stainless.corp.google.com/search?view=matrix&row=model&col=build&first_date=2018-08-21&last_date=2018-09-03&builder_name=-paladin%24&test=^login_LoginSuccess%24&status=GOOD&status=WARN&status=FAIL&status=ERROR&exclude_cts=false&exclude_not_run=false&exclude_non_release=false&exclude_au=true&exclude_acts=true&exclude_retried=true&exclude_non_production=true #15 didn't go in until 08-29, so I'm not sure that it could've been what helped. These failures look like they were happening frequently before, though, and there haven't been any in the last 10 days, so I'm closing this out.
,
Sep 13
We're having a rash of these again in the CQ right now.
,
Sep 13
Can you link to some of the failures here?
,
Sep 13
,
Sep 13
I may have missed some, but recent builds from wolf, winky, and wizpig (is this a w* issue?!) are failing. Most recent failures wolf: https://cros-goldeneye.corp.google.com/chromeos/healthmonitoring/buildDetails?buildbucketId=8935493585021104320 winky: https://cros-goldeneye.corp.google.com/chromeos/healthmonitoring/buildDetails?buildbucketId=8935493586137712544 wizpig: https://cros-goldeneye.corp.google.com/chromeos/healthmonitoring/buildDetails?buildbucketId=8935493585566160640
,
Sep 13
auron_yuna and cyan show the same *pattern* of failing builds, but a different failure: cheets_Midis: retry_count: 2, FAIL: midis didn't restart.
,
Sep 13
Thanks. The logs for wolf appear to be missing, but here's what I see in the winky logs at http://stainless/browse/chromeos-autotest-results/237499877-chromeos-test/ : 09/13 09:11:22.821 INFO | browser:0208| Closing browser (pid=24902) ... 09/13 09:11:22.839 INFO | cros_interface:0575| (Re)starting the ui (logs the user out) 09/13 09:11:44.791 INFO | browser:0221| Browser is closed. 09/13 09:12:44.745 ERROR| utils:2769| Timed out waiting for condition: Session stopped. 2018-09-13T16:11:22.993407+00:00 INFO session_manager[24839]: [INFO:session_manager_impl.cc(383)] Emitting D-Bus signal SessionStateChanged:stopping 2018-09-13T16:11:22.994715+00:00 INFO session_manager[24839]: [INFO:browser_job.cc(167)] Terminating browser process 24902 with signal 15: exiting cleanly 2018-09-13T16:11:22.994904+00:00 INFO session_manager[24839]: [INFO:system_utils_impl.cc(94)] Sending 15 to 24902 as 1000 2018-09-13T16:11:22.998046+00:00 INFO session_manager[24839]: [INFO:system_utils_impl.cc(94)] Sending 15 to 25105 as 1000 2018-09-13T16:11:22.998254+00:00 INFO session_manager[24839]: [INFO:browser_job.cc(177)] Waiting up to 60 seconds for 24902's process group to exit 2018-09-13T16:11:23.207940+00:00 INFO chrome[24902]: [24902:24902:0913/091123.207861:INFO:remote_commands_invalidator.cc(45)] Shutdown RemoteCommandsInvalidator. 2018-09-13T16:11:23.208054+00:00 INFO chrome[24902]: [24902:24902:0913/091123.208018:INFO:remote_commands_invalidator.cc(68)] Stopping RemoteCommandsInvalidator. 2018-09-13T16:11:23.208120+00:00 INFO chrome[24902]: [24902:24902:0913/091123.208085:INFO:remote_commands_invalidator.cc(167)] Unregister RemoteCommandsInvalidator. 2018-09-13T16:11:23.509709+00:00 WARNING cryptohomed[1266]: RemoveKeyset: key to remove not found 2018-09-13T16:11:43.021102+00:00 WARNING kernel: [ 1658.399765] init: ui main process (24839) killed by KILL signal So if I'm reading this correctly: a) The test waits a minute for the SessionStateChanged "stopped" D-Bus signal. b) session_manager gives Chrome's process group a minute to exit. c) init sends KILL to session_manager 20 seconds after it's been instructed to stop. The delay for b) looks like it's coming from session_manager_service.cc: // TODO(mkrebs): Remove CollectChrome timeout and file when // crosbug.com/5872 is fixed. // When crash-reporter based crash reporting of Chrome is enabled // (which should only be during test runs) we use // kKillTimeoutCollectChrome instead of the kill timeout specified at // the command line. const int SessionManagerService::kKillTimeoutCollectChrome = 60; const char SessionManagerService::kCollectChromeFile[] = "/mnt/stateful_partition/etc/collect_chrome_crashes"; ... base::TimeDelta SessionManagerService::GetKillTimeout() { if (base::PathExists(base::FilePath(kCollectChromeFile))) return base::TimeDelta::FromSeconds(kKillTimeoutCollectChrome); else return kill_timeout_; } --- https://crbug.com/187808 (the bug mentioned in that TODO) looks like it's been closed since 2013, so maybe that code can be deleted now so that session_manager uses a reasonable delay before killing Chrome?
,
Sep 13
Deleting old buggy code is my favorite thing, so I'll take this.
,
Sep 13
The midis restart failure looks plausibly related: https://chromium.googlesource.com/chromiumos/third_party/autotest/+/master/client/site_tests/cheets_Midis/cheets_Midis.py?pli=1#70 derat@: Can you take a look at this as well? I can imagine removing the timeout would help here too.
,
Sep 13
I will happily hope that fixing the session_manager timeout will fix other timeout-related failures as well, but if it doesn't, someone who knows something about midis is going to need to investigate that one. :-)
,
Sep 13
Code spelunking: the one-minute delay in session_manager dates back to 2011, in https://crrev.com/c/5289. It was tracked by https://crbug.com/201001 . I haven't connected all of the dots here yet, but it seems like the one-minute delay was initially meant to be used in cases where crash-reporter is responsible for reporting Chrome crashes. //src/third_party/autotest/files/client/common_lib/cros/chrome.py still touches collect_chrome_crashes (so that core files will be generated when Chrome crashes, I think). Was the delay added because Chrome needs more time to write those files? It doesn't seem useful for session_manager to wait a minute for Chrome if session_manager is going to be killed after 20 seconds. Unless anyone tells me not to, I'm going to reduce SessionManagerService::kKillTimeoutCollectChrome to 12 seconds so it'll complete before session_manager is killed. I'm choosing 12 to match kKillTimeoutLongSeconds in session_manager_main.cc, which we use as the timeout on systems with rotating disks (instead of the default, 3 seconds). I'll reiterate that the core problem is probably that Chrome isn't exiting properly.
,
Sep 13
when collect_chrome_crashes is active, then yes, coredumps will be collected & saved for Chrome (vs just being read in memory & discarded) which can take sometime to read+write to disk. the exact mechanics don't matter that much (who exactly is doing the collection) as the files will be large regardless. if we're seeing early timeouts on bots, then imo that just says the timeouts are too low here. make them like 5min when testing. or we turn off the coredump collection ... i'm dubious how useful these actually are to test systems the majority of the time beyond the existing minidump crash reports.
,
Sep 13
Chrome doesn't appear to have crashed in the logs that I quoted above, so I'm not convinced that these delays should be increased or that disabling core dump collection would help. I think the important thing here is that session_manager kills Chrome before Upstart kills session_manager, and it's easier to tweak the delay on the session_manager side than on the Upstart side.
,
Sep 13
maybe we could improve things whereby crash-reporter would create a flag file when it's actively collecting a crash, and session manager (in general) could look for that in its shutdown phase. if it sees the file, it would increase its gracetime significantly. maybe we create /run/crash_reporter/collection-inprogress/ and session manager can simply check if the dir is missing/empty.
,
Sep 13
> I'll reiterate that the core problem is probably that Chrome isn't exiting properly. derat@: Are you working on the underlying issue as well?
,
Sep 13
,
Sep 13
Nope, and I don't even know if there's information that could be used to investigate that right now. The Chrome log in the results dir I linked to above just ends with this: [24902:24902:0913/091123.207861:INFO:remote_commands_invalidator.cc(45)] Shutdown RemoteCommandsInvalidator. [24902:24902:0913/091123.208018:INFO:remote_commands_invalidator.cc(68)] Stopping RemoteCommandsInvalidator. [24902:24902:0913/091123.208085:INFO:remote_commands_invalidator.cc(167)] Unregister RemoteCommandsInvalidator. We might get a Chrome crash that someone can look at after I reduce the timeout, but if so, please point the gardener at it.
,
Sep 14
Jumping in as on-call today, crbug.com/883875 (blocked by this bug) is blocking CQ. I've marked it P0. Any updates on this bug?
,
Sep 14
The following revision refers to this bug: https://chromium.googlesource.com/chromiumos/platform2/+/11f7b9712f6807e31087d10e7da2e29287385f3e commit 11f7b9712f6807e31087d10e7da2e29287385f3e Author: Daniel Erat <derat@chromium.org> Date: Fri Sep 14 19:08:47 2018 login: Reduce collect_chrome_crashes timeout to 12 seconds. Change the time that session_manager gives Chrome to exit when /mnt/stateful_partition/etc/collect_chrome_crashes is set to be 12 seconds rather than a minute. The ui Upstart job specifies that session_manager should be sent SIGKILL after 20 seconds, and the longer delay in session_manager appears to be resulting in some Autotest tests failing when session_manager is killed before it can emit a SessionStateChanged "stopped" D-Bus signal while waiting for a hanging Chrome process. BUG= chromium:876196 TEST=none Change-Id: I1239cd41e381a926f97adcfb088084ef788959ef Reviewed-on: https://chromium-review.googlesource.com/1224644 Commit-Ready: Dan Erat <derat@chromium.org> Tested-by: Dan Erat <derat@chromium.org> Reviewed-by: Jorge Lucangeli Obes <jorgelo@chromium.org> Reviewed-by: Mike Frysinger <vapier@chromium.org> [modify] https://crrev.com/11f7b9712f6807e31087d10e7da2e29287385f3e/login_manager/session_manager_service.cc [modify] https://crrev.com/11f7b9712f6807e31087d10e7da2e29287385f3e/login_manager/session_manager_service.h
,
Sep 17
No failures on -paladin since R71-11062.0.0-rc1, as far as I can see: https://stainless.corp.google.com/search?view=matrix&row=model&col=build&builder_name=-paladin%24&test=%5Elogin_LoginSuccess%24&status=GOOD&status=WARN&status=FAIL&status=ERROR&exclude_cts=false&exclude_not_run=true&exclude_non_release=false&exclude_au=true&exclude_acts=true&exclude_retried=true&exclude_non_production=false&days=7 The only failures on -release since Sept 11 were due to guado_moblab Python import bugginess, not this error: https://stainless.corp.google.com/search?view=matrix&row=model&col=build&builder_name=-release%24&test=%5Elogin_LoginSuccess%24&status=GOOD&status=WARN&status=FAIL&status=ERROR&exclude_cts=false&exclude_not_run=true&exclude_non_release=false&exclude_au=true&exclude_acts=true&exclude_retried=true&exclude_non_production=false&days=7 |
||||||||||||||||
►
Sign in to add a comment |
||||||||||||||||
Comment 1 by tfiga@chromium.org
, Aug 21Labels: OS-Chrome
Owner: pprabhu@chromium.org