chrome_all_tast_tests fails due to disk/file issues in VM |
||||||
Issue descriptionPer https://crbug.com/909736#c23: In the top failure at https://findit-for-me.appspot.com/flake/occurrences?key=ag9zfmZpbmRpdC1mb3ItbWVyPwsSBUZsYWtlIjRjaHJvbWl1bUBjaHJvbWVfYWxsX3Rhc3RfdGVzdHNAY2hyb21lX2FsbF90YXN0X3Rlc3RzDA, ui.SingleProcessMashLogin failed in https://ci.chromium.org/p/chromium/builders/luci.chromium.try/chromeos-amd64-generic-rel/147906: 2018/12/07 14:21:37 Removing cryptohome for testuser@gmail.com ... 2018/12/07 14:23:31 Error: [single_process_mash_login.go:26] Chrome login failed: OOBE not dismissed: browser process 6289 replaced by 6319; Chrome probably crashed Looking at the corresponding Chrome log file, it looks like the VM ran out of space: [6289:6340:1207/142203.052733:WARNING:important_file_writer.cc(97)] temp file failure: /home/chronos/Default/Preferences : error flushing: Input/output error (5) [6289:6344:1207/142203.084488:ERROR:model_type_store_service_impl.cc(34)] Failed to initialize ModelTypeStore backend: Init@../../components/sync/model_impl/model_type_store_backend.cc:99: IO error: /home/chronos/u-f5e09967f7a158e1c1085cb51f7d3d8007c8088b/Sync Data/LevelDB/MANIFEST-000001: FILE_ERROR_NO_SPACE (ChromeMethodBFE: 3::WritableFileAppend::8) [6289:6341:1207/142202.822680:FATAL:database.cc(1821)] database or disk is full #0 0x5a574ee3acbf base::debug::StackTrace::StackTrace() #1 0x5a574ed8f2bb logging::LogMessage::~LogMessage() #2 0x5a5750282c41 sql::Database::OnSqliteError() #3 0x5a575027fb16 sql::Database::Execute() #4 0x5a5750285ded sql::MetaTable::Init() #5 0x5a575139f866 password_manager::LoginDatabase::Init() #6 0x5a5751384c55 password_manager::PasswordStoreDefault::InitOnBackgroundSequence() #7 0x5a574eac2c28 _ZN4base8internal7InvokerINS0_9BindStateIMN24DeviceMediaAsyncFileUtil22MediaPathFilterWrapperEFbRKNS_8FilePathEEJ13scoped_refptrIS4_ES5_EEEFbvEE7RunOnceEPNS0_13BindStateBaseE #8 0x5a574b09c9a0 base::internal::ReturnAsParamAdapter<>() #9 0x5a574b09cb37 _ZN4base8internal7InvokerINS0_9BindStateIPFvNS_12OnceCallbackIFbvEEEPNSt3__110unique_ptrIbNS6_14default_deleteIbEEEEEJS5_SB_EEEFvvEE7RunOnceEPNS0_13BindStateBaseE #10 0x5a574ee0961b base::(anonymous namespace)::PostTaskAndReplyRelay::RunTaskAndPostReply() #11 0x5a574ee09a9b _ZN4base8internal7InvokerINS0_9BindStateIPFvNS_12_GLOBAL__N_121PostTaskAndReplyRelayEEJS4_EEEFvvEE7RunOnceEPNS0_13BindStateBaseE #12 0x5a574ee60607 base::debug::TaskAnnotator::RunTask() #13 0x5a574ee00105 base::internal::TaskTracker::RunOrSkipTask() #14 0x5a574ee4d643 base::internal::TaskTrackerPosix::RunOrSkipTask() #15 0x5a574edff40f base::internal::TaskTracker::RunAndPopNextTask() #16 0x5a574ee6e54d base::internal::SchedulerWorker::RunWorker() #17 0x5a574ee6e221 base::internal::SchedulerWorker::RunPooledWorker() #18 0x5a574ee4e128 base::(anonymous namespace)::ThreadFunc() #19 0x7896fce3b4fe <unknown> #20 0x7896fc293bef clone
,
Dec 8
Continuing down the list, I see the same stack in https://ci.chromium.org/p/chromium/builders/luci.chromium.try/chromeos-amd64-generic-rel/147865: [2880:2941:1207/133147.732815:FATAL:database.cc(1821)] database is locked #0 0x597bf08160bf base::debug::StackTrace::StackTrace() #1 0x597bf076a6bb logging::LogMessage::~LogMessage() #2 0x597bf1c5df61 sql::Database::OnSqliteError() #3 0x597bf1c59994 sql::Database::OpenInternal() #4 0x597bf1c594b4 sql::Database::Open() #5 0x597bf2d7a860 password_manager::LoginDatabase::Init() #6 0x597bf2d5fc95 password_manager::PasswordStoreDefault::InitOnBackgroundSequence() #7 0x597bf049e028 _ZN4base8internal7InvokerINS0_9BindStateIMN24DeviceMediaAsyncFileUtil22MediaPathFilterWrapperEFbRKNS_8FilePathEEJ13scoped_refptrIS4_ES5_EEEFbvEE7RunOnceEPNS0_13BindStateBaseE #8 0x597beca789a0 base::internal::ReturnAsParamAdapter<>() #9 0x597beca78b37 _ZN4base8internal7InvokerINS0_9BindStateIPFvNS_12OnceCallbackIFbvEEEPNSt3__110unique_ptrIbNS6_14default_deleteIbEEEEEJS5_SB_EEEFvvEE7RunOnceEPNS0_13BindStateBaseE #10 0x597bf07e4a1b base::(anonymous namespace)::PostTaskAndReplyRelay::RunTaskAndPostReply() #11 0x597bf07e4e9b _ZN4base8internal7InvokerINS0_9BindStateIPFvNS_12_GLOBAL__N_121PostTaskAndReplyRelayEEJS4_EEEFvvEE7RunOnceEPNS0_13BindStateBaseE #12 0x597bf083ba07 base::debug::TaskAnnotator::RunTask() #13 0x597bf07db505 base::internal::TaskTracker::RunOrSkipTask() #14 0x597bf0828a43 base::internal::TaskTrackerPosix::RunOrSkipTask() #15 0x597bf07da80f base::internal::TaskTracker::RunAndPopNextTask() #16 0x597bf084994d base::internal::SchedulerWorker::RunWorker() #17 0x597bf0849621 base::internal::SchedulerWorker::RunPooledWorker() #18 0x597bf0829528 base::(anonymous namespace)::ThreadFunc() #19 0x7889f486c4fe <unknown> #20 0x7889f3cc4bef clone
,
Dec 8
And again in https://ci.chromium.org/p/chromium/builders/luci.chromium.try/chromeos-amd64-generic-rel/147869. security.UserFilesGuest is the first test to run, and it looks like it's the only one that's failing in most of these runs: 2018/12/07 13:30:42 Waiting for DUT to be ready for testing 2018/12/07 13:30:44 Running arc.Boot 2018/12/07 13:30:44 Skipped arc.Boot for missing deps: [android] 2018/12/07 13:30:44 -------------------------------------------------------------------------------- 2018/12/07 13:30:44 Running security.UserFilesGuest 2018/12/07 13:30:45 Restarting ui job 2018/12/07 13:30:49 Waiting for org.chromium.SessionManager D-Bus service 2018/12/07 13:30:49 Asking session_manager to enable Chrome testing 2018/12/07 13:30:49 Waiting for Chrome to write its debugging port to /home/chronos/DevToolsActivePort 2018/12/07 13:30:50 Removing cryptohome for $guest 2018/12/07 13:30:50 Finding OOBE DevTools target 2018/12/07 13:32:44 Error: [user_files_guest.go:26] Login failed: OOBE target not found: browser process 2915 exited; Chrome probably crashed 2018/12/07 13:32:44 Finished security.UserFilesGuest 2018/12/07 13:32:44 -------------------------------------------------------------------------------- 2018/12/07 13:32:44 Running security.UserFilesLoggedIn 2018/12/07 13:32:44 Restarting ui job 2018/12/07 13:32:46 Waiting for org.chromium.SessionManager D-Bus service 2018/12/07 13:32:46 Asking session_manager to enable Chrome testing 2018/12/07 13:32:46 Waiting for Chrome to write its debugging port to /home/chronos/DevToolsActivePort 2018/12/07 13:32:47 Removing cryptohome for testuser@gmail.com 2018/12/07 13:32:47 Finding OOBE DevTools target 2018/12/07 13:32:48 Connecting to Chrome at ws://127.0.0.1:34773/devtools/page/AA7C116419F56E6EAE5E2AD0C35A3C9A 2018/12/07 13:32:48 Waiting for OOBE 2018/12/07 13:32:59 Logging in as user "testuser@gmail.com" 2018/12/07 13:33:01 Waiting for cryptohome for user "testuser@gmail.com" 2018/12/07 13:33:09 Waiting for OOBE to be dismissed 2018/12/07 13:33:18 Checking /home/chronos 2018/12/07 13:33:18 Examined 111 path(s) under /home/chronos 2018/12/07 13:33:18 Checking /home/user/b6f496913a1b13ddbbb92063906f919a26c972ad 2018/12/07 13:33:18 Examined 246 path(s) under /home/user/b6f496913a1b13ddbbb92063906f919a26c972ad 2018/12/07 13:33:18 Finished security.UserFilesLoggedIn ... Is there any chance that an old Chrome process (or something else) has locked various DB files at the time when local_test_runner is being executed? I don't think I know anything about the VM environment that chrome_all_tast_tests uses...
,
Dec 8
,
Dec 8
Would a chrome crash leave the DB files in locked state? Does tast not delete the cryptohome between runs?
,
Dec 8
> Does tast not delete the cryptohome between runs? It does -- see the "Removing cryptohome for ___" messages in the log snippet in #3. (But also, we're logging in as a guest for this test, so there already shouldn't be a recycled cryptohome, I'd think.)
,
Dec 8
We're currently not striping symbols from the chrome we deploy to the VM so that we can get full stack traces. We have just enough disk space for that in the VM, so it doesn't leave much headroom for tests that generate crash/dumps/logs/etc. Assuming that increased disk usage in tests is here to stay, I uploaded https://chromium-review.googlesource.com/c/chromium/src/+/1369036 to start stripping them again, though that's more of a stop-gap IMO since a bunch of hex isn't very useful. We may want to consider increasing partition sizes in the VM afterwards.
,
Dec 8
I'll look into increasing the VM disk size
,
Dec 10
Issue 913356 has been merged into this issue.
,
Dec 10
,
Dec 10
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/7a1760fa9078626f62c18733b569bf298ccb871d commit 7a1760fa9078626f62c18733b569bf298ccb871d Author: Ben Pastene <bpastene@chromium.org> Date: Mon Dec 10 18:07:58 2018 Strip symbols when deploying chrome to cros VMs during tast tests. It's looking like we don't have enough disk space in the VM for this. I'll look into increasing partition sizes before turning this back on. TBR=dpranke Bug: 913153 Change-Id: I7d0ac50ed45171f69d294596a0c35b3cd46e4a94 Reviewed-on: https://chromium-review.googlesource.com/c/1369036 Commit-Queue: Ben Pastene <bpastene@chromium.org> Reviewed-by: Achuith Bhandarkar <achuith@chromium.org> Cr-Commit-Position: refs/heads/master@{#615172} [modify] https://crrev.com/7a1760fa9078626f62c18733b569bf298ccb871d/build/chromeos/run_vm_test.py
,
Dec 11
Flake rate is down after #11. Apparently stripping symbols from the browser gives enough room for the tests to run. Hopefully if/when we expand partition sizes in the VM, we can go back to not stripping. That said, it appears that crash dumps (again?) are the culprit for why we were running into space issues in the first place. Running the full set of browser-related tast tests in a local VM generates about 200MB of crash dump files that look like: /var/spool/crash/cros_camera_service.20181211.134027.2419.core And it's all cros_camera_service files. Digging further, a new set of these crash files are generated every time chrome starts up. ie: running "restart ui" generates 40MB of cruft on the disk every invocation. I can confirm that running "stop cros-camera" prevents the dumps from getting generated. Perhaps we should run that preemptively before each test. Though I'm not sure what those dumps signify, or their nature.
,
Dec 11
#12: I couldn't find any existing bugs about the cros_camera_service crashes, so I filed issue 914110 . |
||||||
►
Sign in to add a comment |
||||||
Comment 1 by derat@chromium.org
, Dec 8