New issue
Advanced search Search tips

Issue 913153 link

Starred by 4 users

Issue metadata

Status: Assigned
Owner:
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: Chrome
Pri: 2
Type: Bug

Blocked on:
issue 914110

Blocking:
issue 909975



Sign in to add a comment

chrome_all_tast_tests fails due to disk/file issues in VM

Project Member Reported by derat@chromium.org, Dec 8

Issue description

Per https://crbug.com/909736#c23: In the top failure at https://findit-for-me.appspot.com/flake/occurrences?key=ag9zfmZpbmRpdC1mb3ItbWVyPwsSBUZsYWtlIjRjaHJvbWl1bUBjaHJvbWVfYWxsX3Rhc3RfdGVzdHNAY2hyb21lX2FsbF90YXN0X3Rlc3RzDA, ui.SingleProcessMashLogin failed in https://ci.chromium.org/p/chromium/builders/luci.chromium.try/chromeos-amd64-generic-rel/147906:

2018/12/07 14:21:37 Removing cryptohome for testuser@gmail.com
...
2018/12/07 14:23:31 Error: [single_process_mash_login.go:26] Chrome login failed: OOBE not dismissed: browser process 6289 replaced by 6319; Chrome probably crashed

Looking at the corresponding Chrome log file, it looks like the VM ran out of space:

[6289:6340:1207/142203.052733:WARNING:important_file_writer.cc(97)] temp file failure: /home/chronos/Default/Preferences : error flushing: Input/output error (5)
[6289:6344:1207/142203.084488:ERROR:model_type_store_service_impl.cc(34)] Failed to initialize ModelTypeStore backend: Init@../../components/sync/model_impl/model_type_store_backend.cc:99: IO error: /home/chronos/u-f5e09967f7a158e1c1085cb51f7d3d8007c8088b/Sync Data/LevelDB/MANIFEST-000001: FILE_ERROR_NO_SPACE (ChromeMethodBFE: 3::WritableFileAppend::8)
[6289:6341:1207/142202.822680:FATAL:database.cc(1821)] database or disk is full
#0 0x5a574ee3acbf base::debug::StackTrace::StackTrace()
#1 0x5a574ed8f2bb logging::LogMessage::~LogMessage()
#2 0x5a5750282c41 sql::Database::OnSqliteError()
#3 0x5a575027fb16 sql::Database::Execute()
#4 0x5a5750285ded sql::MetaTable::Init()
#5 0x5a575139f866 password_manager::LoginDatabase::Init()
#6 0x5a5751384c55 password_manager::PasswordStoreDefault::InitOnBackgroundSequence()
#7 0x5a574eac2c28 _ZN4base8internal7InvokerINS0_9BindStateIMN24DeviceMediaAsyncFileUtil22MediaPathFilterWrapperEFbRKNS_8FilePathEEJ13scoped_refptrIS4_ES5_EEEFbvEE7RunOnceEPNS0_13BindStateBaseE
#8 0x5a574b09c9a0 base::internal::ReturnAsParamAdapter<>()
#9 0x5a574b09cb37 _ZN4base8internal7InvokerINS0_9BindStateIPFvNS_12OnceCallbackIFbvEEEPNSt3__110unique_ptrIbNS6_14default_deleteIbEEEEEJS5_SB_EEEFvvEE7RunOnceEPNS0_13BindStateBaseE
#10 0x5a574ee0961b base::(anonymous namespace)::PostTaskAndReplyRelay::RunTaskAndPostReply()
#11 0x5a574ee09a9b _ZN4base8internal7InvokerINS0_9BindStateIPFvNS_12_GLOBAL__N_121PostTaskAndReplyRelayEEJS4_EEEFvvEE7RunOnceEPNS0_13BindStateBaseE
#12 0x5a574ee60607 base::debug::TaskAnnotator::RunTask()
#13 0x5a574ee00105 base::internal::TaskTracker::RunOrSkipTask()
#14 0x5a574ee4d643 base::internal::TaskTrackerPosix::RunOrSkipTask()
#15 0x5a574edff40f base::internal::TaskTracker::RunAndPopNextTask()
#16 0x5a574ee6e54d base::internal::SchedulerWorker::RunWorker()
#17 0x5a574ee6e221 base::internal::SchedulerWorker::RunPooledWorker()
#18 0x5a574ee4e128 base::(anonymous namespace)::ThreadFunc()
#19 0x7896fce3b4fe <unknown>
#20 0x7896fc293bef clone
 
In https://ci.chromium.org/p/chromium/builders/luci.chromium.try/chromeos-amd64-generic-rel/147896, it looks like Chrome aborted due to another disk-related issue (so maybe the disk isn't really full):

[2893:2958:1207/140515.982658:FATAL:database.cc(1821)] database is locked
#0 0x560f8e0fe97f base::debug::StackTrace::StackTrace()
#1 0x560f8e052f7b logging::LogMessage::~LogMessage()
#2 0x560f8f546901 sql::Database::OnSqliteError()
#3 0x560f8f542334 sql::Database::OpenInternal()
#4 0x560f8f541e54 sql::Database::Open()
#5 0x560f906634e0 password_manager::LoginDatabase::Init()
#6 0x560f90648915 password_manager::PasswordStoreDefault::InitOnBackgroundSequence()
#7 0x560f8dd868e8 _ZN4base8internal7InvokerINS0_9BindStateIMN24DeviceMediaAsyncFileUtil22MediaPathFilterWrapperEFbRKNS_8FilePathEEJ13scoped_refptrIS4_ES5_EEEFbvEE7RunOnceEPNS0_13BindStateBaseE
#8 0x560f8a3609a0 base::internal::ReturnAsParamAdapter<>()
#9 0x560f8a360b37 _ZN4base8internal7InvokerINS0_9BindStateIPFvNS_12OnceCallbackIFbvEEEPNSt3__110unique_ptrIbNS6_14default_deleteIbEEEEEJS5_SB_EEEFvvEE7RunOnceEPNS0_13BindStateBaseE
#10 0x560f8e0cd2db base::(anonymous namespace)::PostTaskAndReplyRelay::RunTaskAndPostReply()
#11 0x560f8e0cd75b _ZN4base8internal7InvokerINS0_9BindStateIPFvNS_12_GLOBAL__N_121PostTaskAndReplyRelayEEJS4_EEEFvvEE7RunOnceEPNS0_13BindStateBaseE
#12 0x560f8e1242c7 base::debug::TaskAnnotator::RunTask()
#13 0x560f8e0c3dc5 base::internal::TaskTracker::RunOrSkipTask()
#14 0x560f8e111303 base::internal::TaskTrackerPosix::RunOrSkipTask()
#15 0x560f8e0c30cf base::internal::TaskTracker::RunAndPopNextTask()
#16 0x560f8e13220d base::internal::SchedulerWorker::RunWorker()
#17 0x560f8e131ee1 base::internal::SchedulerWorker::RunPooledWorker()
#18 0x560f8e111de8 base::(anonymous namespace)::ThreadFunc()
#19 0x7e814cd124fe <unknown>
#20 0x7e814c16abef clone
Continuing down the list, I see the same stack in https://ci.chromium.org/p/chromium/builders/luci.chromium.try/chromeos-amd64-generic-rel/147865:

[2880:2941:1207/133147.732815:FATAL:database.cc(1821)] database is locked
#0 0x597bf08160bf base::debug::StackTrace::StackTrace()
#1 0x597bf076a6bb logging::LogMessage::~LogMessage()
#2 0x597bf1c5df61 sql::Database::OnSqliteError()
#3 0x597bf1c59994 sql::Database::OpenInternal()
#4 0x597bf1c594b4 sql::Database::Open()
#5 0x597bf2d7a860 password_manager::LoginDatabase::Init()
#6 0x597bf2d5fc95 password_manager::PasswordStoreDefault::InitOnBackgroundSequence()
#7 0x597bf049e028 _ZN4base8internal7InvokerINS0_9BindStateIMN24DeviceMediaAsyncFileUtil22MediaPathFilterWrapperEFbRKNS_8FilePathEEJ13scoped_refptrIS4_ES5_EEEFbvEE7RunOnceEPNS0_13BindStateBaseE
#8 0x597beca789a0 base::internal::ReturnAsParamAdapter<>()
#9 0x597beca78b37 _ZN4base8internal7InvokerINS0_9BindStateIPFvNS_12OnceCallbackIFbvEEEPNSt3__110unique_ptrIbNS6_14default_deleteIbEEEEEJS5_SB_EEEFvvEE7RunOnceEPNS0_13BindStateBaseE
#10 0x597bf07e4a1b base::(anonymous namespace)::PostTaskAndReplyRelay::RunTaskAndPostReply()
#11 0x597bf07e4e9b _ZN4base8internal7InvokerINS0_9BindStateIPFvNS_12_GLOBAL__N_121PostTaskAndReplyRelayEEJS4_EEEFvvEE7RunOnceEPNS0_13BindStateBaseE
#12 0x597bf083ba07 base::debug::TaskAnnotator::RunTask()
#13 0x597bf07db505 base::internal::TaskTracker::RunOrSkipTask()
#14 0x597bf0828a43 base::internal::TaskTrackerPosix::RunOrSkipTask()
#15 0x597bf07da80f base::internal::TaskTracker::RunAndPopNextTask()
#16 0x597bf084994d base::internal::SchedulerWorker::RunWorker()
#17 0x597bf0849621 base::internal::SchedulerWorker::RunPooledWorker()
#18 0x597bf0829528 base::(anonymous namespace)::ThreadFunc()
#19 0x7889f486c4fe <unknown>
#20 0x7889f3cc4bef clone
And again in https://ci.chromium.org/p/chromium/builders/luci.chromium.try/chromeos-amd64-generic-rel/147869.

security.UserFilesGuest is the first test to run, and it looks like it's the only one that's failing in most of these runs:

2018/12/07 13:30:42 Waiting for DUT to be ready for testing
2018/12/07 13:30:44 Running arc.Boot
2018/12/07 13:30:44 Skipped arc.Boot for missing deps: [android]
2018/12/07 13:30:44 --------------------------------------------------------------------------------
2018/12/07 13:30:44 Running security.UserFilesGuest
2018/12/07 13:30:45 Restarting ui job
2018/12/07 13:30:49 Waiting for org.chromium.SessionManager D-Bus service
2018/12/07 13:30:49 Asking session_manager to enable Chrome testing
2018/12/07 13:30:49 Waiting for Chrome to write its debugging port to /home/chronos/DevToolsActivePort
2018/12/07 13:30:50 Removing cryptohome for $guest
2018/12/07 13:30:50 Finding OOBE DevTools target
2018/12/07 13:32:44 Error: [user_files_guest.go:26] Login failed: OOBE target not found: browser process 2915 exited; Chrome probably crashed
2018/12/07 13:32:44 Finished security.UserFilesGuest
2018/12/07 13:32:44 --------------------------------------------------------------------------------
2018/12/07 13:32:44 Running security.UserFilesLoggedIn
2018/12/07 13:32:44 Restarting ui job
2018/12/07 13:32:46 Waiting for org.chromium.SessionManager D-Bus service
2018/12/07 13:32:46 Asking session_manager to enable Chrome testing
2018/12/07 13:32:46 Waiting for Chrome to write its debugging port to /home/chronos/DevToolsActivePort
2018/12/07 13:32:47 Removing cryptohome for testuser@gmail.com
2018/12/07 13:32:47 Finding OOBE DevTools target
2018/12/07 13:32:48 Connecting to Chrome at ws://127.0.0.1:34773/devtools/page/AA7C116419F56E6EAE5E2AD0C35A3C9A
2018/12/07 13:32:48 Waiting for OOBE
2018/12/07 13:32:59 Logging in as user "testuser@gmail.com"
2018/12/07 13:33:01 Waiting for cryptohome for user "testuser@gmail.com"
2018/12/07 13:33:09 Waiting for OOBE to be dismissed
2018/12/07 13:33:18 Checking /home/chronos
2018/12/07 13:33:18 Examined 111 path(s) under /home/chronos
2018/12/07 13:33:18 Checking /home/user/b6f496913a1b13ddbbb92063906f919a26c972ad
2018/12/07 13:33:18 Examined 246 path(s) under /home/user/b6f496913a1b13ddbbb92063906f919a26c972ad
2018/12/07 13:33:18 Finished security.UserFilesLoggedIn
...

Is there any chance that an old Chrome process (or something else) has locked various DB files at the time when local_test_runner is being executed? I don't think I know anything about the VM environment that chrome_all_tast_tests uses...
Summary: chrome_all_tast_tests fails due to disk/file issues in VM (was: chrome_all_tast_tests fails due to running out of disk space in VM)
Would a chrome crash leave the DB files in locked state?

Does tast not delete the cryptohome between runs?
> Does tast not delete the cryptohome between runs?

It does -- see the "Removing cryptohome for ___" messages in the log snippet in #3. (But also, we're logging in as a guest for this test, so there already shouldn't be a recycled cryptohome, I'd think.)
We're currently not striping symbols from the chrome we deploy to the VM so that we can get full stack traces. We have just enough disk space for that in the VM, so it doesn't leave much headroom for tests that generate crash/dumps/logs/etc.

Assuming that increased disk usage in tests is here to stay, I uploaded https://chromium-review.googlesource.com/c/chromium/src/+/1369036 to start stripping them again, though that's more of a stop-gap IMO since a bunch of hex isn't very useful. We may want to consider increasing partition sizes in the VM afterwards.
Cc: -achuith@chromium.org bpastene@chromium.org
Owner: achuith@chromium.org
I'll look into increasing the VM disk size
Cc: tzik@chromium.org derat@chromium.org achuith@chromium.org shik@chromium.org keiichiw@chromium.org
 Issue 913356  has been merged into this issue.
Blocking: 909975
Project Member

Comment 11 by bugdroid1@chromium.org, Dec 10

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/7a1760fa9078626f62c18733b569bf298ccb871d

commit 7a1760fa9078626f62c18733b569bf298ccb871d
Author: Ben Pastene <bpastene@chromium.org>
Date: Mon Dec 10 18:07:58 2018

Strip symbols when deploying chrome to cros VMs during tast tests.

It's looking like we don't have enough disk space in the VM for this.
I'll look into increasing partition sizes before turning this back on.

TBR=dpranke

Bug: 913153
Change-Id: I7d0ac50ed45171f69d294596a0c35b3cd46e4a94
Reviewed-on: https://chromium-review.googlesource.com/c/1369036
Commit-Queue: Ben Pastene <bpastene@chromium.org>
Reviewed-by: Achuith Bhandarkar <achuith@chromium.org>
Cr-Commit-Position: refs/heads/master@{#615172}
[modify] https://crrev.com/7a1760fa9078626f62c18733b569bf298ccb871d/build/chromeos/run_vm_test.py

Flake rate is down after #11. Apparently stripping symbols from the browser gives enough room for the tests to run. Hopefully if/when we expand partition sizes in the VM, we can go back to not stripping.

That said, it appears that crash dumps (again?) are the culprit for why we were running into space issues in the first place. Running the full set of browser-related tast tests in a local VM generates about 200MB of crash dump files that look like:
/var/spool/crash/cros_camera_service.20181211.134027.2419.core

And it's all cros_camera_service files. Digging further, a new set of these crash files are generated every time chrome starts up. ie: running "restart ui" generates 40MB of cruft on the disk every invocation. 

I can confirm that running "stop cros-camera" prevents the dumps from getting generated. Perhaps we should run that preemptively before each test. Though I'm not sure what those dumps signify, or their nature.
Blockedon: 914110
#12: I couldn't find any existing bugs about the cros_camera_service crashes, so I filed  issue 914110 .

Sign in to add a comment