tast.ui.MashLogin failures due to mangled command line missing --mash-service-name |
||||||||
Issue descriptiontast.ui.MashLogin uses the command line to detect if a mash service process (e.g. the ash+ws process) is running. Sometimes this fails: https://stainless.corp.google.com/browse/chromeos-autotest-results/244127085-chromeos-test/ No chrome process containing "mash-service-name" among... The problem is the command line is mangled. See below, it only contains "--mash-s" instead of "--mash-service-name". chronos 16285 52.7 7.5 587424 295504 ? Rl 19:15 0:05 \_ /opt/google/chrome/chrome --type=utility --field-trial-handle=5055908869115835639,2119901972152640347,131072 --enable-features=Crostini,ExperimentalCrostiniUI,Mash,Pepper3DImageChromium,TouchOptimizedUi --lang=en-US --no-sandbox --enable-logging --log-level=1 --use-gl=egl --vmodule=*arc/*=1,nss_cert_database_chromeos=1,*/assistant/*=1,*chromeos/login/*=1,auto_enrollment_controller=1,*/ui/ozone/*=1,*/ui/display/manager/chromeos/*=1,*night_light*=1,update_engine=1,component_updater_service=1,power_button_observer=2,webui_login_view=2,lock_state_controller=2,webui_screen_locker=2,screen_locker=2 --message-loop-type-ui --mash-s ash --arc-availability=none --ash-disable-system-sounds --ash-power-button-position={"edge":"bottom","position":0.05} --autoplay-policy=no-user-gesture-required --child-wallpaper-large=/usr/share/chromeos-assets/wallpaper/child_large.jpg --child-wallpaper-small=/usr/share/chromeos-assets/wallpaper/child_small.jpg --default-wallpaper-large=/usr/share/chromeos-assets/wallpaper/default_large.jpg --default-wallpaper-small=/usr/share/chromeos-assets/wallpaper/default_small.jpg --disable-gaia-services --disable-logging-redirect --enable-arc-oobe-optin-no-skip --enable-experimental-extension-apis --enable-features=Pepper3DImageChromium,TouchOptimizedUi,Crostini,ExperimentalCrostiniUI,Mash --enable-logging --enable-touchview --enable-wayland-server --enable-webgl-image-chromium --enterprise-enrollment-initial-modulus=15 --enterprise-enrollment-modulus-limit=19 --force-tablet-power-button --gpu-sandbox-failures-fatal=yes --gpu-sandbox-start-early --guest-wallpaper-large=/usr/share/chromeos-assets/wallpaper/guest_large.jpg --guest-wallpaper-small=/usr/share/chromeos-assets/wallpaper/guest_small.jpg --has-chromeos-keyboard --has-internal-stylus --load-extension=/tmp/tast_chrome_extensions.154056820/test_api_extension --log-level=1 --login-manager --login-profile=user --num-raster-threads=2 --oobe-skip-postlogin --ppapi-flash-path=/opt/google/chrome/pepper/libpepflashplayer.so --ppapi-flash-version=31.0.0.122 --remote-debugging-port=0 --system-developer-mode --testing-channel=NamedTestingInterface:/tmp/.org.chromium.Chromium.uSlpUO/.org.chromium.Chromium.eCrKDt --ui-prioritize-in-gpu-process --use-cras --use-gl=egl --user-data-dir=/home/chronos --vmodule=*arc/*=1,nss_cert_database_chromeos=1,*/assistant/*=1,*chromeos/login/*=1,auto_enrollment_controller=1,*/ui/ozone/*=1,*/ui/display/manager/chromeos/*=1,*night_light*=1,update_engine=1,component_updater_service=1,power_button_observer=2,webui_login_view=2,lock_state_controller=2,webui_screen_locker=2,screen_locker=2 --whitelisted-extension-id=behllobkkfkfnphdnhnkndlbkcpglgmj --enable-crash-reporter=,unknown --noerrdialogs --user-data-dir=/home/chronos --homedir=/ --enable-experimental-extension-apis --whitelisted-extension-id=behllobkkfkfnphdnhnkndlbkcpglgmj --enable-logging --log-level=1 --vmodule=*arc/*=1,nss_cert_database_chromeos=1,*/assistant/*=1,*chromeos/login/*=1,auto_enrollment_controller=1,*/ui/ozone/*=1,*/ui/display/manager/chromeos/*=1,*night_light*=1,update_engine=1,component_updater_service=1,power_button_observer=2,webui_login_view=2,lock_state_controller=2,webui_screen_locker=2,screen_locker=2 --service-request-channel-token=11450729603940349271 --shared-files=v8_natives_data:100,v8_snapshot_data:101 I've seen something like this before. Chrome on Linux and Chrome OS does something to mangle its command line, replacing the per-switch NUL terminators with a command line consisting of a string with spaces. See issue 772099 . Dan, do you have any ideas here?
,
Oct 3
http://g/chromeos-chatty-eng/YAAkxrVjxe8 may be relevant. In the meantime, are you okay with updating the test to (also?) match the truncated version, James?
,
Oct 3
we're tracking the rework in issue 887875. might be worth spending time in addressing that rather than pushing the hacks around some more.
,
Oct 3
+sky fwiw, i bisected to https://chromium-review.googlesource.com/c/chromium/src/+/1239371, though from the comments it seems the real bug lies somewhere else. It seems to fail pretty consistently on caroline, tricky and minnie now.
,
Oct 3
As James says, this is likely because of issue 772099 .
,
Oct 3
I'm OK with modifying the test to look for "--mash*". I spent a while last year trying to sort out a similar problem and failed to figure it out.
,
Oct 3
sammiequon Re #4 - is it possible that CL changed the command line length, which might affect where truncations happened?
,
Oct 3
,
Oct 3
My patch shouldn't have impacted CommandLine at all.
,
Oct 3
Re #7: I was in the middle of a bisect before the other bug got merged, so i thought i finish it and post my findings in case it helps. I didn't dig too deeper since it seemed like this was under control, but happy to take a closer look if needed!
,
Oct 5
The following revision refers to this bug: https://chromium.googlesource.com/chromiumos/platform/tast-tests/+/f95075b921b5041b81dc53cdae911c5bfb83239e commit f95075b921b5041b81dc53cdae911c5bfb83239e Author: James Cook <jamescook@chromium.org> Date: Fri Oct 05 06:00:17 2018 tast-tests: Use arg prefix for mash processes in ui.MashLogin Chrome sometimes truncates arguments in its command line. Instead of looking for "mash-service-name", just look for "mash-". The command line problem is issue 887875 BUG= chromium:891470 ,chromium:887875 TEST=this is the test Change-Id: I812ffd147ec36148cd6185b73510dfef40b70135 Reviewed-on: https://chromium-review.googlesource.com/c/1260054 Tested-by: James Cook <jamescook@chromium.org> Trybot-Ready: James Cook <jamescook@chromium.org> Reviewed-by: Dan Erat <derat@chromium.org> Reviewed-by: James Cook <jamescook@chromium.org> [modify] https://crrev.com/f95075b921b5041b81dc53cdae911c5bfb83239e/src/chromiumos/tast/local/bundles/cros/ui/mash_login.go
,
Oct 5
One of the runs after this landed passed, but a couple runs have been failing, just showing missing "--mash-s" instead of "--mash-service-name"
,
Oct 5
Yeah, I think this is still a real problem. Here's a recent failure: https://stainless.corp.google.com/search?exclude_retried=true&first_date=2018-10-03&master_builder_name=&builder_name_number=&shard=&exclude_acts=true&builder_name=-tot-chrom.*-pfq-informational%24&master_builder_name_number=&owner=&retry=&exclude_cts=false&exclude_non_production=false&hostname=&board=%5Etricky%24&test=%5Etast%5C..*Mash&suite=&build=%5ER71%5C-11129%5C.0%5C.0%5C-b3009867%24&status=FAIL&status=ERROR&status=ABORT&reason=&waterfall=&exclude_not_run=false&last_date=2018-10-05&exclude_non_release=false&exclude_au=true&model=&view=list There are a variety of .dmp files here, including: │ ├── chromium-ash-minidump-1c1d1f24b5b199f4.dmp.txt [0 Bytes] │ ├── cryptohome.14135.dmp.txt [6.6 kB] │ ├── cryptohomed.871.dmp.txt [41.4 kB] │ └── tcsd.834.dmp.txt [9.1 kB] I don't know what chromium-ash-minidump is, nor tcsd. Hypothesis: Some other regression came in while dealing with the truncated cmd line problem, and the ash process is crashing. The weird thing is the ash process should show as a chrome crash (it's a chrome utility process.) Dan, do you know what any of those other processes are?
,
Oct 5
/var/log/messages shows a chrome crash: 2018-10-05T15:08:26.797141+00:00 WARNING crash_reporter[17047]: [user] Received crash notification for chrome[16579] sig 11, user 1000 group 1000 (ignoring call by kernel - chrome crash; waiting for chrome to call us directly) Tast logs show a failure at roughly the same time: 2018/10/05 08:08:11 Started test ui.MashLogin 2018/10/05 08:08:11 [08:08:10.241] Restarting ui job 2018/10/05 08:08:13 [08:08:12.175] Waiting for org.chromium.SessionManager D-Bus service 2018/10/05 08:08:13 [08:08:12.210] Asking session_manager to enable Chrome testing 2018/10/05 08:08:13 [08:08:12.216] Waiting for Chrome to write its debugging port to /home/chronos/DevToolsActivePort 2018/10/05 08:08:15 [08:08:14.479] Checking cryptohomed service 2018/10/05 08:08:15 [08:08:14.480] Removing cryptohome for testuser@gmail.com 2018/10/05 08:08:15 [08:08:14.580] Finding OOBE DevTools target 2018/10/05 08:08:15 [08:08:14.733] Connecting to Chrome at ws://127.0.0.1:37769/devtools/page/9990A5696EF6134F3263BD1F6464367E 2018/10/05 08:08:16 [08:08:15.005] Waiting for OOBE 2018/10/05 08:08:21 [08:08:19.987] Logging in as user "testuser@gmail.com" 2018/10/05 08:08:21 [08:08:20.129] Waiting for cryptohome for user "testuser@gmail.com" 2018/10/05 08:08:22 [08:08:21.125] Waiting for OOBE to be dismissed 2018/10/05 08:08:30 [08:08:28.296] Error at mash_login.go:66: No chrome process containing "mash-s" I see a surprising number of renderers running (22), but no utility processes, which makes me think the ash+ws process isn't running.
,
Oct 5
tast symbolize on the chromium-ash-minidump shows:
Crash reason: SIGSEGV /SEGV_MAPERR
Crash address: 0x2b8
Process uptime: not available
Thread 0 (crashed)
0 chrome!cc::SurfaceLayer::SetFallbackSurfaceId(viz::SurfaceId const&) [optional.h : 612 + 0x0]
rax = 0x1638a1f361277500 rdx = 0x00003ee409a3ca00
rcx = 0x0000000000000000 rbx = 0x00003ee409941770
rsi = 0x00007ffd3c3d8d20 rdi = 0x0000000000000000
rbp = 0x00007ffd3c3d8c40 rsp = 0x00007ffd3c3d8b40
r8 = 0x00003ee4099f5380 r9 = 0x0000000000000000
r10 = 0x0000000000000003 r11 = 0x00003ee40b3661e0
r12 = 0x00003ee408c335a0 r13 = 0x0000000000000000
r14 = 0x00007ffd3c3d8d20 r15 = 0x00007ffd3c3d8d20
rip = 0x00007f79269a1707
Found by: given as instruction pointer in context
1 chrome!ui::Layer::SetFallbackSurfaceId(viz::SurfaceId const&) [layer.cc : 805 + 0x5]
rbx = 0x00003ee409941770 rbp = 0x00007ffd3c3d8c70
rsp = 0x00007ffd3c3d8c50 r12 = 0x00003ee408c335a0
r13 = 0x00003ee408c97180 r14 = 0x00003ee40931a500
r15 = 0x00007ffd3c3d8d20 rip = 0x00007f7923b5552c
Found by: call frame info
2 chrome!ws::ClientRoot::OnFirstSurfaceActivation(viz::SurfaceInfo const&) [client_surface_embedder.cc : 49 + 0x8]
rbx = 0x00003ee409941770 rbp = 0x00007ffd3c3d8ca0
rsp = 0x00007ffd3c3d8c80 r12 = 0x00003ee408c335a0
r13 = 0x00003ee408c97180 r14 = 0x00003ee409941700
r15 = 0x00007ffd3c3d8d20 rip = 0x00007f7924d2352a
Found by: call frame info
3 chrome!viz::mojom::FrameSinkManagerClientStubDispatch::Accept(viz::mojom::FrameSinkManagerClient*, mojo::Message*) [frame_sink_manager.mojom.cc : 2205 + 0x6]
rbx = 0x00003ee40a821368 rbp = 0x00007ffd3c3d8eb0
rsp = 0x00007ffd3c3d8cb0 r12 = 0x00003ee408c335a0
r13 = 0x00003ee408c97180 r14 = 0x00007ffd3c3d8f20
r15 = 0x00007ffd3c3d8f20 rip = 0x00007f79201dcd7d
Found by: call frame info
4 chrome!mojo::InterfaceEndpointClient::HandleIncomingMessageThunk::Accept(mojo::Message*) [interface_endpoint_client.cc : 423 + 0x6]
rbx = 0x0000000000000000 rbp = 0x00007ffd3c3d8f00
rsp = 0x00007ffd3c3d8ec0 r12 = 0x00003ee408c97140
r13 = 0x00003ee408c97180 r14 = 0x00007ffd3c3d8f20
r15 = 0x00007ffd3c3d90d8 rip = 0x00007f79224a95e0
Found by: call frame info
5 chrome!mojo::internal::MultiplexRouter::ProcessIncomingMessage(mojo::internal::MultiplexRouter::MessageWrapper*, mojo::internal::MultiplexRouter::ClientCallBehavior, base::SequencedTaskRunner*) [interface_endpoint_client.cc : 306 + 0xb]
...
sky has already landed a fix in https://chromium.googlesource.com/chromium/src/+/12ef71460252b042f52da8eddc26adf641e2d50b so this should go away shortly
,
Oct 5
Thank you for looking into this James!
,
Oct 5
#13: I don't understand Autotest's logic for determining which crashes and logs to collect. The files under tast/results/crashes are the ones that the tast process collects, and they're just the ones that appeared between when testing started and ended.
,
Oct 5
Test is green on the builders now.
,
Oct 13
I still see a few "No chrome process containing "mash-s" among [...]" failures, oddly both on scarlet-arcnext: http://stainless/browse/chromeos-autotest-results/247580820-chromeos-test/ http://stainless/browse/chromeos-autotest-results/247795193-chromeos-test/ James, any ideas?
,
Oct 13
,
Oct 14
@derat - This is the same command-line truncation problem. For example, one process contains "--message-loop-type-ui --mash e=ash". I think that should be "--mash-service-name=ash" but the middle got eaten. Maybe scarlet-arcnext has additional command-line switches that cause truncation to happen more. We really need a fix for the command-line issue 887875, which is currently unassigned. I tried to fix it a few months ago, but my low-level POSIX-fu is weak. In the mean time, do you think I should just skip the part of the test that looks for the command line switch?
,
Oct 15
Skipping the check seems like the only approach for now until the command line is fixed.
,
Oct 17
The following revision refers to this bug: https://chromium.googlesource.com/chromiumos/platform/tast-tests/+/ae7903f4abf6c3e4eb554f11effb282e00e3e466 commit ae7903f4abf6c3e4eb554f11effb282e00e3e466 Author: James Cook <jamescook@chromium.org> Date: Wed Oct 17 16:07:14 2018 tast-tests: Don't parse command line in ui.MashLogin This avoids flaky failures due to command line arguments sometimes being truncated by Chrome's implementation of setproctitle(). BUG= chromium:891470 TEST=Tast ui.MashLogin Change-Id: I911961ea09a4a8c48db70c1d25d8f6449948f26f Reviewed-on: https://chromium-review.googlesource.com/c/1281682 Reviewed-by: Dan Erat <derat@chromium.org> Tested-by: James Cook <jamescook@chromium.org> [modify] https://crrev.com/ae7903f4abf6c3e4eb554f11effb282e00e3e466/src/chromiumos/tast/local/bundles/cros/ui/mash_login.go
,
Oct 19
|
||||||||
►
Sign in to add a comment |
||||||||
Comment 1 by derat@chromium.org
, Oct 3