New issue
Advanced search Search tips

Issue 891470 link

Starred by 2 users

Issue metadata

Status: Fixed
Owner:
Closed: Oct 19
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: Chrome
Pri: 1
Type: Bug

Blocking:
issue 847901



Sign in to add a comment

tast.ui.MashLogin failures due to mangled command line missing --mash-service-name

Project Member Reported by jamescook@chromium.org, Oct 2

Issue description

tast.ui.MashLogin uses the command line to detect if a mash service process (e.g. the ash+ws process) is running. Sometimes this fails:

https://stainless.corp.google.com/browse/chromeos-autotest-results/244127085-chromeos-test/

No chrome process containing "mash-service-name" among...

The problem is the command line is mangled. See below, it only contains "--mash-s" instead of "--mash-service-name".

chronos  16285 52.7  7.5 587424 295504 ?       Rl   19:15   0:05      \_ /opt/google/chrome/chrome --type=utility --field-trial-handle=5055908869115835639,2119901972152640347,131072 --enable-features=Crostini,ExperimentalCrostiniUI,Mash,Pepper3DImageChromium,TouchOptimizedUi --lang=en-US --no-sandbox --enable-logging --log-level=1 --use-gl=egl --vmodule=*arc/*=1,nss_cert_database_chromeos=1,*/assistant/*=1,*chromeos/login/*=1,auto_enrollment_controller=1,*/ui/ozone/*=1,*/ui/display/manager/chromeos/*=1,*night_light*=1,update_engine=1,component_updater_service=1,power_button_observer=2,webui_login_view=2,lock_state_controller=2,webui_screen_locker=2,screen_locker=2 --message-loop-type-ui --mash-s ash --arc-availability=none --ash-disable-system-sounds --ash-power-button-position={"edge":"bottom","position":0.05} --autoplay-policy=no-user-gesture-required --child-wallpaper-large=/usr/share/chromeos-assets/wallpaper/child_large.jpg --child-wallpaper-small=/usr/share/chromeos-assets/wallpaper/child_small.jpg --default-wallpaper-large=/usr/share/chromeos-assets/wallpaper/default_large.jpg --default-wallpaper-small=/usr/share/chromeos-assets/wallpaper/default_small.jpg --disable-gaia-services --disable-logging-redirect --enable-arc-oobe-optin-no-skip --enable-experimental-extension-apis --enable-features=Pepper3DImageChromium,TouchOptimizedUi,Crostini,ExperimentalCrostiniUI,Mash --enable-logging --enable-touchview --enable-wayland-server --enable-webgl-image-chromium --enterprise-enrollment-initial-modulus=15 --enterprise-enrollment-modulus-limit=19 --force-tablet-power-button --gpu-sandbox-failures-fatal=yes --gpu-sandbox-start-early --guest-wallpaper-large=/usr/share/chromeos-assets/wallpaper/guest_large.jpg --guest-wallpaper-small=/usr/share/chromeos-assets/wallpaper/guest_small.jpg --has-chromeos-keyboard --has-internal-stylus --load-extension=/tmp/tast_chrome_extensions.154056820/test_api_extension --log-level=1 --login-manager --login-profile=user --num-raster-threads=2 --oobe-skip-postlogin --ppapi-flash-path=/opt/google/chrome/pepper/libpepflashplayer.so --ppapi-flash-version=31.0.0.122 --remote-debugging-port=0 --system-developer-mode --testing-channel=NamedTestingInterface:/tmp/.org.chromium.Chromium.uSlpUO/.org.chromium.Chromium.eCrKDt --ui-prioritize-in-gpu-process --use-cras --use-gl=egl --user-data-dir=/home/chronos --vmodule=*arc/*=1,nss_cert_database_chromeos=1,*/assistant/*=1,*chromeos/login/*=1,auto_enrollment_controller=1,*/ui/ozone/*=1,*/ui/display/manager/chromeos/*=1,*night_light*=1,update_engine=1,component_updater_service=1,power_button_observer=2,webui_login_view=2,lock_state_controller=2,webui_screen_locker=2,screen_locker=2 --whitelisted-extension-id=behllobkkfkfnphdnhnkndlbkcpglgmj --enable-crash-reporter=,unknown --noerrdialogs --user-data-dir=/home/chronos --homedir=/ --enable-experimental-extension-apis --whitelisted-extension-id=behllobkkfkfnphdnhnkndlbkcpglgmj --enable-logging --log-level=1 --vmodule=*arc/*=1,nss_cert_database_chromeos=1,*/assistant/*=1,*chromeos/login/*=1,auto_enrollment_controller=1,*/ui/ozone/*=1,*/ui/display/manager/chromeos/*=1,*night_light*=1,update_engine=1,component_updater_service=1,power_button_observer=2,webui_login_view=2,lock_state_controller=2,webui_screen_locker=2,screen_locker=2 --service-request-channel-token=11450729603940349271 --shared-files=v8_natives_data:100,v8_snapshot_data:101

I've seen something like this before. Chrome on Linux and Chrome OS does something to mangle its command line, replacing the per-switch NUL terminators with a command line consisting of a string with spaces. See  issue 772099 .

Dan, do you have any ideas here?

 
Cc: agawronska@chromium.org sammiequon@chromium.org
 Issue 891842  has been merged into this issue.
Cc: hidehiko@chromium.org vapier@chromium.org
http://g/chromeos-chatty-eng/YAAkxrVjxe8 may be relevant.

In the meantime, are you okay with updating the test to (also?) match the truncated version, James?
we're tracking the rework in issue 887875.  might be worth spending time in addressing that rather than pushing the hacks around some more.
Cc: sky@chromium.org
Labels: OS-Chrome
+sky

fwiw, i bisected to https://chromium-review.googlesource.com/c/chromium/src/+/1239371, though from the comments it seems the real bug lies somewhere else.

It seems to fail pretty consistently on caroline, tricky and minnie now.
As James says, this is likely because of  issue 772099 .
I'm OK with modifying the test to look for "--mash*". I spent a while last year trying to sort out a similar problem and failed to figure it out.

sammiequon Re #4 - is it possible that CL changed the command line length, which might affect where truncations happened?

My patch shouldn't have impacted CommandLine at all.
Re #7: I was in the middle of a bisect before the other bug got merged, so i thought i finish it and post my findings in case it helps. I didn't dig too deeper since it seemed like this was under control, but happy to take a closer look if needed!
Project Member

Comment 11 by bugdroid1@chromium.org, Oct 5

The following revision refers to this bug:
  https://chromium.googlesource.com/chromiumos/platform/tast-tests/+/f95075b921b5041b81dc53cdae911c5bfb83239e

commit f95075b921b5041b81dc53cdae911c5bfb83239e
Author: James Cook <jamescook@chromium.org>
Date: Fri Oct 05 06:00:17 2018

tast-tests: Use arg prefix for mash processes in ui.MashLogin

Chrome sometimes truncates arguments in its command line.
Instead of looking for "mash-service-name", just look for
"mash-". The command line problem is issue 887875

BUG= chromium:891470 ,chromium:887875
TEST=this is the test

Change-Id: I812ffd147ec36148cd6185b73510dfef40b70135
Reviewed-on: https://chromium-review.googlesource.com/c/1260054
Tested-by: James Cook <jamescook@chromium.org>
Trybot-Ready: James Cook <jamescook@chromium.org>
Reviewed-by: Dan Erat <derat@chromium.org>
Reviewed-by: James Cook <jamescook@chromium.org>

[modify] https://crrev.com/f95075b921b5041b81dc53cdae911c5bfb83239e/src/chromiumos/tast/local/bundles/cros/ui/mash_login.go

One of the runs after this landed passed, but a couple runs have been failing, just showing missing "--mash-s" instead of "--mash-service-name"
Labels: -Pri-3 Pri-1
Owner: jamescook@chromium.org
Status: Started (was: Untriaged)
Yeah, I think this is still a real problem. Here's a recent failure:

https://stainless.corp.google.com/search?exclude_retried=true&first_date=2018-10-03&master_builder_name=&builder_name_number=&shard=&exclude_acts=true&builder_name=-tot-chrom.*-pfq-informational%24&master_builder_name_number=&owner=&retry=&exclude_cts=false&exclude_non_production=false&hostname=&board=%5Etricky%24&test=%5Etast%5C..*Mash&suite=&build=%5ER71%5C-11129%5C.0%5C.0%5C-b3009867%24&status=FAIL&status=ERROR&status=ABORT&reason=&waterfall=&exclude_not_run=false&last_date=2018-10-05&exclude_non_release=false&exclude_au=true&model=&view=list

There are a variety of .dmp files here, including:

     │   ├── chromium-ash-minidump-1c1d1f24b5b199f4.dmp.txt [0 Bytes]
     │   ├── cryptohome.14135.dmp.txt [6.6 kB]
     │   ├── cryptohomed.871.dmp.txt [41.4 kB]
     │   └── tcsd.834.dmp.txt [9.1 kB]

I don't know what chromium-ash-minidump is, nor tcsd.

Hypothesis: Some other regression came in while dealing with the truncated cmd line problem, and the ash process is crashing. The weird thing is the ash process should show as a chrome crash (it's a chrome utility process.)

Dan, do you know what any of those other processes are?

/var/log/messages shows a chrome crash:

2018-10-05T15:08:26.797141+00:00 WARNING crash_reporter[17047]: [user] Received crash notification for chrome[16579] sig 11, user 1000 group 1000 (ignoring call by kernel - chrome crash; waiting for chrome to call us directly)

Tast logs show a failure at roughly the same time:

2018/10/05 08:08:11 Started test ui.MashLogin
2018/10/05 08:08:11 [08:08:10.241] Restarting ui job
2018/10/05 08:08:13 [08:08:12.175] Waiting for org.chromium.SessionManager D-Bus service
2018/10/05 08:08:13 [08:08:12.210] Asking session_manager to enable Chrome testing
2018/10/05 08:08:13 [08:08:12.216] Waiting for Chrome to write its debugging port to /home/chronos/DevToolsActivePort
2018/10/05 08:08:15 [08:08:14.479] Checking cryptohomed service
2018/10/05 08:08:15 [08:08:14.480] Removing cryptohome for testuser@gmail.com
2018/10/05 08:08:15 [08:08:14.580] Finding OOBE DevTools target
2018/10/05 08:08:15 [08:08:14.733] Connecting to Chrome at ws://127.0.0.1:37769/devtools/page/9990A5696EF6134F3263BD1F6464367E
2018/10/05 08:08:16 [08:08:15.005] Waiting for OOBE
2018/10/05 08:08:21 [08:08:19.987] Logging in as user "testuser@gmail.com"
2018/10/05 08:08:21 [08:08:20.129] Waiting for cryptohome for user "testuser@gmail.com"
2018/10/05 08:08:22 [08:08:21.125] Waiting for OOBE to be dismissed
2018/10/05 08:08:30 [08:08:28.296] Error at mash_login.go:66: No chrome process containing "mash-s"

I see a surprising number of renderers running (22), but no utility processes, which makes me think the ash+ws process isn't running.

tast symbolize on the chromium-ash-minidump shows:

Crash reason:  SIGSEGV /SEGV_MAPERR
Crash address: 0x2b8
Process uptime: not available

Thread 0 (crashed)
 0  chrome!cc::SurfaceLayer::SetFallbackSurfaceId(viz::SurfaceId const&) [optional.h : 612 + 0x0]
    rax = 0x1638a1f361277500   rdx = 0x00003ee409a3ca00
    rcx = 0x0000000000000000   rbx = 0x00003ee409941770
    rsi = 0x00007ffd3c3d8d20   rdi = 0x0000000000000000
    rbp = 0x00007ffd3c3d8c40   rsp = 0x00007ffd3c3d8b40
     r8 = 0x00003ee4099f5380    r9 = 0x0000000000000000
    r10 = 0x0000000000000003   r11 = 0x00003ee40b3661e0
    r12 = 0x00003ee408c335a0   r13 = 0x0000000000000000
    r14 = 0x00007ffd3c3d8d20   r15 = 0x00007ffd3c3d8d20
    rip = 0x00007f79269a1707
    Found by: given as instruction pointer in context
 1  chrome!ui::Layer::SetFallbackSurfaceId(viz::SurfaceId const&) [layer.cc : 805 + 0x5]
    rbx = 0x00003ee409941770   rbp = 0x00007ffd3c3d8c70
    rsp = 0x00007ffd3c3d8c50   r12 = 0x00003ee408c335a0
    r13 = 0x00003ee408c97180   r14 = 0x00003ee40931a500
    r15 = 0x00007ffd3c3d8d20   rip = 0x00007f7923b5552c
    Found by: call frame info
 2  chrome!ws::ClientRoot::OnFirstSurfaceActivation(viz::SurfaceInfo const&) [client_surface_embedder.cc : 49 + 0x8]
    rbx = 0x00003ee409941770   rbp = 0x00007ffd3c3d8ca0
    rsp = 0x00007ffd3c3d8c80   r12 = 0x00003ee408c335a0
    r13 = 0x00003ee408c97180   r14 = 0x00003ee409941700
    r15 = 0x00007ffd3c3d8d20   rip = 0x00007f7924d2352a
    Found by: call frame info
 3  chrome!viz::mojom::FrameSinkManagerClientStubDispatch::Accept(viz::mojom::FrameSinkManagerClient*, mojo::Message*) [frame_sink_manager.mojom.cc : 2205 + 0x6]
    rbx = 0x00003ee40a821368   rbp = 0x00007ffd3c3d8eb0
    rsp = 0x00007ffd3c3d8cb0   r12 = 0x00003ee408c335a0
    r13 = 0x00003ee408c97180   r14 = 0x00007ffd3c3d8f20
    r15 = 0x00007ffd3c3d8f20   rip = 0x00007f79201dcd7d
    Found by: call frame info
 4  chrome!mojo::InterfaceEndpointClient::HandleIncomingMessageThunk::Accept(mojo::Message*) [interface_endpoint_client.cc : 423 + 0x6]
    rbx = 0x0000000000000000   rbp = 0x00007ffd3c3d8f00
    rsp = 0x00007ffd3c3d8ec0   r12 = 0x00003ee408c97140
    r13 = 0x00003ee408c97180   r14 = 0x00007ffd3c3d8f20
    r15 = 0x00007ffd3c3d90d8   rip = 0x00007f79224a95e0
    Found by: call frame info
 5  chrome!mojo::internal::MultiplexRouter::ProcessIncomingMessage(mojo::internal::MultiplexRouter::MessageWrapper*, mojo::internal::MultiplexRouter::ClientCallBehavior, base::SequencedTaskRunner*) [interface_endpoint_client.cc : 306 + 0xb]
...

sky has already landed a fix in https://chromium.googlesource.com/chromium/src/+/12ef71460252b042f52da8eddc26adf641e2d50b so this should go away shortly

Thank you for looking into this James!
#13: I don't understand Autotest's logic for determining which crashes and logs to collect. The files under tast/results/crashes are the ones that the tast process collects, and they're just the ones that appeared between when testing started and ended.
Status: Fixed (was: Started)
Test is green on the builders now.

Status: Assigned (was: Fixed)
I still see a few "No chrome process containing "mash-s" among [...]" failures, oddly both on scarlet-arcnext:

http://stainless/browse/chromeos-autotest-results/247580820-chromeos-test/
http://stainless/browse/chromeos-autotest-results/247795193-chromeos-test/

James, any ideas?
Blocking: 847901
@derat - This is the same command-line truncation problem. For example, one process contains "--message-loop-type-ui --mash e=ash". I think that should be "--mash-service-name=ash" but the middle got eaten.

Maybe scarlet-arcnext has additional command-line switches that cause truncation to happen more.

We really need a fix for the command-line issue 887875, which is currently unassigned. I tried to fix it a few months ago, but my low-level POSIX-fu is weak.

In the mean time, do you think I should just skip the part of the test that looks for the command line switch?

Skipping the check seems like the only approach for now until the command line is fixed.
Project Member

Comment 23 by bugdroid1@chromium.org, Oct 17

The following revision refers to this bug:
  https://chromium.googlesource.com/chromiumos/platform/tast-tests/+/ae7903f4abf6c3e4eb554f11effb282e00e3e466

commit ae7903f4abf6c3e4eb554f11effb282e00e3e466
Author: James Cook <jamescook@chromium.org>
Date: Wed Oct 17 16:07:14 2018

tast-tests: Don't parse command line in ui.MashLogin

This avoids flaky failures due to command line arguments sometimes
being truncated by Chrome's implementation of setproctitle().

BUG= chromium:891470 
TEST=Tast ui.MashLogin

Change-Id: I911961ea09a4a8c48db70c1d25d8f6449948f26f
Reviewed-on: https://chromium-review.googlesource.com/c/1281682
Reviewed-by: Dan Erat <derat@chromium.org>
Tested-by: James Cook <jamescook@chromium.org>

[modify] https://crrev.com/ae7903f4abf6c3e4eb554f11effb282e00e3e466/src/chromiumos/tast/local/bundles/cros/ui/mash_login.go

Status: Fixed (was: Assigned)

Sign in to add a comment