New issue
Advanced search Search tips
Note: Color blocks (like or ) mean that a user may not be available. Tooltip shows the reason.

Issue 777905 link

Starred by 6 users

Issue metadata

Status: Fixed
Owner:
Closed: Jan 2018
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: Chrome
Pri: 1
Type: Bug

Blocked on:
issue 733875

Blocking:
issue 776512



Sign in to add a comment

chrome has no oobe UI on VMs

Project Member Reported by achuith@chromium.org, Oct 24 2017

Issue description

I believe this started with https://chromium-review.googlesource.com/c/chromium/src/+/726247

Enter the sdk and download a VM:
achuith@achuith:~/code/chrome$ cros chrome-sdk --log-level=debug --download-vm --internal --board=amd64-generic --clear-sdk-cache

I get version 10041.

Start the vm:
(sdk amd64-generic R64-10041.0.0-b19954) achuith@achuith ~/code/chrome $ cros_vm --start

Restart ui:
(sdk amd64-generic R64-10041.0.0-b19954) achuith@achuith ~/code/chrome $ cros_vm --cmd "restart ui"

Chrome is up, but there's no oobe UI. You can see the clock and some shelf parts. See screenshot.

I did a chrome bisect, and now I'm pretty sure this is a chromeos problem. We would need to walk back revisions from 10041 to identify the chromeos CL.

While telemetry 'fake' login works because chrome is still responsive, this is causing failures in telemetry GAIA login, which can't find the webview, which causes a bunch of other failures, such as cheets tests, power_LoadTest, hotrod tests, etc.

I haven't checked to see if this is also happening on devices.

I'm at the end of my day here in MUC, so assigning to the current gardener. 
 
Screenshot from 2017-10-24 18:34:13.png
7.4 KB View Download
Blocking: 777541
Blocking: 776512

Comment 3 Deleted

Comment 4 Deleted

If this is a relatively recent it could be a duplicate of issue 777250.
Very strange. I went back to chrome on Oct 17 and Oct 10, and was still able to repo this, which is what convinced me that this was a cros issue. Maybe I was doing something wrong. I'll check with TOT chrome.
I tried with TOT chrome (with Hashimoto's revert), and this is still happening, so I think this is an independent issue, and I think probably still a cros issue
Cc: ihf@chromium.org
Cc: jinsong@chromium.org mruthven@chromium.org akahuang@chromium.org
I did a bisect based on version and it looks like the first failing version is 10003.0.0. Looks like this was around Oct 4. I believe the builder is amd64-generic-full: 
https://uberchromegw.corp.google.com/i/chromiumos/builders/amd64-generic-full?numbuilds=200

Looks like the build we're interested in has scrolled off the bottom :(

I've added the cros sheriffs who may be able to help?

There was a 3 week break where LKGM was not updated (Sep 25 to Oct 18), and these failures are a result of the update.
I updated 10002.0.0 with TOT chrome, and it works, so I feel pretty certain that this is a chromeos issue.
We have a build from the amd64-chromium-pfq, which updates less frequently than amd64-generic-full. Unfortunately there was a 2 day gap, so the list of chromeos CLs is a bit large, but it should be one of these:
https://uberchromegw.corp.google.com/i/chromeos/builders/amd64-generic-chromium-pfq/builds/10689

Cc: marc...@chromium.org
The failing change should be in:
https://crosland.corp.google.com/log/10002.0.0..10003.0.0

Maybe one of the mesa changes?
Cc: malaykeshav@chromium.org
Owner: warx@chromium.org
*ping*
Owner: steve...@chromium.org
Assigning to current gardener.
Cc: norvez@chromium.org
If this isn't a chrome issue then we really need to find a cros owner, or someone on the graphcis team if this is a mesa issue.

Do we have a test that can detect the failure that we can use to bisect? Can we write one?

That looks like a combination of:
- removal of cirrus
- outdated qemu (2.0) in Goobuntu
See discussion at the end of Issue 710629

I tried with qemu-2.10 and replacing '-vga cirrus' with '-vga virtio' and amd64-generic started and I could go through OOBE
Thanks for digging into this norvez@, are you OK with owning this?

Also, for my edification / reference, could you elaborate on 'replacing '-vga cirrus' with '-vga virtio' ?

FWIW, this is as far as I was able to get investigating the issue (I haven't had to run a VM from the chroot in a very long time):

(cr) ~/trunk/src/scripts $ ./image_to_vm.sh --board=${BOARD} --test_image

(cr) ~/trunk/src/scripts $ ./bin/cros_start_vm --image_path=../build/images/${BOARD}/latest/chromiumos_qemu_image.bin
INFO    : QEMU binary: /mnt/host/source/chroot/usr/bin/qemu-system-x86_64
INFO    : QEMU version: QEMU emulator version 2.6.0, Copyright (c) 2003-2008 Fabrice Bellard
Starting a KVM instance
INFO    : Launching: /mnt/host/source/chroot/usr/bin/qemu-system-x86_64 -enable-kvm -m 2G -smp 4 -vga cirrus -pidfile /tmp/kvm.139280.pid -chardev pipe,id=control_pipe,path=/tmp/kvm.139280.monitor -serial file:/tmp/kvm.139280.serial -mon chardev=control_pipe -daemonize -net nic,model=virtio,vlan=9222 -net user,hostfwd=tcp:127.0.0.1:9222-:22,vlan=9222 -drive file=../build/images/amd64-generic/latest/chromiumos_qemu_image.bin,index=0,media=disk,cache=unsafe
qemu-system-x86_64: -net user,hostfwd=tcp:127.0.0.1:9222-:22,vlan=9222: could not set up host forwarding rule 'tcp:127.0.0.1:9222-:22'
qemu-system-x86_64: -net user,hostfwd=tcp:127.0.0.1:9222-:22,vlan=9222: Device 'user' could not be initialized

For the 3 questions:

1. Err, I don't think I'm the right owner. OOBE looks fine in amd64-generic+10090.0.0. My understanding of Issue 777541 is that the bug still happens on M63 on panther/guado, seems unrelated to VMs. Not sure who would be a good owner either, sorry!

2. If you look at the qemu command line that's generate by cros_start_vm in your example (and by cros_vm in simplechrome), there's a '-vga cirrus' argument in there. This makes the graphics stack use cirrus instead of virtio_gpu, and support for cirrus has recently been removed (which triggered the discussion at the end of Issue 710629).

Note, I think that if you pass '--board=amd64-generic' to cros_start_vm it will actually use '-vga virtio' instead of '-vga cirrus' automatically, that the intent behind cs/chromeos_public/src/scripts/lib/cros_vm_lib.sh?sq&l=169

3. Do you already have another instance of qemu running that is also redirecting port 9222 to the guest? If you stop it you should be able to start the new instance
Blockedon: 733875
I can confirm that like this problem goes away with qemu 2.6.0 and virtio, which makes fixing 733875 a pretty high priority. 

Nicolas - how did you get qemu-2.10 on your system? This could at least be a work-around while we sort out how to create a downloadable package with the chrome sdk.
Blocking: -777541
It's built from source, not a suitable workaround I'm afraid.
Cc: akes...@chromium.org steve...@chromium.org
Owner: abodenha@chromium.org
-> abodenha@ to figure out who "owns" VM support for Chrome OS.

This needs to include working with Infra to ensure that we have the proper versions of qemu (and its dependencies) installed on any builders that need to run VM tests, and that these packagaes are available to devs on supported environments.

We have been working to get CrOS tests running on a VM on the chromium waterfall via Simple Chrome for a very long time, and I believe that we are finally close. Having that break because we did not coordinate with Chrome Infra to ensure that the chromium builders will support a change to the VM requirements would be a huge step backwards.

+akeshet@ since this is related to CrOS Infra.

Yup, I'm building it myself, but that's not going to work for most devs. And I can use the qemu that's in chroot/usr/bin/qemu-system-x86_64, but that's only going to work for chromium-os devs. We need the VM to be available for chromium/telemetry devs without having to download/setup a chroot.
We have a revert here that I'd prefer to be landed:
https://bugs.chromium.org/p/chromium/issues/detail?id=710629#c37
This is not just about this revert. The qemu that you find outside of the chroot can't reliably connect to the network on boot (see https://bugs.chromium.org/p/chromium/issues/detail?id=748634 ).

Overall you are trying to build a new qemu setup, different from the one used in Chrome OS which is the problem here. That sort of divergence is not sustainable, for example the network issue above will introduce flakiness, and so will the cirrus driver's numerous bugs. We certainly don't have headcount to duplicate that sort of effort.
Owner: achuith@chromium.org
achuith@, this seems like something we need to figure out before we can move forward on the vmtests effort. It's really beyond the scope of anything gardeners should handle. I hate to throw this back in your lap, but I don't see other options.

Can you pull together the right people once you're back in the office next week and try to work out a sustainable solution here?
Components: UI>Shell>OOBE
Status: Fixed (was: Assigned)
We have a workaround.

Sign in to add a comment