New issue
Advanced search Search tips
Note: Color blocks (like or ) mean that a user may not be available. Tooltip shows the reason.

Issue 703219 link

Starred by 3 users

Issue metadata

Status: Verified
Owner:
Last visit 16 days ago
Closed: Mar 2017
Cc:
EstimatedDays: ----
NextAction: ----
OS: Chrome
Pri: 1
Type: Bug



Sign in to add a comment

PFQ Failures in android.core.tests.libcore.package.harmony_java_math

Project Member Reported by steve...@chromium.org, Mar 20 2017

Issue description

First failure on cyan informational:
https://uberchromegw.corp.google.com/i/chromeos.chrome/builders/cyan-tot-chrome-pfq-informational/builds/1944

Failure message:

03/16 20:37:48.813 WARNI|              test:0615| The test failed with the following exception
Traceback (most recent call last):
  File "/build/cyan/usr/local/build/autotest/client/common_lib/test.py", line 609, in _exec
    _call_test_function(self.execute, *p_args, **p_dargs)
  File "/build/cyan/usr/local/build/autotest/client/common_lib/test.py", line 817, in _call_test_function
    return func(*args, **dargs)
  File "/build/cyan/usr/local/build/autotest/client/common_lib/test.py", line 470, in execute
    dargs)
  File "/build/cyan/usr/local/build/autotest/client/common_lib/test.py", line 347, in _call_run_once_with_retry
    postprocess_profiled_run, args, dargs)
  File "/build/cyan/usr/local/build/autotest/client/common_lib/test.py", line 380, in _call_run_once
    self.run_once(*args, **dargs)
  File "/build/cyan/usr/local/build/autotest/server/site_tests/cheets_CTS/cheets_CTS.py", line 320, in run_once
    self._ready_arc()
  File "/build/cyan/usr/local/build/autotest/server/cros/tradefed_test.py", line 367, in _ready_arc
    self._connect_adb()
  File "/build/cyan/usr/local/build/autotest/server/cros/tradefed_test.py", line 323, in _connect_adb
    sleep_interval=_ADB_POLLING_INTERVAL_SECONDS)
  File "/build/cyan/usr/local/build/autotest/client/common_lib/site_utils.py", line 971, in poll_for_condition
    raise exception
TestFail: Error: Failed to set up adb connection

 
Cc: yusukes@chromium.org norvez@chromium.org
There is also a string of messages that look like:

03/16 20:36:32.779 INFO |     tradefed_test:0260| adb connect 127.0.0.1:9228:
* daemon not running. starting it now on port 5037 *
* daemon started successfully *
connected to 127.0.0.1:9228

03/16 20:36:32.831 INFO |     tradefed_test:0265| adb devices:
List of devices attached
127.0.0.1:9228	offline


03/16 20:36:33.885 INFO |     tradefed_test:0260| adb connect 127.0.0.1:9228:
already connected to 127.0.0.1:9228

03/16 20:36:33.937 INFO |     tradefed_test:0265| adb devices:
List of devices attached
127.0.0.1:9228	offline


Note: build #1943 failed in BuildPackages so the regression could also have been introduced there:
https://uberchromegw.corp.google.com/i/chromeos.chrome/builders/cyan-tot-chrome-pfq-informational/builds/1943

Components: Platform>ARC
Labels: -Pri-3 Build-PFQ-Failures OS-Chrome Pri-1

Comment 5 by norvez@chromium.org, Mar 20 2017

Issue 702814 has been merged into this issue.
Cc: lpique@chromium.org

Comment 7 by norvez@chromium.org, Mar 20 2017

Not sure about that particular build. The build_packages failure seems to be triggered by this CL: https://chromium-review.googlesource.com/c/456579/ (or the one it's reverting, not sure) and it seems to have been fixed right after.

But yes, it's around that time, https://uberchromegw.corp.google.com/i/chromeos/builders/veyron_minnie-chrome-pfq/builds/905 was the first PFQ to fail.
The last good one https://uberchromegw.corp.google.com/i/chromeos/builders/veyron_minnie-chrome-pfq/builds/904 started a bit after https://uberchromegw.corp.google.com/i/chromeos.chrome/builders/cyan-tot-chrome-pfq-informational/builds/1944 which is the first one to fail the same way as the PFQ.
Cc: jdufault@chromium.org
Right. We use the -informational builders to help narrow the range of changes.

Unfortunately in this case because of the (presumably) unrelated BuildPackages failure, we do not know whether the failure was introduced in 1943 or 1944.

Hopefully someone more knowledgeable about Arc can look through the changes and identify a likely culprit.

Comment 9 by norvez@chromium.org, Mar 20 2017

Cc: abhishekbh@chromium.org
I wonder about this one: https://codereview.chromium.org/2756443003
Since the ARC++ tests start Chrome without a real owner (as far as I understand it), that may cause problems?
Cc: cernekee@chromium.org
re comment#9: That seems like a reasonable guess.

I also found this in arc-network.log (and confirmed that it didn't occur in the most recent successful run):

[FATAL:arc_ip_config.cc(246)] Could not look up routing table ID in /opt/google/containers/android/rootfs/android-data/data/misc/net/rt_tables
/usr/lib64/libbase-core-395517.so(base::debug::StackTrace::StackTrace()+0x13) [0x74c5f63f6d13]

[FATAL:manager.cc(184)] Subprocess 3893 exited unexpectedly
/usr/lib64/libbase-core-395517.so(base::debug::StackTrace::StackTrace()+0x13) [0x7e086d2f9d13]

https://codereview.chromium.org/2749853003 looks a bit suspicious too since it possibly messes with Chromes/ARC++ communications.

I'm trying to reproduce locally, I'll update the bug when I have more info.
Hey norvez@ did you succeed at locally reproducing the bug ? if so do you mind sharing your steps? 
Yes, I downloaded 2 VM images (from a successful and a failing build) and I'm running them locally in qemu. ARC++ is also failing to start with the failing build.
Still looking, I'm seeing this in logcat:

ArcAuthService: Requesting ARC bridge proxy
<snip lots of messages>
03-20 19:15:29.019   667   667 E ArcProvisioning: Failed to connect to ArcAuthService
03-20 19:15:29.019   667   667 E ArcProvisioning: java.util.concurrent.TimeoutException
03-20 19:15:29.019   667   667 E ArcProvisioning: 	at org.chromium.arc.DeadlineCallbackNotifier$2.onFinish(DeadlineCallbackNotifier.java:108)
03-20 19:15:29.019   667   667 E ArcProvisioning: 	at android.os.CountDownTimer$1.handleMessage(CountDownTimer.java:127)
03-20 19:15:29.019   667   667 E ArcProvisioning: 	at android.os.Handler.dispatchMessage(Handler.java:102)
03-20 19:15:29.019   667   667 E ArcProvisioning: 	at android.os.Looper.loop(Looper.java:148)
03-20 19:15:29.019   667   667 E ArcProvisioning: 	at android.app.ActivityThread.main(ActivityThread.java:5607)
03-20 19:15:29.019   667   667 E ArcProvisioning: 	at java.lang.reflect.Method.invoke(Native Method)
03-20 19:15:29.019   667   667 E ArcProvisioning: 	at com.android.internal.os.ZygoteInit$MethodAndArgsCaller.run(ZygoteInit.java:730)
03-20 19:15:29.019   667   667 E ArcProvisioning: 	at com.android.internal.os.ZygoteInit.main(ZygoteInit.java:620)
03-20 19:15:29.020   667   667 E ArcAppLauncherService: Provisioning failed
03-20 19:15:29.020   667   667 E ArcAppLauncherService: java.util.concurrent.TimeoutException
03-20 19:15:29.020   667   667 E ArcAppLauncherService: 	at org.chromium.arc.DeadlineCallbackNotifier$2.onFinish(DeadlineCallbackNotifier.java:108)
03-20 19:15:29.020   667   667 E ArcAppLauncherService: 	at android.os.CountDownTimer$1.handleMessage(CountDownTimer.java:127)
03-20 19:15:29.020   667   667 E ArcAppLauncherService: 	at android.os.Handler.dispatchMessage(Handler.java:102)
03-20 19:15:29.020   667   667 E ArcAppLauncherService: 	at android.os.Looper.loop(Looper.java:148)
03-20 19:15:29.020   667   667 E ArcAppLauncherService: 	at android.app.ActivityThread.main(ActivityThread.java:5607)
03-20 19:15:29.020   667   667 E ArcAppLauncherService: 	at java.lang.reflect.Method.invoke(Native Method)
03-20 19:15:29.020   667   667 E ArcAppLauncherService: 	at com.android.internal.os.ZygoteInit$MethodAndArgsCaller.run(ZygoteInit.java:730)
03-20 19:15:29.020   667   667 E ArcAppLauncherService: 	at com.android.internal.os.ZygoteInit.main(ZygoteInit.java:620)
03-20 19:15:29.022   667   667 I ArcAppLauncherService: Stopping service...


That's the big difference I can see between the passing and failing builds.

Cc: jcivelli@chromium.org
Right, I think it's this one: https://codereview.chromium.org/2749853003

I tried reverting it locally and Android is starting now.
The Mojo channel protocol (legacy or new) is now decided dynamically, this is probably what's causing the failure.
You need to call mojo::edk::Channel::SetUseLegacyTransportProtocol(true) before calling mojo::edk:Init.
Sorry for not giving you guys notice of this change before it broke stuff.

Cc: uekawa@chromium.org
Revert here -> https://codereview.chromium.org/2765673002/

Thanks guys.
Cc: nya@chromium.org
Merge of revert to 3047 branch:  https://codereview.chromium.org/2765963002

Status: Verified (was: Assigned)
PFQ turned green last night, closing the bug.

Sign in to add a comment