New issue
Advanced search Search tips
Note: Color blocks (like or ) mean that a user may not be available. Tooltip shows the reason.

Issue 703708 link

Starred by 2 users

Issue metadata

Status: Fixed
Owner:
Closed: Mar 2017
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: Android
Pri: 1
Type: Bug-Regression

Blocked on:
issue 704260



Sign in to add a comment

DCHECK(!g_launcher_delgate) failing on Android tablet testers

Project Member Reported by joh...@chromium.org, Mar 21 2017

Issue description

The following DCHECK:
    DCHECK(!g_launcher_delgate)
in content/public/test/test_launcher.cc's LaunchTests function started flakily consistently on Android tablet test bots.

Regression range: https://crrev.com/458066..458116?pretty=fuller

See failures:
https://test-results.appspot.com/dashboards/flakiness_dashboard.html#testType=content_browsertests&builder=chromium.android%3ALollipop%20Tablet%20Tester
(about 80 DCHECK failures per run of content_browsertests)
and
https://test-results.appspot.com/dashboards/flakiness_dashboard.html#testType=content_browsertests&builder=chromium.android%3AKitKat%20Tablet%20Tester
(about 12 DCHECK failures per run of content_browsertests)

though it seems other test suites are also affected in addition to content_browsertests.

At first glance g_launcher_delgate seems to be uninitialized, but in fact C++ guarantees that variables with static storage duration are zero-initialized (just in case, I've landed https://codereview.chromium.org/2762173002).

So the only remaining conclusion seems to be that the |main| function (the only caller of LaunchTests) is getting called twice!

Bo found that apparently the java code starts its own gpu and renderer process before calling into the native main launcher, and the test launcher main isn't really the entrypoint, it actually gets called from the following stack:

Stack Trace:
  RELADDR   FUNCTION                                                                   FILE:LINE
  0076d90b  base::debug::StackTrace::StackTrace()+10                                   /b/c/b/Android_arm_Builder__dbg_/src/base/debug/stack_trace.cc:199
  00780b1f  logging::LogMessage::~LogMessage()+234                                     /b/c/b/Android_arm_Builder__dbg_/src/base/logging.cc:537
  0231104b  content::LaunchTests(content::TestLauncherDelegate*, int, int, char**)+90  /b/c/b/Android_arm_Builder__dbg_/src/content/public/test/test_launcher.cc:487
  0035b099  main+48                                                                    /b/c/b/Android_arm_Builder__dbg_/src/content/test/content_test_launcher.cc:131
  v------>  RunTests                                                                   /b/c/b/Android_arm_Builder__dbg_/src/testing/android/native_test/native_test_launcher.cc:127
  00758843  Java_org_chromium_native_1test_NativeTest_nativeRunTests+598               /b/c/b/Android_arm_Builder__dbg_/src/out/Debug/gen/testing/android/native_test/native_test_jni_headers/testing/jni/NativeTest_jni.h:54
  009d1529  <unknown>                                                                  /data/dalvik-cache/arm/data@app@org.chromium.content_browsertests_apk-1@base.apk@classes.dex

jcivelli, it seems you worked on how main gets called recently in https://codereview.chromium.org/2611323002 - any insights into what could be going on here?
 
Owner: jcivelli@chromium.org
Not sure of what's going on yet but I'll investigate.

Comment 2 by boliu@chromium.org, Mar 21 2017

I tried running all content_browsertests on L nexus 7 (same as bot) and keeping logcat. It was running for ~90 minutes before cubemate said it device making noise and had to turn it off. But no repro of this in that 90 minutes looking at the logs. This was a component release build with DCHECKs on.

Comment 3 by boliu@chromium.org, Mar 21 2017

Both the kitkat and lollipop tablet tester had a purple build before content_browsertests starts failing consistently:
https://build.chromium.org/p/chromium.android/builders/Lollipop%20Tablet%20Tester/builds/7233
https://build.chromium.org/p/chromium.android/builders/KitKat%20Tablet%20Tester/builds/7048

Kitkat one finished all tests, and got interrupted at the symbolize logcat step.

Not sure if this means anything or not..
#3: I think the interruption was something that happened to the chromium.android master and is unrelated. (All of the builders on chromium.android were also interrupted around the same time.)
Owner: ----
I thought it was related to my previous multiprocess test work, but it isn't, so unmarking myself as the owner.

I am not entirely clear on how the Python script launches the tests, but could there be a case where a process gets reused? (like we had happen for services)
Meaning the test activity starts and ends, but one is started again right after and the framework decides to reuse the existing process, which is fine for the Java code (I think when we saw that the classes were loaded in a different class loader in that case for services), but not for the native code that has all its previous states.

Comment 6 by boliu@chromium.org, Mar 21 2017

> I am not entirely clear on how the Python script launches the tests, but could there be a case where a process gets reused?

no evidence of reuse from logs

Comment 7 by boliu@chromium.org, Mar 21 2017

So blamelist on kk bot is more trustworthy, since the purple run had all green tests and no DCHECK in logs. Looking through that:

1) https://chromium.googlesource.com/chromium/src/+/4ac59a56d54826dbdfc5b17b2e8875863065b0b8
My refactor CL that should only move code. Should have no behavior change.

2) https://chromium.googlesource.com/chromium/src/+/c7a0947f9803da749b8f498252a5118e9b08d9ad
My CL to enable a process launcher experiment that was forgotten. Yes behavior change.

3) https://chromium.googlesource.com/chromium/src/+/43cc0fba21372346c981396d719443b05e7a2a0c
[build/android] Fix device.RunShellCommand usages

4) https://chromium.googlesource.com/chromium/src/+/df14e3f7dd968466a056279eed4839ae9a900f18
Fix null argument to base::GetProcId in RenderProcessHostImpl::CreateMessageFilters.

5) https://chromium.googlesource.com/chromium/src/+/8a58df54d52b781b9628e7adee559157a8602150
PlzNavigate: send SourceLocation when mixed content is found
I wonder if this is specific to unswarmed bots rather than specific to tablets?

Comment 9 by boliu@chromium.org, Mar 22 2017

tried the exact build config bot uses, still no repro when running the specific failing tests :/

Can I blame this on bad device or something? How are these devices configured exactly (because I learned the hard way they don't have wifi/network, but anything else?)
Unlikely to be a bad device, especially with this popping up on both the L and K tablets. The devices get set up w/ //third_party/catapult/devil/devil/android/tools/provision_devices.py. I'm wondering if something is getting left over from a previous test or if the provisioning logic between that script and swarming has drifted in some consequential way.
Owner: jbudorick@chromium.org
Status: Assigned (was: Untriaged)
possibly related to https://bugs.chromium.org/p/chromium/issues/detail?id=704260
Status: Fixed (was: Assigned)
This is fixed as of https://build.chromium.org/p/chromium.android/builders/Lollipop%20Tablet%20Tester/builds/7259

In particular, that build included https://codereview.chromium.org/2773543003 (a7f56a486070ad1881202f894e8296918f287fb5 @{#458957}) which is the patch from  issue 704260 , so the catapult changes may well have been the cause. Thanks!
Blockedon: 704260
When I say fixed, I mean the flakiness is fixed. We never tracked down why DCHECK(!g_launcher_delgate) was failing, and that might still be worthwhile.

Sign in to add a comment