Issue metadata
Sign in to add a comment
|
DCHECK(!g_launcher_delgate) failing on Android tablet testers |
||||||||||||||||||||||
Issue description
The following DCHECK:
DCHECK(!g_launcher_delgate)
in content/public/test/test_launcher.cc's LaunchTests function started flakily consistently on Android tablet test bots.
Regression range: https://crrev.com/458066..458116?pretty=fuller
See failures:
https://test-results.appspot.com/dashboards/flakiness_dashboard.html#testType=content_browsertests&builder=chromium.android%3ALollipop%20Tablet%20Tester
(about 80 DCHECK failures per run of content_browsertests)
and
https://test-results.appspot.com/dashboards/flakiness_dashboard.html#testType=content_browsertests&builder=chromium.android%3AKitKat%20Tablet%20Tester
(about 12 DCHECK failures per run of content_browsertests)
though it seems other test suites are also affected in addition to content_browsertests.
At first glance g_launcher_delgate seems to be uninitialized, but in fact C++ guarantees that variables with static storage duration are zero-initialized (just in case, I've landed https://codereview.chromium.org/2762173002).
So the only remaining conclusion seems to be that the |main| function (the only caller of LaunchTests) is getting called twice!
Bo found that apparently the java code starts its own gpu and renderer process before calling into the native main launcher, and the test launcher main isn't really the entrypoint, it actually gets called from the following stack:
Stack Trace:
RELADDR FUNCTION FILE:LINE
0076d90b base::debug::StackTrace::StackTrace()+10 /b/c/b/Android_arm_Builder__dbg_/src/base/debug/stack_trace.cc:199
00780b1f logging::LogMessage::~LogMessage()+234 /b/c/b/Android_arm_Builder__dbg_/src/base/logging.cc:537
0231104b content::LaunchTests(content::TestLauncherDelegate*, int, int, char**)+90 /b/c/b/Android_arm_Builder__dbg_/src/content/public/test/test_launcher.cc:487
0035b099 main+48 /b/c/b/Android_arm_Builder__dbg_/src/content/test/content_test_launcher.cc:131
v------> RunTests /b/c/b/Android_arm_Builder__dbg_/src/testing/android/native_test/native_test_launcher.cc:127
00758843 Java_org_chromium_native_1test_NativeTest_nativeRunTests+598 /b/c/b/Android_arm_Builder__dbg_/src/out/Debug/gen/testing/android/native_test/native_test_jni_headers/testing/jni/NativeTest_jni.h:54
009d1529 <unknown> /data/dalvik-cache/arm/data@app@org.chromium.content_browsertests_apk-1@base.apk@classes.dex
jcivelli, it seems you worked on how main gets called recently in https://codereview.chromium.org/2611323002 - any insights into what could be going on here?
,
Mar 21 2017
I tried running all content_browsertests on L nexus 7 (same as bot) and keeping logcat. It was running for ~90 minutes before cubemate said it device making noise and had to turn it off. But no repro of this in that 90 minutes looking at the logs. This was a component release build with DCHECKs on.
,
Mar 21 2017
Both the kitkat and lollipop tablet tester had a purple build before content_browsertests starts failing consistently: https://build.chromium.org/p/chromium.android/builders/Lollipop%20Tablet%20Tester/builds/7233 https://build.chromium.org/p/chromium.android/builders/KitKat%20Tablet%20Tester/builds/7048 Kitkat one finished all tests, and got interrupted at the symbolize logcat step. Not sure if this means anything or not..
,
Mar 21 2017
#3: I think the interruption was something that happened to the chromium.android master and is unrelated. (All of the builders on chromium.android were also interrupted around the same time.)
,
Mar 21 2017
I thought it was related to my previous multiprocess test work, but it isn't, so unmarking myself as the owner. I am not entirely clear on how the Python script launches the tests, but could there be a case where a process gets reused? (like we had happen for services) Meaning the test activity starts and ends, but one is started again right after and the framework decides to reuse the existing process, which is fine for the Java code (I think when we saw that the classes were loaded in a different class loader in that case for services), but not for the native code that has all its previous states.
,
Mar 21 2017
> I am not entirely clear on how the Python script launches the tests, but could there be a case where a process gets reused? no evidence of reuse from logs
,
Mar 21 2017
So blamelist on kk bot is more trustworthy, since the purple run had all green tests and no DCHECK in logs. Looking through that: 1) https://chromium.googlesource.com/chromium/src/+/4ac59a56d54826dbdfc5b17b2e8875863065b0b8 My refactor CL that should only move code. Should have no behavior change. 2) https://chromium.googlesource.com/chromium/src/+/c7a0947f9803da749b8f498252a5118e9b08d9ad My CL to enable a process launcher experiment that was forgotten. Yes behavior change. 3) https://chromium.googlesource.com/chromium/src/+/43cc0fba21372346c981396d719443b05e7a2a0c [build/android] Fix device.RunShellCommand usages 4) https://chromium.googlesource.com/chromium/src/+/df14e3f7dd968466a056279eed4839ae9a900f18 Fix null argument to base::GetProcId in RenderProcessHostImpl::CreateMessageFilters. 5) https://chromium.googlesource.com/chromium/src/+/8a58df54d52b781b9628e7adee559157a8602150 PlzNavigate: send SourceLocation when mixed content is found
,
Mar 22 2017
I wonder if this is specific to unswarmed bots rather than specific to tablets?
,
Mar 22 2017
tried the exact build config bot uses, still no repro when running the specific failing tests :/ Can I blame this on bad device or something? How are these devices configured exactly (because I learned the hard way they don't have wifi/network, but anything else?)
,
Mar 22 2017
Unlikely to be a bad device, especially with this popping up on both the L and K tablets. The devices get set up w/ //third_party/catapult/devil/devil/android/tools/provision_devices.py. I'm wondering if something is getting left over from a previous test or if the provisioning logic between that script and swarming has drifted in some consequential way.
,
Mar 22 2017
possibly related to https://bugs.chromium.org/p/chromium/issues/detail?id=704260
,
Mar 23 2017
This is fixed as of https://build.chromium.org/p/chromium.android/builders/Lollipop%20Tablet%20Tester/builds/7259 In particular, that build included https://codereview.chromium.org/2773543003 (a7f56a486070ad1881202f894e8296918f287fb5 @{#458957}) which is the patch from issue 704260 , so the catapult changes may well have been the cause. Thanks!
,
Mar 23 2017
,
Mar 23 2017
When I say fixed, I mean the flakiness is fixed. We never tracked down why DCHECK(!g_launcher_delgate) was failing, and that might still be worthwhile. |
|||||||||||||||||||||||
►
Sign in to add a comment |
|||||||||||||||||||||||
Comment 1 by jcivelli@chromium.org
, Mar 21 2017