New issue
Advanced search Search tips
Note: Color blocks (like or ) mean that a user may not be available. Tooltip shows the reason.

Issue 842927 link

Starred by 1 user

Issue metadata

Status: Fixed
Owner:
Closed: Jun 2018
Cc:
EstimatedDays: ----
NextAction: ----
OS: Chrome
Pri: 1
Type: Bug-Regression



Sign in to add a comment

cheets_GTS.GtsAdminTestCases failing on eve-tot-chrome-pfq-informational

Project Member Reported by derat@chromium.org, May 14 2018

Issue description

In HWTest [bvt-arc] on eve-tot-chrome-pfq-informational, cheets_GTS.GtsAdminTestCases failed: https://ci.chromium.org/p/chromeos/builds/b8946537013290657920

I don't see anything meaningful in the logs:

05/14 12:28:37.477 ERROR|             utils:0282| [stderr] cat: /tmp/sysinfo/autoserv-MMoFrk/.checksum: No such file or directory
05/14 12:29:25.956 ERROR|             utils:0282| [stderr] SELinux: Loaded file_contexts contexts from /file_contexts.bin.
05/14 12:30:40.473 ERROR|             utils:0282| [stderr] SELinux: Loaded file_contexts contexts from /file_contexts.bin.
05/14 12:31:40.128 ERROR|             utils:0282| [stderr] SELinux: Loaded file_contexts contexts from /file_contexts.bin.
05/14 12:32:22.186 ERROR|             utils:0282| [stderr] cat: /tmp/sysinfo/autoserv-MMoFrk/.checksum: No such file or directory
05/14 12:32:46.860 ERROR|   logging_manager:0626| tko parser: {'cidb_build_stage_id': 79688286, 'builds': "{'cros-version': 'eve-tot-chrome-pfq-informational/R68-10678.0.0-b2569696'}", 'job_started': 1526326068, 'hostname': 'chromeos6-row4-rack11-host3', 'cidb_build_id': 2569696, 'status_version': 1, 'label': 'eve-tot-chrome-pfq-informational/R68-10678.0.0-b2569696/bvt-arc/cheets_GTS.5.1_r4.GtsAdminTestCases', 'parent_job_id': 200094324, 'drone': 'cros-full-0004.mtv.corp.google.com', 'user': 'chromeos-test', 'suite': 'bvt-arc', 'job_queued': 1526321990, 'experimental': 'False', 'build': 'eve-tot-chrome-pfq-informational/R68-10678.0.0-b2569696'}
05/14 12:32:46.861 ERROR|   logging_manager:0626| tko parser: MACHINE NAME: chromeos6-row4-rack11-host3
05/14 12:32:46.861 ERROR|   logging_manager:0626| tko parser: MACHINE GROUP: eve
05/14 12:32:46.861 ERROR|   logging_manager:0626| tko parser: parsing partial test ---- SERVER_JOB
05/14 12:32:46.861 ERROR|   logging_manager:0626| tko parser: parsing partial test cheets_GTS.GtsAdminTestCases cheets_GTS.GtsAdminTestCases
05/14 12:32:46.862 ERROR|   logging_manager:0626| tko parser: RUNNING: RUNNING
05/14 12:32:46.862 ERROR|   logging_manager:0626| Subdir: cheets_GTS.GtsAdminTestCases
05/14 12:32:46.862 ERROR|   logging_manager:0626| Testname: cheets_GTS.GtsAdminTestCases
05/14 12:32:46.862 ERROR|   logging_manager:0626| 
05/14 12:32:46.862 ERROR|   logging_manager:0626| tko parser: update RUNNING reason: Failed: after 2 retries giving up. passed=2, failed=0, waived=0, notexec>=1. run(p=2, f=0, w=0) retry(p=2, f=0, w=0) retry(p=2, f=0, w=0)
05/14 12:32:46.862 ERROR|   logging_manager:0626| tko parser: parsing test cheets_GTS.GtsAdminTestCases cheets_GTS.GtsAdminTestCases
05/14 12:32:46.863 ERROR|   logging_manager:0626| tko parser: ADD: FAIL
05/14 12:32:46.863 ERROR|   logging_manager:0626| Subdir: cheets_GTS.GtsAdminTestCases
05/14 12:32:46.863 ERROR|   logging_manager:0626| Testname: cheets_GTS.GtsAdminTestCases
05/14 12:32:46.863 ERROR|   logging_manager:0626| Failed: after 2 retries giving up. passed=2, failed=0, waived=0, notexec>=1. run(p=2, f=0, w=0) retry(p=2, f=0, w=0) retry(p=2, f=0, w=0)
05/14 12:32:46.863 ERROR|   logging_manager:0626| tko parser: parsing test ---- SERVER_JOB

I guess it's saying that one or more tests weren't run?
 

Comment 1 by kinaba@chromium.org, May 14 2018

Cc: yusukes@chromium.org
Owner: kinaba@chromium.org
Non-PST ARC constable started looking.

> I guess it's saying that one or more tests weren't run?
Yes, that's what the error message says. Often this is due to Android or Chrome crash...

Comment 2 by kinaba@chromium.org, May 15 2018

In the Android log, the test process is crashing in:

05-14 12:32:02.325  2071  2071 E AndroidRuntime: FATAL EXCEPTION: main
05-14 12:32:02.325  2071  2071 E AndroidRuntime: Process: com.google.android.admin.gts, PID: 2071
05-14 12:32:02.325  2071  2071 E AndroidRuntime: java.lang.NullPointerException: Attempt to invoke virtual method 'android.content.res.Configuration android.content.res.Resources.getConfiguration()' on a null object reference
05-14 12:32:02.325  2071  2071 E AndroidRuntime: 	at android.app.ActivityThread.updateLocaleListFromAppContext(ActivityThread.java:5204)
05-14 12:32:02.325  2071  2071 E AndroidRuntime: 	at android.app.ActivityThread.handleBindApplication(ActivityThread.java:5441)
05-14 12:32:02.325  2071  2071 E AndroidRuntime: 	at android.app.ActivityThread.-wrap2(ActivityThread.java)
05-14 12:32:02.325  2071  2071 E AndroidRuntime: 	at android.app.ActivityThread$H.handleMessage(ActivityThread.java:1595)
05-14 12:32:02.325  2071  2071 E AndroidRuntime: 	at android.os.Handler.dispatchMessage(Handler.java:102)
05-14 12:32:02.325  2071  2071 E AndroidRuntime: 	at android.os.Looper.loop(Looper.java:154)
05-14 12:32:02.325  2071  2071 E AndroidRuntime: 	at android.app.ActivityThread.main(ActivityThread.java:6320)
05-14 12:32:02.325  2071  2071 E AndroidRuntime: 	at java.lang.reflect.Method.invoke(Native Method)
05-14 12:32:02.325  2071  2071 E AndroidRuntime: 	at com.android.internal.os.ZygoteInit$MethodAndArgsCaller.run(ZygoteInit.java:891)
05-14 12:32:02.325  2071  2071 E AndroidRuntime: 	at com.android.internal.os.ZygoteInit.main(ZygoteInit.java:781)
05-14 12:32:02.326  2071  2071 I Process : Sending signal. PID: 2071 SIG: 9

Comment 3 by kinaba@chromium.org, May 15 2018

Looks like an Android issue not specific to ARC++/ChromeOS..?
b/issues?q=updateLocaleListFromAppContext%20status:open
b/63333966


But that said, we seemed to have mitigated the flakiness by retries (orange cells) in the past.
https://stainless.corp.google.com/search?view=matrix&row=build&col=test&first_date=2018-04-15&last_date=2018-05-15&test=cheets_GTS.5.*Gts%28Admin%7CNet%7CPlacement%29TestCases&board=%5Eeve%24&status=GOOD&status=WARN&status=FAIL&status=ERROR&exclude_cts=false&exclude_not_run=false&exclude_non_release=false&exclude_au=true&exclude_acts=true&exclude_retried=true&exclude_non_production=false
Recently, even after the retries the failures tend to alive more.

Comment 4 by kinaba@chromium.org, May 15 2018

Owner: yusukes@chromium.org
(passing to the PST constable. I have no progress since #3)
Looked at the logs, no conclusion yet.

Our crash/ server (ChromeOS_ARC category) does have crashes at updateLocaleListFromAppContext() as I wrote in b/63333966#comment16 but I'm not sure if this particular GTS failure shares the same root cause with the crashes because GTS test failures seem to happen more often than crashes in crash/.

The root cause of the upstream bug b/63333966 seems also unknown but the log from GTS does not match the one in b/63333966#comment8 at least. It's hard to say if this is the same as upstream's.

The logcat log when the test fails is really strange. Before the FATAL EXCEPTION, it has

05-14 12:32:02.310   168   178 I ActivityManager: Force stopping com.google.android.admin.gts appid=10058 user=0: start instr
05-14 12:32:02.321   168   178 I ActivityManager: Start proc 2071:com.google.android.admin.gts/u0a58 for added application com.google.android.admin.gts
05-14 12:32:02.324  2071  2071 W System  : ClassLoader referenced unknown path: /data/app/com.google.android.admin.gts-1/base.apk
05-14 12:32:02.324  2071  2071 W System  : ClassLoader referenced unknown path: /data/app/com.google.android.admin.gts-1/lib/x86
05-14 12:32:02.324  2071  2071 W linker  : Warning: unable to resolve "/data/app/com.google.android.admin.gts-1/base.apk": No such file or directory
05-14 12:32:02.325  2071  2071 W linker  : Warning: unable to resolve "/data/app/com.google.android.admin.gts-1/base.apk": No such file or directory
05-14 12:32:02.325  2071  2071 W art     : JIT profile information will not be recorded: profile file does not exits.
05-14 12:32:02.325  2071  2071 W asset   : Asset path /data/app/com.google.android.admin.gts-1/base.apk is neither a directory nor file (type=1).
05-14 12:32:02.325  2071  2071 E ResourcesManager: failed to add asset path /data/app/com.google.android.admin.gts-1/base.apk
05-14 12:32:02.325  2071  2071 D AndroidRuntime: Shutting down VM
05-14 12:32:02.325  2071  2071 E AndroidRuntime: FATAL EXCEPTION: main

and the linker line is from Bionic.

        if (realpath(zip_path.c_str(), resolved_path) == nullptr) {
          DL_WARN("Warning: unable to resolve \"%s\": %s", zip_path.c_str(), strerror(errno));
          continue;
        }

This means that the base.apk file does not exist at all at this point. Just in case, I searched for SELinux denials, but found none.

I'm comparing two logs (test failure vs success) and one thing I've noticed is when it fails, the log only shows dex2oat for com.google.android.admin.gts for isa=x86:

05-14 12:32:01.388   168   194 I PackageManager.DexOptimizer: Running dexopt (dex2oat) on: /data/app/vmdl91353407.tmp/base.apk pkg=com.google.android.admin.gts isa=x86 vmSafeMode=false debuggable=false target-filter=interpret-only oatDir = /data/app/vmdl91353407.tmp/oat sharedLibraries=null
05-14 12:32:01.396  2032  2032 I dex2oat : /system/bin/dex2oat --no-digest-validation --compiler-filter=interpret-only
05-14 12:32:01.570  2032  2036 W dex2oat : Class android.support.test.rule.logging.AtraceLogger failed lock verification and will run slower.
05-14 12:32:01.570  2032  2036 W dex2oat : Common causes for lock verification issues are non-optimized dex code
05-14 12:32:01.570  2032  2036 W dex2oat : and incorrect proguard optimizations.
05-14 12:32:01.766  2032  2032 I dex2oat : dex2oat took 370.840ms (threads: 4) arena alloc=17KB (18352B) java alloc=2MB (2987816B) native alloc=7MB (8337248B) free=3MB (3197088B)
05-14 12:32:01.769   168   194 I BackgroundDexOptService: onBootPhase
05-14 12:32:01.769   168   194 W SELinux : SELinux: setxattr restorecon_last failed (expected): /data/app/com.google.android.admin.gts-1: Operation not permitted.
05-14 12:32:01.805    62    62 D installd: Detected label change from u:object_r:system_data_file:s0 to u:object_r:app_data_file:s0:c512,c768 at /data/data/com.google.android.admin.gts; running recursive restorecon
05-14 12:32:01.806    62    62 D installd: Detected label change from u:object_r:system_data_file:s0 to u:object_r:app_data_file:s0:c512,c768 at /data/user_de/0/com.google.android.admin.gts; running recursive restorecon
05-14 12:32:01.806   168   194 V BackupManagerService: restoreAtInstall pkg=com.google.android.admin.gts token=4 restoreSet=0
05-14 12:32:01.806   168   194 V BackupManagerService: Finishing install immediately
05-14 12:32:01.806   168   194 I art     : Starting a blocking GC Explicit
05-14 12:32:01.815   168   194 I art     : Explicit concurrent mark sweep GC freed 5889(462KB) AllocSpace objects, 5(104KB) LOS objects, 33% free, 7MB/11MB, paused 167us total 8.941ms
05-14 12:32:01.816    62    62 E         : Couldn't opendir /data/app/vmdl91353407.tmp: No such file or directory              // this line seems benign btw.

When it succeeds, dex2oat runs for com.google.android.admin.gts for isa=x86_64 happens first, then start com.google.android.admin.gts, then dex2oat runs for isa=x86.


What I'm probably going to do next is to check if the test fails if I limit the isa to only x86, and if it actually fails, investigate why sometimes only x86 is used for dex2oat. (but if kinaba@ has other ideas, please do.)

x86 vs x86_64 was not related to the issue. The bad case only tried x86 simply because it was a retry and tradefed explicitly excluded x86_64 variant from the test.

I checked the first trial's log instead, and it did use x86_64 and still failed in the same way. I again compared good and bad logcat logs, but the only first obvious difference is that the bad case had the following:

 System  : ClassLoader referenced unknown path: /data/app/com.google.android.admin.gts-1/base.apk

which basically said base.apk was neither a directory or file (maybe nonexistent.)

My best guess is that the code for renaming /data/app/vmdl*.tpm to /data/app/com.google.android.admin.gts-1 was not called for some reason??? This is the code and unless DEBUG_INSTALL is enabled, nothing is logged here:

            // PackageManagerService.java

	    if (DEBUG_INSTALL) Slog.d(TAG, "Renaming " + beforeCodeFile + " to " + afterCodeFile);
            try {
                Os.rename(beforeCodeFile.getAbsolutePath(), afterCodeFile.getAbsolutePath());
            } catch (ErrnoException e) {
                Slog.w(TAG, "Failed to rename", e);
                return false;
            }

kinaba@, I have two questions:

1) Can I temporarily enable the debug log above? (will send a CL shortly)
2) According to the host side logcat log, P's tradefed command is used for this rather than N's. Is that expected?



Comment 7 Deleted

Landed: ag/4067574

Comment 9 by kinaba@chromium.org, May 17 2018

Answered offline, but for record:
Re: (2), GTS is built from the latest release branch (now O MR1) targeting all the existing branches. So, newer GTS running on N is expected/intended.
Labels: ArcConstable
10688.0.0 has the logging patch. Waiting for the gts test to fail...
Cc: -domlasko...@chromium.org -hashimoto@chromium.org levarum@chromium.org
Owner: nya@chromium.org
Unfortunately, #8 didn't tell us anything.

https://pantheon.corp.google.com/storage/browser/chromeos-autotest-results/200963879-chromeos-test this failure for example, has the log (== /data/app/app-name-1/ was created) but right after that, the log says /data/app/app-name-1/base.apk is missing. It's unclear what happened in between.

I already reverted #8 since it was useless. I guess we need even more logging to know what's going on...
I sent ag/4112534 to you (nya@).

nya@, I'm reassigning this to you, next week's constable. Can you take a look? I think this is the most important cheets auto test flake at this point.

Comment 13 by nya@chromium.org, May 21 2018

Cc: nya@chromium.org
Owner: levarum@chromium.org
Sorry, I don't have any news about this issue. Assigning to PST constable.

Added some more logging at ag/4112534. Please revert once this is fixed (or the logs are useless.)

Comment 15 by nya@chromium.org, May 24 2018

Recent GTS crashes left different logs:

https://storage.cloud.google.com/chromeos-autotest-results/202442445-chromeos-test/chromeos6-row4-rack11-host7/cheets_GTS.GtsNetTestCases/results/android-gts/logs/2018.05.23_21.28.13/inv_564407203681777484/device_logcat_setup_7538128073790608880.txt.gz

05-23 21:27:54.221    75    75 F libc    : Fatal signal 6 (SIGABRT), code -6 in tid 75 (system_server)
05-23 21:27:54.221     6     6 I chatty  : uid=0(root) /system/bin/debuggerd64 expire 1 line
05-23 21:27:54.226   160   160 I chatty  : uid=0(root) expire 1 line
05-23 21:27:54.277   160   160 F DEBUG   : *** *** *** *** *** *** *** *** *** *** *** *** *** *** *** ***
05-23 21:27:54.277   160   160 F DEBUG   : Build fingerprint: 'google/eve/eve_cheets:7.1.1/R68-10714.0.0/4799854:user/release-keys'
05-23 21:27:54.277   160   160 F DEBUG   : Revision: '0'
05-23 21:27:54.277   160   160 F DEBUG   : ABI: 'x86_64'
05-23 21:27:54.277   160   160 F DEBUG   : pid: 75, tid: 75, name: system_server  >>> system_server <<<
05-23 21:27:54.277   160   160 F DEBUG   : signal 6 (SIGABRT), code -6 (SI_TKILL), fault addr --------
05-23 21:27:54.278   160   160 F DEBUG   : Abort message: 'art/runtime/gc/space/image_space.cc:256] Unable to find a relocated version of image file /system/framework/boot.art'
05-23 21:27:54.278   160   160 F DEBUG   :     rax 0000000000000000  rbx 00007fed1b3a7be8  rcx 00007fed1623a587  rdx 0000000000000006
05-23 21:27:54.278   160   160 F DEBUG   :     rsi 000000000000004b  rdi 000000000000004b
05-23 21:27:54.278   160   160 F DEBUG   :     r8  ffffffffffffffc8  r9  000000000026c92e  r10 0000000000000008  r11 0000000000000206
05-23 21:27:54.278   160   160 F DEBUG   :     r12 000000000000004b  r13 0000000000000006  r14 00007fed0b999fd0  r15 00007fed15c95a00
05-23 21:27:54.278   160   160 F DEBUG   :     cs  0000000000000033  ss  000000000000002b
05-23 21:27:54.278   160   160 F DEBUG   :     rip 00007fed1623a587  rbp 000000000000000d  rsp 00007ffc2e140978  eflags 0000000000000206
05-23 21:27:54.321    60    60 I ServiceManager: Waiting for service media.camera.proxy...
05-23 21:27:54.421    60    60 I ServiceManager: Waiting for service media.camera.proxy...
05-23 21:27:54.522    60    60 I ServiceManager: Waiting for service media.camera.proxy...
05-23 21:27:54.596   160   160 F DEBUG   : 
05-23 21:27:54.596   160   160 F DEBUG   : backtrace:
05-23 21:27:54.596   160   160 F DEBUG   :     #00 pc 000000000008d587  /system/lib64/libc.so (tgkill+7)
05-23 21:27:54.597   160   160 F DEBUG   :     #01 pc 000000000008a041  /system/lib64/libc.so (pthread_kill+65)
05-23 21:27:54.597   160   160 F DEBUG   :     #02 pc 0000000000030201  /system/lib64/libc.so (raise+17)
05-23 21:27:54.597   160   160 F DEBUG   :     #03 pc 000000000002887d  /system/lib64/libc.so (abort+77)
05-23 21:27:54.597   160   160 F DEBUG   :     #04 pc 00000000004ed81a  /system/lib64/libart.so (_ZN3art7Runtime5AbortEPKc+474)
05-23 21:27:54.597   160   160 F DEBUG   :     #05 pc 00000000001509d9  /system/lib64/libart.so (_ZN3art10LogMessageD1Ev+1785)
05-23 21:27:54.597   160   160 F DEBUG   :     #06 pc 00000000002ad4a3  /system/lib64/libart.so (_ZN3art2gc5space10ImageSpace20ReadImageHeaderOrDieEPKcNS_14InstructionSetE+147)
05-23 21:27:54.597   160   160 F DEBUG   :     #07 pc 0000000000460f73  /system/lib64/libart.so (_ZN3art16OatFileAssistant12GetImageInfoEv+451)
05-23 21:27:54.597   160   160 F DEBUG   :     #08 pc 000000000045ff52  /system/lib64/libart.so (_ZN3art16OatFileAssistant23GivenOatFileIsOutOfDateERKNS_7OatFileE+722)
05-23 21:27:54.597   160   160 F DEBUG   :     #09 pc 00000000004603d4  /system/lib64/libart.so (_ZN3art16OatFileAssistant22GivenOatFileIsUpToDateERKNS_7OatFileE+36)
05-23 21:27:54.597   160   160 F DEBUG   :     #10 pc 000000000045aa88  /system/lib64/libart.so (_ZN3art16OatFileAssistant15GetDexOptNeededENS_14CompilerFilter6FilterE+344)
05-23 21:27:54.597   160   160 F DEBUG   :     #11 pc 0000000000421303  /system/lib64/libart.so (_ZN3artL23DexFile_getDexOptNeededEP7_JNIEnvP7_jclassP8_jstringS5_S5_h+435)
05-23 21:27:54.597   160   160 F DEBUG   :     #12 pc 0000000000857160  /system/framework/x86_64/boot-core-libart.oat (offset 0x666000) (int dalvik.system.DexFile.getDexOptNeeded(java.lang.String, java.lang.String, java.lang.String, boolean)+268)
05-23 21:27:54.597   160   160 F DEBUG   :     #13 pc 00000000018f2f0c  /system/framework/oat/x86_64/services.odex (offset 0x1083000) (com.android.server.pm.PackageManagerService.<init>+6632)
05-23 21:27:54.597   160   160 F DEBUG   :     #14 pc 000000000121fefe  /system/framework/oat/x86_64/services.odex (offset 0x1083000) (com.android.server.SystemServer.startBootstrapServices+1274)
05-23 21:27:54.597   160   160 F DEBUG   :     #15 pc 000000000121f637  /system/framework/oat/x86_64/services.odex (offset 0x1083000) (com.android.server.SystemServer.run+1347)
05-23 21:27:54.597   160   160 F DEBUG   :     #16 pc 000000000121eae7  /system/framework/oat/x86_64/services.odex (offset 0x1083000) (com.android.server.SystemServer.main+83)
05-23 21:27:54.597   160   160 F DEBUG   :     #17 pc 000000000013c016  /system/lib64/libart.so (art_quick_invoke_static_stub+806)
05-23 21:27:54.597   160   160 F DEBUG   :     #18 pc 0000000000149edc  /system/lib64/libart.so (_ZN3art9ArtMethod6InvokeEPNS_6ThreadEPjjPNS_6JValueEPKc+268)
05-23 21:27:54.597   160   160 F DEBUG   :     #19 pc 00000000004e66c7  /system/lib64/libart.so (_ZN3artL18InvokeWithArgArrayERKNS_33ScopedObjectAccessAlreadyRunnableEPNS_9ArtMethodEPNS_8ArgArrayEPNS_6JValueEPKc+87)
05-23 21:27:54.597   160   160 F DEBUG   :     #20 pc 00000000004e87b4  /system/lib64/libart.so (_ZN3art12InvokeMethodERKNS_33ScopedObjectAccessAlreadyRunnableEP8_jobjectS4_S4_m+1444)
05-23 21:27:54.597   160   160 F DEBUG   :     #21 pc 0000000000448cf8  /system/lib64/libart.so (_ZN3artL13Method_invokeEP7_JNIEnvP8_jobjectS3_S3_+56)
05-23 21:27:54.597   160   160 F DEBUG   :     #22 pc 0000000000813527  /system/framework/x86_64/boot.oat (offset 0x73b000) (java.lang.Object java.lang.reflect.Method.invoke(java.lang.Object, java.lang.Object[])+227)
05-23 21:27:54.597   160   160 F DEBUG   :     #23 pc 0000000002fb0faa  /system/framework/x86_64/boot-framework.oat (offset 0x1fcb000) (void com.android.internal.os.ZygoteInit$MethodAndArgsCaller.run()+134)
05-23 21:27:54.597   160   160 F DEBUG   :     #24 pc 0000000002fb219f  /system/framework/x86_64/boot-framework.oat (offset 0x1fcb000) (void com.android.internal.os.ZygoteInit.main(java.lang.String[])+1355)
05-23 21:27:54.597   160   160 F DEBUG   :     #25 pc 000000000013c016  /system/lib64/libart.so (art_quick_invoke_static_stub+806)
05-23 21:27:54.597   160   160 F DEBUG   :     #26 pc 0000000000149edc  /system/lib64/libart.so (_ZN3art9ArtMethod6InvokeEPNS_6ThreadEPjjPNS_6JValueEPKc+268)
05-23 21:27:54.597   160   160 F DEBUG   :     #27 pc 00000000004e66c7  /system/lib64/libart.so (_ZN3artL18InvokeWithArgArrayERKNS_33ScopedObjectAccessAlreadyRunnableEPNS_9ArtMethodEPNS_8ArgArrayEPNS_6JValueEPKc+87)
05-23 21:27:54.597   160   160 F DEBUG   :     #28 pc 00000000004e6274  /system/lib64/libart.so (_ZN3art17InvokeWithVarArgsERKNS_33ScopedObjectAccessAlreadyRunnableEP8_jobjectP10_jmethodIDP13__va_list_tag+388)
05-23 21:27:54.597   160   160 F DEBUG   :     #29 pc 00000000003e288d  /system/lib64/libart.so (_ZN3art3JNI21CallStaticVoidMethodVEP7_JNIEnvP7_jclassP10_jmethodIDP13__va_list_tag+781)
05-23 21:27:54.597   160   160 F DEBUG   :     #30 pc 00000000000ac1d2  /system/lib64/libandroid_runtime.so (_ZN7_JNIEnv20CallStaticVoidMethodEP7_jclassP10_jmethodIDz+146)
05-23 21:27:54.597   160   160 F DEBUG   :     #31 pc 00000000000af039  /system/lib64/libandroid_runtime.so (_ZN7android14AndroidRuntime5startEPKcRKNS_6VectorINS_7String8EEEb+665)
05-23 21:27:54.597   160   160 F DEBUG   :     #32 pc 00000000000027f7  /system/bin/app_process64
05-23 21:27:54.597   160   160 F DEBUG   :     #33 pc 000000000001c9d4  /system/lib64/libc.so (__libc_init+84)
05-23 21:27:54.597   160   160 F DEBUG   :     #34 pc 000000000000209f  /system/bin/app_process64
05-23 21:27:54.597   160   160 F DEBUG   :     #35 pc 0000000000000005  <unknown>



Owner: kinaba@chromium.org
Recent run still fails like comment#2:

https://stainless.corp.google.com/browse/chromeos-autotest-results/203643264-chromeos-test/

One interesting thing is that the test runs 3 ABIs, and in the failed run the first one (x86_64) pass, and the other two (x86, armeabi-v7) fail.

=========================================

5/29 03:10:21.896 DEBUG|             utils:0283| [stdout] 05-29 03:10:21 I/CompatibilityTest: ========================================
05/29 03:10:21.896 DEBUG|             utils:0283| [stdout] 05-29 03:10:21 I/CompatibilityTest: Starting a run with 3 unique modules.
05/29 03:10:21.897 DEBUG|             utils:0283| [stdout] 05-29 03:10:21 I/CompatibilityTest: ========================================
05/29 03:10:21.898 DEBUG|             utils:0283| [stdout] 05-29 03:10:21 I/ModuleRepo: chromeos6-row4-rack11-host13:22 running 3 test sub-modules, expected to complete in 3m 0s.
05/29 03:10:21.900 DEBUG|             utils:0283| [stdout] 05-29 03:10:21 I/CompatibilityTest: Starting 3 test sub-modules on chromeos6-row4-rack11-host13:22
05/29 03:10:24.469 DEBUG|             utils:0283| [stdout] 05-29 03:10:24 I/ConsoleReporter: [chromeos6-row4-rack11-host13:22] Starting x86_64 GtsAdminTestCases with 2 tests
05/29 03:10:24.473 DEBUG|             utils:0283| [stdout] 05-29 03:10:24 I/ConsoleReporter: [1/2 x86_64 GtsAdminTestCases chromeos6-row4-rack11-host13:22] com.google.android.admin.gts.EnterprisePrivacySettingsTest#testEnterprisePrivacySettings pass
05/29 03:10:24.475 DEBUG|             utils:0283| [stdout] 05-29 03:10:24 I/ConsoleReporter: [2/2 x86_64 GtsAdminTestCases chromeos6-row4-rack11-host13:22] com.google.android.admin.gts.ProfileOwnerTest#testSetProfileOwnerActivityExists pass
05/29 03:10:24.476 DEBUG|             utils:0283| [stdout] 05-29 03:10:24 I/ConsoleReporter: [chromeos6-row4-rack11-host13:22] x86_64 GtsAdminTestCases completed in 8 ms. 2 passed, 0 failed, 0 not executed
05/29 03:10:24.891 DEBUG|             utils:0283| [stdout] 05-29 03:10:24 W/CompatibilityTest: Inaccurate runtime hint for x86_64 GtsAdminTestCases, expected 1m 0s was 2s
05/29 03:10:25.544 DEBUG|             utils:0283| [stdout] 05-29 03:10:25 D/KeyguardStatusChecker: KeyguardControllerState is not supported by the device. Skipping System Checker.
05/29 03:10:28.082 DEBUG|             utils:0283| [stdout] 05-29 03:10:28 W/InstrumentationTest: Run failure Instrumentation run failed due to 'Process crashed.' when collecting tests to run for com.google.android.admin.gts on device chromeos6-row4-rack11-host13:22.
05/29 03:10:28.410 DEBUG|             utils:0283| [stdout] 05-29 03:10:28 I/ConsoleReporter: [chromeos6-row4-rack11-host13:22] Starting x86 GtsAdminTestCases with 0 test
05/29 03:10:28.411 DEBUG|             utils:0283| [stdout] 05-29 03:10:28 I/ConsoleReporter: [chromeos6-row4-rack11-host13:22] Instrumentation run failed due to 'Process crashed.'
05/29 03:10:28.411 DEBUG|             utils:0283| [stdout] 05-29 03:10:28 I/ConsoleReporter: [chromeos6-row4-rack11-host13:22] x86 GtsAdminTestCases completed in 0 ms. 0 passed, 0 failed, 0 not executed
05/29 03:10:29.344 DEBUG|             utils:0283| [stdout] 05-29 03:10:29 W/CompatibilityTest: Inaccurate runtime hint for x86 GtsAdminTestCases, expected 1m 0s was 3s
05/29 03:10:29.943 DEBUG|             utils:0283| [stdout] 05-29 03:10:29 D/KeyguardStatusChecker: KeyguardControllerState is not supported by the device. Skipping System Checker.
05/29 03:10:32.346 DEBUG|             utils:0283| [stdout] 05-29 03:10:32 W/InstrumentationTest: Run failure Instrumentation run failed due to 'Process crashed.' when collecting tests to run for com.google.android.admin.gts on device chromeos6-row4-rack11-host13:22.
05/29 03:10:32.630 DEBUG|             utils:0283| [stdout] 05-29 03:10:32 I/ConsoleReporter: [chromeos6-row4-rack11-host13:22] Starting armeabi-v7a GtsAdminTestCases with 0 test
05/29 03:10:32.630 DEBUG|             utils:0283| [stdout] 05-29 03:10:32 I/ConsoleReporter: [chromeos6-row4-rack11-host13:22] Instrumentation run failed due to 'Process crashed.'
05/29 03:10:32.631 DEBUG|             utils:0283| [stdout] 05-29 03:10:32 I/ConsoleReporter: [chromeos6-row4-rack11-host13:22] armeabi-v7a GtsAdminTestCases completed in 0 ms. 0 passed, 0 failed, 0 not executed

=============================================

It looks like sometimes the harness cannot replace the apk correctly (maybe some race?).

@kinaba, is there any known issues with running multiple ABIs in the same invocation?
Should we run them separately instead, using distinct gts-tradefed commands?
Otherwise, please reassign to current constable for further analysis.

Also we might consider running only primary/default abi until root cause is diagnosed to get rid of the flakiness


More observations (maybe irrelevant):

1. Failing run has many suspicious Finsky logs (good run doesn't have them):
05-29 03:12:31.227  1147  1147 W Finsky  : [1] com.google.android.finsky.application.d.cY(896): No account configured on this device

2. eve-tot-chrome-pfq-informational seems to fail much more often that other builders:

https://stainless.corp.google.com/search?view=list&first_date=2018-05-03&last_date=2018-05-30&builder_name=%5E.*eve.*%24&test=cheets_GTS.*GtsAdminTestCases&status=FAIL&status=ERROR&status=ABORT&exclude_cts=false&exclude_not_run=false&exclude_non_release=false&exclude_au=true&exclude_acts=true&exclude_retried=false&exclude_non_production=false

Maybe the flakiness depends on the specific hardware/network properties.



Cc: cmtm@chromium.org hidehiko@chromium.org
Cc: pwang@chromium.org
Sorry I just now noticed I got assigned.

> @kinaba, is there any known issues with running multiple ABIs in the same invocation?
> Should we run them separately instead, using distinct gts-tradefed commands?

I haven't heard any such issues... at least I haven't ever bitten by this kind of bug
on regularly running arc-gts suite. As you wrote, eve-tot-chrome-pfq-informational looks
much more affected. Weird.

Separating the GTS execution per ABI is technically possible, but GTS is usually not run
in that way... I mean, running multiple ABIs at once has to be a perfectly valid execution. 
If it failed, the test detected some badness correctly. I don't think we want to change the test to ignore bugs.

That said,
> Also we might consider running only primary/default abi until root cause is diagnosed to get rid of the flakiness

that sounds like a practical choice we need to consider. Is there any way to detect (from the test itself) from which suite the test is run?
Currently the GtsAdminTestCases is shared between bvt and arc-gts(-qual), where the latter is used for generating the certification of Android compatibility hence we cannot omit any supported ABI.
Looking at the dashboard:

https://stainless.corp.google.com/search?view=matrix&row=build&col=test&first_date=2018-05-04&last_date=2018-05-31&test=cheets_GTS.5.*r4.*Gts%28Admin%7CNet%7CPlacement%29TestCases&board=%5Eeve%24&status=GOOD&status=WARN&status=FAIL&status=ERROR&exclude_cts=false&exclude_not_run=false&exclude_non_release=false&exclude_au=true&exclude_acts=true&exclude_retried=true&exclude_non_production=false

(1) The failures are only seen on non-release builds (-b#######).
(2) Until R68-10655, retry made the test passing (=yellow cells). Since R68-10658 the failure started to persist.

The "retry" of (2) restarts the browser and recreates the ChromeOS user and hence Android /data, so to me it looked weird that any bad state kept beyond the retries.
@cmtm, I wonder if https://chromium-review.googlesource.com/1043488 in the regression range may explain it. The delayed timing of unmount may be leaving something unexpectedly.
#15 may be closer to the root cause.
Actually it is not a different logs.

All the failing instances are first crashing by #15 (at the container start up phase), and then crashing by #2 while running the GTS test. Passing runs show neither of them.

So probably the situation may probably be that #15 crashed art/system_server/whatever and left the container in a bad state that triggered #2.
And IIUC #15 is triggered when the checksum of
/system/framework/<image_isa>/boot.art
and
/data/dalvik-cache/<image_isa>/system@framework@boot.art
did not match, or the latter did not exist.

https://android.googlesource.com/platform/art/+/nougat-mr1-dev/runtime/gc/space/image_space.cc#274
arc-boot-continue.log comes with "Unexpected mount info".
Does it have something in common with  crbug.com/834479 ?

[0529/031154:INFO:arc_setup.cc(653)] Unmounting: /opt/google/containers/android/rootfs/android-data/data/dalvik-cache/arm
[0529/031154:INFO:arc_setup.cc(654)] Running arc_mounter_->UmountLazily(mount_path)...
[0529/031154:ERROR:arc_setup_util.cc(357)] Failed to lazy-umount /opt/google/containers/android/rootfs/android-data/data/dalvik-cache/arm: Invalid argument
[0529/031154:INFO:arc_setup.cc(654)] Ignoring failures: arc_mounter_->UmountLazily(mount_path)
[0529/031154:INFO:arc_setup.cc(653)] Unmounting: /opt/google/containers/android/rootfs/android-data/data/dalvik-cache/x86
[0529/031154:INFO:arc_setup.cc(654)] Running arc_mounter_->UmountLazily(mount_path)...
[0529/031154:ERROR:arc_setup.cc(242)] Unexpected mount info: 113 18 259:1 /unencrypted/art-data/dalvik-cache/x86 /opt/google/containers/android/rootfs/android-data/data/dalvik-cache/x86 rw,nosuid,nodev,noexec,noatime - ext4 /dev/nvme0n1p1 rw,seclabel,commit=600,data=ordered
[0529/031154:ERROR:arc_setup.cc(242)] Unexpected mount info: 114 18 259:1 /unencrypted/art-data/dalvik-cache/x86_64 /opt/google/containers/android/rootfs/android-data/data/dalvik-cache/x86_64 rw,nosuid,nodev,noexec,noatime - ext4 /dev/nvme0n1p1 rw,seclabel,commit=600,data=ordered
[0529/031154:ERROR:arc_setup.cc(242)] Unexpected mount info: 120 114 259:1 /unencrypted/art-data/dalvik-cache/x86_64 /opt/google/containers/android/rootfs/android-data/data/dalvik-cache/x86_64 rw,nosuid,nodev,noexec,noatime - ext4 /dev/nvme0n1p1 rw,seclabel,commit=600,data=ordered
[0529/031154:INFO:arc_setup.cc(653)] Unmounting: /opt/google/containers/android/rootfs/android-data/data/dalvik-cache/x86_64
[0529/031154:INFO:arc_setup.cc(654)] Running arc_mounter_->UmountLazily(mount_path)...
[0529/031154:ERROR:arc_setup.cc(242)] Unexpected mount info: 114 18 259:1 /unencrypted/art-data/dalvik-cache/x86_64 /opt/google/containers/android/rootfs/android-data/data/dalvik-cache/x86_64 rw,nosuid,nodev,noexec,noatime - ext4 /dev/nvme0n1p1 rw,seclabel,commit=600,data=ordered

Comment 25 by pwang@chromium.org, May 31 2018

To #19, after the suite is expanded to job, it only records its parent suite id, not the suite name. I do not know any method to fetch it from the test itself without reaching out the database for the information.

BTW, dynamically adjust test's behavior based on which suite it is scheduled on sounds not right to me. Suite only serves as a simple wrapper of test collection. If we want different behavior of the test, we should create new test control file for it if needed.

#24
The cheets_ContainerMount failure happens occasionally. Not very often, but it sometimes does.

I'm trying to fix the issue by moving the mount operation from arc-setup to config.json, but haven't finished it yet. For totally unknown reasons, betty pre-cq freezes with my CL..... 
Project Member

Comment 27 by bugdroid1@chromium.org, Jun 8 2018

The following revision refers to this bug:
  https://chromium.googlesource.com/chromiumos/platform2/+/3031d6000a517cb20d4d61d6e80704fa828817fb

commit 3031d6000a517cb20d4d61d6e80704fa828817fb
Author: yusukes <yusukes@google.com>
Date: Fri Jun 08 02:56:16 2018

arc: Mount /data/dalvik-cache/<isa> in the container namespace

With run_oci, there's no reason to do that in the init namespace.
This will fix the occasional mount point leak reported at
 crbug.com/834479  too.

BUG= chromium:834479 
BUG= chromium:842927 
TEST=ARC++ still starts, cheets_ContainerMount
CQ-DEPEND=CL:1065582

Change-Id: I4b442d4702c3a09f020a5cc7a40463b5bd5dff59
Reviewed-on: https://chromium-review.googlesource.com/1065587
Commit-Ready: Yusuke Sato <yusukes@chromium.org>
Tested-by: Yusuke Sato <yusukes@chromium.org>
Reviewed-by: Luis Hector Chavez <lhchavez@chromium.org>

[modify] https://crrev.com/3031d6000a517cb20d4d61d6e80704fa828817fb/arc/container-bundle/pi/config.json
[modify] https://crrev.com/3031d6000a517cb20d4d61d6e80704fa828817fb/arc/container-bundle/nyc/config.json
[modify] https://crrev.com/3031d6000a517cb20d4d61d6e80704fa828817fb/arc/container-bundle/master/config.json
[modify] https://crrev.com/3031d6000a517cb20d4d61d6e80704fa828817fb/arc/setup/arc_setup.cc

#27 should remove #24's unexpected mount info errors. Let's see if this fixes the flake...
Is the test still flaky?
Owner: yusukes@chromium.org
Status: Fixed (was: Assigned)
I assume it fixed.

Sign in to add a comment