Library loading blocks UI thread |
|||||
Issue descriptionDevice: gobo-userdebug 8.1.0 OMB1.171211.001 At some point I realized that I've never seen "Looper.dispatchMessage: android.app.ActivityThread$H" event (first event that contains onCreate) ending before LibraryLoader.preloadAlreadyLocked finishes (hmm.png). As a test I added a global object with sleep(1) in it, and voila, ActivityThread$H extended for 1 second (voila.png).
,
Dec 21 2017
Here are all dlopen / android_dlopen_ext calls: Thread 12 "AsyncTask #1" hit Breakpoint 2, 0xaaae4106 in android_dlopen_ext () from /tmp/adb-gdb-libs-A6R9K17624900707/system/lib/libdl.so (gdb) p (char*)$r0 $1 = 0x8e2c2000 "/data/user_de/0/com.google.android.gms/app_chimera/m/0000000c/oat/arm/DynamiteLoader_GmsCore_prodmnc_hdpi_release.odex" (gdb) bt #0 0xaaae4106 in android_dlopen_ext () from /tmp/adb-gdb-libs-A6R9K17624900707/system/lib/libdl.so #1 0xa85979b2 in art::DlOpenOatFile::Dlopen(std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&, unsigned char*, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >*) () from /tmp/adb-gdb-libs-A6R9K17624900707/system/lib/libart.so #2 0xa85986ee in art::OatFileBase* art::OatFileBase::OpenOatFile<art::DlOpenOatFile>(std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&, unsigned char*, unsigned char*, bool, bool, bool, char const*, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >*) () from /tmp/adb-gdb-libs-A6R9K17624900707/system/lib/libart.so #3 0xa8598492 in art::OatFile::Open(std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&, unsigned char*, unsigned char*, bool, bool, char const*, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >*) () from /tmp/adb-gdb-libs-A6R9K17624900707/system/lib/libart.so #4 0xa859dfb2 in art::OatFileAssistant::OatFileInfo::GetFile() () from /tmp/adb-gdb-libs-A6R9K17624900707/system/lib/libart.so #5 0xa859c730 in art::OatFileAssistant::OatFileInfo::Status() () from /tmp/adb-gdb-libs-A6R9K17624900707/system/lib/libart.so #6 0xa859c6da in art::OatFileAssistant::IsUpToDate() () from /tmp/adb-gdb-libs-A6R9K17624900707/system/lib/libart.so #7 0xa85a23c0 in art::OatFileManager::OpenDexFilesFromOat(char const*, _jobject*, _jobjectArray*, art::OatFile const**, std::__1::vector<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >, std::__1::allocator<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > > >*) () from /tmp/adb-gdb-libs-A6R9K17624900707/system/lib/libart.so #8 0xa85728a8 in art::DexFile_openDexFileNative(_JNIEnv*, _jclass*, _jstring*, _jstring*, int, _jobject*, _jobjectArray*) () from /tmp/adb-gdb-libs-A6R9K17624900707/system/lib/libart.so #9 0x70767864 in ?? () Backtrace stopped: previous frame identical to this frame (corrupt stack?) Thread 1 "oid.apps.chrome" hit Breakpoint 1, 0xaaae40b0 in dlopen () from /tmp/adb-gdb-libs-A6R9K17624900707/system/lib/libdl.so (gdb) p (char*)$r0 $2 = 0xab065044 <fbcInit()+364> "libperfctl.so" (gdb) bt #0 0xaaae40b0 in dlopen () from /tmp/adb-gdb-libs-A6R9K17624900707/system/lib/libdl.so #1 0xab064f48 in fbcInit() () from /tmp/adb-gdb-libs-A6R9K17624900707/system/lib/libgui.so #2 0xab065132 in notifyFbc(int const&, unsigned long long const&, int const&) () from /tmp/adb-gdb-libs-A6R9K17624900707/system/lib/libgui.so #3 0xab059bd2 in android::Surface::connect(int, android::sp<android::IProducerListener> const&, bool) () from /tmp/adb-gdb-libs-A6R9K17624900707/system/lib/libgui.so #4 0xab05ab82 in android::Surface::lock(ANativeWindow_Buffer*, ARect*) () from /tmp/adb-gdb-libs-A6R9K17624900707/system/lib/libgui.so #5 0xa8e42e38 in android::nativeLockCanvas(_JNIEnv*, _jclass*, long long, _jobject*, _jobject*) () from /tmp/adb-gdb-libs-A6R9K17624900707/system/lib/libandroid_runtime.so #6 0x710e34d6 in ?? () Backtrace stopped: previous frame identical to this frame (corrupt stack?) Thread 1 "oid.apps.chrome" hit Breakpoint 2, 0xaaae4106 in android_dlopen_ext () from /tmp/adb-gdb-libs-A6R9K17624900707/system/lib/libdl.so (gdb) p (char*)$r0 $3 = 0x8db9cc00 "/vendor/lib/hw/android.hardware.graphics.mapper@2.0-impl.so" (gdb) bt #0 0xaaae4106 in android_dlopen_ext () from /tmp/adb-gdb-libs-A6R9K17624900707/system/lib/libdl.so #1 0xab8af962 in android_load_sphal_library () from /tmp/adb-gdb-libs-A6R9K17624900707/system/lib/libvndksupport.so #2 0xab323316 in android::hardware::PassthroughServiceManager::openLibs(std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&, std::__1::function<bool (void*, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&)>) () from /tmp/adb-gdb-libs-A6R9K17624900707/system/lib/libhidltransport.so #3 0xab32450e in android::hardware::PassthroughServiceManager::get(android::hardware::hidl_string const&, android::hardware::hidl_string const&) () from /tmp/adb-gdb-libs-A6R9K17624900707/system/lib/libhidltransport.so #4 0xaaa97d68 in android::hardware::graphics::mapper::V2_0::IMapper::getService(std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&, bool) () from /tmp/adb-gdb-libs-A6R9K17624900707/system/lib/android.hardware.graphics.mapper@2.0.so #5 0xab4a5798 in android::Gralloc2::Mapper::Mapper() () from /tmp/adb-gdb-libs-A6R9K17624900707/system/lib/libui.so #6 0xab4a7b96 in android::GraphicBufferMapper::GraphicBufferMapper() () from /tmp/adb-gdb-libs-A6R9K17624900707/system/lib/libui.so #7 0xab4a708e in android::Singleton<android::GraphicBufferMapper>::getInstance() () from /tmp/adb-gdb-libs-A6R9K17624900707/system/lib/libui.so #8 0xab4a6330 in android::GraphicBuffer::GraphicBuffer() () from /tmp/adb-gdb-libs-A6R9K17624900707/system/lib/libui.so #9 0xab04ec84 in android::BpGraphicBufferProducer::requestBuffer(int, android::sp<android::GraphicBuffer>*) () from /tmp/adb-gdb-libs-A6R9K17624900707/system/lib/libgui.so #10 0xab0589c0 in android::Surface::dequeueBuffer(ANativeWindowBuffer**, int*) () from /tmp/adb-gdb-libs-A6R9K17624900707/system/lib/libgui.so #11 0xab05ab58 in android::Surface::lock(ANativeWindow_Buffer*, ARect*) () from /tmp/adb-gdb-libs-A6R9K17624900707/system/lib/libgui.so #12 0xa8e42e38 in android::nativeLockCanvas(_JNIEnv*, _jclass*, long long, _jobject*, _jobject*) () from /tmp/adb-gdb-libs-A6R9K17624900707/system/lib/libandroid_runtime.so #13 0x710e34d6 in ?? () Backtrace stopped: previous frame identical to this frame (corrupt stack?) Thread 1 "oid.apps.chrome" hit Breakpoint 2, 0xaaae4106 in android_dlopen_ext () from /tmp/adb-gdb-libs-A6R9K17624900707/system/lib/libdl.so (gdb) p (char*)$r0 $4 = 0xbec49690 "/vendor/lib/hw/gralloc.mt6737m.so" (gdb) bt #0 0xaaae4106 in android_dlopen_ext () from /tmp/adb-gdb-libs-A6R9K17624900707/system/lib/libdl.so #1 0xab8af962 in android_load_sphal_library () from /tmp/adb-gdb-libs-A6R9K17624900707/system/lib/libvndksupport.so #2 0x9e977b22 in hw_get_module_by_class () from /tmp/adb-gdb-libs-A6R9K17624900707/system/lib/vndk-sp/libhardware.so #3 0x9e8cceee in ?? () Backtrace stopped: previous frame identical to this frame (corrupt stack?) Thread 13 "AsyncTask #2" hit Breakpoint 2, 0xaaae4106 in android_dlopen_ext () from /tmp/adb-gdb-libs-A6R9K17624900707/system/lib/libdl.so (gdb) p (char*)$r0 $5 = 0x9e7354b8 "libmonochrome.so" (gdb) bt #0 0xaaae4106 in android_dlopen_ext () from /tmp/adb-gdb-libs-A6R9K17624900707/system/lib/libdl.so #1 0x9bc54f1c in android::(anonymous namespace)::LoadWithRelroFile(_JNIEnv*, _jclass*, _jstring*, _jstring*, _jstring*, _jobject*) () from /tmp/adb-gdb-libs-A6R9K17624900707/system/lib/libwebviewchromium_loader.so #2 0x70e8a584 in ?? () Backtrace stopped: previous frame identical to this frame (corrupt stack?) Thread 1 "oid.apps.chrome" hit Breakpoint 2, 0xaaae4106 in android_dlopen_ext () from /tmp/adb-gdb-libs-A6R9K17624900707/system/lib/libdl.so (gdb) p (char*)$r0 $6 = 0xbec49700 "/vendor/lib/hw/gralloc.mt6737m.so" (gdb) bt #0 0xaaae4106 in android_dlopen_ext () from /tmp/adb-gdb-libs-A6R9K17624900707/system/lib/libdl.so #1 0xab8af962 in android_load_sphal_library () from /tmp/adb-gdb-libs-A6R9K17624900707/system/lib/libvndksupport.so #2 0x9e977b22 in hw_get_module_by_class () from /tmp/adb-gdb-libs-A6R9K17624900707/system/lib/vndk-sp/libhardware.so #3 0x8e02d442 in ?? () Backtrace stopped: previous frame identical to this frame (corrupt stack?) So there are plenty of library loads libmonochrome.so can block.
,
Dec 21 2017
Filed Android bug: b/70935371
,
Dec 22 2017
Thanks! That explains weird things I was seeing in traces. Realistically, what can we do about that? The set of libraries is unlikely to be known with OEMs adding their own, or changing the names. As Chrome's library loading happens on a background thread and at least some of Android's are on the UI thread, does it argue for delaying Chrome's library loading? Another possibility would be to delay Chrome's library loading, but to prefetch the start of the native library, so that the subsequent dlopen() is not blocked on IO, and only CPU-bound. What do you think?
,
Dec 22 2017
The idea with prefetching before dlopen is interesting, and it generally brings the question of where exactly dlopen() spends its time. Right now I'm experimenting with moving LoadTask around. First idea was to move it past first "ActivityThread$H" event (by posting to UI thread), but alas Android does dlopen() before drawing for the first time (I suspect it's the android::Surface::lock in the traces above).
,
Dec 22 2017
,
Dec 26 2017
,
Dec 27 2017
I've got good results with crrev.com/c/843719, which does two things: 1. Starts loading libraries before setContentView. This hides Chrome's dlopen() behind a task of comparable length (setContentView takes ~200ms). 2. Delays warming up a renderer until libraries are loaded. This is to reduce stress to the OS, because otherwise three CPU / IO consuming tasks are happening in parallel (library loading, UI inflation, and renderer process startup). To evaluate the change I've done 20 warm/cold runs, in the following conditions: * "gobo" device running OMB1.171211.001. * WiFi simulating 3G speeds. * Chrome is started using MAIN intent, and reloads a single tab with "https://en.m.wikipedia.org/wiki/California". I've collected the following metrics: * setContentView: time it takes to inflate and initialize UI (duration of "onCreate->setContentView()" event). * preloadAlreadyLocked: time it takes to dlopen() Chrome library (duration of "LibraryLoader.preloadAlreadyLocked" event). * StartupToFirstDraw: time it takes for Chrome to draw its UI for the first time (distance between "ChromeApplication.onCreate" event and first "ViewHierarchy:draw" event; refer to crrev.com/c/775182 to see how "ViewHierarchy:draw" is reported). * MessageLoopStartTime: duration of "Startup.BrowserMessageLoopStartTimeFromMainEntry2" event. * StartupToNavigation: time it takes to start navigation (distance between "ChromeApplication.onCreate" and "NavigationTiming navigationStart" events). * StartupToLoadEnd: time it takes to get first contentful paint (distance between "ChromeApplication.onCreate" and "firstContentfulPaint" events). The results are here: https://docs.google.com/a/chromium.org/spreadsheets/d/1uQoP0fQ-0cW37ThOn2ZxfJEyO1lR3EL4dfnIKsCIZW0/edit?usp=sharing Most of the metrics improved in both warm and cold runs. Some interesting bits: * preloadAlreadyLocked metric improved from 185ms to 87ms for warm runs. This is likely because I moved renderer warmup, so it doesn't fight with library loading anymore. * 100ms improvement propagated from StartupToFirstDraw all the way to StartupToLoadEnd. * setContentView slightly regressed for warm runs 235ms -> 262ms. This is because library loading steals CPU/IO from UI inflation. The regression is small though, and there is no regression for cold runs.
,
Dec 28 2017
Dmitry: I'm impressed! Both of the tricks are clever and produce exciting improvements! The analysis looks solid (though as usual I cannot miss the opportunity to point out that Colab would make it more readable and extensible .. "because python" (compared to spreadsh**t formulas) .. ok sorry for this grumbling). Is it easy to rerun with only one of the changes and see that both are in fact needed? Given that the patched setContentView is two stdevs away from the unpatched, I'd say the effect is likely small or not there at all (we could look get more samples and look closer at their histograms, but not sure it is worth diving this deep). Do you have example traces with/without patch cold/warm around median messageloop start? I haven't looked at startup traces in a while, curious to see how the interleaving of tasks looks in comparison. What are the next steps? Do we think that dlopen() will take roughly the same amount of time as setContentView on other devices or it's different on high end? If it is different, does it hurt to still parallelize at this spot? Should we get smarter with delaying renderer warmup by saying 'after dlopen you are immediately good' or is it soing roughly this already?
,
Jan 9 2018
The effect is definitely there - StartupToFirstDraw improved from 737ms to 632ms, with STDEVs of 30ms. Two sample traces attached - as you can see android.app.ActivityThread$H even in new_3g_loadbeforeinflate_warmupafter_20-10s doesn't have a tail, and is ~100ms shorter (463ms vs 559ms) . That 100ms is then propagated to StartupToFirstDraw and BrowserMessageLoopStartTimeFromMainEntry2. Next I'm going to check whether moving renderer startup after loading is sufficient.
,
Jan 9 2018
,
Jan 11 2018
I redid measurements on gobo 1GiB with two variants of the patch: * patch_full: all changes from crrev.com/c/843719 (starting loading libraries before setContentView; start warmup renderer after loading) * patch_warmup: only AsyncInitializationActivity.java change from crrev.com/c/843719 (start warmup renderer after loading libraries) The results for gobo are here: https://docs.google.com/spreadsheets/d/1AGzUBHWy2yEblGehyI9DurMr6-qffn_QB4IYvvwuy54/edit?usp=sharing patch_warmup shows ~half of the improvement of patch_full, i.e. we need both changes. I also measured the same thing on 1GiB Android One (sprout) device. And surprisingly the improvement is greater there! See for yourself: https://docs.google.com/spreadsheets/d/1ATKCynzFC0eqqWADt9EXsWwS2TknWt7PF4B_tcDQhgw/edit?usp=sharing patch_full improved BrowserMessageLoopStartTimeFromMainEntry2 1600ms -> 1150ms / 2350ms -> 1780ms for warm / cold runs. Turned out on sprout library loading takes ~600ms, and main thread just waits for it to complete. See the screenshots (LibraryLoader.loadAlreadyLocked event is marked).
,
Jan 11 2018
Sample cold runs on sprout.
,
Jan 11 2018
Wow, that's an impressive improvement on sprout. We should land this and watch the UMA to see how this affects loop start timing in production.
,
Jan 11 2018
Hi, Cannot find a bug with the experimental results I got 3 years ago, so I'll attach an ipython notebook here. You can look at it with jupyter. I also attach the HTML output. The tl;dr was at the time that it was helping on high-end devices, not on low-end ones, so we didn't do it. However sprout was the "android one" device on which it didn't help, so looks like things have changed. Yeah!
,
Jan 12 2018
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/54ed83456d7384933afb78bfc5b1a702e71e17d9 commit 54ed83456d7384933afb78bfc5b1a702e71e17d9 Author: Dmitry Skiba <dskiba@chromium.org> Date: Fri Jan 12 20:18:37 2018 Reorder content inflation, library loading, and renderer warmup. On Android Go devices dlopen() called to load Chrome's library can take a long time. That blocks UI thread, because Android also calls dlopen() during startup, and dlopen() is synchronous. The solution is to "hide" Chrome's library loading behind UI inflation done in setContentView(), since they are comparable in length. To reduce pressure on the OS we also delay starting warmup renderer until library is loaded. Surprisingly this change also helps Android One devices running M, because library loading takes even longer there, and so UI thread spends long time idling, waiting for onCreateWithNative() to be called. Bug: 796957 Change-Id: Icb476e66c85f51ead63f1972fc19c0d404104e4c Reviewed-on: https://chromium-review.googlesource.com/862390 Reviewed-by: Maria Khomenko <mariakhomenko@chromium.org> Reviewed-by: Egor Pasko <pasko@chromium.org> Commit-Queue: Dmitry Skiba <dskiba@chromium.org> Cr-Commit-Position: refs/heads/master@{#529044} [modify] https://crrev.com/54ed83456d7384933afb78bfc5b1a702e71e17d9/chrome/android/java/src/org/chromium/chrome/browser/init/AsyncInitTaskRunner.java [modify] https://crrev.com/54ed83456d7384933afb78bfc5b1a702e71e17d9/chrome/android/java/src/org/chromium/chrome/browser/init/AsyncInitializationActivity.java
,
Jan 12 2018
Good news: experimental.startup.android.coldish/messageloop_start_time_avg dropped from 811ms to 706ms, and in the CL range the only relevant commit is mine. Bad news: looking at the traces for those points I don't see any changes in Startup.BrowserMessageLoopStartTimeFromMainEntry2 values (I see the effect of my CL though). The values are 856ms for 811ms point and 842ms for 706ms point. I thought messageloop_start_time is derived from Startup.BrowserMessageLoopStartTimeFromMainEntry2?
,
Jan 15 2018
Congrats! the improvement is real!! > The values are 856ms for 811ms point and 842ms for 706ms point. Thank you for noticing this!! Together with Juan we confirmed a few things: * the time being reported is the average of 10 runs on the bot * the link to the trace is from the very first run, which became quite non-typical when we switched to 'coldish' start * the increased stdev 'bars' on the graph indicate that there are these 2 outliers from the first 2 runs, other measurements are a lot less noisy * to get all the traces, click on 'Buildbot status page' for the point, then go to step "experimental.startup.android.coldish on Android" -> bottom of stdout, see lines like "View generated trace files online at https://console.developers.google.com/m/cloudstorage/b/chrome-telemetry-output/o/http___bbc_co_uk_2018-01-15_14-27-04_95045.html for story http://bbc.co.uk" Let me file a request for improving UI / reporting / bisecting in this setup. > I thought messageloop_start_time is derived from Startup.BrowserMessageLoopStartTimeFromMainEntry2? Almost. Recently switched to "...FromMainEntry3", but they are near each other on this bot.
,
Jan 15 2018
Proposed a few possibilities to make it less confusing: https://github.com/catapult-project/catapult/issues/4161
,
Jan 18 2018
This improves start_with_url.cold.startup_pages/messageloop_start_time by 80-100ms on our downstream bot: https://chromeperf.appspot.com/report?sid=15c2861109c546f72304a81d85a3692578ce2256e4b012df32602fa8f7b72089&start_rev=1515607362&end_rev=1516148025 I attached this bug to the alert.
,
Jan 18 2018
Thanks for checking! I think we can close this now.
,
Jan 23 2018
dskiba: perhaps you can post here a few answers (even if partial) to questions I asked in #9? The most thing I can be nervous about: do we expect all device types to benefit from this optimizations or it could show up as a regression for some?
,
Jan 23 2018
Apologies for not answering #9 in time, pasko@. Here we go. First, here are traces that are close to average BrowserMessageLoopStartTimeFromMainEntry2 values (gobo).
,
Jan 23 2018
Traces for spout.
,
Jan 23 2018
As for regressions, I would expect things to stay the same on high end devices, where everything is fast, so the order doesn't matter. There is an increase in preloadAlreadyLocked times for cold gobo runs, but it nevertheless stays within setContentView times. I.e. even if Android removes dlopen() calls from startup path, the patch will cause degradation to pre-patch levels, but won't cause a regression. There is one possibility of a regression though - if dlopen() is called during setContentView(), it will now block until library loading is complete (N+ though, where we use system loader). |
|||||
►
Sign in to add a comment |
|||||
Comment 1 by dskiba@chromium.org
, Dec 21 2017