Potential jank caused by about_flags::RecordUMAStatistics() |
||||
Issue descriptionWe collected slow reports from users facing janks and found that the function mentioned above is taking too much time on the main thread, potentially causing janks. The issue can be found in reports: eab795ac841c2cde - 329.12 ms c9ba3267bff3fea0 - 282.694 ms e7332bd10180506a - 280.837 ms 71d72b2b9da48b76 - 204.081 ms bc26edb209a8d1bd - 202 ms Go to crash/ReportID to view the traces. The stack trace that was hit the most: strlen std::__ndk1::basic_string<char, std::__ndk1::char_traits<char>, std::__ndk1::allocator<char> >::basic_string(char const*) flags_ui::FlagsState::GenerateFlagsToSwitchesMapping(flags_ui::FlagsStorage*, std::__ndk1::set<std::__ndk1::basic_string<char, std::__ndk1::char_traits<char>, std::__ndk1::allocator<char> >, std::__ndk1::less<std::__ndk1::basic_string<char, std::__ndk1::char_traits<char>, std::__ndk1::allocator<char> > >, std::__ndk1::allocator<std::__ndk1::basic_string<char, std::__ndk1::char_traits<char>, std::__ndk1::allocator<char> > > >*, std::__ndk1::map<std::__ndk1::basic_string<char, std::__ndk1::char_traits<char>, std::__ndk1::allocator<char> >, flags_ui::FlagsState::SwitchEntry, std::__ndk1::less<std::__ndk1::basic_string<char, std::__ndk1::char_traits<char>, std::__ndk1::allocator<char> > >, std::__ndk1::allocator<std::__ndk1::pair<std::__ndk1::basic_string<char, std::__ndk1::char_traits<char>, std::__ndk1::allocator<char> > const, flags_ui::FlagsState::SwitchEntry> > >*) flags_ui::FlagsState::GetSwitchesFromFlags(flags_ui::FlagsStorage*) about_flags::RecordUMAStatistics(flags_ui::FlagsStorage*) ChromeBrowserMainExtraPartsMetrics::PreBrowserStart() ChromeBrowserMainParts::PreBrowserStart() ChromeBrowserMainParts::PreMainMessageLoopRun() content::BrowserMainLoop::PreMainMessageLoopRun() base::internal::Invoker<base::internal::BindState<int (content::BrowserMainLoop::*)(), base::internal::UnretainedWrapper<content::BrowserMainLoop> >, int ()>::Run(base::internal::BindStateBase*) content::StartupTaskRunner::WrappedTask() base::internal::Invoker<base::internal::BindState<void (media::AudioInputController::*)(), base::internal::UnretainedWrapper<media::AudioInputController> >, void ()>::Run(base::internal::BindStateBase*) base::debug::TaskAnnotator::RunTask(char const*, base::PendingTask*) base::MessageLoop::RunTask(base::PendingTask*) base::MessageLoop::DoWork() base::MessagePumpForUI::OnNonDelayedLooperCallback() base::(anonymous namespace)::NonDelayedLooperCallback(int, int, void*) __futex_syscall3 Unknown dlmalloc malloc operator new(unsigned int) std::__ndk1::basic_string<char, std::__ndk1::char_traits<char>, std::__ndk1::allocator<char> >::__init(char const*, unsigned int) std::__ndk1::basic_string<char, std::__ndk1::char_traits<char>, std::__ndk1::allocator<char> >::basic_string(char const*) flags_ui::FeatureEntry::NameForOption(int) const flags_ui::(anonymous namespace)::AddInternalName(flags_ui::FeatureEntry const&, std::__ndk1::set<std::__ndk1::basic_string<char, std::__ndk1::char_traits<char>, std::__ndk1::allocator<char> >, std::__ndk1::less<std::__ndk1::basic_string<char, std::__ndk1::char_traits<char>, std::__ndk1::allocator<char> > >, std::__ndk1::allocator<std::__ndk1::basic_string<char, std::__ndk1::char_traits<char>, std::__ndk1::allocator<char> > > >*) flags_ui::FlagsState::GetSanitizedEnabledFlags(flags_ui::FlagsStorage*, std::__ndk1::set<std::__ndk1::basic_string<char, std::__ndk1::char_traits<char>, std::__ndk1::allocator<char> >, std::__ndk1::less<std::__ndk1::basic_string<char, std::__ndk1::char_traits<char>, std::__ndk1::allocator<char> > >, std::__ndk1::allocator<std::__ndk1::basic_string<char, std::__ndk1::char_traits<char>, std::__ndk1::allocator<char> > > >*) flags_ui::FlagsState::GetSanitizedEnabledFlagsForCurrentPlatform(flags_ui::FlagsStorage*, std::__ndk1::set<std::__ndk1::basic_string<char, std::__ndk1::char_traits<char>, std::__ndk1::allocator<char> >, std::__ndk1::less<std::__ndk1::basic_string<char, std::__ndk1::char_traits<char>, std::__ndk1::allocator<char> > >, std::__ndk1::allocator<std::__ndk1::basic_string<char, std::__ndk1::char_traits<char>, std::__ndk1::allocator<char> > > >*) flags_ui::FlagsState::GenerateFlagsToSwitchesMapping(flags_ui::FlagsStorage*, std::__ndk1::set<std::__ndk1::basic_string<char, std::__ndk1::char_traits<char>, std::__ndk1::allocator<char> >, std::__ndk1::less<std::__ndk1::basic_string<char, std::__ndk1::char_traits<char>, std::__ndk1::allocator<char> > >, std::__ndk1::allocator<std::__ndk1::basic_string<char, std::__ndk1::char_traits<char>, std::__ndk1::allocator<char> > > >*, std::__ndk1::map<std::__ndk1::basic_string<char, std::__ndk1::char_traits<char>, std::__ndk1::allocator<char> >, flags_ui::FlagsState::SwitchEntry, std::__ndk1::less<std::__ndk1::basic_string<char, std::__ndk1::char_traits<char>, std::__ndk1::allocator<char> > >, std::__ndk1::allocator<std::__ndk1::pair<std::__ndk1::basic_string<char, std::__ndk1::char_traits<char>, std::__ndk1::allocator<char> > const, flags_ui::FlagsState::SwitchEntry> > >*) flags_ui::FlagsState::GetSwitchesFromFlags(flags_ui::FlagsStorage*) about_flags::RecordUMAStatistics(flags_ui::FlagsStorage*) ChromeBrowserMainExtraPartsMetrics::PreBrowserStart() ChromeBrowserMainParts::PreBrowserStart() ChromeBrowserMainParts::PreMainMessageLoopRun() content::BrowserMainLoop::PreMainMessageLoopRun() base::internal::Invoker<base::internal::BindState<int (content::BrowserMainLoop::*)(), base::internal::UnretainedWrapper<content::BrowserMainLoop> >, int ()>::Run(base::internal::BindStateBase*) content::StartupTaskRunner::WrappedTask() base::internal::Invoker<base::internal::BindState<void (media::AudioInputController::*)(), base::internal::UnretainedWrapper<media::AudioInputController> >, void ()>::Run(base::internal::BindStateBase*) base::debug::TaskAnnotator::RunTask(char const*, base::PendingTask*) base::MessageLoop::RunTask(base::PendingTask*) base::MessageLoop::DoWork() base::MessagePumpForUI::OnNonDelayedLooperCallback() base::(anonymous namespace)::NonDelayedLooperCallback(int, int, void*) std::__ndk1::basic_string<char, std::__ndk1::char_traits<char>, std::__ndk1::allocator<char> >::compare(std::__ndk1::basic_string<char, std::__ndk1::char_traits<char>, std::__ndk1::allocator<char> > const&) const std::__ndk1::map<std::__ndk1::basic_string<char, std::__ndk1::char_traits<char>, std::__ndk1::allocator<char> >, flags_ui::FlagsState::SwitchEntry, std::__ndk1::less<std::__ndk1::basic_string<char, std::__ndk1::char_traits<char>, std::__ndk1::allocator<char> > >, std::__ndk1::allocator<std::__ndk1::pair<std::__ndk1::basic_string<char, std::__ndk1::char_traits<char>, std::__ndk1::allocator<char> > const, flags_ui::FlagsState::SwitchEntry> > >::operator[](std::__ndk1::basic_string<char, std::__ndk1::char_traits<char>, std::__ndk1::allocator<char> > const&) flags_ui::FlagsState::AddFeatureMapping(std::__ndk1::basic_string<char, std::__ndk1::char_traits<char>, std::__ndk1::allocator<char> > const&, std::__ndk1::basic_string<char, std::__ndk1::char_traits<char>, std::__ndk1::allocator<char> > const&, bool, std::__ndk1::map<std::__ndk1::basic_string<char, std::__ndk1::char_traits<char>, std::__ndk1::allocator<char> >, flags_ui::FlagsState::SwitchEntry, std::__ndk1::less<std::__ndk1::basic_string<char, std::__ndk1::char_traits<char>, std::__ndk1::allocator<char> > >, std::__ndk1::allocator<std::__ndk1::pair<std::__ndk1::basic_string<char, std::__ndk1::char_traits<char>, std::__ndk1::allocator<char> > const, flags_ui::FlagsState::SwitchEntry> > >*) flags_ui::FlagsState::GenerateFlagsToSwitchesMapping(flags_ui::FlagsStorage*, std::__ndk1::set<std::__ndk1::basic_string<char, std::__ndk1::char_traits<char>, std::__ndk1::allocator<char> >, std::__ndk1::less<std::__ndk1::basic_string<char, std::__ndk1::char_traits<char>, std::__ndk1::allocator<char> > >, std::__ndk1::allocator<std::__ndk1::basic_string<char, std::__ndk1::char_traits<char>, std::__ndk1::allocator<char> > > >*, std::__ndk1::map<std::__ndk1::basic_string<char, std::__ndk1::char_traits<char>, std::__ndk1::allocator<char> >, flags_ui::FlagsState::SwitchEntry, std::__ndk1::less<std::__ndk1::basic_string<char, std::__ndk1::char_traits<char>, std::__ndk1::allocator<char> > >, std::__ndk1::allocator<std::__ndk1::pair<std::__ndk1::basic_string<char, std::__ndk1::char_traits<char>, std::__ndk1::allocator<char> > const, flags_ui::FlagsState::SwitchEntry> > >*) flags_ui::FlagsState::GetFeaturesFromFlags(flags_ui::FlagsStorage*) about_flags::RecordUMAStatistics(flags_ui::FlagsStorage*) ChromeBrowserMainExtraPartsMetrics::PreBrowserStart() ChromeBrowserMainParts::PreBrowserStart() ChromeBrowserMainParts::PreMainMessageLoopRun() content::BrowserMainLoop::PreMainMessageLoopRun() base::internal::Invoker<base::internal::BindState<int (content::BrowserMainLoop::*)(), base::internal::UnretainedWrapper<content::BrowserMainLoop> >, int ()>::Run(base::internal::BindStateBase*) content::StartupTaskRunner::WrappedTask() base::internal::Invoker<base::internal::BindState<void (media::AudioInputController::*)(), base::internal::UnretainedWrapper<media::AudioInputController> >, void ()>::Run(base::internal::BindStateBase*) base::debug::TaskAnnotator::RunTask(char const*, base::PendingTask*) base::MessageLoop::RunTask(base::PendingTask*) base::MessageLoop::DoWork() base::MessagePumpForUI::OnNonDelayedLooperCallback() base::(anonymous namespace)::NonDelayedLooperCallback(int, int, void*)
,
Nov 13
This is surprising because the code is not doing IO as far as I can tell. It seems like something else must be wrong on those systems for this to be so slow. e.g. is something hooking the allocator?
,
Nov 13
Also, how can I see those stack traces from the linked reports?
,
Nov 13
Otherwise, could look at what call trees look like on other platforms, e.g. on Mac: https://uma.googleplex.com/callstacks?sid=5e53c8e3392c4e4cceedb92fa4071e56 We do see 7ms spent there which is not nothing. One thing that jumps out is maybe the use of std::map is not the best here and we can try other containers.
,
Nov 13
The other bit is that GetFeaturesFromFlags and GetSwitchesFromFlags seem to duplicate some work. For example, they both call GenerateFlagsToSwitchesMapping(). So with some refactoring, we could eliminate one of those calls and half the execution time. That still won't explain why we're spending hundreds of milliseconds in it on Android...
,
Nov 13
I have a CL to combine the two calls. The other thing we can do is have name_to_switch_map and enabled_entries collections use StringPieces instead of std::strings to avoid copying them. I'll look at doing that as well.
,
Nov 13
Actually switching those to StringPieces probably won't do much. Those just contain the enabled ones which should be pretty minimal. We're paying the main cost for the list of all possible flag entries. The problem is we have some temporary copies - e.g. when we get them from local state: https://cs.chromium.org/chromium/src/components/flags_ui/pref_service_flags_storage.cc?rcl=f3f15d41dd781e8cf6370f544c2356c1cd627213&l=22 And then filter for current platform: https://cs.chromium.org/chromium/src/components/flags_ui/flags_state.cc?rcl=fc5cd59f92f7f94996856d096d48ab1c1f0d572f&l=824 Probably we can just make the GetFlags() API take a filtering callback so that we only add the ones that we need there. I can take a look at that.
,
Nov 13
Hmm, but even that doesn't make much sense. GetFlags() should only be returning toggled switches already...
,
Nov 13
Sorry I will provide details on how to get stack traces. You can go to memlog/trace?reportid=<> to get a symbolized trace with stackframes. But, unfortunately not all traces linked contain stack frames since they were old versions which did not support sampling profiler. I have attached 2 traces from newer chrome versions which has samples. I identified old traces from the trace event "ChromeBrowserMainParts::PreBrowserStart", which is usually long because of flags parsing. You can look at the trace event "ChromeBrowserMainParts::PreBrowserStart" which is long. If you scroll down to thread "StackSamplingProfiler" you'll see stack traces for every 50ms under the args of event "StackCpuSampling". I know it is hard right now, and we are making a better UI for this. This is the current way to get to samples.
,
Nov 13
The median cases are still ok. It is the high percentiles that take long time to parse.
,
Nov 14
I read through the code a bit. I can't tell if FlagState can be accessed from background thread? If so can we just move all this computation in background? Are we using this UMA at all on Android? Do we care about what flags people set? Maybe we can just remove them on android.
,
Nov 14
,
Nov 14
"I read through the code a bit. I can't tell if FlagState can be accessed from background thread? If so can we just move all this computation in background?" It uses the Local State pref store which needs to be done on the main thread. In general, I'd prefer we just make this code fast than add complexity to do it on a background thread. So that's what the changes I've sent out for review are doing (making it fast). We can see if after those there's still a problem and then decide if we need to do anything else. "Are we using this UMA at all on Android? Do we care about what flags people set? Maybe we can just remove them on android." It's as useful on Android as any other platforms. It's needed when we need to look at people opting out of features or when we want to clean up old flag entries.
,
Nov 14
OK. I'd be OK with waiting and seeing how the updated code behaves. Maybe we could add a NextAction to this bug to see what the reports are telling us?
,
Nov 14
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/103faf893940889d189a6620ddb46ff871666b32 commit 103faf893940889d189a6620ddb46ff871666b32 Author: Alexei Svitkine <asvitkine@chromium.org> Date: Wed Nov 14 16:43:15 2018 Optimize about_flags::RecordUMAStatistics(). Combines the GetSwitchesFromFlags() and GetFeaturesFromFlags() calls to avoid GenerateFlagsToSwitchesMapping() being called twice, which we have evidence of being expensive. Bug: 904575 Change-Id: Ie89acad8175fdfe029a951074b9aa1f4f7678f95 Reviewed-on: https://chromium-review.googlesource.com/c/1333875 Commit-Queue: Alexei Svitkine <asvitkine@chromium.org> Reviewed-by: ssid <ssid@chromium.org> Cr-Commit-Position: refs/heads/master@{#608005} [modify] https://crrev.com/103faf893940889d189a6620ddb46ff871666b32/chrome/browser/about_flags.cc [modify] https://crrev.com/103faf893940889d189a6620ddb46ff871666b32/components/flags_ui/flags_state.cc [modify] https://crrev.com/103faf893940889d189a6620ddb46ff871666b32/components/flags_ui/flags_state.h
,
Nov 14
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/ea9e39617b2a09963eb374327874bd6be686bd5d commit ea9e39617b2a09963eb374327874bd6be686bd5d Author: Alexei Svitkine <asvitkine@chromium.org> Date: Wed Nov 14 21:28:03 2018 Optimize some startup code related to about:flags handling. Previously, the code would create a set from all the known flags and compare the specified flags against them. But actually this is less efficient than just the O(n*m) search, because n being the number of flags set by the user and m being the number of available flags, since n will likely be small or 0 and then we don't need to do a lot of work. In particular, this also avoids copying a bunch of strings. Bug: 904575 Change-Id: I63e59d439c432338aea71eb6258e8f8b42f04921 Reviewed-on: https://chromium-review.googlesource.com/c/1334413 Commit-Queue: Alexei Svitkine <asvitkine@chromium.org> Reviewed-by: ssid <ssid@chromium.org> Cr-Commit-Position: refs/heads/master@{#608128} [modify] https://crrev.com/ea9e39617b2a09963eb374327874bd6be686bd5d/components/flags_ui/feature_entry.cc [modify] https://crrev.com/ea9e39617b2a09963eb374327874bd6be686bd5d/components/flags_ui/feature_entry.h [modify] https://crrev.com/ea9e39617b2a09963eb374327874bd6be686bd5d/components/flags_ui/flags_state.cc [modify] https://crrev.com/ea9e39617b2a09963eb374327874bd6be686bd5d/components/flags_ui/flags_state.h
,
Nov 14
Let's see what things look like after those changes. We can definitely check in UMA sampling profiler on Mac/Windows how things look like. (Not sure how quickly we can see the effect on slow reports effects.)
,
Nov 15
,
Nov 19
So the result of the optimizations is a 75% reduction in runtime in that function on Mac: https://uma.googleplex.com/callstacks?sid=db83529fecc3595891825a25aeb5b7d3 (5.6ms reduction from 7.5ms originally) Do we have any data about how this looks like on Android?
,
Nov 26
I'm going to mark this as Fixed based on the changes that landed. If there are still reports post those changes, please re-open and link to them and we can discuss further steps.
,
Nov 28
Thanks! This issue no longer shows up as top jank in my dashboard. Less than 0.2% of users have long record task, it was 10% earlier. |
||||
►
Sign in to add a comment |
||||
Comment 1 by ssid@chromium.org
, Nov 12