New issue
Advanced search Search tips
Note: Color blocks (like or ) mean that a user may not be available. Tooltip shows the reason.

Issue 904575 link

Starred by 1 user

Issue metadata

Status: Fixed
Owner:
Closed: Nov 26
Cc:
EstimatedDays: ----
NextAction: ----
OS: Android
Pri: 2
Type: Bug



Sign in to add a comment

Potential jank caused by about_flags::RecordUMAStatistics()

Project Member Reported by ssid@chromium.org, Nov 12

Issue description

We collected slow reports from users facing janks and found that the function mentioned above is taking too much time on the main thread, potentially causing janks.
The issue can be found in reports:

eab795ac841c2cde - 329.12 ms
c9ba3267bff3fea0 - 282.694 ms
e7332bd10180506a - 280.837 ms
71d72b2b9da48b76 - 204.081 ms
bc26edb209a8d1bd - 202 ms

Go to crash/ReportID to view the traces.

The stack trace that was hit the most:

strlen
std::__ndk1::basic_string<char, std::__ndk1::char_traits<char>, std::__ndk1::allocator<char> >::basic_string(char const*)
flags_ui::FlagsState::GenerateFlagsToSwitchesMapping(flags_ui::FlagsStorage*, std::__ndk1::set<std::__ndk1::basic_string<char, std::__ndk1::char_traits<char>, std::__ndk1::allocator<char> >, std::__ndk1::less<std::__ndk1::basic_string<char, std::__ndk1::char_traits<char>, std::__ndk1::allocator<char> > >, std::__ndk1::allocator<std::__ndk1::basic_string<char, std::__ndk1::char_traits<char>, std::__ndk1::allocator<char> > > >*, std::__ndk1::map<std::__ndk1::basic_string<char, std::__ndk1::char_traits<char>, std::__ndk1::allocator<char> >, flags_ui::FlagsState::SwitchEntry, std::__ndk1::less<std::__ndk1::basic_string<char, std::__ndk1::char_traits<char>, std::__ndk1::allocator<char> > >, std::__ndk1::allocator<std::__ndk1::pair<std::__ndk1::basic_string<char, std::__ndk1::char_traits<char>, std::__ndk1::allocator<char> > const, flags_ui::FlagsState::SwitchEntry> > >*)
flags_ui::FlagsState::GetSwitchesFromFlags(flags_ui::FlagsStorage*)
about_flags::RecordUMAStatistics(flags_ui::FlagsStorage*)
ChromeBrowserMainExtraPartsMetrics::PreBrowserStart()
ChromeBrowserMainParts::PreBrowserStart()
ChromeBrowserMainParts::PreMainMessageLoopRun()
content::BrowserMainLoop::PreMainMessageLoopRun()
base::internal::Invoker<base::internal::BindState<int (content::BrowserMainLoop::*)(), base::internal::UnretainedWrapper<content::BrowserMainLoop> >, int ()>::Run(base::internal::BindStateBase*)
content::StartupTaskRunner::WrappedTask()
base::internal::Invoker<base::internal::BindState<void (media::AudioInputController::*)(), base::internal::UnretainedWrapper<media::AudioInputController> >, void ()>::Run(base::internal::BindStateBase*)
base::debug::TaskAnnotator::RunTask(char const*, base::PendingTask*)
base::MessageLoop::RunTask(base::PendingTask*)
base::MessageLoop::DoWork()
base::MessagePumpForUI::OnNonDelayedLooperCallback()
base::(anonymous namespace)::NonDelayedLooperCallback(int, int, void*)


__futex_syscall3
Unknown
dlmalloc
malloc
operator new(unsigned int)
std::__ndk1::basic_string<char, std::__ndk1::char_traits<char>, std::__ndk1::allocator<char> >::__init(char const*, unsigned int)
std::__ndk1::basic_string<char, std::__ndk1::char_traits<char>, std::__ndk1::allocator<char> >::basic_string(char const*)
flags_ui::FeatureEntry::NameForOption(int) const
flags_ui::(anonymous namespace)::AddInternalName(flags_ui::FeatureEntry const&, std::__ndk1::set<std::__ndk1::basic_string<char, std::__ndk1::char_traits<char>, std::__ndk1::allocator<char> >, std::__ndk1::less<std::__ndk1::basic_string<char, std::__ndk1::char_traits<char>, std::__ndk1::allocator<char> > >, std::__ndk1::allocator<std::__ndk1::basic_string<char, std::__ndk1::char_traits<char>, std::__ndk1::allocator<char> > > >*)
flags_ui::FlagsState::GetSanitizedEnabledFlags(flags_ui::FlagsStorage*, std::__ndk1::set<std::__ndk1::basic_string<char, std::__ndk1::char_traits<char>, std::__ndk1::allocator<char> >, std::__ndk1::less<std::__ndk1::basic_string<char, std::__ndk1::char_traits<char>, std::__ndk1::allocator<char> > >, std::__ndk1::allocator<std::__ndk1::basic_string<char, std::__ndk1::char_traits<char>, std::__ndk1::allocator<char> > > >*)
flags_ui::FlagsState::GetSanitizedEnabledFlagsForCurrentPlatform(flags_ui::FlagsStorage*, std::__ndk1::set<std::__ndk1::basic_string<char, std::__ndk1::char_traits<char>, std::__ndk1::allocator<char> >, std::__ndk1::less<std::__ndk1::basic_string<char, std::__ndk1::char_traits<char>, std::__ndk1::allocator<char> > >, std::__ndk1::allocator<std::__ndk1::basic_string<char, std::__ndk1::char_traits<char>, std::__ndk1::allocator<char> > > >*)
flags_ui::FlagsState::GenerateFlagsToSwitchesMapping(flags_ui::FlagsStorage*, std::__ndk1::set<std::__ndk1::basic_string<char, std::__ndk1::char_traits<char>, std::__ndk1::allocator<char> >, std::__ndk1::less<std::__ndk1::basic_string<char, std::__ndk1::char_traits<char>, std::__ndk1::allocator<char> > >, std::__ndk1::allocator<std::__ndk1::basic_string<char, std::__ndk1::char_traits<char>, std::__ndk1::allocator<char> > > >*, std::__ndk1::map<std::__ndk1::basic_string<char, std::__ndk1::char_traits<char>, std::__ndk1::allocator<char> >, flags_ui::FlagsState::SwitchEntry, std::__ndk1::less<std::__ndk1::basic_string<char, std::__ndk1::char_traits<char>, std::__ndk1::allocator<char> > >, std::__ndk1::allocator<std::__ndk1::pair<std::__ndk1::basic_string<char, std::__ndk1::char_traits<char>, std::__ndk1::allocator<char> > const, flags_ui::FlagsState::SwitchEntry> > >*)
flags_ui::FlagsState::GetSwitchesFromFlags(flags_ui::FlagsStorage*)
about_flags::RecordUMAStatistics(flags_ui::FlagsStorage*)
ChromeBrowserMainExtraPartsMetrics::PreBrowserStart()
ChromeBrowserMainParts::PreBrowserStart()
ChromeBrowserMainParts::PreMainMessageLoopRun()
content::BrowserMainLoop::PreMainMessageLoopRun()
base::internal::Invoker<base::internal::BindState<int (content::BrowserMainLoop::*)(), base::internal::UnretainedWrapper<content::BrowserMainLoop> >, int ()>::Run(base::internal::BindStateBase*)
content::StartupTaskRunner::WrappedTask()
base::internal::Invoker<base::internal::BindState<void (media::AudioInputController::*)(), base::internal::UnretainedWrapper<media::AudioInputController> >, void ()>::Run(base::internal::BindStateBase*)
base::debug::TaskAnnotator::RunTask(char const*, base::PendingTask*)
base::MessageLoop::RunTask(base::PendingTask*)
base::MessageLoop::DoWork()
base::MessagePumpForUI::OnNonDelayedLooperCallback()
base::(anonymous namespace)::NonDelayedLooperCallback(int, int, void*)


std::__ndk1::basic_string<char, std::__ndk1::char_traits<char>, std::__ndk1::allocator<char> >::compare(std::__ndk1::basic_string<char, std::__ndk1::char_traits<char>, std::__ndk1::allocator<char> > const&) const
std::__ndk1::map<std::__ndk1::basic_string<char, std::__ndk1::char_traits<char>, std::__ndk1::allocator<char> >, flags_ui::FlagsState::SwitchEntry, std::__ndk1::less<std::__ndk1::basic_string<char, std::__ndk1::char_traits<char>, std::__ndk1::allocator<char> > >, std::__ndk1::allocator<std::__ndk1::pair<std::__ndk1::basic_string<char, std::__ndk1::char_traits<char>, std::__ndk1::allocator<char> > const, flags_ui::FlagsState::SwitchEntry> > >::operator[](std::__ndk1::basic_string<char, std::__ndk1::char_traits<char>, std::__ndk1::allocator<char> > const&)
flags_ui::FlagsState::AddFeatureMapping(std::__ndk1::basic_string<char, std::__ndk1::char_traits<char>, std::__ndk1::allocator<char> > const&, std::__ndk1::basic_string<char, std::__ndk1::char_traits<char>, std::__ndk1::allocator<char> > const&, bool, std::__ndk1::map<std::__ndk1::basic_string<char, std::__ndk1::char_traits<char>, std::__ndk1::allocator<char> >, flags_ui::FlagsState::SwitchEntry, std::__ndk1::less<std::__ndk1::basic_string<char, std::__ndk1::char_traits<char>, std::__ndk1::allocator<char> > >, std::__ndk1::allocator<std::__ndk1::pair<std::__ndk1::basic_string<char, std::__ndk1::char_traits<char>, std::__ndk1::allocator<char> > const, flags_ui::FlagsState::SwitchEntry> > >*)
flags_ui::FlagsState::GenerateFlagsToSwitchesMapping(flags_ui::FlagsStorage*, std::__ndk1::set<std::__ndk1::basic_string<char, std::__ndk1::char_traits<char>, std::__ndk1::allocator<char> >, std::__ndk1::less<std::__ndk1::basic_string<char, std::__ndk1::char_traits<char>, std::__ndk1::allocator<char> > >, std::__ndk1::allocator<std::__ndk1::basic_string<char, std::__ndk1::char_traits<char>, std::__ndk1::allocator<char> > > >*, std::__ndk1::map<std::__ndk1::basic_string<char, std::__ndk1::char_traits<char>, std::__ndk1::allocator<char> >, flags_ui::FlagsState::SwitchEntry, std::__ndk1::less<std::__ndk1::basic_string<char, std::__ndk1::char_traits<char>, std::__ndk1::allocator<char> > >, std::__ndk1::allocator<std::__ndk1::pair<std::__ndk1::basic_string<char, std::__ndk1::char_traits<char>, std::__ndk1::allocator<char> > const, flags_ui::FlagsState::SwitchEntry> > >*)
flags_ui::FlagsState::GetFeaturesFromFlags(flags_ui::FlagsStorage*)
about_flags::RecordUMAStatistics(flags_ui::FlagsStorage*)
ChromeBrowserMainExtraPartsMetrics::PreBrowserStart()
ChromeBrowserMainParts::PreBrowserStart()
ChromeBrowserMainParts::PreMainMessageLoopRun()
content::BrowserMainLoop::PreMainMessageLoopRun()
base::internal::Invoker<base::internal::BindState<int (content::BrowserMainLoop::*)(), base::internal::UnretainedWrapper<content::BrowserMainLoop> >, int ()>::Run(base::internal::BindStateBase*)
content::StartupTaskRunner::WrappedTask()
base::internal::Invoker<base::internal::BindState<void (media::AudioInputController::*)(), base::internal::UnretainedWrapper<media::AudioInputController> >, void ()>::Run(base::internal::BindStateBase*)
base::debug::TaskAnnotator::RunTask(char const*, base::PendingTask*)
base::MessageLoop::RunTask(base::PendingTask*)
base::MessageLoop::DoWork()
base::MessagePumpForUI::OnNonDelayedLooperCallback()
base::(anonymous namespace)::NonDelayedLooperCallback(int, int, void*)
 
Owner: asvitk...@chromium.org
Getting switches and features from flags seems to take a lot of time for string parsing and adding metrics to UMA. Sometimes I also see PersistentHistogram storage in the stack traces, but lesser compared to the parsing time.

+asvitkine It looks like this method can be simply moved to a background thread?

Stats show that about 10% of users are impacted by upto 100ms in this method. I can try to get more accurate stats if needed.
This is surprising because the code is not doing IO as far as I can tell. It seems like something else must be wrong on those systems for this to be so slow. e.g. is something hooking the allocator?
Also, how can I see those stack traces from the linked reports?
Otherwise, could look at what call trees look like on other platforms, e.g. on Mac:

https://uma.googleplex.com/callstacks?sid=5e53c8e3392c4e4cceedb92fa4071e56

We do see 7ms spent there which is not nothing. One thing that jumps out is maybe the use of std::map is not the best here and we can try other containers.
The other bit is that GetFeaturesFromFlags and GetSwitchesFromFlags seem to duplicate some work. For example, they both call GenerateFlagsToSwitchesMapping().

So with some refactoring, we could eliminate one of those calls and half the execution time.

That still won't explain why we're spending hundreds of milliseconds in it on Android...
I have a CL to combine the two calls. The other thing we can do is have name_to_switch_map and enabled_entries collections use StringPieces instead of std::strings to avoid copying them. I'll look at doing that as well.
Actually switching those to StringPieces probably won't do much. Those just contain the enabled ones which should be pretty minimal. We're paying the main cost for the list of all possible flag entries.

The problem is we have some temporary copies - e.g. when we get them from local state:
https://cs.chromium.org/chromium/src/components/flags_ui/pref_service_flags_storage.cc?rcl=f3f15d41dd781e8cf6370f544c2356c1cd627213&l=22

And then filter for current platform:

https://cs.chromium.org/chromium/src/components/flags_ui/flags_state.cc?rcl=fc5cd59f92f7f94996856d096d48ab1c1f0d572f&l=824

Probably we can just make the GetFlags() API take a filtering callback so that we only add the ones that we need there. I can take a look at that.
Hmm, but even that doesn't make much sense. GetFlags() should only be returning toggled switches already...
Sorry I will provide details on how to get stack traces. You can go to 
memlog/trace?reportid=<> to get a symbolized trace with stackframes.
But, unfortunately not all traces linked contain stack frames since they were old versions which did not support sampling profiler. I have attached 2 traces from newer chrome versions which has samples. I identified old traces from the trace event "ChromeBrowserMainParts::PreBrowserStart", which is usually long because of flags parsing. 

You can look at the trace event "ChromeBrowserMainParts::PreBrowserStart" which is long. If you scroll down to thread "StackSamplingProfiler" you'll see stack traces for every 50ms under the args of event "StackCpuSampling". I know it is hard right now, and we are making a better UI for this. This is the current way to get to samples.

trace-bc26edb209a8d1bd.gz
232 KB Download
trace-0edaaf9613860a15.gz
225 KB Download
The median cases are still ok. It is the high percentiles that take long time to parse.
I read through the code a bit. I can't tell if FlagState can be accessed from background thread? If so can we just move all this computation in background?

Are we using this UMA at all on Android? Do we care about what flags people set? Maybe we can just remove them on android.
Cc: nyquist@chromium.org
"I read through the code a bit. I can't tell if FlagState can be accessed from background thread? If so can we just move all this computation in background?"

It uses the Local State pref store which needs to be done on the main thread.
In general, I'd prefer we just make this code fast than add complexity to do it on a background thread. So that's what the changes I've sent out for review are doing (making it fast). We can see if after those there's still a problem and then decide if we need to do anything else.

"Are we using this UMA at all on Android? Do we care about what flags people set? Maybe we can just remove them on android."

It's as useful on Android as any other platforms. It's needed when we need to look at people opting out of features or when we want to clean up old flag entries.
OK. I'd be OK with waiting and seeing how the updated code behaves. Maybe we could add a NextAction to this bug to see what the reports are telling us?
Project Member

Comment 15 by bugdroid1@chromium.org, Nov 14

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/103faf893940889d189a6620ddb46ff871666b32

commit 103faf893940889d189a6620ddb46ff871666b32
Author: Alexei Svitkine <asvitkine@chromium.org>
Date: Wed Nov 14 16:43:15 2018

Optimize about_flags::RecordUMAStatistics().

Combines the GetSwitchesFromFlags() and GetFeaturesFromFlags()
calls to avoid GenerateFlagsToSwitchesMapping() being called
twice, which we have evidence of being expensive.

Bug:  904575 
Change-Id: Ie89acad8175fdfe029a951074b9aa1f4f7678f95
Reviewed-on: https://chromium-review.googlesource.com/c/1333875
Commit-Queue: Alexei Svitkine <asvitkine@chromium.org>
Reviewed-by: ssid <ssid@chromium.org>
Cr-Commit-Position: refs/heads/master@{#608005}
[modify] https://crrev.com/103faf893940889d189a6620ddb46ff871666b32/chrome/browser/about_flags.cc
[modify] https://crrev.com/103faf893940889d189a6620ddb46ff871666b32/components/flags_ui/flags_state.cc
[modify] https://crrev.com/103faf893940889d189a6620ddb46ff871666b32/components/flags_ui/flags_state.h

Project Member

Comment 16 by bugdroid1@chromium.org, Nov 14

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/ea9e39617b2a09963eb374327874bd6be686bd5d

commit ea9e39617b2a09963eb374327874bd6be686bd5d
Author: Alexei Svitkine <asvitkine@chromium.org>
Date: Wed Nov 14 21:28:03 2018

Optimize some startup code related to about:flags handling.

Previously, the code would create a set from all the known
flags and compare the specified flags against them.

But actually this is less efficient than just the O(n*m) search,
because n being the number of flags set by the user and m being
the number of available flags, since n will likely be small or 0
and then we don't need to do a lot of work. In particular, this
also avoids copying a bunch of strings.

Bug:  904575 
Change-Id: I63e59d439c432338aea71eb6258e8f8b42f04921
Reviewed-on: https://chromium-review.googlesource.com/c/1334413
Commit-Queue: Alexei Svitkine <asvitkine@chromium.org>
Reviewed-by: ssid <ssid@chromium.org>
Cr-Commit-Position: refs/heads/master@{#608128}
[modify] https://crrev.com/ea9e39617b2a09963eb374327874bd6be686bd5d/components/flags_ui/feature_entry.cc
[modify] https://crrev.com/ea9e39617b2a09963eb374327874bd6be686bd5d/components/flags_ui/feature_entry.h
[modify] https://crrev.com/ea9e39617b2a09963eb374327874bd6be686bd5d/components/flags_ui/flags_state.cc
[modify] https://crrev.com/ea9e39617b2a09963eb374327874bd6be686bd5d/components/flags_ui/flags_state.h

Let's see what things look like after those changes. We can definitely check in UMA sampling profiler on Mac/Windows how things look like. (Not sure how quickly we can see the effect on slow reports effects.)
Status: Started (was: Untriaged)
So the result of the optimizations is a 75% reduction in runtime in that function on Mac:

https://uma.googleplex.com/callstacks?sid=db83529fecc3595891825a25aeb5b7d3

(5.6ms reduction from 7.5ms originally)

Do we have any data about how this looks like on Android?
Status: Fixed (was: Started)
I'm going to mark this as Fixed based on the changes that landed.

If there are still reports post those changes, please re-open and link to them and we can discuss further steps.
Thanks! This issue no longer shows up as top jank in my dashboard. Less than 0.2% of users have long record task, it was 10% earlier.

Sign in to add a comment