New issue
Advanced search Search tips
Note: Color blocks (like or ) mean that a user may not be available. Tooltip shows the reason.

Issue 761684 link

Starred by 0 users

Issue metadata

Status: Fixed
Owner:
Closed: Sep 25
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: Android
Pri: 2
Type: Bug

Blocked on:
issue 776071

Blocking:
issue 735799



Sign in to add a comment

SetupFieldTrials is slow on 512 Go device

Project Member Reported by dskiba@chromium.org, Sep 3 2017

Issue description

Larger part of ChromeBrowserMainParts::PreCreateThreadsImpl (~200ms) was not covered by trace events, so I started digging.

It turned out that most of the time was spent in ChromeBrowserMainParts::SetupFieldTrials.

However, when I dug deeper I found that:

1. VariationsSeed::ParseFromString() in VariationsSeedStore::LoadSeed() takes 57ms
2. VariationsSeed destructor in VariationsFieldTrialCreator::CreateTrialsFromSeed() takes 26ms

This might be a sign of a general slowness of alloc / free functions, I'll keep digging.
 
Blocking: 735799
Trace attached.
fieldtrials-chrome-10-nourl.html.gz
1.3 MB Download

Comment 2 by dskiba@chromium.org, Sep 11 2017

I added some instrumentation, and found out that:

1. ParseFromString allocates a lot:

malloc_time_ms      "131 - 46 = 85"
allocation_count    "63992 - 16697 = 47295"
allocated_bytes     "5750470 - 4264334 = 1486136"
free_count          "23164 - 13377 = 9787"

2. ~VariationsSeed frees it all:

malloc_time_ms      "186 - 132 = 54"
allocation_count    "64284 - 64284 = 0"
allocated_bytes     "5759566 - 5759566 = 0"
free_count          "60884 - 23375 = 37509"


The slowdown is probably coming from enormous number of alloc/free operations, but might also be because of mmap/munmap/madvise calls by jemalloc.
malloc_counters.html.gz
1.9 MB Download

Comment 3 by dskiba@chromium.org, Sep 12 2017

Modifying jemalloc's decay time doesn't seem to do anything:

Columns below are: |total time (ms)|total allocated (KiB)|#allocations|#frees|

***** DECAY=1 (default) *****

Browser
869	34418	275931	220139
940	34686	277598	221144
920	34536	275357	219635
887	34570	276952	220939
901	34654	277127	220940
923	34980	278359	221499
876	34295	274765	219541
1064	34615	276824	220681
865	34172	274900	219042
885	34223	274792	219425

Renderer
778	34629	133924	107295
796	34704	134836	108052
825	35104	135630	108774
756	34997	135299	108500
737	34610	134564	107827
742	34614	134618	107913
782	35053	135491	108665
817	33970	132720	106216
759	34588	133746	107140
797	35047	135360	108535

***** DECAY=0 *****

Browser
971	34986	278411	221922
902	34608	275502	219802
982	34839	277009	220741
877	34478	275513	219914
980	34612	276799	220559
937	34273	275253	219371
986	34760	277059	220938
875	34159	274450	219134
917	34893	278699	221958
938	34932	278197	221429

Renderer
725	34692	134913	108127
742	34390	134278	107561
777	34546	134827	108023
748	34667	134736	107981
797	34623	134569	107843
751	34335	133423	106856
769	34230	134516	107894
761	34508	133471	106913
778	35013	135346	108533
762	34672	134702	107942

***** DECAY=1000 *****

Browser
860	34115	274458	219014
859	34516	277054	220780
947	36113	286245	226816
915	34019	274031	218731
936	34730	278310	222043
872	34754	276710	220735
913	34892	278039	221860
861	34576	276569	220587
914	34818	277680	221275
858	34293	276271	220522

Renderer
733	34555	133998	107445
776	34530	133636	107099
832	33559	134685	107884
788	34261	133160	106633
751	34487	134155	107491
800	34529	134297	107625
772	34565	134284	107582
798	34870	134731	107995
802	34678	134739	107974
826	34636	133925	107299

Comment 4 by dskiba@chromium.org, Sep 14 2017

Cc: isherman@chromium.org asvitk...@chromium.org
I experimented with memory arenas (cc_enable_arenas = true), but that didn't move the needle. I think the reason is that VariationsSeed mostly consists of strings, and they are not affected by memory arenas (i.e. string's data is still allocated from the heap; google3 version of protobuf has support for this).

There is [ctype = STRING_PIECE] thing, but it's not complete (mainly because protobuf has its own copy of StringPiece, and doesn't want to expose it).

So I think we can't do anything here, unfortunately. We can't cache anything, because the logic that handles VariationsSeed is pretty complex. We can't speed up the allocator (theoretically possible, but hard). And we can't make protobuf not to create tons of strings.

I guess I can dig deeper to see what are those strings are, and if most of them are formatted like "45514eb0-1b151ae8", i.e. hex-hex, then we can change proto format and encode those parts as two ints. But that feels like a big project.

Guys, what do you think?
Components: Internals>Metrics
We could do more on the server-side to reduce the payload size. Right now there's a bunch of expired trials there that we can probably not send to clients.

But that wouldn't solve the root problem of why allocs are slow here. It would be good to find a solution to that.

For example, perhaps a protobuf API can be added that would use StringPieces for the string data - and have those string pieces point into the original protobuf data? (It would require making sure the lifetime of that byte buffer outlive the lifetime of the VariationSeed - but that should be possible to do in VariationsService.)

I think such a change would greatly reduce the amount of allocations needed - since all the strings would no longer need to be allocated and would just point to the existing data.
Components: -Internals>Metrics Internals>Metrics>Variations

Comment 7 by dskiba@chromium.org, Sep 14 2017

Yes, StringPiece + memory arena would be ideal, but [ctype = STRING_PIECE] is not fully supported, see https://cs.chromium.org/chromium/src/third_party/protobuf/src/google/protobuf/compiler/cpp/cpp_string_field.cc?l=115  and  https://www.mail-archive.com/protobuf@googlegroups.com/msg11752.html

I guess almost any allocator implementation is slow on 44K allocations / frees, it's just more visible on low-end devices because they're slower. (BTW Chrome does ~250K allocations during startup, so this is 1/5th of them.)

Not sending expired trials sounds good, because we're also spending time on checking them. Is it hard to do on the server-side?
There's a lot of trials marked as "expired". These specify a group to put users into when they're expired. (For new studies, we prefer they use end_date such when they're ended - it uses the client default rather than the expired group in the config.)

It should be possible to omit the ones that are ended. For expired ones, we need to do an audit to figure out whether there's still any code that relies on their behavior. If there is, we can't remove them.

Then, there are studies that specify a min/max version. These ones are still sent to all clients because we don't know the Chrome version of the client. (We can't user agent because client can override that.) So to properly filter them out, we'd need to add a milestone URL param on the request and get that approved by privacy before we can do the filtering.

So not a trivial change to handle all those cases. If we just want to handle end_date, it would be easy enough but it won't have enough of an effect, I believe.
By the way, I think the steps I mention in comment 8 are worthwhile to make and we should do them at some point. If it's something you'd like to help with since it aligns with the Go effort, I definitely welcome the help and happy to advise on the relevant parts to change.

I also think exploring adding StringPiece support to protobuf to be worthwhile as well.
By the way, here's how ChromeBrowserMainParts::SetupFieldTrials looks like on Windows:

https://uma.googleplex.com/callstacks?sid=172d64fb259ca5157d2ad166178a8565

There, we see most of the bottleneck being IO stuff (e.g. setting up files for persistent histograms.)

I sent server-side CL http://CL/168873380 to omit studies with end date in the past. It reduces size by about 10%.
Owner: asvitk...@chromium.org
Status: Assigned (was: Untriaged)
Just to follow-up, the server-side CL landed.

We've tentatively decided to have a Q4 OKR to do the other things I mentioned in comment 8 about server-side clean up of expired studies.

We're not currently planning to pursue the protobuf library optimizations I suggested in earlier comments. I think this would still be worthwhile orthogonally to reducing the number of studies and I encourage anyone interested to pursue this.
Owner: gayane@chromium.org
Assigning to gayane@ who'll be looking at making the server-side changes to omit sending old studies to the client.
BTW, I asked protobuf folks - they don't have a timeline for open-sourcing STRING_PIECE: https://groups.google.com/a/google.com/forum/#!topic/protobuf-discuss/eItJoGwj-pg

Any idea what's the latest numbers on this? Would it ever or at all be feasible to lazily init field trials? I'm sure it's not today cause we have some experiments that rely on it but an alternative we've employed elsewhere is to leverage android shared prefs when the config is needed really early. It means that it requires an extra browser restart for configs to take affect but it also means we don't have to parse the whole config early on in startup. 
Cc: yfried...@chromium.org
I don't see lazy init field trials happening any time soon. It has lots of challenges - mostly related to the fact that Chrome is multi-process and processes needed to have a consistent view of field trials. So you need to do the randomization early so that it can be made available to all processes in a consistent fashion.

Gayane is away this week, but she should have some numbers when she's back next week. I think it might be on the ballpark of 50% savings from the server-side changes.

Otherwise, if we want to do more beyond that, I think the protobuf optimization is the solution to pursue here. Because that way, we can simply not alloc all those strings during protobuf deserialization and just have StringPieces that point into the original buffer.
Do we know what those strings are, exactly? Are they experiment names, or IDs?

(Always wanted to check, but never had time...)
Study and experiment names as well as parameters and feature names. You can see the proto definition here:

https://cs.chromium.org/chromium/src/components/variations/proto/study.proto?q=Study.prot&sq=package:chromium&l=1
Also, Chrome max/min versions, countries, locales, hardware classes (all filters that can be specified) - so lots of things.
Hmm - sounds like servicifying the field trials component could solve some of that in terms of isolating deps and providing a consistent view? Some amount of work would still need to be done early, of course. 

Protobuf optimization sounds nice but also a really long way out :/
We already provide state via shared memory. Field trial APIs are synchronous so servicifying doesn't help since that would provide an async API. (And changing semantics would add a ton of code complexity to any code that wants to use field trial API.)

I don't necessarily buy that "Protobuf optimization sounds nice but also a really long way out". It is if we just wait for protobuf team's plans to some day materialize. But we could also work on this more proactively. If necessary extend the stock protobuf library with the functionality we want (and work with them to upstream it).
I also think that changing protobuf to support StringPiece is the cleanest way to optimize this. But protobuf folks want to use Abseil, which will complicate our life as we'll have to bring Abseil into Chrome too. However, benefits of bringing Abseil are not clear (there is a thread somewhere about this).

Maybe we need to invent a new protobuf feature to be able to use custom string classes. That way we could use our base::StringPiece.
Yeah, I was also thinking something along those lines.
In theory, could we just do something like concatenate all of the strings into a single string field and replace all of the string fields with a "StringPiece { int start, int end }" message that points into that buffer?

That might be simpler than needing to modify protobuf library directly.
It would work. Not sure how I feel about it, though. On one hand, it would be a very ugly workaround - adding lots of complexity to the protobuf format we use.

On the other hand, it would allow us to re-use some of the strings - e.g. when they appear in multiple configs, they could point to the same data. Though, this might be moot with gzip compression. I'd be curious to see how this would affect bandwidth (i.e. how big is the Finch seed size in such a format vs. now and same for gzipped sizes).

In vacuum, I would lean towards putting work on the protobuf library - however at the same time if such an approach could get us savings elsewhere (i.e. less bandwidth used), then maybe it's worth it.

FWIW, I don't think the protobuf stuff is necessarily a lot of work. All that's needed is to change the output code it produces to print StringPiece fields instead of std::string and then have some way to enable this mode for certain protos. So it might actually be less work than the solution of changing out protobuf format.
Labels: -Performance-Loading Performance-Startup
Project Member

Comment 28 by bugdroid1@chromium.org, Dec 14 2017

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/159ff63c16fd55bfdd6732bcfebedd77b9350677

commit 159ff63c16fd55bfdd6732bcfebedd77b9350677
Author: Dmitry Skiba <dskiba@chromium.org>
Date: Thu Dec 14 01:02:18 2017

Make CreateTrialsFromSeed visible in traces.

VariationsFieldTrialCreator::CreateTrialsFromSeed() function takes
~100ms on Android Go devices (see the bug).

This CL adds TRACE_EVENT to the function to make it visible in startup
traces.

Bug:  761684 
Change-Id: I339de72e6e04cfa2ba30e65ce8d5888d467fead2
Reviewed-on: https://chromium-review.googlesource.com/825647
Reviewed-by: Ilya Sherman <isherman@chromium.org>
Commit-Queue: Ilya Sherman <isherman@chromium.org>
Cr-Commit-Position: refs/heads/master@{#523964}
[modify] https://crrev.com/159ff63c16fd55bfdd6732bcfebedd77b9350677/components/variations/service/variations_field_trial_creator.cc

Blockedon: 776071
We have made changes to send smaller variation seeds to users by filtering field trials server side as well as clean up of expired field trials. Now when M64 is in stable we can see big drop in full variation seed size https://uma.googleplex.com/timeline_v2/?sid=90fcf0409da6c71a97f1cda0568cd881

This should make VariationsFieldTrialCreator::CreateTrialsFromSeed faster.
\o/
Very cool! Thanks! This is the first time I realized that Android has more experiments than any other platform.
Cc: dskiba@chromium.org
This is awesome! dskiba: have you observed any impact on startup traces?
Wondering whether this is worth monitoring this metric somehow? Chirp or system health for each milestone?
Status: Fixed (was: Assigned)
Closing this bug as Fixed per comment 30. We can file follow-up bugs if there's more to do (e.g. ideas around the proto stringpiece optimization), but the main size reduction via reducing number of studies has been done.

Sign in to add a comment