SetupFieldTrials is slow on 512 Go device |
|||||||||||
Issue descriptionLarger part of ChromeBrowserMainParts::PreCreateThreadsImpl (~200ms) was not covered by trace events, so I started digging. It turned out that most of the time was spent in ChromeBrowserMainParts::SetupFieldTrials. However, when I dug deeper I found that: 1. VariationsSeed::ParseFromString() in VariationsSeedStore::LoadSeed() takes 57ms 2. VariationsSeed destructor in VariationsFieldTrialCreator::CreateTrialsFromSeed() takes 26ms This might be a sign of a general slowness of alloc / free functions, I'll keep digging.
,
Sep 11 2017
I added some instrumentation, and found out that: 1. ParseFromString allocates a lot: malloc_time_ms "131 - 46 = 85" allocation_count "63992 - 16697 = 47295" allocated_bytes "5750470 - 4264334 = 1486136" free_count "23164 - 13377 = 9787" 2. ~VariationsSeed frees it all: malloc_time_ms "186 - 132 = 54" allocation_count "64284 - 64284 = 0" allocated_bytes "5759566 - 5759566 = 0" free_count "60884 - 23375 = 37509" The slowdown is probably coming from enormous number of alloc/free operations, but might also be because of mmap/munmap/madvise calls by jemalloc.
,
Sep 12 2017
Modifying jemalloc's decay time doesn't seem to do anything: Columns below are: |total time (ms)|total allocated (KiB)|#allocations|#frees| ***** DECAY=1 (default) ***** Browser 869 34418 275931 220139 940 34686 277598 221144 920 34536 275357 219635 887 34570 276952 220939 901 34654 277127 220940 923 34980 278359 221499 876 34295 274765 219541 1064 34615 276824 220681 865 34172 274900 219042 885 34223 274792 219425 Renderer 778 34629 133924 107295 796 34704 134836 108052 825 35104 135630 108774 756 34997 135299 108500 737 34610 134564 107827 742 34614 134618 107913 782 35053 135491 108665 817 33970 132720 106216 759 34588 133746 107140 797 35047 135360 108535 ***** DECAY=0 ***** Browser 971 34986 278411 221922 902 34608 275502 219802 982 34839 277009 220741 877 34478 275513 219914 980 34612 276799 220559 937 34273 275253 219371 986 34760 277059 220938 875 34159 274450 219134 917 34893 278699 221958 938 34932 278197 221429 Renderer 725 34692 134913 108127 742 34390 134278 107561 777 34546 134827 108023 748 34667 134736 107981 797 34623 134569 107843 751 34335 133423 106856 769 34230 134516 107894 761 34508 133471 106913 778 35013 135346 108533 762 34672 134702 107942 ***** DECAY=1000 ***** Browser 860 34115 274458 219014 859 34516 277054 220780 947 36113 286245 226816 915 34019 274031 218731 936 34730 278310 222043 872 34754 276710 220735 913 34892 278039 221860 861 34576 276569 220587 914 34818 277680 221275 858 34293 276271 220522 Renderer 733 34555 133998 107445 776 34530 133636 107099 832 33559 134685 107884 788 34261 133160 106633 751 34487 134155 107491 800 34529 134297 107625 772 34565 134284 107582 798 34870 134731 107995 802 34678 134739 107974 826 34636 133925 107299
,
Sep 14 2017
I experimented with memory arenas (cc_enable_arenas = true), but that didn't move the needle. I think the reason is that VariationsSeed mostly consists of strings, and they are not affected by memory arenas (i.e. string's data is still allocated from the heap; google3 version of protobuf has support for this). There is [ctype = STRING_PIECE] thing, but it's not complete (mainly because protobuf has its own copy of StringPiece, and doesn't want to expose it). So I think we can't do anything here, unfortunately. We can't cache anything, because the logic that handles VariationsSeed is pretty complex. We can't speed up the allocator (theoretically possible, but hard). And we can't make protobuf not to create tons of strings. I guess I can dig deeper to see what are those strings are, and if most of them are formatted like "45514eb0-1b151ae8", i.e. hex-hex, then we can change proto format and encode those parts as two ints. But that feels like a big project. Guys, what do you think?
,
Sep 14 2017
We could do more on the server-side to reduce the payload size. Right now there's a bunch of expired trials there that we can probably not send to clients. But that wouldn't solve the root problem of why allocs are slow here. It would be good to find a solution to that. For example, perhaps a protobuf API can be added that would use StringPieces for the string data - and have those string pieces point into the original protobuf data? (It would require making sure the lifetime of that byte buffer outlive the lifetime of the VariationSeed - but that should be possible to do in VariationsService.) I think such a change would greatly reduce the amount of allocations needed - since all the strings would no longer need to be allocated and would just point to the existing data.
,
Sep 14 2017
,
Sep 14 2017
Yes, StringPiece + memory arena would be ideal, but [ctype = STRING_PIECE] is not fully supported, see https://cs.chromium.org/chromium/src/third_party/protobuf/src/google/protobuf/compiler/cpp/cpp_string_field.cc?l=115 and https://www.mail-archive.com/protobuf@googlegroups.com/msg11752.html I guess almost any allocator implementation is slow on 44K allocations / frees, it's just more visible on low-end devices because they're slower. (BTW Chrome does ~250K allocations during startup, so this is 1/5th of them.) Not sending expired trials sounds good, because we're also spending time on checking them. Is it hard to do on the server-side?
,
Sep 14 2017
There's a lot of trials marked as "expired". These specify a group to put users into when they're expired. (For new studies, we prefer they use end_date such when they're ended - it uses the client default rather than the expired group in the config.) It should be possible to omit the ones that are ended. For expired ones, we need to do an audit to figure out whether there's still any code that relies on their behavior. If there is, we can't remove them. Then, there are studies that specify a min/max version. These ones are still sent to all clients because we don't know the Chrome version of the client. (We can't user agent because client can override that.) So to properly filter them out, we'd need to add a milestone URL param on the request and get that approved by privacy before we can do the filtering. So not a trivial change to handle all those cases. If we just want to handle end_date, it would be easy enough but it won't have enough of an effect, I believe.
,
Sep 14 2017
By the way, I think the steps I mention in comment 8 are worthwhile to make and we should do them at some point. If it's something you'd like to help with since it aligns with the Go effort, I definitely welcome the help and happy to advise on the relevant parts to change. I also think exploring adding StringPiece support to protobuf to be worthwhile as well.
,
Sep 14 2017
By the way, here's how ChromeBrowserMainParts::SetupFieldTrials looks like on Windows: https://uma.googleplex.com/callstacks?sid=172d64fb259ca5157d2ad166178a8565 There, we see most of the bottleneck being IO stuff (e.g. setting up files for persistent histograms.)
,
Sep 15 2017
I sent server-side CL http://CL/168873380 to omit studies with end date in the past. It reduces size by about 10%.
,
Sep 21 2017
Just to follow-up, the server-side CL landed. We've tentatively decided to have a Q4 OKR to do the other things I mentioned in comment 8 about server-side clean up of expired studies. We're not currently planning to pursue the protobuf library optimizations I suggested in earlier comments. I think this would still be worthwhile orthogonally to reducing the number of studies and I encourage anyone interested to pursue this.
,
Oct 3 2017
Assigning to gayane@ who'll be looking at making the server-side changes to omit sending old studies to the client.
,
Oct 3 2017
BTW, I asked protobuf folks - they don't have a timeline for open-sourcing STRING_PIECE: https://groups.google.com/a/google.com/forum/#!topic/protobuf-discuss/eItJoGwj-pg
,
Nov 7 2017
Any idea what's the latest numbers on this? Would it ever or at all be feasible to lazily init field trials? I'm sure it's not today cause we have some experiments that rely on it but an alternative we've employed elsewhere is to leverage android shared prefs when the config is needed really early. It means that it requires an extra browser restart for configs to take affect but it also means we don't have to parse the whole config early on in startup.
,
Nov 7 2017
,
Nov 7 2017
I don't see lazy init field trials happening any time soon. It has lots of challenges - mostly related to the fact that Chrome is multi-process and processes needed to have a consistent view of field trials. So you need to do the randomization early so that it can be made available to all processes in a consistent fashion. Gayane is away this week, but she should have some numbers when she's back next week. I think it might be on the ballpark of 50% savings from the server-side changes. Otherwise, if we want to do more beyond that, I think the protobuf optimization is the solution to pursue here. Because that way, we can simply not alloc all those strings during protobuf deserialization and just have StringPieces that point into the original buffer.
,
Nov 7 2017
Do we know what those strings are, exactly? Are they experiment names, or IDs? (Always wanted to check, but never had time...)
,
Nov 7 2017
Study and experiment names as well as parameters and feature names. You can see the proto definition here: https://cs.chromium.org/chromium/src/components/variations/proto/study.proto?q=Study.prot&sq=package:chromium&l=1
,
Nov 7 2017
Also, Chrome max/min versions, countries, locales, hardware classes (all filters that can be specified) - so lots of things.
,
Nov 7 2017
Hmm - sounds like servicifying the field trials component could solve some of that in terms of isolating deps and providing a consistent view? Some amount of work would still need to be done early, of course. Protobuf optimization sounds nice but also a really long way out :/
,
Nov 7 2017
We already provide state via shared memory. Field trial APIs are synchronous so servicifying doesn't help since that would provide an async API. (And changing semantics would add a ton of code complexity to any code that wants to use field trial API.) I don't necessarily buy that "Protobuf optimization sounds nice but also a really long way out". It is if we just wait for protobuf team's plans to some day materialize. But we could also work on this more proactively. If necessary extend the stock protobuf library with the functionality we want (and work with them to upstream it).
,
Nov 7 2017
I also think that changing protobuf to support StringPiece is the cleanest way to optimize this. But protobuf folks want to use Abseil, which will complicate our life as we'll have to bring Abseil into Chrome too. However, benefits of bringing Abseil are not clear (there is a thread somewhere about this). Maybe we need to invent a new protobuf feature to be able to use custom string classes. That way we could use our base::StringPiece.
,
Nov 7 2017
Yeah, I was also thinking something along those lines.
,
Nov 7 2017
In theory, could we just do something like concatenate all of the strings into a single string field and replace all of the string fields with a "StringPiece { int start, int end }" message that points into that buffer?
That might be simpler than needing to modify protobuf library directly.
,
Nov 7 2017
It would work. Not sure how I feel about it, though. On one hand, it would be a very ugly workaround - adding lots of complexity to the protobuf format we use. On the other hand, it would allow us to re-use some of the strings - e.g. when they appear in multiple configs, they could point to the same data. Though, this might be moot with gzip compression. I'd be curious to see how this would affect bandwidth (i.e. how big is the Finch seed size in such a format vs. now and same for gzipped sizes). In vacuum, I would lean towards putting work on the protobuf library - however at the same time if such an approach could get us savings elsewhere (i.e. less bandwidth used), then maybe it's worth it. FWIW, I don't think the protobuf stuff is necessarily a lot of work. All that's needed is to change the output code it produces to print StringPiece fields instead of std::string and then have some way to enable this mode for certain protos. So it might actually be less work than the solution of changing out protobuf format.
,
Nov 23 2017
,
Dec 14 2017
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/159ff63c16fd55bfdd6732bcfebedd77b9350677 commit 159ff63c16fd55bfdd6732bcfebedd77b9350677 Author: Dmitry Skiba <dskiba@chromium.org> Date: Thu Dec 14 01:02:18 2017 Make CreateTrialsFromSeed visible in traces. VariationsFieldTrialCreator::CreateTrialsFromSeed() function takes ~100ms on Android Go devices (see the bug). This CL adds TRACE_EVENT to the function to make it visible in startup traces. Bug: 761684 Change-Id: I339de72e6e04cfa2ba30e65ce8d5888d467fead2 Reviewed-on: https://chromium-review.googlesource.com/825647 Reviewed-by: Ilya Sherman <isherman@chromium.org> Commit-Queue: Ilya Sherman <isherman@chromium.org> Cr-Commit-Position: refs/heads/master@{#523964} [modify] https://crrev.com/159ff63c16fd55bfdd6732bcfebedd77b9350677/components/variations/service/variations_field_trial_creator.cc
,
Dec 14 2017
,
Mar 9 2018
We have made changes to send smaller variation seeds to users by filtering field trials server side as well as clean up of expired field trials. Now when M64 is in stable we can see big drop in full variation seed size https://uma.googleplex.com/timeline_v2/?sid=90fcf0409da6c71a97f1cda0568cd881 This should make VariationsFieldTrialCreator::CreateTrialsFromSeed faster.
,
Mar 9 2018
\o/
,
Mar 9 2018
Very cool! Thanks! This is the first time I realized that Android has more experiments than any other platform.
,
Mar 12 2018
This is awesome! dskiba: have you observed any impact on startup traces? Wondering whether this is worth monitoring this metric somehow? Chirp or system health for each milestone?
,
Sep 25
Closing this bug as Fixed per comment 30. We can file follow-up bugs if there's more to do (e.g. ideas around the proto stringpiece optimization), but the main size reduction via reducing number of studies has been done. |
|||||||||||
►
Sign in to add a comment |
|||||||||||
Comment 1 by dskiba@chromium.org
, Sep 3 20171.3 MB
1.3 MB Download