Task profiler is enabled by default (kInitialStartupState = ThreadData::PROFILING_ACTIVE in tracked_objects.cc), and causes ~120 KiB of allocations (see base/tracked_objects.cc group in go/fxbus for example).
I don't know how task profiler is used on Android, but disabling it will save some memory.
Task profiler is enabled by default (kInitialStartupState = ThreadData::PROFILING_ACTIVE in tracked_objects.cc), and causes ~120 KiB of allocations (see base/tracked_objects.cc group in go/fxbus for example).
I don't know how task profiler is used on Android, but disabling it will save some memory.
Design doc:
https://docs.google.com/document/d/1cibE5nAG9sdjVMfs-MkvrMtp43i9MLGqM_uNusLqiBY
The task profiler data is included in UMA reports. I thought that there were some folks actively using this data for Android Chrome development, but I might be misremembering.
Filtering by OS is supported via the "Version" filter... though now that I look more closely, it looks like Android is not available in the list. So, I'd guess it's somewhat unlikely that the aggregated data is being used by folks working on Android Chrome =P
My understanding is the timing code was disabled. At some point there was a trial to enable for a small slice of users. Probably, if we're not enabling the timing code, we could also avoid the memory overhead for this.
Yes, I do mean on Android.
Re: Timing code, it was found that collecting times of the tasks has a bigger hit on Android than other platforms. So it was disabled (I think for 99% of users). I think we still report the data (i.e. other parts that don't include run times - such as counts of task, etc), but not the timing information.
My suggestion is for the 99% of the case, we could disable everything, not just the timing information. But for the 1% of sessions where we do enable it, it makes sense to still have it all. 1% of sessions on stable should still provide useful performance data while having a much smaller impact on users.
I'll try to dig up the relevant code.
Looking at server-side configs, it looks like it was 0.1% rather than 1% for stable. However, looks like the configs have expired. So we should double check the client code - it could be defaulting to actually enabling the timing info now?
+tdresser who's also been recently looking at the task profiler stuff for speed metrics
Here's the original CL that disabled it on Android:
https://codereview.chromium.org/99343002
So the part I missed in comment 9 is the piece in content_startup_flags.cc that specifically disables timing on Android by modifying command line flags.
Sorry for the delayed response, I've been OOO.
I'm looking at piggy backing on top of the task profiler to build per thread CPU usage metrics.
See https://codereview.chromium.org/2973543002/ for example.
That being said, it is much more heavyweight than I require.
Perhaps we could disable some task attribution data without disabling the profiler completely? Based on a cursory glance, it looks like attribution is responsible for the bulk of the memory consumption?
Just to provide some context on the bug, Brett is looking to reduce memory use and some of the task data showed up in traces. So the question is can we eliminate it if it's not useful?
We had a discussion around this and I mentioned that this data is useful on occasion, such as when it helped identify the cause of the go/m51-startup-regression-postmortem.
However, if it's causing significant memory use, I agreed that it doesn't make sense to make all our users pay for it. I suggested we can change it to be enabled just for 1% of session so that if we need to debug problems, data is still available, but most users don't have to pay the cost. We can also have a 1% control group as well and compare the data for the two to see how much the runtime data savings actually is with the new memory metrics that Erik had added.
However, Brett also mentioned there's a binary size impact of this - due to FROM_HERE macros that hold filenames as strings. This unfortunately we can't enable just for 1% users since its a compile-time thing. Brett suggested we could maybe store the PC address and symbolize it on the server, but this would require a bunch of work.
Another option is to see if we could obtain the same data from the Sampling Profiler - which records samples periodically - which both Brett and I agree would be a good solution, since it would allow us to entirely delete the C++ and JS chrome://profiler from Chrome. (Though the limitations there is we only support a subset of platforms for Sampling Profiler right now.)
Brett will write up a doc and more discussion can happen there.
If we are pulling out the task profiler, can we leave enough in place for building CPU time metrics? I don't need any of the location data for CPU time metrics, but my prototype relies on task profiler instrumentation for gathering task durations per thread.
Thinking about this more, I don't think sampling profiler data can replace what task profiler provides.
In particular, task profiler counts number of tasks posted as well as their run times - whereas the sampling profiler will just know how much time in aggregate those functions take to run. This means it's impossible to tell whether a function is actually janky (i.e. often takes over > 100ms or w/e threshold) vs. it just executes often and takes a lot of time in aggregated, but never causes jank for users (since all executions are short).
So given this, maybe the other plan that we had discussion (replacing location info with pc) might be better to pursue, else we would be losing jank data - which I think is still important for Chrome to not regress on (and we don't have any other mechanism to track this right now AFAIK).
Changing source location to PC would involve changing the protobuf format sent to UMA and updating our pipeline that consumes this - but I guess since we already do this for sampling profiler pipeline, it may not be that bad to adapt.
As for chrome://profiler - I agree that it's not very useful in its current form. Originally it was added because we didn't have data in UMA - but now we have UMA data so local aggregation doesn't as useful. What would be useful is for that page to instead show data from the last X amount of time only - let's say last 5 minutes. At least then, it could be more useful to figure out if some jank happened, what it
was. I've used the existing UI for that but it doesn't serve that use case very well because it's hard to tell if something slow was recent or old. In theory, if we just repurpose it to such a use case, the page could be greatly simplified - i.e. can remove all the JS and HTML and just have a simple text page that the code outputs - like chrome://histograms.
I've used http://go/uma-profiler a long long time ago via the manual instrumentation route to find janks.
I think asvitkine@'s analysis is right with regards to the differences between the sampling profiler and the task profiler in that the sampling profiler is vulnerable to task aliasing.
Now, having said that, there is a legitimate discussion to be had about the task profiler itself.
With the sampling profiler, it seems all first level triage of issues should go through the sampling profiler first, especially as it doesn't require manual instrumentation of code.
The task profiler should only be used if it cannot be determined from the sampling profiler if an issue exists.
This leads to a few questions (of which I don't have answers)
* From a practical standpoint, are there issues that only the task profiler can identify. In a hypothetical standpoint, the answer is yes via aliasing, but how often does this actually happen?
* Have we used the task profiler / http://go/uma-profiler recently to find and fix issues? If the answer is no and we generally agree that Chrome is operating reasonably well, perhaps we can get by without them.
Maybe I'm the one who uses http://go/uma-profiler the most in the past 6 months or so. So I think I should comment, no matter if we are going to retire task profiler or not. Btw, I am fairly new in Chrome (this is my 11th month), so some of my thoughts below may be wrong. I apologize in advance. A little more about me, I'm in Chrome Windows team focusing on improving Chrome performance, especially in CPU and power usage.
Quoted from comment #21 by robliao@
"Have we used the task profiler / http://go/uma-profiler recently to find and fix issues? If the answer is no and we generally agree that Chrome is operating reasonably well, perhaps we can get by without them."
1) Have we used the task profiler to find and fix issues? Yes, I have. Task profiler has helped me to fix the severe bugs in jumplist. See crbug/40407 where more than 10 related bugs are linked inside. This is a 7+ year old bug and I used 70+ CLs to fix it. See go/jumplist-icon-accum-fix and go/jumplist-boost-efficiency for the whole story if you're interested. I also used task profiler to find the severe issues in TopSites service. One example is crbug/763103.
2) Is Chrome operating reasonably well? Yes, it does. I love Chrome.
3) Does Chrome still nasty bugs to fix? Yes, see crbug/103737, crbug/176727 for example. Similar to jumplist, there are no useful traces and we cannot repro locally. Task profiler may be the only weapon that could help according to my own experience. If you have any suggestions, that'll be great!
Lastly, I also like the idea of removing the cost of task profiler if that's unnecessary.
Comment 1 by isherman@chromium.org
, Jul 6 2017