New issue
Advanced search Search tips

Issue 908893 link

Starred by 1 user

Issue metadata

Status: Available
Owner: ----
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: ----
Pri: 2
Type: Bug
Intent-Implement: NotStarted
Intent-Ship: NotStarted



Sign in to add a comment

Compute confidence intervals for histograms

Project Member Reported by behdadb@chromium.org, Nov 27

Issue description

When viewing the historical data of each metrics, values are reported as  average ± std (example: 92.4416 (± 145.247)).
In order to measure the precision of estimated mean the values should be reported as average ± confidence interval

So for the example above with mean of 92.4416 and std of 145.247, mean with confidence interval of 95% would be:

92.4416 (± 59.3607)       [knowing that count = 23]

Comparing charts: https://docs.google.com/spreadsheets/d/1Wrv00fPP9doc1UJ18egMKTu-GBkjTTjheRj5c3DokSg/edit?usp=sharing
Some discussion on the topic: https://www.researchgate.net/post/What_is_the_difference_between_meanSD_and_meanSE


 
std.png
146 KB View Download
Description: Show this description
Description: Show this description
Cc: sullivan@chromium.org benjhayden@chromium.org vmi...@chromium.org
Components: Speed>Dashboard
Labels: -Type-Design-Review Pri-2 Type-Bug
Description: Show this description
Cc: tdres...@chromium.org
Thanks! Flexibility to compute and display alternative statistics is a big part of the reason that we're switching from chartjson to Histograms, and charts are being rewritten from scratch in v2spa.

When researching statistics in this context, it is important to remember that most performance metrics are not normally distributed. Many are multi-modal or quantized or resemble another distribution such as log-normal. (I did some statistics a few years ago and found many many UMA that fit log-normal very well, often multi-modal.) Also, many benchmarks record relatively few samples (often 1-10), so it is rarely appropriate to apply the central limit theorem. I've considered adding a windowing mode to v2spa charts to merge successive data points; this would effectively increase the number of samples per data point, which could enable using the central limit theorem.

IIRC, +Tim considered using inter-percentile ranges, but I'm not sure if they're enabled by any metrics. They aren't available in v2spa yet, but I'm looking forward to displaying them there. I'm open to suggestions for alternative statistics such as confidence intervals. We'll just need to remember to carefully explain them in the UI and documentation, so we might want to hew towards statistics that are easier to explain to our user base, at least at first.

Please feel free to schedule a VC to discuss statistics further, or we can add this topic to the Speed Services Weekly agenda.

Have you considered using bootstrap resampling to calculate confidence intervals, instead of assuming anything about the distribution shape?

My crude recollection of the technique: shuffle the actual samples you have a bit ("sampling with replacement"), recalculate the mean (or median, or whatever parameter is of interest) from the shuffled samples, and treat that as a new synthetic sample.  Repeat however many times you like to generate however many "samples" you want, sort the means calculated from the resamples, and pick the upper and lower N of those to get the % CI you want. 

https://ocw.mit.edu/courses/mathematics/18-05-introduction-to-probability-and-statistics-spring-2014/readings/MIT18_05S14_Reading24.pdf


Thanks, Sean!
Yep, bootstrap resampling looks to me like a good way to compute CIs here.

behdadb: Is this blocking anything?
If you want, you can go ahead and extend Histogram.getStatisticScalar() to support statNames like 'ci_NN' such as ci_90 or ci_95:
https://github.com/catapult-project/catapult/blob/master/tracing/tracing/value/histogram.html#L672
Patches welcome!

For now, you'll need to modify the relevant TBMv2 metrics to enable those statistics using summaryOptions in order to display them in results.html, though I hope to deprecate summaryOptions eventually so users can always get any statistic for any histogram, and I'll also add support for them in v2spa eventually. Just let me know what your timeline looks like!
No this is not blocking anything, this only came up as an effort to take the noise of the data into account for the analysis.

I also think that bootstrap resampling would be a good way to do it, and as a result having CIs would give better perspective to anyone studying the data.
I would assume that first step would be adding CIs using bootstrap resampling to the statistics we calculate. I have a example of how it would be in this document: https://docs.google.com/document/d/1phwJlLgW1zsoHkBt_cK_tP_EGg0iJeJdbjdOPc4KHP8/edit?usp=sharing

Components: -Speed>Dashboard Speed>TBM2
Changing component: this bug is more about Histograms than the dashboard, which are currently maintained as part of tbm2. I'll file a separate bug about displaying statistics in v2spa when we're ready for that work. When tbm3 provides the one true implementation of Histograms, if these ci statistics are not yet implemented, this bug could move to a tbm3 component.
Summary: Compute confidence intervals for histograms (was: Improving value representation in perf dashboard)

Sign in to add a comment