Issue metadata
Sign in to add a comment
|
5.2%-578% regression in media.tough_video_cases_tbmv2 at 479553:479976 |
||||||||||||||||||||||
Issue descriptionSee the link to graphs below.
,
Jun 19 2017
Started bisect job https://chromeperf.appspot.com/buildbucket_job_status/8976339695532750368
,
Jun 19 2017
=== Auto-CCing suspected CL author mmenke@chromium.org === Hi mmenke@chromium.org, the bisect results pointed to your CL, please take a look at the results. === BISECT JOB RESULTS === Perf regression found with culprit Suspected Commit Author : mmenke Commit : 14085ad54d50252bcadd1b1d7dc4a0bca2c9c5b3 Date : Thu Jun 15 21:53:54 2017 Subject: Use URLRequestContextBuilderMojo to create the SystemURLRequestContext. Bisect Details Configuration: mac_10_11_perf_bisect Benchmark : media.tough_video_cases_tbmv2 Metric : memory:chrome:all_processes:reported_by_chrome:net:effective_size_avg/video.html?src_garden2_10s.mp4 Change : 198.16% | 2067.14285714 -> 5408.66666667 Revision Result N chromium@479811 2067.14 +- 8161.65 14 good chromium@479839 3207.43 +- 9839.63 14 good chromium@479851 2449.14 +- 8989.76 14 good chromium@479853 3883.11 +- 7918.67 9 good chromium@479854 5110.67 +- 6630.22 9 bad <-- chromium@479855 4525.33 +- 6133.8 6 bad chromium@479858 2874.57 +- 9530.75 14 bad chromium@479864 2874.57 +- 9530.75 14 bad chromium@479916 5408.67 +- 4838.22 6 bad To Run This Test src/tools/perf/run_benchmark -v --browser=release --output-format=chartjson --upload-results --pageset-repeat=1 --also-run-disabled-tests --story-filter=video.html.src.garden2.10s.mp4 media.tough_video_cases_tbmv2 Debug Info https://chromeperf.appspot.com/buildbucket_job_status/8976339695532750368 Is this bisect wrong? https://chromeperf.appspot.com/bad_bisect?try_job_id=5781280437829632 | O O | Visit http://www.chromium.org/developers/speed-infra/perf-bug-faq | X | for more information addressing perf regression bugs. For feedback, | / \ | file a bug with component Speed>Bisection. Thank you!
,
Jun 19 2017
I can't make any progress here without an english language description of what is being tested, or some deterministic way to get that information.
,
Jun 19 2017
c#3 has some details. I'm not sure the exact specifics of "net:effective_size_avg" but it's essentially saying more memory is being used by net/ during playback of the garden2_10s.mp4 clip. +sullivan who might know more about that specific benchmark.
,
Jun 19 2017
That said, the benchmark says these values are in bytes?? So a regression of 48 - 192 bytes seems silly to alert on unless that's a mistake and this is kb.
,
Jun 19 2017
Given that the numbers are rock stable (i.e., no variance at all), which seems rather surprising when measuring anything, and the label is "reported_by_chrome" (As opposed to something actually measured), I wouldn't be surprised if it's just the way something's being measured was changed. Regardless, the words "effective_size_avg" appear nowhere in the Chrome repository. Well, once in a comment and once in the middle of a quoted URL in a pythong file, never in actual code, I don't think this is actionable without more information from some mysterious source outside the Chrome repo.
,
Jun 20 2017
+xunjieli who might know what to make of the memory numbers here, and please also see the documentation here: https://chromium.googlesource.com/chromium/src/+/master/docs/memory-infra/memory_benchmarks.md It's correct that the numbers are in bytes. crouleau: should we change the alerting config for the memory metrics to have a minimum absolute value? maybe a few hundred kilobytes?
,
Jun 20 2017
That doc doesn't really help. I have no idea what size this is claiming increased. If that's bytes, though, I guess this is just WontFix? Investing significant time on saving two hundred bytes across all of Chrome seems silly.
,
Jun 20 2017
Re #9: The "trace" link on perf dashboard will gives you a trace file. If you click on a memory dump dot, you will see a "net" category in the analysis view. The increase here is caused by an 0.2KiB increase in System Profile's in-memory HTTP Cache. Before: https://00e9e64bacf48734629c89c35cfe159fa1ff5918e5e2d4c5ec-apidata.googleusercontent.com/download/storage/v1/b/chrome-telemetry-output/o/trace-file-id_16-2017-06-11_15-08-16-16934.html?qk=AD5uMEsy2N4CyOH0ww_D3StMxI56jzuRHwInBXAYhqnqRkDSO4VulSbsglbSLtKdPpkr-ay43IUNytO-Jq7xdBDxgdAtPWQ9ywo56ZkfZKlhBXBteCmi62KqyaqhIEQjsmN6k76vwo-dpW8QOgzodOqq-ytdmMc-xF23dbGtGSi5p-ZbkrPnSoSD3m1_ocZgI-I65cgg313Or3j8nQI9AdL8dkIFVw-6ERQLJrHN1jk88i3lkXtQcAnuSLOS6EPufov5WEY-3bHckk6CPE7IesCmxIPO-_SgQuVY8UKpwTHvWyrmoU7ekxSJtmSSZm3UUEtDo4cf1zALs_m6FWcI8zFh4zh3VBV6GjO-kxs2CFJGQ4xRUDUMDYX4abwwZGcMYgg8D7rIR2g0a3XnZEXeW0sqaMdc-_W67VSGn2ln0U5EW5LuCLdU7XDuagkfkDF-UFnjT1UYrDwGPoSHMMWC8BlWkJLyXSMug4GJwTn7vJHd5VgyN1qpP4hrFmvulwANnBwm5CJwtDHmNfg-LGnd5NixqPQ9rCHd_hY4Xm-0uGzgtC7Glrvd-4k4n_pTNuimHowSiX5OEPQ9ZDh6Mg5FrQtWBYf84w3ZF7qpcjpcHYuDpV65MB-u9VSjsNOpqU8y6Bindt5Rob1-jIFbujkoPHCdfBwA-MafmbwzN-BXgVIU02ukonRVRUyueWNWJ0PJfrtJxcn2acYt9qeRHzDAKWJqKMACpb1rvhxZvRrWOIilhc8tF_fjtsYKhy-b9kpv_n2ox_BX_VNf6Khk_v1GFi1gFFHqtJpYzw2GWEq1wvC8YNrJgBEWMp65_egxqVqFgKQsbh7k4qM6lXHsuIX-hRzTEgJgugD_Cw After: https://00e9e64bac2863ccf022a12de61c215d45e8ecd9391b318c35-apidata.googleusercontent.com/download/storage/v1/b/chrome-telemetry-output/o/trace-file-id_16-2017-06-15_22-49-59-51339.html?qk=AD5uMEvM-CsVYfEtnoehpcSO-3ZxpOvDV3CIgxydq5n6goRcGT2QV6qW9EpKJ1o91ffpfjAWXkuHkEGLWBP9wL3VJ99Mkt7_1I-zCRLBCxtDKFlwDuZZ3fdG86vJYkMjb0bm_-sJsMhRywDocoMjGmKwYsSnIYoeuYl8p6S04cLzKZvMDRR15QUxYKvVSg8UkJhjQzAK2tYtdGdmC8yGWqYVVoqQ8Lv0zsxRKVxLac6J_NZ5oparu7gJQs6wCLUMjJxejxvgjPv_cOS0uAdVjHSbAHB-iOj4ztvzd9Z-Ftg9yWFdJfxLcWvRu5yQVfriErDoIsHJCoCxqOsgnGNxaUcwPIA3x3VAHtrUfvw8UmKpVqa8I5magrbDZyRoleCoAIDDpj2-seHB4EklvPkeHSlutC-U3Zsut4wu9fK0p3dV-XmW8JF1fTsbFECgZ86UAS5Y6mKurxhm5f3L7izG4aJytLmZiA_JNV3ITIGMTbGhoJxeKQp34DalG4FUEMacOricVXNtc3WiGZl6YP5BhsHExwxkXNSJhdrbg8oYLMJqZBQFITyk5749C-XTT_peEIlojeMDUOmdxEoRz4FxFHz_X0OM1kFQuh8sJVPrniut8VUg426AJURgPJxAJ4CGxvBdZ1mZuyAxR6zQNJSFvqP2zS6xRrDlO7QVsloLis8MBr6Wy-L6qp1KydwHyO0z-Oo3FIb6E_0hRrb51Wj_2zC6AIdmHE_h4jkCDiQ_Fxyp7xJNHQ0p3UF1kjcb1C6PmtPPQJnS0Gn5ZdfjytmhPwanf_jllmLue07y2waOSmgp0DD5DTRoKHmPDxPL3FlHegQhM8KMNBMYdyGupT_t_tYnCgsdRmArOQ
,
Jun 20 2017
How is someone who does not work full time on memory infrastructure supposed to go from https://chromeperf.appspot.com/group_report?bug_id=734619 and "memory:chrome:all_processes:reported_by_chrome:net:effective_size_avg/video.html?src_garden2_10s.mp4" to realizing this was attributable to the cache? Assigning bugs people have no way to make progress on it, other than to whine to the memory team helplessly, seems like it will cause needless aggravation and hostility towards this sort of bug. So it looks like this is real regression - the CL gave the system request context a memory cache, and it didn't have one before.
,
Jun 20 2017
I planned to write up a doc on //net MemoryDumpProvider and it's on my TODOs. Hopefully that will help. I will make it a priority to do it before the end of this quarter :)
,
Jun 20 2017
And where is the "trace link on the perf dashboard"? I see 4 chromeperf.appspot.com links (One of which doesn't work, due to its length, I assume), and I can find a trace link on none of them, even knowing I should be looking for it.
,
Jun 20 2017
You can right click anywhere on the graph, it will bring up a box which has a blue "Trace" link in it. If you right click on the alert exclamation mark, you can see a blue "Trace" link after "See all performance changes at 479942" (screenshot_trace1.png). When you right click on any point on the flat line (before regression), you will see the same box UI which contains a "Trace" link after "Buildbot stdio" (screenshot_trace2.png).
,
Jun 20 2017
,
Jun 20 2017
Setting to proper status "WontFix" since no changes were made. To respond to #8: "crouleau: should we change the alerting config for the memory metrics to have a minimum absolute value? maybe a few hundred kilobytes?" That sounds like a reasonable change since it would reduce the number of alerts my team has to deal with. I would be happy to work on this if you point me to the code that handles this. To respond to comment #11: I think it is reasonable that someone with the information dalecurtis@ provided in #5 who had recently made a change to add a cache to something could understand that their change caused a slight memory regression, and we are happy to work with you to figure out the problem. It's true that the documentation and UI could use a good deal of work (And adding units to the bug post would be helpful as well), but the current system is pretty workable. We realize that preventing speed regressions is a pain for developers, but it is necessary to keep Chrome from gradually becoming sluggish.
,
Jun 20 2017
mmenke@ landed a fix https://codereview.chromium.org/2945083002/ which is not stapled to this bug for some reason. Maybe there is a lag. Please don't increase the threadhold for this particular metric ( memory:chrome:all_processes:reported_by_chrome:net:effective_size_avg). It has caught a legitimate regression.
,
Jun 20 2017
[crouleau]: I didn't exactly make a cache change. I made a change that completely replaced how the SystemURLRequestContext was created - from using a bunch of one-off code to using shared creation code (i.e., nothing cache specific). So it could have been a change in literally any network stack object that is a component of a SystemURLRequestContext. It could even be a change in what we're reporting, since the reporting pathway is unclear. "net:effective_size_avg" is unclear - it does not make clear what's being measured (binary size? memory size? network service process memory? proxy process memory?) The network stack, even just after initialization, is a heck of a lot larger than the 200 bytes the dashboard claims - heck, an empty, uninitialized URLRequestContext is larger than that, so it's not all clear what it's measuring. Also, #5 was incorrect - this is memory after playback was complete (i.e., no active URLRequests), which makes a pretty significant difference.
,
Jun 20 2017
Re #18: "net::effective_size_avg" is whatever //net MemoryDumpProvider reports. It's the size of memory usage in bytes that are tracked by //net MemoryDumpProvider (design doc: https://docs.google.com/document/d/1zBX27tvkc8ZJHp9yvm6EQJrMDIhDCU92YlahaGFwsyc/edit?usp=sharing) in the browser process. There's no network service process yet as far as I know, otherwise measurement would be more straightforward. It's true that net stack uses more memory than what's reported. Though when I measured locally using real network, the order of magnitude of memory usage reported by MemoryDumpProvider and that of heap profiler are the same. This metric is a good proxy. Tracking everything turns out to be non-trivial and expensive, so I tracked the larger chunks.
,
Jun 20 2017
Even as someone who is at least vaguely aware of the existence of MemoryDumpProvider, it didn't even occur to me that this metric covers the stuff net provides to the memory service via the MemoryDumpProvider interface. I think some sort of natural language explanation, along the lines of what UMA provides, would have been helpful. I'm not arguing that we necessarily have to provide completely coverage, I just think we should have some way to clearly go from an assigned bug report to figuring out that this is from MemoryDumpProvider, and how to find the hidden traces that are a couple clicks away from one of the 4 links on these bugs typically have.
,
Jun 20 2017
mmenke@, feel free to file a bug suggesting improvements to this process/UI to https://github.com/catapult-project/catapult/issues. The problem that Speed Ops is solving is a very difficult one and resources must be sent to the highest priority problems, so sometimes the UI isn't polished.
,
Jun 20 2017
Sure, have to priority. But I was completely lost on how to handle this bug report until comment #10. If I'm not the only one who finds these sorts of bug reports extremely confusing, it seems like either you guys will have to waste time repeatedly explaining what's going on (Which may cumulatively waste more time than making these reports more user friendly), or they may end up being ignored (Which defeats the point of having all this automated testing infrastructure).
,
Jun 23 2017
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/0c0ed3be51e0487b4a210f8a77ff714358d9b95e commit 0c0ed3be51e0487b4a210f8a77ff714358d9b95e Author: xunjieli <xunjieli@chromium.org> Date: Fri Jun 23 14:08:35 2017 Add documentation in memory_benchmarks.md on finding links to trace files This CL: - incorporates the feedback (see linked bug) that memory_benchmarks.md doesn't mention where to find the trace files on perf dashboard. - adds "memory:chrome:all_processes:reported_by_chrome:net:effective_size_avg" string in a comment in url_request_context.h, so it's greppable in code and makes it easier to find where the metric comes from. BUG= 734619 Review-Url: https://codereview.chromium.org/2952203002 Cr-Commit-Position: refs/heads/master@{#481871} [modify] https://crrev.com/0c0ed3be51e0487b4a210f8a77ff714358d9b95e/docs/memory-infra/memory_benchmarks.md [modify] https://crrev.com/0c0ed3be51e0487b4a210f8a77ff714358d9b95e/net/url_request/url_request_context.h
,
Jun 23 2017
|
|||||||||||||||||||||||
►
Sign in to add a comment |
|||||||||||||||||||||||
Comment 1 by dalecur...@chromium.org
, Jun 19 2017