Enforce a per-step and/or per-run deadline for performance tests |
||
Issue descriptionhttps://bugs.chromium.org/p/chromium/issues/detail?id=669088 tracks an instance where a deadlock in cloud storage resulted in a four day hang on Win High-DPI (1). I think that we should probably make this impossible to happen in the future. At the very least, it seems like we should have a per-step deadline. This could be high enough to guarantee that it doesn't kick in until something goes seriously wrong - maybe 12, 18, or 24 hours. This would prevent a multi-day hang from ever occurring again and instead manifesting as a more-manageable test failure. I could also imagine a situation where different steps each hang until the deadline for the same underlying reason. If there are 20 steps, and 10 of them hang for the same reason until a 12 hour deadline, that would still result in a 5 day hang, which would still be unacceptable. For this reason, it seems like it might make sense to enforce a deadline for an entire run that's longer than the deadline for a single step. eyaich@, nednguyen@, WDYT? Any ideas about how hard this would be to implement?
,
Nov 30 2016
So this is done on swarming, we have a 5 hour execution timeout for a run for our perf tests. Our unittests are swarmed but not sure what the execution time out is set to there. I think the default swarming timeout is 2 hours.
,
Dec 1 2016
Should our policy for these types of "solved by swarming" problems be not to take action with the idea that all perf unit tests will be swarmed soon enough?
,
Dec 5 2016
After more discussion, I think that "solved by swarming" is okay as long as it's accompanied by a "warn jbudorick@ that he might want to come up with something for Android" too.
,
Jan 12 2017
|
||
►
Sign in to add a comment |
||
Comment 1 by charliea@chromium.org
, Nov 30 2016