Cleanup the "tsmon" Go library. |
||||||||
Issue descriptionThe "tsmon" Go library has some API problems that I would like to clean up. Notably: - To use it, one must generally import 4+ packages (root, metric, target, distribution, etc.) - The implicit global state is at odds with the need to call Shutdown. - Having no explicit State installed in the Context means that, in the typical tsmon use case, all Context operations will result in full traversal that comes up emptyhanded. - Some of the names and layering are odd. Will someone ever want to use "AddFlags" just for Target? This is a mega-bug to track ideas and thoughts related to cleanup of this package for usability and readability.
,
Oct 10 2016
Yeah understood. I do think the implicit need to Shutdown/Flush something that you may not have even initialized at all is bad. I think an initialization requirement followed by a "defer Shutdown" is more idiomatic.
,
Dec 1 2016
,
Dec 11 2017
This issue has been Available for over a year. If it's no longer important or seems unlikely to be fixed, please consider closing it out. If it is important, please re-triage the issue. Sorry for the inconvenience if the bug really should have been left as Available. If you change it back, also remove the "Hotlist-Recharge-Cold" label. For more details visit https://www.chromium.org/issue-tracking/autotriage - Your friendly Sheriffbot
,
Dec 13 2017
Still valid bug.
,
Dec 14
This issue has been Available for over a year. If it's no longer important or seems unlikely to be fixed, please consider closing it out. If it is important, please re-triage the issue. Sorry for the inconvenience if the bug really should have been left as Available. For more details visit https://www.chromium.org/issue-tracking/autotriage - Your friendly Sheriffbot
,
Dec 17
+ People who may have opinions (I never used it)
,
Dec 17
I think this isn’t a Platform bug. It was in Platform component only because dnj was willing to fix it
,
Dec 17
(we may end up needing to take this back; tsmon needs some serious love, imo. The fact that it can lose a bunch of data on a routine basis from appengine without anyone knowing is pretty bad. Maybe switching to GKE would be the right thing to do, because then we can more predictably control spinup/shutdown of containers in the common case, as well as have background threads to push data 'in between' requests).
,
Dec 17
i believe monitoring team is capable of providing serious love (if not, we have bigger problems)
,
Dec 17
Re #c9: IIUC, the scope of this bug is mostly to refactor the code, and not to change the functionality. For fixing the data loss, part of the reason is captured in issue 622401 (and I'd guess it's a fairly big part). There are also ideas floating around about introducing a custom Target for GAE apps, which may facilitate the above. I'm not sure if there is a bug for that yet.
,
Dec 18
Prod-Tech has an OKR for ts-mon love : https://docs.google.com/document/d/1AUE4K_xNzDnSVIBz9lOScLX0gRVACFbpaRr7dfmSZA0 >> For fixing the data loss, part of the reason is captured in issue 622401 (and I'd guess it's a fairly big part). Yes, I agree that it is very likely a big part of the cause for data lose. The other possible causes are also described in the following tickets * crbug.com/829635 - ts_mon invalid data in Swarming causing monitoring packets to be lost * crbug.com/869274 - MonarchErrorRateTooHigh and ProtoValidationErrorRateTooHigh
,
Dec 18
|
||||||||
►
Sign in to add a comment |
||||||||
Comment 1 by vadimsh@chromium.org
, Oct 10 2016