New issue
Advanced search Search tips
Note: Color blocks (like or ) mean that a user may not be available. Tooltip shows the reason.

Issue 654510 link

Starred by 4 users

Issue metadata

Status: Available
Owner: ----
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: ----
Pri: 3
Type: Bug



Sign in to add a comment

Cleanup the "tsmon" Go library.

Project Member Reported by d...@chromium.org, Oct 10 2016

Issue description

The "tsmon" Go library has some API problems that I would like to clean up. Notably:
- To use it, one must generally import 4+ packages (root, metric, target, distribution, etc.)
- The implicit global state is at odds with the need to call Shutdown.
- Having no explicit State installed in the Context means that, in the typical tsmon use case, all Context operations will result in full traversal that comes up emptyhanded.
- Some of the names and layering are odd. Will someone ever want to use "AddFlags" just for Target?

This is a mega-bug to track ideas and thoughts related to cleanup of this package for usability and readability.
 
Components: -Infra Infra>Monitoring
> - The implicit global state is at odds with the need to call Shutdown.

This is mostly to do the final Flush. Otherwise some points can be lost.

Comment 2 by d...@chromium.org, Oct 10 2016

Yeah understood. I do think the implicit need to Shutdown/Flush something that you may not have even initialized at all is bad. I think an initialization requirement followed by a "defer Shutdown" is more idiomatic.
Status: Available (was: Untriaged)
Project Member

Comment 4 by sheriffbot@chromium.org, Dec 11 2017

Labels: Hotlist-Recharge-Cold
Status: Untriaged (was: Available)
This issue has been Available for over a year. If it's no longer important or seems unlikely to be fixed, please consider closing it out. If it is important, please re-triage the issue.

Sorry for the inconvenience if the bug really should have been left as Available. If you change it back, also remove the "Hotlist-Recharge-Cold" label.

For more details visit https://www.chromium.org/issue-tracking/autotriage - Your friendly Sheriffbot
Labels: -Hotlist-Recharge-Cold
Status: Available (was: Untriaged)
Still valid bug.
Project Member

Comment 6 by sheriffbot@chromium.org, Dec 14

Labels: Hotlist-Recharge-Cold
Status: Untriaged (was: Available)
This issue has been Available for over a year. If it's no longer important or seems unlikely to be fixed, please consider closing it out. If it is important, please re-triage the issue.

Sorry for the inconvenience if the bug really should have been left as Available.

For more details visit https://www.chromium.org/issue-tracking/autotriage - Your friendly Sheriffbot
Cc: -d...@chromium.org -dsansome@chromium.org vadimsh@chromium.org iannucci@chromium.org no...@chromium.org tandrii@chromium.org
Labels: -Hotlist-Recharge-Cold
Status: Available (was: Untriaged)
+ People who may have opinions (I never used it)
Cc: ddoman@chromium.org packrat@chromium.org
Components: -Infra>Platform
I think this isn’t a Platform bug. It was in Platform component only because dnj was willing to fix it
(we may end up needing to take this back; tsmon needs some serious love, imo. The fact that it can lose a bunch of data on a routine basis from appengine without anyone knowing is pretty bad. Maybe switching to GKE would be the right thing to do, because then we can more predictably control spinup/shutdown of containers in the common case, as well as have background threads to push data 'in between' requests).
i believe monitoring team is capable of providing serious love (if not, we have bigger problems)
Re #c9: IIUC, the scope of this bug is mostly to refactor the code, and not to change the functionality.

For fixing the data loss, part of the reason is captured in issue 622401 (and I'd guess it's a fairly big part). There are also ideas floating around about introducing a custom Target for GAE apps, which may facilitate the above. I'm not sure if there is a bug for that yet.
Prod-Tech has an OKR for ts-mon love
: https://docs.google.com/document/d/1AUE4K_xNzDnSVIBz9lOScLX0gRVACFbpaRr7dfmSZA0

>> For fixing the data loss, part of the reason is captured in issue 622401 (and I'd guess it's a fairly big part).

Yes, I agree that it is very likely a big part of the cause for data lose. The other possible causes are also described in the following tickets
* crbug.com/829635 - ts_mon invalid data in Swarming causing monitoring packets to be lost
* crbug.com/869274 - MonarchErrorRateTooHigh and ProtoValidationErrorRateTooHigh

Cc: abennetts@chromium.org

Sign in to add a comment