New issue
Advanced search Search tips

Issue 670022 link

Starred by 1 user

Issue metadata

Status: Untriaged
Owner: ----
Components:
EstimatedDays: ----
NextAction: ----
OS: Linux
Pri: 3
Type: Feature



Sign in to add a comment

ts_mon: Add "--ts-mon-fork-flush" option to auto flush metrics from a separate process.

Project Member Reported by pprabhu@chromium.org, Nov 30 2016

Issue description

When ts_mon is used from processes that fork, and "--ts-mon-flush auto" is used, the ts_mon flushing thread is duplicated in the sub-processes.

This creates a few problems: 
- The sub-processes must be careful to flush their metrics before dying. Sometimes, the main process may terminate these subprocesses uncleanly, but we'd still want the metrics to be flushed properly, once enqueued.
- Outstanding metrics in the store when the subprocess is forked may be sent in duplicate by the parent and child process' _FlushThread thread.

For context, see: https://bugs.chromium.org/p/chromium/issues/detail?id=623293#c15

This bug is a proposal to upstream the feature added in chromite for flushing metrics from a separate process into ts_mon.
Basically,
- Create a subprocess instead of a thread for flushing metrics,
- use a multiprocessing queue to send metrics across to the flushing process.

 
Status: Available (was: Untriaged)
I'm seeing it as a possible nice to have but rather esoteric usage. Sending metric data across processes may consume a lot more resources, and ts_mon is supposed to be fairly light-weight so it can be used in highly constrained environments, like AppEngine. Thus, I'm keeping it as Pri-3.

Normally, when used in multiple processes, we expect ts_mon to send separate streams independently, and differentiate these streams by some fields, e.g. task_num in the Task target. Aggregation then can happen on the backend.

If metrics are by nature "global" (not logically belonging to any particular process), custom solutions can be built on top of raw ts_mon. E.g. gae_ts_mon for AppEngine registers special "global" metrics and computes them in a cron job - but it still keeps them local to each instance (uniqueness is guaranteed in other ways).
Fair. Just wanted to see if upstreaming our addition makes sense.

For a bit of context: The subprocesses here are not long standing independent processes. Rather, they are short to medium lifespan processes forked for "build steps" that run concurrently. So,
- they are logically in the same metrics namespace as the overall build process.
- this group of processes doesn't really use a global metric namespace, since the namespace is still tied to the "build process"

Feel free to WontFix too.
Project Member

Comment 3 by sheriffbot@chromium.org, Dec 11 2017

Labels: Hotlist-Recharge-Cold
Status: Untriaged (was: Available)
This issue has been Available for over a year. If it's no longer important or seems unlikely to be fixed, please consider closing it out. If it is important, please re-triage the issue.

Sorry for the inconvenience if the bug really should have been left as Available. If you change it back, also remove the "Hotlist-Recharge-Cold" label.

For more details visit https://www.chromium.org/issue-tracking/autotriage - Your friendly Sheriffbot

Sign in to add a comment