memory-infra service should have a timeout fallback mechanism for robustness to give up if a child process is stuck |
|||||
Issue descriptionBackground context: go/memory-infra So far the memory-infra service relies on the fact that either: - a child process replies to the ClientProcess::RequestProcessMemoryDump - a child process dies and we get notified by the mojo connection error handler This seems fragile. if there is either a bug in mojo or (more likely) one of the many child processes is stuck, we will keep queueing (and eventualy rejecting) global dump requests forever. We never bothered before because the only use case was tracing, but not it feels like we should be more robust
,
May 31 2017
,
Oct 5 2017
Putting this in Lalit queue. Lalit, don't worry about this right now, you have are bigger problems to solve. But once we are done with that, we should fix this one.
,
Dec 13 2017
,
Dec 19 2017
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/e1ae19f1f2e5e63aac86c61928e284a4484b892f commit e1ae19f1f2e5e63aac86c61928e284a4484b892f Author: Lalit Maganti <lalitm@chromium.org> Date: Tue Dec 19 11:57:44 2017 memory-infra: make coodinator service more resiliant to stuck processes If a client process never responds, our service will simply keep queueing requests forever which is both wastes memory and also can cause gaps in data collection elsewhere. Add a hard timeout where we clear all pending requests if not responded. Provisionally set the timeout to 15 seconds with a route for tests to reduce this to prevent blowups in test running times. Bug: 727785 Change-Id: Ifc36e0e09d08d6f59f8b774c5f71e7fd466a97eb Reviewed-on: https://chromium-review.googlesource.com/824240 Commit-Queue: Lalit Maganti <lalitm@chromium.org> Reviewed-by: Hector Dearman <hjd@chromium.org> Cr-Commit-Position: refs/heads/master@{#525000} [modify] https://crrev.com/e1ae19f1f2e5e63aac86c61928e284a4484b892f/services/resource_coordinator/memory_instrumentation/coordinator_impl.cc [modify] https://crrev.com/e1ae19f1f2e5e63aac86c61928e284a4484b892f/services/resource_coordinator/memory_instrumentation/coordinator_impl.h [modify] https://crrev.com/e1ae19f1f2e5e63aac86c61928e284a4484b892f/services/resource_coordinator/memory_instrumentation/coordinator_impl_unittest.cc
,
Dec 19 2017
This should be fixed with the above change! |
|||||
►
Sign in to add a comment |
|||||
Comment 1 by primiano@chromium.org
, May 31 2017