New issue
Advanced search Search tips
Note: Color blocks (like or ) mean that a user may not be available. Tooltip shows the reason.

Issue 727785 link

Starred by 1 user

Issue metadata

Status: Fixed
Owner:
Last visit > 30 days ago
Closed: Dec 2017
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: All
Pri: 2
Type: Bug

Blocking:
issue 703184
issue 727963



Sign in to add a comment

memory-infra service should have a timeout fallback mechanism for robustness to give up if a child process is stuck

Project Member Reported by primiano@chromium.org, May 30 2017

Issue description

Background context: go/memory-infra

So far the memory-infra service relies on the fact that either:
 - a child process replies to the ClientProcess::RequestProcessMemoryDump
 - a child process dies and we get notified by the mojo connection error handler

This seems fragile. if there is either a bug in mojo or (more likely) one of the many child processes is stuck, we will keep queueing (and eventualy rejecting) global dump requests forever.

We never bothered before because the only use case was tracing, but not it feels like we should be more robust 
 
Blocking: 727963
Blocking: 703184
Owner: lalitm@chromium.org
Status: Assigned (was: Available)
Putting this in Lalit queue.
Lalit, don't worry about this right now, you have are bigger problems to solve.
But once we are done with that, we should fix this one.

Comment 4 by lalitm@chromium.org, Dec 13 2017

Status: Started (was: Assigned)
Project Member

Comment 5 by bugdroid1@chromium.org, Dec 19 2017

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/e1ae19f1f2e5e63aac86c61928e284a4484b892f

commit e1ae19f1f2e5e63aac86c61928e284a4484b892f
Author: Lalit Maganti <lalitm@chromium.org>
Date: Tue Dec 19 11:57:44 2017

memory-infra: make coodinator service more resiliant to stuck processes

If a client process never responds, our service will simply keep queueing
requests forever which is both wastes memory and also can cause gaps in
data collection elsewhere.

Add a hard timeout where we clear all pending requests if not responded.
Provisionally set the timeout to 15 seconds with a route for tests to
reduce this to prevent blowups in test running times.

Bug:  727785 
Change-Id: Ifc36e0e09d08d6f59f8b774c5f71e7fd466a97eb
Reviewed-on: https://chromium-review.googlesource.com/824240
Commit-Queue: Lalit Maganti <lalitm@chromium.org>
Reviewed-by: Hector Dearman <hjd@chromium.org>
Cr-Commit-Position: refs/heads/master@{#525000}
[modify] https://crrev.com/e1ae19f1f2e5e63aac86c61928e284a4484b892f/services/resource_coordinator/memory_instrumentation/coordinator_impl.cc
[modify] https://crrev.com/e1ae19f1f2e5e63aac86c61928e284a4484b892f/services/resource_coordinator/memory_instrumentation/coordinator_impl.h
[modify] https://crrev.com/e1ae19f1f2e5e63aac86c61928e284a4484b892f/services/resource_coordinator/memory_instrumentation/coordinator_impl_unittest.cc

Comment 6 by lalitm@chromium.org, Dec 19 2017

Status: Fixed (was: Started)
This should be fixed with the above change!

Sign in to add a comment