Add heartbeat trace for CategorizedWorkerPool during waiting for more tasks. |
||
Issue descriptionWhen CategorizedWorkerPoolThread is waiting for more tasks, it looks like it is hung as it starts working only after the related condition variable is signaled. Adding heartbeat trace can help detect whether thread is live or not. Refer http://crbug.com/617010 for more details.
,
Jun 8 2016
Is the heartbeat necessary? In the referenced hang stack, the thread is simply idle. If this were a true hang, you'd expect to see a culprit further up the stack. What are we hoping to accomplish with the heartbeat?
,
Jun 8 2016
Would there come a callstack further up if the thread is hang? I may be wrong. To identify whether thread is really hung or idle and waiting for next signal, heartbeat trace would help.
,
Jun 8 2016
Generally yes. You wouldn't be waiting on a condition variable. Now, if someone was supposed to signal the condition variable but didn't, then you have a problem. However, the heartbeat wouldn't find who was supposed to signal the condition variable. All you would find out is that you timeout. You would also timeout if there was no work to be done. If a thread is waiting on a condition variable, the thread is effectively idle.
,
Jun 8 2016
To add heartbeat trace, there is a timeout added. Heartbeat should not generally tell who has signaled the condition variable, it is meant for showing with trace that thread is idle. If thread is waiting on a condition variable and heartbeat traces are not coming, then thread is hung. Ideally there should be a mechanism which will detect this kind of situation like we do for GPU process crash.
,
Jun 8 2016
The issue with a heartbeat within a thread is you can't distinguish between many tasks running or a hung task. All you know is if a thread is idle. If you're looking to truly monitor hung threads, consider taking a similar approach to https://cs.chromium.org/chromium/src/chrome/browser/metrics/thread_watcher.h ThreadWatcher runs on a watchdog thread and periodically posts a task to the targeted thread. If the thread doesn't respond within a certain time, then it's truly hung.
,
Jun 9 2016
Hmm. Then ThreadWatcher stands to be better here as it uses watchdog kinda mechanism. I'll skip the changes of CategorizedWorkerPool.* from cl https://codereview.chromium.org/2047433004/. Should we patch up rest of the code? (TimedWait returning true when timeout occurs or false when gets signaled.)
,
Jun 9 2016
On CV: Only if there is code that will stand to benefit from it. We've not needed it previously, so by default the current behavior is fine.
,
Jun 10 2016
We'll not fix this. We'll implement something like ThreadWatcher. |
||
►
Sign in to add a comment |
||
Comment 1 by prashan...@samsung.com
, Jun 8 2016Owner: prashan...@samsung.com