New issue
Advanced search Search tips
Note: Color blocks (like or ) mean that a user may not be available. Tooltip shows the reason.

Issue 660248 link

Starred by 1 user

Issue metadata

Status: Archived
Owner:
Last visit > 30 days ago
Closed: Nov 2016
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: Chrome
Pri: 2
Type: Bug



Sign in to add a comment

Add monarch metrics for repo/git tool in chromite

Project Member Reported by nxia@chromium.org, Oct 28 2016

Issue description

For the repo/git tool contacting the remote repositories(GOB), add monarch metrics around the retries and failures.

This is to help investigate crbug.com/643452

 
Project Member

Comment 1 by bugdroid1@chromium.org, Nov 3 2016

The following revision refers to this bug:
  https://chromium.googlesource.com/chromiumos/chromite/+/b3e08511493be3d12382e59d147b453f18bcba21

commit b3e08511493be3d12382e59d147b453f18bcba21
Author: Ningning Xia <nxia@chromium.org>
Date: Fri Oct 28 21:23:51 2016

Add Monarch metrics on repo and git.

1.add metrics on repo sync.
2.add metrics on git fetch.
3.fix local_manifest in repo retry.

BUG= chromium:660248 
TEST=run_tests

Change-Id: I72f46aa1826d697b972ecb2e5d93b916c15ec99d
Reviewed-on: https://chromium-review.googlesource.com/404911
Commit-Ready: Ningning Xia <nxia@chromium.org>
Tested-by: Ningning Xia <nxia@chromium.org>
Reviewed-by: Paul Hobbs <phobbs@google.com>

[modify] https://crrev.com/b3e08511493be3d12382e59d147b453f18bcba21/cbuildbot/repository.py
[modify] https://crrev.com/b3e08511493be3d12382e59d147b453f18bcba21/lib/constants.py
[modify] https://crrev.com/b3e08511493be3d12382e59d147b453f18bcba21/lib/patch.py
[modify] https://crrev.com/b3e08511493be3d12382e59d147b453f18bcba21/lib/retry_util.py

Comment 2 by nxia@chromium.org, Nov 4 2016

Status: Fixed (was: Untriaged)
Labels: -Pri-3 Pri-2
Status: Assigned (was: Fixed)
Follow-up: can you point out some monarch queries that use the new data? Which ones are likely to show us when we are having GoB issues?

Comment 4 by nxia@chromium.org, Nov 9 2016

sync retry records count on a different label (it's handled in _CleanUpAndRunCommand). 
fetch retry doesn't record count on a different label, should we change the way retry_util records count to distinguish the first try and the other retries?
Project Member

Comment 5 by bugdroid1@chromium.org, Nov 16 2016

The following revision refers to this bug:
  https://chromium.googlesource.com/chromiumos/chromite/+/0337e23e7805df6dd423114513f15852cc4d2a11

commit 0337e23e7805df6dd423114513f15852cc4d2a11
Author: Ningning Xia <nxia@chromium.org>
Date: Tue Nov 15 19:45:29 2016

Record monarch metrics for git fetch retries.

Recording a seperate monarch name for git fetch retries can give us
better numbers on the git remote operation performances.

BUG= chromium:660248 
TEST=run_tests

Change-Id: If07cd5fb884372586fb3eb794fc7ddb611c35958
Reviewed-on: https://chromium-review.googlesource.com/411476
Commit-Ready: Ningning Xia <nxia@chromium.org>
Tested-by: Ningning Xia <nxia@chromium.org>
Reviewed-by: Aviv Keshet <akeshet@chromium.org>

[modify] https://crrev.com/0337e23e7805df6dd423114513f15852cc4d2a11/lib/constants.py
[modify] https://crrev.com/0337e23e7805df6dd423114513f15852cc4d2a11/lib/patch.py
[modify] https://crrev.com/0337e23e7805df6dd423114513f15852cc4d2a11/lib/retry_util.py

Comment 6 by nxia@chromium.org, Nov 18 2016


Repo sync (excludes retry):
MON_REPO_SYNC_COUNT = 'chromeos/cbuildbot/repo/sync_count'

Repo sync retry:
MON_REPO_SYNC_RETRY_COUNT = 'chromeos/cbuildbot/repo/sync_retry_count'

git fetch (includes retry):
MON_GIT_FETCH_COUNT = 'chromeos/cbuildbot/git/fetch_count'

git fetch retry:
MON_GIT_FETCH_RETRY_COUNT = 'chromeos/cbuildbot/git/fetch_retry_count'

Comment 7 by nxia@chromium.org, Nov 18 2016

Status: Fixed (was: Assigned)
#6 is about the monarch labels to search.
Project Member

Comment 8 by bugdroid1@chromium.org, Dec 10 2016

Labels: merge-merged-chromeos-3.10
The following revision refers to this bug:
  https://chromium.googlesource.com/chromiumos/third_party/kernel/+/edfc0de027e03f53e8a6c5177c9b2b6eddc93ce4

commit edfc0de027e03f53e8a6c5177c9b2b6eddc93ce4
Author: Oleg Nesterov <oleg@redhat.com>
Date: Wed Apr 02 15:45:05 2014

UPSTREAM: pid_namespace: pidns_get() should check task_active_pid_ns() != NULL

pidns_get()->get_pid_ns() can hit ns == NULL. This task_struct can't
go away, but task_active_pid_ns(task) is NULL if release_task(task)
was already called. Alternatively we could change get_pid_ns(ns) to
check ns != NULL, but it seems that other callers are fine.

BUG= chromium:660248 
TEST=Build and run

Change-Id: I282b679b2b4b30d4279295a94d561dea6d970238
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Cc: Eric W. Biederman ebiederm@xmission.com>
Cc: stable@kernel.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Guenter Roeck <groeck@chromium.org>
(cherry picked from commit d23082257d83e)
Reviewed-on: https://chromium-review.googlesource.com/418084
Reviewed-by: Kevin Cernekee <cernekee@chromium.org>

[modify] https://crrev.com/edfc0de027e03f53e8a6c5177c9b2b6eddc93ce4/kernel/pid_namespace.c

Project Member

Comment 9 by bugdroid1@chromium.org, Dec 10 2016

The following revision refers to this bug:
  https://chromium.googlesource.com/chromiumos/third_party/kernel/+/89a351662f8ee8bed72e5bde3afb984f0511ac53

commit 89a351662f8ee8bed72e5bde3afb984f0511ac53
Author: Oleg Nesterov <oleg@redhat.com>
Date: Wed Dec 10 23:55:25 2014

UPSTREAM: exit: pidns: alloc_pid() leaks pid_namespace if child_reaper is exiting

alloc_pid() does get_pid_ns() beforehand but forgets to put_pid_ns() if it
fails because disable_pid_allocation() was called by the exiting
child_reaper.

We could simply move get_pid_ns() down to successful return, but this fix
tries to be as trivial as possible.

BUG= chromium:660248 
TEST=Build and run

Change-Id: If7a21f2b5132d7b258cb42cbcba4c4c713219689
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Reviewed-by: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Aaron Tomlin <atomlin@redhat.com>
Cc: Pavel Emelyanov <xemul@parallels.com>
Cc: Serge Hallyn <serge.hallyn@ubuntu.com>
Cc: Sterling Alexander <stalexan@redhat.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Guenter Roeck <groeck@chromium.org>
(cherry picked from commit 24c037ebf5723)
Reviewed-on: https://chromium-review.googlesource.com/418086
Reviewed-by: Kevin Cernekee <cernekee@chromium.org>

[modify] https://crrev.com/89a351662f8ee8bed72e5bde3afb984f0511ac53/kernel/pid.c

Project Member

Comment 10 by bugdroid1@chromium.org, Dec 10 2016

The following revision refers to this bug:
  https://chromium.googlesource.com/chromiumos/third_party/kernel/+/e8ed738cfacbda9d6898cca3363b0c3971cdb452

commit e8ed738cfacbda9d6898cca3363b0c3971cdb452
Author: Oleg Nesterov <oleg@redhat.com>
Date: Mon Apr 07 22:38:41 2014

UPSTREAM: wait: fix reparent_leader() vs EXIT_DEAD->EXIT_ZOMBIE race

wait_task_zombie() first does EXIT_ZOMBIE->EXIT_DEAD transition and
drops tasklist_lock.  If this task is not the natural child and it is
traced, we change its state back to EXIT_ZOMBIE for ->real_parent.

The last transition is racy, this is even documented in 50b8d257486a
"ptrace: partially fix the do_wait(WEXITED) vs EXIT_DEAD->EXIT_ZOMBIE
race".  wait_consider_task() tries to detect this transition and clear
->notask_error but we can't rely on ptrace_reparented(), debugger can
exit and do ptrace_unlink() before its sub-thread sets EXIT_ZOMBIE.

And there is another problem which were missed before: this transition
can also race with reparent_leader() which doesn't reset >exit_signal if
EXIT_DEAD, assuming that this task must be reaped by someone else.  So
the tracee can be re-parented with ->exit_signal != SIGCHLD, and if
/sbin/init doesn't use __WALL it becomes unreapable.

Change reparent_leader() to update ->exit_signal even if EXIT_DEAD.
Note: this is the simple temporary hack for -stable, it doesn't try to
solve all problems, it will be reverted by the next changes.

BUG= chromium:660248 
TEST=Build and run

Change-Id: I7e53823c1a73024a4bf13900cebdc58999f1606a
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Reported-by: Jan Kratochvil <jan.kratochvil@redhat.com>
Reported-by: Michal Schmidt <mschmidt@redhat.com>
Tested-by: Michal Schmidt <mschmidt@redhat.com>
Cc: Al Viro <viro@ZenIV.linux.org.uk>
Cc: Lennart Poettering <lpoetter@redhat.com>
Cc: Roland McGrath <roland@hack.frob.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Guenter Roeck <groeck@chromium.org>
(cherry picked from commit dfccbb5e49a6)
Reviewed-on: https://chromium-review.googlesource.com/418085
Reviewed-by: Kevin Cernekee <cernekee@chromium.org>

[modify] https://crrev.com/e8ed738cfacbda9d6898cca3363b0c3971cdb452/kernel/exit.c

Project Member

Comment 11 by bugdroid1@chromium.org, Dec 10 2016

The following revision refers to this bug:
  https://chromium.googlesource.com/chromiumos/third_party/kernel/+/7c54838c3fe8ce350843ce4d2490f011e3111109

commit 7c54838c3fe8ce350843ce4d2490f011e3111109
Author: Oleg Nesterov <oleg@redhat.com>
Date: Wed Jul 03 22:08:31 2013

UPSTREAM: kernel/fork.c:copy_process(): don't add the uninitialized child to thread/task/pid lists

copy_process() adds the new child to thread_group/init_task.tasks list and
then does attach_pid(child, PIDTYPE_PID).  This means that the lockless
next_thread() or next_task() can see this thread with the wrong pid.  Say,
"ls /proc/pid/task" can list the same inode twice.

We could move attach_pid(child, PIDTYPE_PID) up, but in this case
find_task_by_vpid() can find the new thread before it was fully
initialized.

And this is already true for PIDTYPE_PGID/PIDTYPE_SID, With this patch
copy_process() initializes child->pids[*].pid first, then calls
attach_pid() to insert the task into the pid->tasks list.

attach_pid() no longer need the "struct pid*" argument, it is always
called after pid_link->pid was already set.

BUG= chromium:660248 
TEST=Build and run

Change-Id: Iec3ec38679a4d89d3acffefc55ff2169b91d944d
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Michal Hocko <mhocko@suse.cz>
Cc: Pavel Emelyanov <xemul@parallels.com>
Cc: Sergey Dyasly <dserrg@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Guenter Roeck <groeck@chromium.org>
(cherry picked from commit 8190773985141)
Reviewed-on: https://chromium-review.googlesource.com/418088
Reviewed-by: Kevin Cernekee <cernekee@chromium.org>

[modify] https://crrev.com/7c54838c3fe8ce350843ce4d2490f011e3111109/kernel/pid.c
[modify] https://crrev.com/7c54838c3fe8ce350843ce4d2490f011e3111109/kernel/fork.c
[modify] https://crrev.com/7c54838c3fe8ce350843ce4d2490f011e3111109/include/linux/pid.h

Project Member

Comment 12 by bugdroid1@chromium.org, Dec 10 2016

The following revision refers to this bug:
  https://chromium.googlesource.com/chromiumos/third_party/kernel/+/4e34e2497ba9eb96ec3bea8b2aa52cf0cf4f0707

commit 4e34e2497ba9eb96ec3bea8b2aa52cf0cf4f0707
Author: Oleg Nesterov <oleg@redhat.com>
Date: Mon Sep 30 20:45:27 2013

UPSTREAM: pidns: fix free_pid() to handle the first fork failure

"case 0" in free_pid() assumes that disable_pid_allocation() should
clear PIDNS_HASH_ADDING before the last pid goes away.

However this doesn't happen if the first fork() fails to create the
child reaper which should call disable_pid_allocation().

BUG= chromium:660248 
TEST=Build and run

Change-Id: Ibf7caa51e3e6a9c3312bce5a765dfe7f1353b005
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Reviewed-by: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: "Serge E. Hallyn" <serge@hallyn.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Guenter Roeck <groeck@chromium.org>
(cherry picked from commit 314a8ad0f18ac37)
Reviewed-on: https://chromium-review.googlesource.com/418087
Reviewed-by: Kevin Cernekee <cernekee@chromium.org>

[modify] https://crrev.com/4e34e2497ba9eb96ec3bea8b2aa52cf0cf4f0707/kernel/pid.c

Project Member

Comment 13 by bugdroid1@chromium.org, Dec 10 2016

The following revision refers to this bug:
  https://chromium.googlesource.com/chromiumos/third_party/kernel/+/356a5de792d6da3da036deded8b7b22f7d2ec40f

commit 356a5de792d6da3da036deded8b7b22f7d2ec40f
Author: Al Viro <viro@zeniv.linux.org.uk>
Date: Thu Oct 03 17:28:06 2013

UPSTREAM: pid_namespace: make freeing struct pid_namespace rcu-delayed

makes procfs ->premission() instances safety in RCU mode independent
from vfsmount_lock.

BUG= chromium:660248 
TEST=Build and run

Change-Id: I591f7df90b3530c4b1ff3cf7eb660aae208d0d84
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Guenter Roeck <groeck@chromium.org>
(cherry picked from commit 1adfcb03e31b)
Reviewed-on: https://chromium-review.googlesource.com/418089
Reviewed-by: Kevin Cernekee <cernekee@chromium.org>

[modify] https://crrev.com/356a5de792d6da3da036deded8b7b22f7d2ec40f/include/linux/pid_namespace.h
[modify] https://crrev.com/356a5de792d6da3da036deded8b7b22f7d2ec40f/kernel/pid_namespace.c

Project Member

Comment 14 by bugdroid1@chromium.org, Dec 10 2016

The following revision refers to this bug:
  https://chromium.googlesource.com/chromiumos/third_party/kernel/+/2b9e051698c6a40008b2a02db277c2e53f3bcc59

commit 2b9e051698c6a40008b2a02db277c2e53f3bcc59
Author: Guenter Roeck <groeck@chromium.org>
Date: Thu Dec 08 22:29:28 2016

CHROMIUM: pid_namespace: print hung tasks after 10 seconds

Crash reporter report a large number of of crashes due to hung tasks
in zap_pid_ns_processes(). Most of those are reported in the 3.8 and
3.10 kernels. A typical traceback looks as follows.

Backtrace:
[<c063f4b8>] (__schedule+0x4b0/0x740) from [<c063f7d8>] (schedule+0x90/0x94)
[<c063f7d8>] (schedule+0x90/0x94) from [<c018720c>] (zap_pid_ns_processes+0x11c/0x148)
[<c018720c>] (zap_pid_ns_processes+0x11c/0x148) from [<c0127454>] (do_exit+0x454/0x8ac)
[<c0127454>] (do_exit+0x454/0x8ac) from [<c0128810>] (do_group_exit+0x5c/0xc0)
[<c0128810>] (do_group_exit+0x5c/0xc0) from [<c0128894>] (__wake_up_parent+0x0/0x30)
[<c0128894>] (__wake_up_parent+0x0/0x30) from [<c01061cc>] (__sys_trace_return+0x0/0x14)
Kernel panic - not syncing: hung_task: blocked tasks

To help tracking down the problem, dump all threads associated with the
pid namespace to the console if tasks are still active 10 seconds after
the request to kill them.

Also, display a warning if the number of threads in the PID namespace is
smaller than expected, since this would result in the observed hang.

BUG= chromium:660248 
TEST=Build and run; ensure that no messages are seen during normal operation

Change-Id: Ifb7b570627d485da1dc1141d8249f922bfe35af6
Signed-off-by: Guenter Roeck <groeck@chromium.org>
Reviewed-on: https://chromium-review.googlesource.com/418090
Reviewed-by: Kevin Cernekee <cernekee@chromium.org>

[modify] https://crrev.com/2b9e051698c6a40008b2a02db277c2e53f3bcc59/kernel/pid_namespace.c

Project Member

Comment 15 by bugdroid1@chromium.org, Dec 10 2016

The following revision refers to this bug:
  https://chromium.googlesource.com/chromiumos/third_party/kernel/+/6deaa38389568266701a7d8464ad30bf6e909067

commit 6deaa38389568266701a7d8464ad30bf6e909067
Author: Oleg Nesterov <oleg@redhat.com>
Date: Wed Jul 03 22:08:25 2013

UPSTREAM: fs/exec.c:de_thread(): use change_pid() rather than detach_pid/attach_pid

de_thread() can use change_pid() instead of detach + attach.  This looks
better and this ensures that, say, next_thread() can never see a task with
->pid == NULL.

BUG= chromium:660248 
TEST=Build and run

Change-Id: I36c02d06196ac264226ddb406f0323eef4fa8ab3
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Acked-by: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Michal Hocko <mhocko@suse.cz>
Cc: Pavel Emelyanov <xemul@parallels.com>
Cc: Sergey Dyasly <dserrg@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Guenter Roeck <groeck@chromium.org>
(cherry picked from commit 3f4185483832c)
Reviewed-on: https://chromium-review.googlesource.com/418689
Reviewed-by: Kevin Cernekee <cernekee@chromium.org>

[modify] https://crrev.com/6deaa38389568266701a7d8464ad30bf6e909067/fs/exec.c

Project Member

Comment 16 by bugdroid1@chromium.org, Dec 10 2016

Labels: merge-merged-chromeos-3.8
The following revision refers to this bug:
  https://chromium.googlesource.com/chromiumos/third_party/kernel/+/45f9cecb0cefe7471798429a85a83472b61faeba

commit 45f9cecb0cefe7471798429a85a83472b61faeba
Author: Oleg Nesterov <oleg@redhat.com>
Date: Wed Apr 02 15:45:05 2014

UPSTREAM: pid_namespace: pidns_get() should check task_active_pid_ns() != NULL

pidns_get()->get_pid_ns() can hit ns == NULL. This task_struct can't
go away, but task_active_pid_ns(task) is NULL if release_task(task)
was already called. Alternatively we could change get_pid_ns(ns) to
check ns != NULL, but it seems that other callers are fine.

BUG= chromium:660248 
TEST=Build and run

Change-Id: I282b679b2b4b30d4279295a94d561dea6d970238
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Cc: Eric W. Biederman ebiederm@xmission.com>
Cc: stable@kernel.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Guenter Roeck <groeck@chromium.org>
(cherry picked from commit d23082257d83e)
Reviewed-on: https://chromium-review.googlesource.com/418251
Reviewed-by: Kevin Cernekee <cernekee@chromium.org>

[modify] https://crrev.com/45f9cecb0cefe7471798429a85a83472b61faeba/kernel/pid_namespace.c

Project Member

Comment 17 by bugdroid1@chromium.org, Dec 10 2016

Labels: merge-merged-chromeos-3.8
The following revision refers to this bug:
  https://chromium.googlesource.com/chromiumos/third_party/kernel/+/f1d56bd9a2c5e2101c32eca7f7b8684afa909dff

commit f1d56bd9a2c5e2101c32eca7f7b8684afa909dff
Author: Oleg Nesterov <oleg@redhat.com>
Date: Mon Apr 07 22:38:41 2014

UPSTREAM: wait: fix reparent_leader() vs EXIT_DEAD->EXIT_ZOMBIE race

wait_task_zombie() first does EXIT_ZOMBIE->EXIT_DEAD transition and
drops tasklist_lock.  If this task is not the natural child and it is
traced, we change its state back to EXIT_ZOMBIE for ->real_parent.

The last transition is racy, this is even documented in 50b8d257486a
"ptrace: partially fix the do_wait(WEXITED) vs EXIT_DEAD->EXIT_ZOMBIE
race".  wait_consider_task() tries to detect this transition and clear
->notask_error but we can't rely on ptrace_reparented(), debugger can
exit and do ptrace_unlink() before its sub-thread sets EXIT_ZOMBIE.

And there is another problem which were missed before: this transition
can also race with reparent_leader() which doesn't reset >exit_signal if
EXIT_DEAD, assuming that this task must be reaped by someone else.  So
the tracee can be re-parented with ->exit_signal != SIGCHLD, and if
/sbin/init doesn't use __WALL it becomes unreapable.

Change reparent_leader() to update ->exit_signal even if EXIT_DEAD.
Note: this is the simple temporary hack for -stable, it doesn't try to
solve all problems, it will be reverted by the next changes.

BUG= chromium:660248 
TEST=Build and run

Change-Id: I7e53823c1a73024a4bf13900cebdc58999f1606a
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Reported-by: Jan Kratochvil <jan.kratochvil@redhat.com>
Reported-by: Michal Schmidt <mschmidt@redhat.com>
Tested-by: Michal Schmidt <mschmidt@redhat.com>
Cc: Al Viro <viro@ZenIV.linux.org.uk>
Cc: Lennart Poettering <lpoetter@redhat.com>
Cc: Roland McGrath <roland@hack.frob.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Guenter Roeck <groeck@chromium.org>
(cherry picked from commit dfccbb5e49a6)
Reviewed-on: https://chromium-review.googlesource.com/418252
Reviewed-by: Kevin Cernekee <cernekee@chromium.org>

[modify] https://crrev.com/f1d56bd9a2c5e2101c32eca7f7b8684afa909dff/kernel/exit.c

Project Member

Comment 18 by bugdroid1@chromium.org, Dec 10 2016

The following revision refers to this bug:
  https://chromium.googlesource.com/chromiumos/third_party/kernel/+/7ec4cfd463e77427474b36db7ab67a335724d718

commit 7ec4cfd463e77427474b36db7ab67a335724d718
Author: Oleg Nesterov <oleg@redhat.com>
Date: Wed Dec 10 23:55:25 2014

UPSTREAM: exit: pidns: alloc_pid() leaks pid_namespace if child_reaper is exiting

alloc_pid() does get_pid_ns() beforehand but forgets to put_pid_ns() if it
fails because disable_pid_allocation() was called by the exiting
child_reaper.

We could simply move get_pid_ns() down to successful return, but this fix
tries to be as trivial as possible.

BUG= chromium:660248 
TEST=Build and run

Change-Id: If7a21f2b5132d7b258cb42cbcba4c4c713219689
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Reviewed-by: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Aaron Tomlin <atomlin@redhat.com>
Cc: Pavel Emelyanov <xemul@parallels.com>
Cc: Serge Hallyn <serge.hallyn@ubuntu.com>
Cc: Sterling Alexander <stalexan@redhat.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Guenter Roeck <groeck@chromium.org>
(cherry picked from commit 24c037ebf5723)
Reviewed-on: https://chromium-review.googlesource.com/418253
Reviewed-by: Kevin Cernekee <cernekee@chromium.org>

[modify] https://crrev.com/7ec4cfd463e77427474b36db7ab67a335724d718/kernel/pid.c

Project Member

Comment 19 by bugdroid1@chromium.org, Dec 10 2016

The following revision refers to this bug:
  https://chromium.googlesource.com/chromiumos/third_party/kernel/+/fd0cf1809f27d2e88d22d6e68f0ad3240d9d1664

commit fd0cf1809f27d2e88d22d6e68f0ad3240d9d1664
Author: Oleg Nesterov <oleg@redhat.com>
Date: Mon Sep 30 20:45:27 2013

UPSTREAM: pidns: fix free_pid() to handle the first fork failure

"case 0" in free_pid() assumes that disable_pid_allocation() should
clear PIDNS_HASH_ADDING before the last pid goes away.

However this doesn't happen if the first fork() fails to create the
child reaper which should call disable_pid_allocation().

BUG= chromium:660248 
TEST=Build and run

Change-Id: Ibf7caa51e3e6a9c3312bce5a765dfe7f1353b005
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Reviewed-by: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: "Serge E. Hallyn" <serge@hallyn.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Guenter Roeck <groeck@chromium.org>
(cherry picked from commit 314a8ad0f18ac37)
Reviewed-on: https://chromium-review.googlesource.com/418254
Reviewed-by: Kevin Cernekee <cernekee@chromium.org>

[modify] https://crrev.com/fd0cf1809f27d2e88d22d6e68f0ad3240d9d1664/kernel/pid.c

Project Member

Comment 20 by bugdroid1@chromium.org, Dec 10 2016

The following revision refers to this bug:
  https://chromium.googlesource.com/chromiumos/third_party/kernel/+/45cf3d4776d4acbf463c3c1bedf152857d6a2407

commit 45cf3d4776d4acbf463c3c1bedf152857d6a2407
Author: Oleg Nesterov <oleg@redhat.com>
Date: Wed Jul 03 22:08:31 2013

UPSTREAM: kernel/fork.c:copy_process(): don't add the uninitialized child to thread/task/pid lists

copy_process() adds the new child to thread_group/init_task.tasks list and
then does attach_pid(child, PIDTYPE_PID).  This means that the lockless
next_thread() or next_task() can see this thread with the wrong pid.  Say,
"ls /proc/pid/task" can list the same inode twice.

We could move attach_pid(child, PIDTYPE_PID) up, but in this case
find_task_by_vpid() can find the new thread before it was fully
initialized.

And this is already true for PIDTYPE_PGID/PIDTYPE_SID, With this patch
copy_process() initializes child->pids[*].pid first, then calls
attach_pid() to insert the task into the pid->tasks list.

attach_pid() no longer need the "struct pid*" argument, it is always
called after pid_link->pid was already set.

BUG= chromium:660248 
TEST=Build and run

Change-Id: Iec3ec38679a4d89d3acffefc55ff2169b91d944d
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Michal Hocko <mhocko@suse.cz>
Cc: Pavel Emelyanov <xemul@parallels.com>
Cc: Sergey Dyasly <dserrg@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Guenter Roeck <groeck@chromium.org>
(cherry picked from commit 8190773985141)
Reviewed-on: https://chromium-review.googlesource.com/418255
Reviewed-by: Kevin Cernekee <cernekee@chromium.org>

[modify] https://crrev.com/45cf3d4776d4acbf463c3c1bedf152857d6a2407/kernel/pid.c
[modify] https://crrev.com/45cf3d4776d4acbf463c3c1bedf152857d6a2407/kernel/fork.c
[modify] https://crrev.com/45cf3d4776d4acbf463c3c1bedf152857d6a2407/include/linux/pid.h

Project Member

Comment 21 by bugdroid1@chromium.org, Dec 10 2016

Labels: merge-merged-chromeos-3.8
The following revision refers to this bug:
  https://chromium.googlesource.com/chromiumos/third_party/kernel/+/7d453ea59f779a7d93fd04a44fdacce5f299a9c0

commit 7d453ea59f779a7d93fd04a44fdacce5f299a9c0
Author: Al Viro <viro@zeniv.linux.org.uk>
Date: Thu Oct 03 17:28:06 2013

UPSTREAM: pid_namespace: make freeing struct pid_namespace rcu-delayed

makes procfs ->premission() instances safety in RCU mode independent
from vfsmount_lock.

BUG= chromium:660248 
TEST=Build and run

Change-Id: I591f7df90b3530c4b1ff3cf7eb660aae208d0d84
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Guenter Roeck <groeck@chromium.org>
(cherry picked from commit 1adfcb03e31b)
Reviewed-on: https://chromium-review.googlesource.com/418256
Reviewed-by: Kevin Cernekee <cernekee@chromium.org>

[modify] https://crrev.com/7d453ea59f779a7d93fd04a44fdacce5f299a9c0/include/linux/pid_namespace.h
[modify] https://crrev.com/7d453ea59f779a7d93fd04a44fdacce5f299a9c0/kernel/pid_namespace.c

Project Member

Comment 22 by bugdroid1@chromium.org, Dec 10 2016

The following revision refers to this bug:
  https://chromium.googlesource.com/chromiumos/third_party/kernel/+/b202bd7e85d35b4a84adcd3606d3b74c946d51f4

commit b202bd7e85d35b4a84adcd3606d3b74c946d51f4
Author: Guenter Roeck <groeck@chromium.org>
Date: Thu Dec 08 22:29:28 2016

CHROMIUM: pid_namespace: print hung tasks after 10 seconds

Crash reporter report a large number of of crashes due to hung tasks
in zap_pid_ns_processes(). Most of those are reported in the 3.8 and
3.10 kernels. A typical traceback looks as follows.

Backtrace:
[<c063f4b8>] (__schedule+0x4b0/0x740) from [<c063f7d8>] (schedule+0x90/0x94)
[<c063f7d8>] (schedule+0x90/0x94) from [<c018720c>] (zap_pid_ns_processes+0x11c/0x148)
[<c018720c>] (zap_pid_ns_processes+0x11c/0x148) from [<c0127454>] (do_exit+0x454/0x8ac)
[<c0127454>] (do_exit+0x454/0x8ac) from [<c0128810>] (do_group_exit+0x5c/0xc0)
[<c0128810>] (do_group_exit+0x5c/0xc0) from [<c0128894>] (__wake_up_parent+0x0/0x30)
[<c0128894>] (__wake_up_parent+0x0/0x30) from [<c01061cc>] (__sys_trace_return+0x0/0x14)
Kernel panic - not syncing: hung_task: blocked tasks

To help tracking down the problem, dump all threads associated with the
pid namespace to the console if tasks are still active 10 seconds after
the request to kill them.

Also, display a warning if the number of threads in the PID namespace is
smaller than expected, since this would result in the observed hang.

BUG= chromium:660248 
TEST=Build and run; ensure that no messages are seen during normal operation

Change-Id: Ifb7b570627d485da1dc1141d8249f922bfe35af6
Signed-off-by: Guenter Roeck <groeck@chromium.org>
Reviewed-on: https://chromium-review.googlesource.com/418257
Reviewed-by: Kevin Cernekee <cernekee@chromium.org>

[modify] https://crrev.com/b202bd7e85d35b4a84adcd3606d3b74c946d51f4/kernel/pid_namespace.c

Project Member

Comment 23 by bugdroid1@chromium.org, Dec 10 2016

The following revision refers to this bug:
  https://chromium.googlesource.com/chromiumos/third_party/kernel/+/604df9d41b658fbc1e1143e4c266e4ac4b79526d

commit 604df9d41b658fbc1e1143e4c266e4ac4b79526d
Author: Oleg Nesterov <oleg@redhat.com>
Date: Wed Jul 03 22:08:25 2013

UPSTREAM: fs/exec.c:de_thread(): use change_pid() rather than detach_pid/attach_pid

de_thread() can use change_pid() instead of detach + attach.  This looks
better and this ensures that, say, next_thread() can never see a task with
->pid == NULL.

BUG= chromium:660248 
TEST=Build and run

Change-Id: I36c02d06196ac264226ddb406f0323eef4fa8ab3
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Acked-by: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Michal Hocko <mhocko@suse.cz>
Cc: Pavel Emelyanov <xemul@parallels.com>
Cc: Sergey Dyasly <dserrg@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Guenter Roeck <groeck@chromium.org>
(cherry picked from commit 3f4185483832c)
Reviewed-on: https://chromium-review.googlesource.com/418501
Reviewed-by: Kevin Cernekee <cernekee@chromium.org>

[modify] https://crrev.com/604df9d41b658fbc1e1143e4c266e4ac4b79526d/fs/exec.c

Comment 24 by dchan@google.com, Mar 4 2017

Labels: VerifyIn-58

Comment 25 by dchan@google.com, Apr 17 2017

Labels: VerifyIn-59
Project Member

Comment 26 by bugdroid1@chromium.org, May 13 2017

The following revision refers to this bug:
  https://chromium.googlesource.com/chromiumos/third_party/kernel/+/f799a51d982e3853dbd13cc16a3c38aa0f25bfc1

commit f799a51d982e3853dbd13cc16a3c38aa0f25bfc1
Author: Guenter Roeck <groeck@chromium.org>
Date: Sat May 13 01:46:15 2017

Revert "CHROMIUM: pid_namespace: print hung tasks after 10 seconds"

This reverts commit 2b9e051698c6a40008b2a02db277c2e53f3bcc59.

Reason for revert: The code does not work, and we have an
at least temporary fix for the underlying problem.

Original change's description:
> CHROMIUM: pid_namespace: print hung tasks after 10 seconds
>
> Crash reporter report a large number of of crashes due to hung tasks
> in zap_pid_ns_processes(). Most of those are reported in the 3.8 and
> 3.10 kernels. A typical traceback looks as follows.
>
> Backtrace:
> [<c063f4b8>] (__schedule+0x4b0/0x740) from [<c063f7d8>] (schedule+0x90/0x94)
> [<c063f7d8>] (schedule+0x90/0x94) from [<c018720c>] (zap_pid_ns_processes+0x11c/0x148)
> [<c018720c>] (zap_pid_ns_processes+0x11c/0x148) from [<c0127454>] (do_exit+0x454/0x8ac)
> [<c0127454>] (do_exit+0x454/0x8ac) from [<c0128810>] (do_group_exit+0x5c/0xc0)
> [<c0128810>] (do_group_exit+0x5c/0xc0) from [<c0128894>] (__wake_up_parent+0x0/0x30)
> [<c0128894>] (__wake_up_parent+0x0/0x30) from [<c01061cc>] (__sys_trace_return+0x0/0x14)
> Kernel panic - not syncing: hung_task: blocked tasks
>
> To help tracking down the problem, dump all threads associated with the
> pid namespace to the console if tasks are still active 10 seconds after
> the request to kill them.
>
> Also, display a warning if the number of threads in the PID namespace is
> smaller than expected, since this would result in the observed hang.
>
> BUG= chromium:660248 
> TEST=Build and run; ensure that no messages are seen during normal operation
>
> Change-Id: Ifb7b570627d485da1dc1141d8249f922bfe35af6
> Signed-off-by: Guenter Roeck <groeck@chromium.org>
> Reviewed-on: https://chromium-review.googlesource.com/418090
> Reviewed-by: Kevin Cernekee <cernekee@chromium.org>
>

BUG= chromium:660248 

Change-Id: Id3c2895c74849cd48ff1d207fff729b9b12d94be
Reviewed-on: https://chromium-review.googlesource.com/505195
Commit-Ready: Guenter Roeck <groeck@chromium.org>
Tested-by: Guenter Roeck <groeck@chromium.org>
Reviewed-by: Douglas Anderson <dianders@chromium.org>

[modify] https://crrev.com/f799a51d982e3853dbd13cc16a3c38aa0f25bfc1/kernel/pid_namespace.c

Comment 27 by dchan@google.com, May 30 2017

Labels: VerifyIn-60
Labels: VerifyIn-61

Comment 29 by dchan@chromium.org, Oct 14 2017

Status: Archived (was: Fixed)

Sign in to add a comment