Add monarch metrics for repo/git tool in chromite |
|||||||||||||
Issue descriptionFor the repo/git tool contacting the remote repositories(GOB), add monarch metrics around the retries and failures. This is to help investigate crbug.com/643452
,
Nov 4 2016
,
Nov 5 2016
Follow-up: can you point out some monarch queries that use the new data? Which ones are likely to show us when we are having GoB issues?
,
Nov 9 2016
sync retry records count on a different label (it's handled in _CleanUpAndRunCommand). fetch retry doesn't record count on a different label, should we change the way retry_util records count to distinguish the first try and the other retries?
,
Nov 16 2016
The following revision refers to this bug: https://chromium.googlesource.com/chromiumos/chromite/+/0337e23e7805df6dd423114513f15852cc4d2a11 commit 0337e23e7805df6dd423114513f15852cc4d2a11 Author: Ningning Xia <nxia@chromium.org> Date: Tue Nov 15 19:45:29 2016 Record monarch metrics for git fetch retries. Recording a seperate monarch name for git fetch retries can give us better numbers on the git remote operation performances. BUG= chromium:660248 TEST=run_tests Change-Id: If07cd5fb884372586fb3eb794fc7ddb611c35958 Reviewed-on: https://chromium-review.googlesource.com/411476 Commit-Ready: Ningning Xia <nxia@chromium.org> Tested-by: Ningning Xia <nxia@chromium.org> Reviewed-by: Aviv Keshet <akeshet@chromium.org> [modify] https://crrev.com/0337e23e7805df6dd423114513f15852cc4d2a11/lib/constants.py [modify] https://crrev.com/0337e23e7805df6dd423114513f15852cc4d2a11/lib/patch.py [modify] https://crrev.com/0337e23e7805df6dd423114513f15852cc4d2a11/lib/retry_util.py
,
Nov 18 2016
Repo sync (excludes retry): MON_REPO_SYNC_COUNT = 'chromeos/cbuildbot/repo/sync_count' Repo sync retry: MON_REPO_SYNC_RETRY_COUNT = 'chromeos/cbuildbot/repo/sync_retry_count' git fetch (includes retry): MON_GIT_FETCH_COUNT = 'chromeos/cbuildbot/git/fetch_count' git fetch retry: MON_GIT_FETCH_RETRY_COUNT = 'chromeos/cbuildbot/git/fetch_retry_count'
,
Nov 18 2016
#6 is about the monarch labels to search.
,
Dec 10 2016
The following revision refers to this bug: https://chromium.googlesource.com/chromiumos/third_party/kernel/+/edfc0de027e03f53e8a6c5177c9b2b6eddc93ce4 commit edfc0de027e03f53e8a6c5177c9b2b6eddc93ce4 Author: Oleg Nesterov <oleg@redhat.com> Date: Wed Apr 02 15:45:05 2014 UPSTREAM: pid_namespace: pidns_get() should check task_active_pid_ns() != NULL pidns_get()->get_pid_ns() can hit ns == NULL. This task_struct can't go away, but task_active_pid_ns(task) is NULL if release_task(task) was already called. Alternatively we could change get_pid_ns(ns) to check ns != NULL, but it seems that other callers are fine. BUG= chromium:660248 TEST=Build and run Change-Id: I282b679b2b4b30d4279295a94d561dea6d970238 Signed-off-by: Oleg Nesterov <oleg@redhat.com> Cc: Eric W. Biederman ebiederm@xmission.com> Cc: stable@kernel.org Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Guenter Roeck <groeck@chromium.org> (cherry picked from commit d23082257d83e) Reviewed-on: https://chromium-review.googlesource.com/418084 Reviewed-by: Kevin Cernekee <cernekee@chromium.org> [modify] https://crrev.com/edfc0de027e03f53e8a6c5177c9b2b6eddc93ce4/kernel/pid_namespace.c
,
Dec 10 2016
The following revision refers to this bug: https://chromium.googlesource.com/chromiumos/third_party/kernel/+/89a351662f8ee8bed72e5bde3afb984f0511ac53 commit 89a351662f8ee8bed72e5bde3afb984f0511ac53 Author: Oleg Nesterov <oleg@redhat.com> Date: Wed Dec 10 23:55:25 2014 UPSTREAM: exit: pidns: alloc_pid() leaks pid_namespace if child_reaper is exiting alloc_pid() does get_pid_ns() beforehand but forgets to put_pid_ns() if it fails because disable_pid_allocation() was called by the exiting child_reaper. We could simply move get_pid_ns() down to successful return, but this fix tries to be as trivial as possible. BUG= chromium:660248 TEST=Build and run Change-Id: If7a21f2b5132d7b258cb42cbcba4c4c713219689 Signed-off-by: Oleg Nesterov <oleg@redhat.com> Reviewed-by: "Eric W. Biederman" <ebiederm@xmission.com> Cc: Aaron Tomlin <atomlin@redhat.com> Cc: Pavel Emelyanov <xemul@parallels.com> Cc: Serge Hallyn <serge.hallyn@ubuntu.com> Cc: Sterling Alexander <stalexan@redhat.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Guenter Roeck <groeck@chromium.org> (cherry picked from commit 24c037ebf5723) Reviewed-on: https://chromium-review.googlesource.com/418086 Reviewed-by: Kevin Cernekee <cernekee@chromium.org> [modify] https://crrev.com/89a351662f8ee8bed72e5bde3afb984f0511ac53/kernel/pid.c
,
Dec 10 2016
The following revision refers to this bug: https://chromium.googlesource.com/chromiumos/third_party/kernel/+/e8ed738cfacbda9d6898cca3363b0c3971cdb452 commit e8ed738cfacbda9d6898cca3363b0c3971cdb452 Author: Oleg Nesterov <oleg@redhat.com> Date: Mon Apr 07 22:38:41 2014 UPSTREAM: wait: fix reparent_leader() vs EXIT_DEAD->EXIT_ZOMBIE race wait_task_zombie() first does EXIT_ZOMBIE->EXIT_DEAD transition and drops tasklist_lock. If this task is not the natural child and it is traced, we change its state back to EXIT_ZOMBIE for ->real_parent. The last transition is racy, this is even documented in 50b8d257486a "ptrace: partially fix the do_wait(WEXITED) vs EXIT_DEAD->EXIT_ZOMBIE race". wait_consider_task() tries to detect this transition and clear ->notask_error but we can't rely on ptrace_reparented(), debugger can exit and do ptrace_unlink() before its sub-thread sets EXIT_ZOMBIE. And there is another problem which were missed before: this transition can also race with reparent_leader() which doesn't reset >exit_signal if EXIT_DEAD, assuming that this task must be reaped by someone else. So the tracee can be re-parented with ->exit_signal != SIGCHLD, and if /sbin/init doesn't use __WALL it becomes unreapable. Change reparent_leader() to update ->exit_signal even if EXIT_DEAD. Note: this is the simple temporary hack for -stable, it doesn't try to solve all problems, it will be reverted by the next changes. BUG= chromium:660248 TEST=Build and run Change-Id: I7e53823c1a73024a4bf13900cebdc58999f1606a Signed-off-by: Oleg Nesterov <oleg@redhat.com> Reported-by: Jan Kratochvil <jan.kratochvil@redhat.com> Reported-by: Michal Schmidt <mschmidt@redhat.com> Tested-by: Michal Schmidt <mschmidt@redhat.com> Cc: Al Viro <viro@ZenIV.linux.org.uk> Cc: Lennart Poettering <lpoetter@redhat.com> Cc: Roland McGrath <roland@hack.frob.com> Cc: Tejun Heo <tj@kernel.org> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Guenter Roeck <groeck@chromium.org> (cherry picked from commit dfccbb5e49a6) Reviewed-on: https://chromium-review.googlesource.com/418085 Reviewed-by: Kevin Cernekee <cernekee@chromium.org> [modify] https://crrev.com/e8ed738cfacbda9d6898cca3363b0c3971cdb452/kernel/exit.c
,
Dec 10 2016
The following revision refers to this bug: https://chromium.googlesource.com/chromiumos/third_party/kernel/+/7c54838c3fe8ce350843ce4d2490f011e3111109 commit 7c54838c3fe8ce350843ce4d2490f011e3111109 Author: Oleg Nesterov <oleg@redhat.com> Date: Wed Jul 03 22:08:31 2013 UPSTREAM: kernel/fork.c:copy_process(): don't add the uninitialized child to thread/task/pid lists copy_process() adds the new child to thread_group/init_task.tasks list and then does attach_pid(child, PIDTYPE_PID). This means that the lockless next_thread() or next_task() can see this thread with the wrong pid. Say, "ls /proc/pid/task" can list the same inode twice. We could move attach_pid(child, PIDTYPE_PID) up, but in this case find_task_by_vpid() can find the new thread before it was fully initialized. And this is already true for PIDTYPE_PGID/PIDTYPE_SID, With this patch copy_process() initializes child->pids[*].pid first, then calls attach_pid() to insert the task into the pid->tasks list. attach_pid() no longer need the "struct pid*" argument, it is always called after pid_link->pid was already set. BUG= chromium:660248 TEST=Build and run Change-Id: Iec3ec38679a4d89d3acffefc55ff2169b91d944d Signed-off-by: Oleg Nesterov <oleg@redhat.com> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: Michal Hocko <mhocko@suse.cz> Cc: Pavel Emelyanov <xemul@parallels.com> Cc: Sergey Dyasly <dserrg@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Guenter Roeck <groeck@chromium.org> (cherry picked from commit 8190773985141) Reviewed-on: https://chromium-review.googlesource.com/418088 Reviewed-by: Kevin Cernekee <cernekee@chromium.org> [modify] https://crrev.com/7c54838c3fe8ce350843ce4d2490f011e3111109/kernel/pid.c [modify] https://crrev.com/7c54838c3fe8ce350843ce4d2490f011e3111109/kernel/fork.c [modify] https://crrev.com/7c54838c3fe8ce350843ce4d2490f011e3111109/include/linux/pid.h
,
Dec 10 2016
The following revision refers to this bug: https://chromium.googlesource.com/chromiumos/third_party/kernel/+/4e34e2497ba9eb96ec3bea8b2aa52cf0cf4f0707 commit 4e34e2497ba9eb96ec3bea8b2aa52cf0cf4f0707 Author: Oleg Nesterov <oleg@redhat.com> Date: Mon Sep 30 20:45:27 2013 UPSTREAM: pidns: fix free_pid() to handle the first fork failure "case 0" in free_pid() assumes that disable_pid_allocation() should clear PIDNS_HASH_ADDING before the last pid goes away. However this doesn't happen if the first fork() fails to create the child reaper which should call disable_pid_allocation(). BUG= chromium:660248 TEST=Build and run Change-Id: Ibf7caa51e3e6a9c3312bce5a765dfe7f1353b005 Signed-off-by: Oleg Nesterov <oleg@redhat.com> Reviewed-by: "Eric W. Biederman" <ebiederm@xmission.com> Cc: "Serge E. Hallyn" <serge@hallyn.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Guenter Roeck <groeck@chromium.org> (cherry picked from commit 314a8ad0f18ac37) Reviewed-on: https://chromium-review.googlesource.com/418087 Reviewed-by: Kevin Cernekee <cernekee@chromium.org> [modify] https://crrev.com/4e34e2497ba9eb96ec3bea8b2aa52cf0cf4f0707/kernel/pid.c
,
Dec 10 2016
The following revision refers to this bug: https://chromium.googlesource.com/chromiumos/third_party/kernel/+/356a5de792d6da3da036deded8b7b22f7d2ec40f commit 356a5de792d6da3da036deded8b7b22f7d2ec40f Author: Al Viro <viro@zeniv.linux.org.uk> Date: Thu Oct 03 17:28:06 2013 UPSTREAM: pid_namespace: make freeing struct pid_namespace rcu-delayed makes procfs ->premission() instances safety in RCU mode independent from vfsmount_lock. BUG= chromium:660248 TEST=Build and run Change-Id: I591f7df90b3530c4b1ff3cf7eb660aae208d0d84 Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Guenter Roeck <groeck@chromium.org> (cherry picked from commit 1adfcb03e31b) Reviewed-on: https://chromium-review.googlesource.com/418089 Reviewed-by: Kevin Cernekee <cernekee@chromium.org> [modify] https://crrev.com/356a5de792d6da3da036deded8b7b22f7d2ec40f/include/linux/pid_namespace.h [modify] https://crrev.com/356a5de792d6da3da036deded8b7b22f7d2ec40f/kernel/pid_namespace.c
,
Dec 10 2016
The following revision refers to this bug: https://chromium.googlesource.com/chromiumos/third_party/kernel/+/2b9e051698c6a40008b2a02db277c2e53f3bcc59 commit 2b9e051698c6a40008b2a02db277c2e53f3bcc59 Author: Guenter Roeck <groeck@chromium.org> Date: Thu Dec 08 22:29:28 2016 CHROMIUM: pid_namespace: print hung tasks after 10 seconds Crash reporter report a large number of of crashes due to hung tasks in zap_pid_ns_processes(). Most of those are reported in the 3.8 and 3.10 kernels. A typical traceback looks as follows. Backtrace: [<c063f4b8>] (__schedule+0x4b0/0x740) from [<c063f7d8>] (schedule+0x90/0x94) [<c063f7d8>] (schedule+0x90/0x94) from [<c018720c>] (zap_pid_ns_processes+0x11c/0x148) [<c018720c>] (zap_pid_ns_processes+0x11c/0x148) from [<c0127454>] (do_exit+0x454/0x8ac) [<c0127454>] (do_exit+0x454/0x8ac) from [<c0128810>] (do_group_exit+0x5c/0xc0) [<c0128810>] (do_group_exit+0x5c/0xc0) from [<c0128894>] (__wake_up_parent+0x0/0x30) [<c0128894>] (__wake_up_parent+0x0/0x30) from [<c01061cc>] (__sys_trace_return+0x0/0x14) Kernel panic - not syncing: hung_task: blocked tasks To help tracking down the problem, dump all threads associated with the pid namespace to the console if tasks are still active 10 seconds after the request to kill them. Also, display a warning if the number of threads in the PID namespace is smaller than expected, since this would result in the observed hang. BUG= chromium:660248 TEST=Build and run; ensure that no messages are seen during normal operation Change-Id: Ifb7b570627d485da1dc1141d8249f922bfe35af6 Signed-off-by: Guenter Roeck <groeck@chromium.org> Reviewed-on: https://chromium-review.googlesource.com/418090 Reviewed-by: Kevin Cernekee <cernekee@chromium.org> [modify] https://crrev.com/2b9e051698c6a40008b2a02db277c2e53f3bcc59/kernel/pid_namespace.c
,
Dec 10 2016
The following revision refers to this bug: https://chromium.googlesource.com/chromiumos/third_party/kernel/+/6deaa38389568266701a7d8464ad30bf6e909067 commit 6deaa38389568266701a7d8464ad30bf6e909067 Author: Oleg Nesterov <oleg@redhat.com> Date: Wed Jul 03 22:08:25 2013 UPSTREAM: fs/exec.c:de_thread(): use change_pid() rather than detach_pid/attach_pid de_thread() can use change_pid() instead of detach + attach. This looks better and this ensures that, say, next_thread() can never see a task with ->pid == NULL. BUG= chromium:660248 TEST=Build and run Change-Id: I36c02d06196ac264226ddb406f0323eef4fa8ab3 Signed-off-by: Oleg Nesterov <oleg@redhat.com> Acked-by: "Eric W. Biederman" <ebiederm@xmission.com> Cc: Michal Hocko <mhocko@suse.cz> Cc: Pavel Emelyanov <xemul@parallels.com> Cc: Sergey Dyasly <dserrg@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Guenter Roeck <groeck@chromium.org> (cherry picked from commit 3f4185483832c) Reviewed-on: https://chromium-review.googlesource.com/418689 Reviewed-by: Kevin Cernekee <cernekee@chromium.org> [modify] https://crrev.com/6deaa38389568266701a7d8464ad30bf6e909067/fs/exec.c
,
Dec 10 2016
The following revision refers to this bug: https://chromium.googlesource.com/chromiumos/third_party/kernel/+/45f9cecb0cefe7471798429a85a83472b61faeba commit 45f9cecb0cefe7471798429a85a83472b61faeba Author: Oleg Nesterov <oleg@redhat.com> Date: Wed Apr 02 15:45:05 2014 UPSTREAM: pid_namespace: pidns_get() should check task_active_pid_ns() != NULL pidns_get()->get_pid_ns() can hit ns == NULL. This task_struct can't go away, but task_active_pid_ns(task) is NULL if release_task(task) was already called. Alternatively we could change get_pid_ns(ns) to check ns != NULL, but it seems that other callers are fine. BUG= chromium:660248 TEST=Build and run Change-Id: I282b679b2b4b30d4279295a94d561dea6d970238 Signed-off-by: Oleg Nesterov <oleg@redhat.com> Cc: Eric W. Biederman ebiederm@xmission.com> Cc: stable@kernel.org Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Guenter Roeck <groeck@chromium.org> (cherry picked from commit d23082257d83e) Reviewed-on: https://chromium-review.googlesource.com/418251 Reviewed-by: Kevin Cernekee <cernekee@chromium.org> [modify] https://crrev.com/45f9cecb0cefe7471798429a85a83472b61faeba/kernel/pid_namespace.c
,
Dec 10 2016
The following revision refers to this bug: https://chromium.googlesource.com/chromiumos/third_party/kernel/+/f1d56bd9a2c5e2101c32eca7f7b8684afa909dff commit f1d56bd9a2c5e2101c32eca7f7b8684afa909dff Author: Oleg Nesterov <oleg@redhat.com> Date: Mon Apr 07 22:38:41 2014 UPSTREAM: wait: fix reparent_leader() vs EXIT_DEAD->EXIT_ZOMBIE race wait_task_zombie() first does EXIT_ZOMBIE->EXIT_DEAD transition and drops tasklist_lock. If this task is not the natural child and it is traced, we change its state back to EXIT_ZOMBIE for ->real_parent. The last transition is racy, this is even documented in 50b8d257486a "ptrace: partially fix the do_wait(WEXITED) vs EXIT_DEAD->EXIT_ZOMBIE race". wait_consider_task() tries to detect this transition and clear ->notask_error but we can't rely on ptrace_reparented(), debugger can exit and do ptrace_unlink() before its sub-thread sets EXIT_ZOMBIE. And there is another problem which were missed before: this transition can also race with reparent_leader() which doesn't reset >exit_signal if EXIT_DEAD, assuming that this task must be reaped by someone else. So the tracee can be re-parented with ->exit_signal != SIGCHLD, and if /sbin/init doesn't use __WALL it becomes unreapable. Change reparent_leader() to update ->exit_signal even if EXIT_DEAD. Note: this is the simple temporary hack for -stable, it doesn't try to solve all problems, it will be reverted by the next changes. BUG= chromium:660248 TEST=Build and run Change-Id: I7e53823c1a73024a4bf13900cebdc58999f1606a Signed-off-by: Oleg Nesterov <oleg@redhat.com> Reported-by: Jan Kratochvil <jan.kratochvil@redhat.com> Reported-by: Michal Schmidt <mschmidt@redhat.com> Tested-by: Michal Schmidt <mschmidt@redhat.com> Cc: Al Viro <viro@ZenIV.linux.org.uk> Cc: Lennart Poettering <lpoetter@redhat.com> Cc: Roland McGrath <roland@hack.frob.com> Cc: Tejun Heo <tj@kernel.org> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Guenter Roeck <groeck@chromium.org> (cherry picked from commit dfccbb5e49a6) Reviewed-on: https://chromium-review.googlesource.com/418252 Reviewed-by: Kevin Cernekee <cernekee@chromium.org> [modify] https://crrev.com/f1d56bd9a2c5e2101c32eca7f7b8684afa909dff/kernel/exit.c
,
Dec 10 2016
The following revision refers to this bug: https://chromium.googlesource.com/chromiumos/third_party/kernel/+/7ec4cfd463e77427474b36db7ab67a335724d718 commit 7ec4cfd463e77427474b36db7ab67a335724d718 Author: Oleg Nesterov <oleg@redhat.com> Date: Wed Dec 10 23:55:25 2014 UPSTREAM: exit: pidns: alloc_pid() leaks pid_namespace if child_reaper is exiting alloc_pid() does get_pid_ns() beforehand but forgets to put_pid_ns() if it fails because disable_pid_allocation() was called by the exiting child_reaper. We could simply move get_pid_ns() down to successful return, but this fix tries to be as trivial as possible. BUG= chromium:660248 TEST=Build and run Change-Id: If7a21f2b5132d7b258cb42cbcba4c4c713219689 Signed-off-by: Oleg Nesterov <oleg@redhat.com> Reviewed-by: "Eric W. Biederman" <ebiederm@xmission.com> Cc: Aaron Tomlin <atomlin@redhat.com> Cc: Pavel Emelyanov <xemul@parallels.com> Cc: Serge Hallyn <serge.hallyn@ubuntu.com> Cc: Sterling Alexander <stalexan@redhat.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Guenter Roeck <groeck@chromium.org> (cherry picked from commit 24c037ebf5723) Reviewed-on: https://chromium-review.googlesource.com/418253 Reviewed-by: Kevin Cernekee <cernekee@chromium.org> [modify] https://crrev.com/7ec4cfd463e77427474b36db7ab67a335724d718/kernel/pid.c
,
Dec 10 2016
The following revision refers to this bug: https://chromium.googlesource.com/chromiumos/third_party/kernel/+/fd0cf1809f27d2e88d22d6e68f0ad3240d9d1664 commit fd0cf1809f27d2e88d22d6e68f0ad3240d9d1664 Author: Oleg Nesterov <oleg@redhat.com> Date: Mon Sep 30 20:45:27 2013 UPSTREAM: pidns: fix free_pid() to handle the first fork failure "case 0" in free_pid() assumes that disable_pid_allocation() should clear PIDNS_HASH_ADDING before the last pid goes away. However this doesn't happen if the first fork() fails to create the child reaper which should call disable_pid_allocation(). BUG= chromium:660248 TEST=Build and run Change-Id: Ibf7caa51e3e6a9c3312bce5a765dfe7f1353b005 Signed-off-by: Oleg Nesterov <oleg@redhat.com> Reviewed-by: "Eric W. Biederman" <ebiederm@xmission.com> Cc: "Serge E. Hallyn" <serge@hallyn.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Guenter Roeck <groeck@chromium.org> (cherry picked from commit 314a8ad0f18ac37) Reviewed-on: https://chromium-review.googlesource.com/418254 Reviewed-by: Kevin Cernekee <cernekee@chromium.org> [modify] https://crrev.com/fd0cf1809f27d2e88d22d6e68f0ad3240d9d1664/kernel/pid.c
,
Dec 10 2016
The following revision refers to this bug: https://chromium.googlesource.com/chromiumos/third_party/kernel/+/45cf3d4776d4acbf463c3c1bedf152857d6a2407 commit 45cf3d4776d4acbf463c3c1bedf152857d6a2407 Author: Oleg Nesterov <oleg@redhat.com> Date: Wed Jul 03 22:08:31 2013 UPSTREAM: kernel/fork.c:copy_process(): don't add the uninitialized child to thread/task/pid lists copy_process() adds the new child to thread_group/init_task.tasks list and then does attach_pid(child, PIDTYPE_PID). This means that the lockless next_thread() or next_task() can see this thread with the wrong pid. Say, "ls /proc/pid/task" can list the same inode twice. We could move attach_pid(child, PIDTYPE_PID) up, but in this case find_task_by_vpid() can find the new thread before it was fully initialized. And this is already true for PIDTYPE_PGID/PIDTYPE_SID, With this patch copy_process() initializes child->pids[*].pid first, then calls attach_pid() to insert the task into the pid->tasks list. attach_pid() no longer need the "struct pid*" argument, it is always called after pid_link->pid was already set. BUG= chromium:660248 TEST=Build and run Change-Id: Iec3ec38679a4d89d3acffefc55ff2169b91d944d Signed-off-by: Oleg Nesterov <oleg@redhat.com> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: Michal Hocko <mhocko@suse.cz> Cc: Pavel Emelyanov <xemul@parallels.com> Cc: Sergey Dyasly <dserrg@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Guenter Roeck <groeck@chromium.org> (cherry picked from commit 8190773985141) Reviewed-on: https://chromium-review.googlesource.com/418255 Reviewed-by: Kevin Cernekee <cernekee@chromium.org> [modify] https://crrev.com/45cf3d4776d4acbf463c3c1bedf152857d6a2407/kernel/pid.c [modify] https://crrev.com/45cf3d4776d4acbf463c3c1bedf152857d6a2407/kernel/fork.c [modify] https://crrev.com/45cf3d4776d4acbf463c3c1bedf152857d6a2407/include/linux/pid.h
,
Dec 10 2016
The following revision refers to this bug: https://chromium.googlesource.com/chromiumos/third_party/kernel/+/7d453ea59f779a7d93fd04a44fdacce5f299a9c0 commit 7d453ea59f779a7d93fd04a44fdacce5f299a9c0 Author: Al Viro <viro@zeniv.linux.org.uk> Date: Thu Oct 03 17:28:06 2013 UPSTREAM: pid_namespace: make freeing struct pid_namespace rcu-delayed makes procfs ->premission() instances safety in RCU mode independent from vfsmount_lock. BUG= chromium:660248 TEST=Build and run Change-Id: I591f7df90b3530c4b1ff3cf7eb660aae208d0d84 Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Guenter Roeck <groeck@chromium.org> (cherry picked from commit 1adfcb03e31b) Reviewed-on: https://chromium-review.googlesource.com/418256 Reviewed-by: Kevin Cernekee <cernekee@chromium.org> [modify] https://crrev.com/7d453ea59f779a7d93fd04a44fdacce5f299a9c0/include/linux/pid_namespace.h [modify] https://crrev.com/7d453ea59f779a7d93fd04a44fdacce5f299a9c0/kernel/pid_namespace.c
,
Dec 10 2016
The following revision refers to this bug: https://chromium.googlesource.com/chromiumos/third_party/kernel/+/b202bd7e85d35b4a84adcd3606d3b74c946d51f4 commit b202bd7e85d35b4a84adcd3606d3b74c946d51f4 Author: Guenter Roeck <groeck@chromium.org> Date: Thu Dec 08 22:29:28 2016 CHROMIUM: pid_namespace: print hung tasks after 10 seconds Crash reporter report a large number of of crashes due to hung tasks in zap_pid_ns_processes(). Most of those are reported in the 3.8 and 3.10 kernels. A typical traceback looks as follows. Backtrace: [<c063f4b8>] (__schedule+0x4b0/0x740) from [<c063f7d8>] (schedule+0x90/0x94) [<c063f7d8>] (schedule+0x90/0x94) from [<c018720c>] (zap_pid_ns_processes+0x11c/0x148) [<c018720c>] (zap_pid_ns_processes+0x11c/0x148) from [<c0127454>] (do_exit+0x454/0x8ac) [<c0127454>] (do_exit+0x454/0x8ac) from [<c0128810>] (do_group_exit+0x5c/0xc0) [<c0128810>] (do_group_exit+0x5c/0xc0) from [<c0128894>] (__wake_up_parent+0x0/0x30) [<c0128894>] (__wake_up_parent+0x0/0x30) from [<c01061cc>] (__sys_trace_return+0x0/0x14) Kernel panic - not syncing: hung_task: blocked tasks To help tracking down the problem, dump all threads associated with the pid namespace to the console if tasks are still active 10 seconds after the request to kill them. Also, display a warning if the number of threads in the PID namespace is smaller than expected, since this would result in the observed hang. BUG= chromium:660248 TEST=Build and run; ensure that no messages are seen during normal operation Change-Id: Ifb7b570627d485da1dc1141d8249f922bfe35af6 Signed-off-by: Guenter Roeck <groeck@chromium.org> Reviewed-on: https://chromium-review.googlesource.com/418257 Reviewed-by: Kevin Cernekee <cernekee@chromium.org> [modify] https://crrev.com/b202bd7e85d35b4a84adcd3606d3b74c946d51f4/kernel/pid_namespace.c
,
Dec 10 2016
The following revision refers to this bug: https://chromium.googlesource.com/chromiumos/third_party/kernel/+/604df9d41b658fbc1e1143e4c266e4ac4b79526d commit 604df9d41b658fbc1e1143e4c266e4ac4b79526d Author: Oleg Nesterov <oleg@redhat.com> Date: Wed Jul 03 22:08:25 2013 UPSTREAM: fs/exec.c:de_thread(): use change_pid() rather than detach_pid/attach_pid de_thread() can use change_pid() instead of detach + attach. This looks better and this ensures that, say, next_thread() can never see a task with ->pid == NULL. BUG= chromium:660248 TEST=Build and run Change-Id: I36c02d06196ac264226ddb406f0323eef4fa8ab3 Signed-off-by: Oleg Nesterov <oleg@redhat.com> Acked-by: "Eric W. Biederman" <ebiederm@xmission.com> Cc: Michal Hocko <mhocko@suse.cz> Cc: Pavel Emelyanov <xemul@parallels.com> Cc: Sergey Dyasly <dserrg@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Guenter Roeck <groeck@chromium.org> (cherry picked from commit 3f4185483832c) Reviewed-on: https://chromium-review.googlesource.com/418501 Reviewed-by: Kevin Cernekee <cernekee@chromium.org> [modify] https://crrev.com/604df9d41b658fbc1e1143e4c266e4ac4b79526d/fs/exec.c
,
Mar 4 2017
,
Apr 17 2017
,
May 13 2017
The following revision refers to this bug: https://chromium.googlesource.com/chromiumos/third_party/kernel/+/f799a51d982e3853dbd13cc16a3c38aa0f25bfc1 commit f799a51d982e3853dbd13cc16a3c38aa0f25bfc1 Author: Guenter Roeck <groeck@chromium.org> Date: Sat May 13 01:46:15 2017 Revert "CHROMIUM: pid_namespace: print hung tasks after 10 seconds" This reverts commit 2b9e051698c6a40008b2a02db277c2e53f3bcc59. Reason for revert: The code does not work, and we have an at least temporary fix for the underlying problem. Original change's description: > CHROMIUM: pid_namespace: print hung tasks after 10 seconds > > Crash reporter report a large number of of crashes due to hung tasks > in zap_pid_ns_processes(). Most of those are reported in the 3.8 and > 3.10 kernels. A typical traceback looks as follows. > > Backtrace: > [<c063f4b8>] (__schedule+0x4b0/0x740) from [<c063f7d8>] (schedule+0x90/0x94) > [<c063f7d8>] (schedule+0x90/0x94) from [<c018720c>] (zap_pid_ns_processes+0x11c/0x148) > [<c018720c>] (zap_pid_ns_processes+0x11c/0x148) from [<c0127454>] (do_exit+0x454/0x8ac) > [<c0127454>] (do_exit+0x454/0x8ac) from [<c0128810>] (do_group_exit+0x5c/0xc0) > [<c0128810>] (do_group_exit+0x5c/0xc0) from [<c0128894>] (__wake_up_parent+0x0/0x30) > [<c0128894>] (__wake_up_parent+0x0/0x30) from [<c01061cc>] (__sys_trace_return+0x0/0x14) > Kernel panic - not syncing: hung_task: blocked tasks > > To help tracking down the problem, dump all threads associated with the > pid namespace to the console if tasks are still active 10 seconds after > the request to kill them. > > Also, display a warning if the number of threads in the PID namespace is > smaller than expected, since this would result in the observed hang. > > BUG= chromium:660248 > TEST=Build and run; ensure that no messages are seen during normal operation > > Change-Id: Ifb7b570627d485da1dc1141d8249f922bfe35af6 > Signed-off-by: Guenter Roeck <groeck@chromium.org> > Reviewed-on: https://chromium-review.googlesource.com/418090 > Reviewed-by: Kevin Cernekee <cernekee@chromium.org> > BUG= chromium:660248 Change-Id: Id3c2895c74849cd48ff1d207fff729b9b12d94be Reviewed-on: https://chromium-review.googlesource.com/505195 Commit-Ready: Guenter Roeck <groeck@chromium.org> Tested-by: Guenter Roeck <groeck@chromium.org> Reviewed-by: Douglas Anderson <dianders@chromium.org> [modify] https://crrev.com/f799a51d982e3853dbd13cc16a3c38aa0f25bfc1/kernel/pid_namespace.c
,
May 30 2017
,
Aug 1 2017
,
Oct 14 2017
|
|||||||||||||
►
Sign in to add a comment |
|||||||||||||
Comment 1 by bugdroid1@chromium.org
, Nov 3 2016