New issue
Advanced search Search tips
Note: Color blocks (like or ) mean that a user may not be available. Tooltip shows the reason.

Issue 845572 link

Starred by 2 users

Issue metadata

Status: Fixed
Owner:
Closed: Jun 26
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: Chrome
Pri: 2
Type: Bug



Sign in to add a comment

/sys/kernel/debug/tracing is broken for 3.18 (at least)

Project Member Reported by semenzato@chromium.org, May 22

Issue description

Both on cyan and caroline (both 3.18) I get this panic:

localhost ~ # echo function > /sys/kernel/debug/tracing/current_tracer 

[   22.592228] kernel tried to execute NX-protected page - exploit attempt? (uid: 0)
[   22.592254] BUG: unable to handle kernel paging request at ffff88017a278000
[   22.592303] IP: [<ffffffff9af7725a>] _raw_spin_unlock_irqrestore+0x0/0x2a
[   22.592343] PGD 1b73f067 PUD 1b742067 PMD 800000017a2001e3 
[   22.592406] Oops: 0011 [#1] PREEMPT SMP 
[   22.594578] gsmi: Log Shutdown Reason 0x03
[   22.594601] Modules linked in: cmac rfcomm uinput iwlmvm iwlwifi snd_soc_dmic iwl7000_mac80211 btusb snd_soc_skl_nau88l25_ssm4567 snd_soc_hdac_hdmi snd_soc_skl snd_soc_skl_ipc snd_soc_sst_acpi snd_soc_sst_ipc snd_soc_sst_dsp snd_hda_ext_core memconsole_x86_legacy memconsole btrtl snd_hda_core btbcm btintel uvcvideo zram bluetooth snd_soc_nau8825 bridge fuse stp videobuf2_vmalloc videobuf2_memops videobuf2_core llc snd_soc_ssm4567 ipt_MASQUERADE nf_nat_masquerade_ipv4 xt_mark snd_seq_dummy cfg80211 iio_trig_sysfs cros_ec_sensors_ring snd_seq_midi snd_seq_midi_event snd_rawmidi cros_ec_sensors snd_seq cros_ec_sensors_core industrialio_triggered_buffer kfifo_buf industrialio ip6table_filter snd_seq_device r8152 mii joydev
[   22.594746] CPU: 1 PID: 0 Comm: swapper/1 Tainted: G        W      3.18.0 #1
[   22.594753] Hardware name: Google Caroline/Caroline, BIOS Google_Caroline.7820.349.0 11/17/2017
[   22.594762] task: ffff88017a226460 ti: ffff88017a278000 task.ti: ffff88017a278000
[   22.594770] RIP: 3bc1:[<ffffffff9af771e0>]  [<ffffffff9af771e0>] _raw_spin_lock_irqsave+0x0/0x22
[   22.594782] RSP: ffffffff9af771e0:ffff88017ec8af58  EFLAGS: 00010046
[   22.594789] RAX: 0000000000000005 RBX: ffff88017ec8aec0 RCX: 0000000000000005
[   22.594796] RDX: ffff88017ec8aef8 RSI: ffffffff9a9fb048 RDI: ffff88017ec93a40
[   22.594803] RBP: ffffffff9a9fad1a R08: ffffffff9af771e0 R09: ffffffff9af771e0
[   22.594810] R10: ffff88017a0045a0 R11: ffff88017a226460 R12: 0000000000000ee4
[   22.594818] R13: ffff88017a278000 R14: ffffffff9a8b6feb R15: ffff88017ec8aea8
[   22.594826] FS:  0000000000000000(0000) GS:ffff88017ec80000(0000) knlGS:0000000000000000
[   22.594834] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[   22.594840] CR2: ffff88017a278000 CR3: 000000001b418000 CR4: 0000000000360770
[   22.594847] 
[   22.594850] RIP  [<ffffffff9af771e0>] _raw_spin_lock_irqsave+0x0/0x22
[   22.594859]  RSP <ffff88017ec8af58>
[   22.594863] CR2: ffff88017a278000
[   22.594868] ---[ end trace aab15d72dbefc14b ]---
[   22.600809] Kernel panic - not syncing: Fatal exception
[   22.600822] Kernel Offset: 0x19800000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff)
[   22.600992] gsmi: Log Shutdown Reason 0x02
[   22.607078] ACPI MEMORY or I/O RESET_REG.

The same panic occurs after applying a set of patches (about 180 of them), at least one of which purports to fix a problem with /sys/kernel/debug/tracing (probably a different problem).

git fetch https://chromium.googlesource.com/chromiumos/third_party/kernel refs/changes/06/1068506/1 && git checkout FETCH_HEAD

My previous image on cyan had a January 11 timestamp, and that worked fine.  

Sonny is taking a look.


 
I tried going back to a January commit
cf4d5fa114d4825eb59f82e48ccd683eb9a4b442 UPSTREAM: crypto: salsa20 - fix blkcipher_walk API usage

and it worked for me on caroline


Owner: semenzato@chromium.org
Status: Started (was: Untriaged)
Thanks.  I'll do a bisect.
I'll bet it's KPTI missing a mapping somewhere...
semenzato@luigi:~/.../third_party/kernel/v3.18$ git bisect log
# bad: [a2b13ac7448907a7cc272f1cb16e8e6e520092bd] UPSTREAM: KVM: X86: Fix load RFLAGS w/o the fixed bit
# good: [cf4d5fa114d4825eb59f82e48ccd683eb9a4b442] UPSTREAM: crypto: salsa20 - fix blkcipher_walk API usage
git bisect start 'a2b13ac7448907a7cc272f1cb16e8e6e520092bd' 'cf4d5fa114d4825eb'
# good: [fc3884617930ff550c3192353e22ff5a1d97111c] CHROMIUM: iwl7000: chromeOS: move genetlink.h and crypto.h below rhashtable.h
git bisect good fc3884617930ff550c3192353e22ff5a1d97111c
# skip: [953ddc2e38dd1219dc9e656af711428bb2c83de4] ANDROID: FROMLIST: binder: add spinlock to protect binder_node
git bisect skip 953ddc2e38dd1219dc9e656af711428bb2c83de4
# good: [e8eae3e19046d9cda7e11d96ba8e42e862c9afcb] ANDROID: FROMLIST: android: binder: Add allocator selftest
git bisect good e8eae3e19046d9cda7e11d96ba8e42e862c9afcb
# bad: [479af323cfd21deb24890cf6c7b3c881aa1e7687] UPSTREAM: x86, tls, ldt: Stop checking lm in LDT_empty
git bisect bad 479af323cfd21deb24890cf6c7b3c881aa1e7687
# bad: [70a45c0200e3436587ed1a698927a387bbff1a7c] UPSTREAM: ALSA: seq: Don't allow resizing pool in use
git bisect bad 70a45c0200e3436587ed1a698927a387bbff1a7c
# good: [1b2110766b1e20761e32454f31f02381a8c0c2d3] BACKPORT: tpm_tis: Move ilb_base_addr to tpm_tis_data
git bisect good 1b2110766b1e20761e32454f31f02381a8c0c2d3
# good: [7b395110babfce6eff2db92067e4923659f8866b] CHROMIUM: iwl7000: mvm: allow unknown crypto RX outside aggregation
git bisect good 7b395110babfce6eff2db92067e4923659f8866b
# skip: [8df5862c8b40b48f54102968f8210749aa211fd4] CHROMIUM: iwl7000: Merge core35 driver updates
git bisect skip 8df5862c8b40b48f54102968f8210749aa211fd4
# bad: [b3f52b891347ee623ec4e5ce6ee637d7005aab7b] CHROMIUM: Revert "net: socket ioctl to reset connections matching local address"
git bisect bad b3f52b891347ee623ec4e5ce6ee637d7005aab7b
# bad: [330699d6c2a0d8ef5c32066c750ab8a4d107c524] BACKPORT: x86/entry/64: Don't use IST entry for #BP stack
git bisect bad 330699d6c2a0d8ef5c32066c750ab8a4d107c524
# good: [841e3be21933ab800961ea9cf7847f0582fe72ed] BACKPORT: tpm: Keep CLKRUN enabled throughout the duration of transmit_cmd()
git bisect good 841e3be21933ab800961ea9cf7847f0582fe72ed


So this seems to be the bad patch.

I know Andy.  Should Guido and I have a chat with him?


commit 330699d6c2a0d8ef5c32066c750ab8a4d107c524 (HEAD, refs/bisect/bad)
Author:     Andy Lutomirski <luto@kernel.org>
AuthorDate: Thu Jul 23 15:37:48 2015 -0700
Commit:     chrome-bot <chrome-bot@chromium.org>
CommitDate: Wed Apr 11 14:55:44 2018 -0700

    BACKPORT: x86/entry/64: Don't use IST entry for #BP stack
    
    There's nothing IST-worthy about #BP/int3.  We don't allow kprobes
    in the small handful of places in the kernel that run at CPL0 with
    an invalid stack, and 32-bit kernels have used normal interrupt
    gates for #BP forever.
    
    Furthermore, we don't allow kprobes in places that have usergs while
    in kernel mode, so "paranoid" is also unnecessary.
    
    Signed-off-by: Andy Lutomirski <luto@kernel.org>
    Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
    Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
    Cc: stable@vger.kernel.org
    
    BUG=chromium:831269
    TEST=Build and boot
    
    Change-Id: I2bb5aece3c789aa3d845d03c9b57f10daf30fd60
    [backport: Context and file name changes. Changes in trap_init do not apply.]
    Signed-off-by: Guenter Roeck <groeck@chromium.org>
    (cherry picked from commit d8ba61ba58c88d5207c1ba2f7d9a2280e7d03be9)
    Reviewed-on: https://chromium-review.googlesource.com/1005885
    Reviewed-by: Dylan Reid <dgreid@chromium.org>

diff --git a/arch/x86/kernel/entry_64.S b/arch/x86/kernel/entry_64.S
index 9c24797e8039..a46834d89f9a 100644
--- a/arch/x86/kernel/entry_64.S
+++ b/arch/x86/kernel/entry_64.S
@@ -1345,7 +1345,7 @@ apicinterrupt3 HYPERVISOR_CALLBACK_VECTOR \
 #endif /* CONFIG_HYPERV */
 
 idtentry debug do_debug has_error_code=0 paranoid=1 shift_ist=DEBUG_STACK
-idtentry int3 do_int3 has_error_code=0 paranoid=1 shift_ist=DEBUG_STACK
+idtentry int3 do_int3 has_error_code=0
 idtentry stack_segment do_stack_segment has_error_code=1
 #ifdef CONFIG_XEN
 idtentry xen_debug do_debug has_error_code=0
diff --git a/arch/x86/kernel/traps.c b/arch/x86/kernel/traps.c
index de801f22128a..2097c8efc437 100644
--- a/arch/x86/kernel/traps.c
+++ b/arch/x86/kernel/traps.c
@@ -334,7 +334,6 @@ exit:
 }
 NOKPROBE_SYMBOL(do_general_protection);
 
-/* May run on IST stack. */
 dotraplinkage void notrace do_int3(struct pt_regs *regs, long error_code)
 {
        enum ctx_state prev_state;
@@ -367,15 +366,9 @@ dotraplinkage void notrace do_int3(struct pt_regs *regs, long error_code)
                        SIGTRAP) == NOTIFY_STOP)
                goto exit;
 
-       /*
-        * Let others (NMI) know that the debug stack is in use
-        * as we may switch to the interrupt stack.
-        */
-       debug_stack_usage_inc();
        preempt_conditional_sti(regs);
        do_trap(X86_TRAP_BP, SIGTRAP, "int3", regs, error_code, NULL);
        preempt_conditional_cli(regs);
-       debug_stack_usage_dec();
 exit:
        exception_exit(prev_state);
 }

Components: OS>Kernel
Labels: OS-Chrome
Actually, I suspect that this patch is interfering with some previous security patch which we created or backported.

At this point I invoke the help of the security folks.
Cc: groeck@chromium.org
It looks like Guenter did the backport and may have done some of the other security backports -- any ideas?
Also, did you try to revert just that one commit and see if it starts working?
Unfortunate but not too surprising. I agree with Kees - probably something essential is missing from the backport. If reverting this patch makes it work, I would suggest to go with it. Otherwise, we'll have to dig into it.

#7 yes, reverting this on ToT is possible and makes the bug go away.  Thanks!

Revert at https://chromium-review.googlesource.com/c/chromiumos/third_party/kernel/+/1069421

Cc: kerrnel@chromium.org
Status: Fixed (was: Started)
This was fixed with a revert.  Please see issue 831269 (restricted).

Sign in to add a comment