New issue
Advanced search Search tips
Note: Color blocks (like or ) mean that a user may not be available. Tooltip shows the reason.

Issue 667523 link

Starred by 1 user

Issue metadata

Status: Assigned
Owner:
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: Chrome
Pri: 2
Type: Bug



Sign in to add a comment

snd_hda_parse_hdmi_codec() deadlock

Project Member Reported by cernekee@chromium.org, Nov 21 2016

Issue description

I am seeing a lot of these in the hung_tasks bucket on crash.corp, all coming from Bay Trail Chromebooks (kip, gnawty, enguarde) running Linux 3.10:

<6>[ 9367.837254] cras D ffff8801625a6250 0 1255 1195 0x00000000
<5>[ 9367.837345] ffff8801524cbbe0 0000000000000082 ffff8801524c5330 ffff8801524cbfd8
<5>[ 9367.837420] ffff8801524cbfd8 0000000000012240 ffff8801625a6250 ffff8801796ac320
<5>[ 9367.837449] 00000000ffffffff ffff8801625a6250 ffff8801796ac328 ffff8801796ac340
<5>[ 9367.837479] Call Trace:
<5>[ 9367.837501] [<ffffffff87522228>] schedule+0x64/0x66
<5>[ 9367.837524] [<ffffffff8752260b>] schedule_preempt_disabled+0xe/0x10
<5>[ 9367.837549] [<ffffffff875213c5>] __mutex_lock_slowpath+0x14d/0x1c7
<5>[ 9367.837573] [<ffffffff8752073a>] mutex_lock+0x1f/0x2f
<5>[ 9367.837594] [<ffffffff8752073a>] ? mutex_lock+0x1f/0x2f
<5>[ 9367.837617] [<ffffffffc023f405>] 0xffffffffc023f404
<5>[ 9367.837639] [<ffffffff87413f34>] snd_ctl_elem_info+0x59/0xf0
<5>[ 9367.837661] [<ffffffff87414042>] snd_ctl_elem_info_user+0x77/0xd6
<5>[ 9367.837686] [<ffffffff874152e7>] snd_ctl_ioctl+0x2dd/0x4da
<5>[ 9367.837709] [<ffffffff87105463>] do_vfs_ioctl+0x35b/0x41c
<5>[ 9367.837731] [<ffffffff870f61d1>] ? fsnotify_access+0x5a/0x61
<5>[ 9367.837778] [<ffffffff87105591>] SyS_ioctl+0x6d/0xa5
<5>[ 9367.837838] [<ffffffff87523d02>] system_call_fastpath+0x16/0x1b
<6>[ 9367.837909] kworker/u4:18 D ffff8801797aa200 0 12707 2 0x00000000
<6>[ 9367.837947] Workqueue: hd-audio0 snd_hda_parse_hdmi_codec [snd_hda_codec_hdmi]
<5>[ 9367.838001] ffff880123f47ae0 0000000000000046 ffff88017810f170 ffff880123f47fd8
<5>[ 9367.838031] ffff880123f47fd8 0000000000012240 ffff8801797a9e40 ffff8801796ac320
<5>[ 9367.838075] 00000000ffffffff ffff8801797a9e40 ffff8801796ac328 ffff8801796ac340
<5>[ 9367.838136] Call Trace:
<5>[ 9367.838161] [<ffffffff87522228>] schedule+0x64/0x66
<5>[ 9367.838199] [<ffffffff8752260b>] schedule_preempt_disabled+0xe/0x10
<5>[ 9367.838223] [<ffffffff875213c5>] __mutex_lock_slowpath+0x14d/0x1c7
<5>[ 9367.838253] [<ffffffff8752073a>] mutex_lock+0x1f/0x2f
<5>[ 9367.838275] [<ffffffff8752073a>] ? mutex_lock+0x1f/0x2f
<5>[ 9367.838308] [<ffffffffc0240dbb>] snd_hda_parse_hdmi_codec+0x5aa/0x13f4 [snd_hda_codec_hdmi]
<5>[ 9367.838343] [<ffffffffc0240ffe>] snd_hda_parse_hdmi_codec+0x7ed/0x13f4 [snd_hda_codec_hdmi]
<5>[ 9367.838383] [<ffffffffc01831c5>] snd_hda_jack_set_dirty_all+0x6a/0xff [snd_hda_codec]
<5>[ 9367.838417] [<ffffffffc018379c>] snd_hda_jack_poll_all+0x6b/0x249 [snd_hda_codec]
<5>[ 9367.838450] [<ffffffffc017d1aa>] snd_hda_unlock_devices+0x5e/0xcc [snd_hda_codec]
<5>[ 9367.838484] [<ffffffffc0182761>] snd_hda_build_controls+0x1e5/0x208 [snd_hda_codec]
<5>[ 9367.838523] [<ffffffffc017f869>] snd_hda_power_save+0x115/0x3b3 [snd_hda_codec]
<5>[ 9367.838556] [<ffffffffc017f996>] snd_hda_power_save+0x242/0x3b3 [snd_hda_codec]
<5>[ 9367.838581] [<ffffffff87521b74>] ? do_wait_for_common+0xc7/0x144
<5>[ 9367.838613] [<ffffffffc017fb45>] snd_hda_codec_read+0x3e/0x65 [snd_hda_codec]
<5>[ 9367.838641] [<ffffffffc0241e29>] snd_hdmi_get_eld_size+0x1e/0x20 [snd_hda_codec_hdmi]
<5>[ 9367.838669] [<ffffffffc0241e54>] snd_hdmi_get_eld+0x29/0x99 [snd_hda_codec_hdmi]
<5>[ 9367.838698] [<ffffffffc0240df5>] snd_hda_parse_hdmi_codec+0x5e4/0x13f4 [snd_hda_codec_hdmi]
<5>[ 9367.838724] [<ffffffff87522f17>] ? _raw_spin_unlock_irq+0xe/0x11
<5>[ 9367.838751] [<ffffffffc02410d6>] snd_hda_parse_hdmi_codec+0x8c5/0x13f4 [snd_hda_codec_hdmi]
<5>[ 9367.838778] [<ffffffff8704b6b9>] process_one_work+0x182/0x2bd
<5>[ 9367.838802] [<ffffffff8704c9a5>] worker_thread+0x143/0x202
<5>[ 9367.838825] [<ffffffff8704c862>] ? rescuer_thread+0x2c3/0x2c3
<5>[ 9367.838848] [<ffffffff8705186c>] kthread+0xc0/0xc8
<5>[ 9367.838871] [<ffffffff870517ac>] ? __kthread_parkme+0x6b/0x6b
<5>[ 9367.838893] [<ffffffff87523c5c>] ret_from_fork+0x7c/0xb0
<5>[ 9367.838915] [<ffffffff870517ac>] ? __kthread_parkme+0x6b/0x6b


This query shows a bunch of them: https://goto.google.com/crefx


There are a few hda patches upstream which may or may not be related, e.g.

commit 664c715573c2c116c2d8f5de7d59ce85a98a1751
Author: Takashi Iwai <tiwai@suse.de>
Date:   Wed Apr 8 11:43:14 2015 +0200

    ALSA: hda - Work around races of power up/down with runtime PM


I don't see any reports of this on non-Google hardware.  We have some local CHROMIUM patches in that driver which may be causing its behavior to diverge from standard 3.10 behavior:

commit 0cc1b72687c553f9443bd627148f83974164e294
Author: zhuo-hao <zhuo-hao.lee@intel.com>
Date:   Fri Sep 4 11:13:05 2015 +0800

    CHROMIUM: Workaround for race between i915 and HDA resume


commit c2ae349cfb79bb73665f88f49748d0b3f3739389
Author: U. Artie Eoff <ullysses.a.eoff@intel.com>
Date:   Mon Nov 17 11:51:03 2014 -0800

    CHROMIUM: ALSA: hda - skip PM ops if async probe is not finished

 
upload_file_kcrash-8afdb44500000000.kcrash
101 KB Download

Comment 1 by dgreid@chromium.org, Nov 24 2016

Owner: chinyue@chromium.org
Status: Assigned (was: Untriaged)
From the attached log, there was an error before system went to S3:

<6>[ 3178.943260] call 0000:00:02.0+ returned 0 after 111063 usecs
<4>[ 3182.880461] snd_hda_intel 0000:00:1b.0: azx_get_response timeout, switching to polling mode: last cmd=0x201f0500
<4>[ 3183.882273] snd_hda_intel 0000:00:1b.0: No response from codec, disabling MSI: last cmd=0x201f0500
<3>[ 3184.884088] hda_intel: azx_get_response timeout, switching to single_cmd mode: last cmd=0x201f0500
<6>[ 3184.884191] snd_hda_intel 0000:00:1b.0: get_response timeout: IRS=0x1
<6>[ 3184.884287] snd_hda_intel 0000:00:1b.0: send_cmd timeout: IRS=0x1, val=0x20270503
<6>[ 3184.884383] snd_hda_intel 0000:00:1b.0: send_cmd timeout: IRS=0x1, val=0x20370503
<6>[ 3184.884478] snd_hda_intel 0000:00:1b.0: send_cmd timeout: IRS=0x1, val=0x20470503
<6>[ 3184.884573] snd_hda_intel 0000:00:1b.0: send_cmd timeout: IRS=0x1, val=0x20570503
<6>[ 3184.884668] snd_hda_intel 0000:00:1b.0: send_cmd timeout: IRS=0x1, val=0x20670503
<6>[ 3184.884763] snd_hda_intel 0000:00:1b.0: send_cmd timeout: IRS=0x1, val=0x201f0500
<6>[ 3184.884859] snd_hda_intel 0000:00:1b.0: send_cmd timeout: IRS=0x1, val=0x20170503
<6>[ 3184.884954] snd_hda_intel 0000:00:1b.0: send_cmd timeout: IRS=0x1, val=0x20270503
<6>[ 3184.885049] snd_hda_intel 0000:00:1b.0: send_cmd timeout: IRS=0x1, val=0x20370503
<6>[ 3184.914115] call 0000:00:1b.0+ returned 0 after 5937332 usecs
<6>[ 3184.914145] PM: suspend of devices complete after 6244.388 msecs

It shouldn't related to my patch "commit 0cc1b72687c553f9443bd627148f83974164e294" because that patch fixed the race between i915 and HDA on resume path.
However, i wonder if the similar fix should be done on suspend path as well. Do you know the reproduce steps to trigger this bug? 
> From the attached log, there was an error before system went to S3:

Good catch, I checked a couple of other reports and see the same azx_get_response timeout in those logs.

> Do you know the reproduce steps to trigger this bug?

Unfortunately I only know about it from our crash reporting interface.  We have a program that classifies incoming kernel crash reports as "hung_tasks" if they contain a message of the form "INFO: task <X> blocked for more than <Y> seconds."  A significant percentage of the crashes in the hung_tasks bucket look like the crash in the OP.

Maybe it would be possible to reproduce it by running continuous suspend/resume loops on one of the 3.10/BayTrail Chromebooks?  I did not try.
Owner: cychiang@chromium.org

Sign in to add a comment