CFM reports freezes |
||||||||||||||
Issue descriptionA CFM customer is reporting freezes and while it could be related to b/31390397 I'd like some help understanding if this crash could be pointing to something else. This is from feedbackid: 18517977534 2016-09-28T12:34:13.279016-04:00 WARNING kernel: [117121.795085] ------------[ cut here ]------------ 2016-09-28T12:34:13.279030-04:00 WARNING kernel: [117121.795102] WARNING: CPU: 0 PID: 29996 at /mnt/host/source/src/third_party/kernel/v3.14/drivers/media/v4l2-core/videobuf2-core.c:2051 __vb2_queue_cancel+0x58/0x13e() 2016-09-28T12:34:13.279036-04:00 NOTICE kernel: [117121.795116] Modules linked in: ctr ccm evdi uinput i2c_dev cmac rfcomm snd_soc_sst_bdw_rt5650_mach memc_x86 x86_pkg_temp_thermal iio_trig_sysfs snd_hda_codec_hdmi zram snd_hda_intel snd_soc_sst_haswell_pcm snd_hda_controller snd_soc_sst_dsp snd_hda_codec snd_soc_rt5645 snd_hwdep snd_soc_sst_acpi snd_soc_rl6231 fuse nf_conntrack_ipv6 nf_defrag_ipv6 ip6table_filter ip6_tables iwlmvm btusb btbcm btintel bluetooth r8169 mii iwlwifi iwl7000_mac80211 cfg80211 uvcvideo videobuf2_vmalloc joydev cros_ec_accel kfifo_buf industrialio snd_seq_midi snd_seq_midi_event snd_rawmidi snd_seq snd_seq_device ppp_async ppp_generic slhc tun 2016-09-28T12:34:13.279038-04:00 NOTICE kernel: [117121.795235] CPU: 0 PID: 29996 Comm: V4L2DecoderThre Tainted: G W 3.14.0 #1 2016-09-28T12:34:13.279039-04:00 NOTICE kernel: [117121.795244] Hardware name: GOOGLE Buddy, BIOS Google_Buddy.6301.202.5 01/09/2016 2016-09-28T12:34:13.279041-04:00 NOTICE kernel: [117121.795252] 0000000000000000 00000000a841cb2a ffff880133801c30 ffffffffad797573 2016-09-28T12:34:13.279042-04:00 NOTICE kernel: [117121.795264] 0000000000000000 ffff880133801c68 ffffffffad23df6b ffffffffad6193e9 2016-09-28T12:34:13.279044-04:00 NOTICE kernel: [117121.795276] ffff88004203b9f8 0000000000000000 ffff88004203b9f8 ffff8800784e50f8 2016-09-28T12:34:13.279045-04:00 NOTICE kernel: [117121.795288] Call Trace: 2016-09-28T12:34:13.279048-04:00 NOTICE kernel: [117121.795298] [<ffffffffad797573>] dump_stack+0x4d/0x6f 2016-09-28T12:34:13.279049-04:00 NOTICE kernel: [117121.795307] [<ffffffffad23df6b>] warn_slowpath_common+0x7f/0x98 2016-09-28T12:34:13.279051-04:00 NOTICE kernel: [117121.795317] [<ffffffffad6193e9>] ? __vb2_queue_cancel+0x58/0x13e 2016-09-28T12:34:13.279052-04:00 NOTICE kernel: [117121.795326] [<ffffffffad23e07d>] warn_slowpath_null+0x1a/0x1c 2016-09-28T12:34:13.279054-04:00 NOTICE kernel: [117121.795335] [<ffffffffad6193e9>] __vb2_queue_cancel+0x58/0x13e 2016-09-28T12:34:13.279055-04:00 NOTICE kernel: [117121.795344] [<ffffffffad619563>] vb2_internal_streamoff+0x31/0x56 2016-09-28T12:34:13.279057-04:00 NOTICE kernel: [117121.795353] [<ffffffffad6195c6>] vb2_streamoff+0x3e/0x45 2016-09-28T12:34:13.279058-04:00 NOTICE kernel: [117121.795360] [<ffffffffad622145>] go2001_streamoff+0x68/0x6f 2016-09-28T12:34:13.279060-04:00 NOTICE kernel: [117121.795369] [<ffffffffad60a6ac>] v4l_streamoff+0x1a/0x1c 2016-09-28T12:34:13.279062-04:00 NOTICE kernel: [117121.795378] [<ffffffffad60e223>] __video_do_ioctl+0x1a7/0x28a 2016-09-28T12:34:13.279063-04:00 NOTICE kernel: [117121.795387] [<ffffffffad79b793>] ? _raw_spin_unlock+0xe/0x21 2016-09-28T12:34:13.279065-04:00 NOTICE kernel: [117121.795396] [<ffffffffad60de53>] video_usercopy+0x2ae/0x4c0 2016-09-28T12:34:13.279066-04:00 NOTICE kernel: [117121.795405] [<ffffffffad60e07c>] ? video_ioctl2+0x17/0x17 2016-09-28T12:34:13.279068-04:00 NOTICE kernel: [117121.795414] [<ffffffffad60e07a>] video_ioctl2+0x15/0x17 2016-09-28T12:34:13.279069-04:00 NOTICE kernel: [117121.795422] [<ffffffffad60962d>] v4l2_ioctl+0x79/0x100 2016-09-28T12:34:13.279071-04:00 NOTICE kernel: [117121.795430] [<ffffffffad31c708>] do_vfs_ioctl+0x355/0x416 2016-09-28T12:34:13.279072-04:00 NOTICE kernel: [117121.795438] [<ffffffffad324ec7>] ? __fget+0x6f/0x79 2016-09-28T12:34:13.279074-04:00 NOTICE kernel: [117121.795446] [<ffffffffad31c820>] SyS_ioctl+0x57/0x79 2016-09-28T12:34:13.279075-04:00 NOTICE kernel: [117121.795454] [<ffffffffad79c95d>] tracesys+0xda/0xdf 2016-09-28T12:34:13.279076-04:00 WARNING kernel: [117121.795461] ---[ end trace 2b23523791b2ecb9 ]--- 2016-09-28T12:34:14.037462-04:00 ERR kernel: [117122.553678] go2001 0000:03:00.0: go2001_dec_g_fmt_cap:1605 Frame info not available yet 2016-09-28T12:34:14.037481-04:00 ERR kernel: [117122.554700] go2001 0000:03:00.0: go2001_process_reply:1437 Error -5 for reply [28:9] type=0x107, status=0x2 2016-09-28T12:34:14.037485-04:00 ERR kernel: [117122.554723] go2001 0000:03:00.0: go2001_ctx_error_locked:76 Setting error state for ctx ffff88012a52b800, session_id 28 2016-09-28T12:34:14.071756-04:00 WARNING kernel: [117122.588937] ------------[ cut here ]------------
,
Sep 28 2016
This is from M52 8350.68.0 / 52.0.2743.116 https://feedback.corp.google.com/product/208/neutron?lView=rd&lReport=18517977534
,
Sep 28 2016
,
Sep 28 2016
,
Sep 28 2016
,
Sep 28 2016
There are a lot media related errors in the feedback, looks related to video freeze. I wonder whether we could get some help with webrtc team to take a look. [1:386:0927/090613:WARNING:mediasession.cc(328)] Duplicate id found. Reassigning from 101 to 127 [1:387:0927/090613:WARNING:stunport.cc(393)] Jingle:Port[0x14a88a13000:audio:<IPv6: 4>Net[any:0:0:0:x:x:x:x:x/0:Unknown]]: StunPort: stun host lookup received error 0 ... [1:387:0927/090614:WARNING:remote_estimator_proxy.cc(48)] RemoteEstimatorProxy: Incoming packet is missing the transport sequence number extension! ... [1:393:0927/090614:WARNING:rtp_packet_history.cc(185)] No match for getting seqNum 4527 ... [1:386:0927/090735:WARNING:statscollector.cc(978)] The SSRC 2994740199 is not associated with a receiving track ... [1:387:0927/111645:WARNING:channel.cc(742)] Got EWOULDBLOCK from socket. [1:386:0927/121445:WARNING:statscollector.cc(978)] The SSRC 2270371005 is not associated with a receiving track [1:12:0927/122957:WARNING:generic_decoder.cc(63)] Too many frames backed up in the decoder, dropping this one. ... [1:392:0927/131551:WARNING:rtp_rtcp_impl.cc(169)] Process: Timeout: No RTCP RR received. [1:392:0927/131551:WARNING:rtp_rtcp_impl.cc(169)] Process: Timeout: No RTCP RR received. [1:392:0927/131551:WARNING:rtp_rtcp_impl.cc(171)] Process: Timeout: No increase in RTCP RR extended highest sequence number. [1:392:0927/131551:WARNING:rtp_rtcp_impl.cc(171)] Process: Timeout: No increase in RTCP RR extended highest sequence number. ... [1:392:0928/002100:WARNING:rtp_rtcp_impl.cc(171)] Process: Timeout: No increase in RTCP RR extended highest sequence number. [1:392:0928/002107:WARNING:rtp_rtcp_impl.cc(169)] Process: Timeout: No RTCP RR received. ... [1500:29444:0928/122605:ERROR:v4l2_video_decode_accelerator.cc(1280)] EnqueueOutputRecord(): ioctl() failed: VIDIOC_QBUF: Input/output error [1500:29444:0928/122605:ERROR:v4l2_video_decode_accelerator.cc(1280)] Setting error state:4 [1:12:0928/122605:ERROR:rtc_video_decoder.cc(519)] VDA Error:4 [1:1154:0928/122605:ERROR:rtc_video_decoder.cc(181)] Decoding error occurred. [1:1154:0928/122605:WARNING:generic_decoder.cc(158)] Failed to decode frame with timestamp 2406167750, error code: -1 [1:1154:0928/122606:WARNING:generic_decoder.cc(158)] Failed to decode frame with timestamp 2406170949, error code: -1 ... [1:12:0928/123418:ERROR:rtc_video_decoder.cc(519)] VDA Error:4
,
Sep 28 2016
original comment suggests kepler driver crash? vmr@, posciak@ - can you take a look please?
,
Sep 28 2016
,
Sep 28 2016
I went through the logs. I don't think this issue is related to b/31390397 where there is a memory leak, since memory leak occurs in devices with SW decode as well: Linux, peppy, etc. We see HW decode problem in this one. Yet, HW decode problems might be happening because of First error comes in here: [1500:29444:0928/122605:ERROR:v4l2_video_decode_accelerator.cc(1280)] EnqueueOutputRecord(): ioctl() failed: VIDIOC_QBUF: Input/output error [1500:29444:0928/122605:ERROR:v4l2_video_decode_accelerator.cc(1280)] Setting error state:4 [1:12:0928/122605:ERROR:rtc_video_decoder.cc(519)] VDA Error:4 After that we try to reset decoder, but keep hitting the issue consistently. posciak@. wuchengli@ can you take a look?
,
Sep 29 2016
There were lots of go2001 driver errors. Eventually go2001 watchdog timed out and reset firmware.
[117124.739424] go2001 0000:03:00.0: go2001_dec_g_fmt_cap:1605 Frame info not available yet
[117124.778356] go2001 0000:03:00.0: go2001_process_reply:1414 VM_EVENT_LOG: 0:Debug.c:466: DbgAssert: system is asserted @ Memory.c[452] in MemPartFree: cur->addrHead.prev->next == &cur->addrHead!
[117124.778714] go2001 0000:03:00.0: go2001_process_reply:1406 FW ASSERT at Memory.c:452 in MemPartFree, executing cur->addrHead.prev->next == &cu
[117124.778744] go2001 0000:03:00.0: go2001_process_reply:1437 Error -5 for reply [0:0] type=0x200, status=0xa
[117126.748882] go2001 0000:03:00.0: go2001_watchdog:1459 Watchdog resetting firmware
[117126.748899] go2001 0000:03:00.0: go2001_ctx_error_locked:76 Setting error state for ctx ffff880175745000, session_id 2
[117126.748921] go2001 0000:03:00.0: go2001_ctx_error_locked:76 Setting error state for ctx ffff8801789fc800, session_id 9
[117126.748938] go2001 0000:03:00.0: go2001_ctx_error_locked:76 Setting error state for ctx ffff8801789fd000, session_id 10
[117126.748955] go2001 0000:03:00.0: go2001_ctx_error_locked:76 Setting error state for ctx ffff880077fdd800, session_id 11
[117126.748974] go2001 0000:03:00.0: go2001_ctx_error_locked:76 Setting error state for ctx ffff880076063000, session_id 12
[117126.748986] go2001 0000:03:00.0: go2001_ctx_error_locked:76 Setting error state for ctx ffff8800779c4000, session_id 13
[117126.749007] go2001 0000:03:00.0: go2001_ctx_error_locked:76 Setting error state for ctx ffff880077e2e800, session_id 14
[117126.749027] go2001 0000:03:00.0: go2001_ctx_error_locked:76 Setting error state for ctx ffff880077e88800, session_id 15
[117126.749045] go2001 0000:03:00.0: go2001_ctx_error_locked:76 Setting error state for ctx ffff880055300000, session_id 16
[117126.749065] go2001 0000:03:00.0: go2001_ctx_error_locked:76 Setting error state for ctx ffff880152f04800, session_id 17
[117126.749084] go2001 0000:03:00.0: go2001_ctx_error_locked:76 Setting error state for ctx ffff880152e0b000, session_id 18
[117126.749103] go2001 0000:03:00.0: go2001_ctx_error_locked:76 Setting error state for ctx ffff8800452b6800, session_id 19
[117126.749121] go2001 0000:03:00.0: go2001_ctx_error_locked:76 Setting error state for ctx ffff88014302f800, session_id 20
[117126.749139] go2001 0000:03:00.0: go2001_ctx_error_locked:76 Setting error state for ctx ffff8800452b1800, session_id 21
[117126.749158] go2001 0000:03:00.0: go2001_ctx_error_locked:76 Setting error state for ctx ffff8800452b6000, session_id 22
[117126.749177] go2001 0000:03:00.0: go2001_ctx_error_locked:76 Setting error state for ctx ffff88014bdf0800, session_id 24
[117126.749199] go2001 0000:03:00.0: go2001_ctx_error_locked:76 Setting error state for ctx ffff8800452b7800, session_id 26
[117126.749211] go2001 0000:03:00.0: go2001_ctx_error_locked:76 Setting error state for ctx ffff880045199000, session_id 33
[117127.022986] go2001 0000:03:00.0: GO2001 ver: 1/-1383285373, VP8 decoder: 1731285080/393248 VP8 encoder: 1211189264/1046000, VP9 decoder: 1731330392/0
[117127.364167] go2001 0000:03:00.0: go2001_dec_g_fmt_cap:1605 Frame info not available yet
What were the repro steps and how easy it was to reproduce? I don't think we can fix this for M52.
,
Sep 29 2016
Bumping up in priority. It looks like this might be causing major problems for Hangouts.
,
Sep 29 2016
,
Sep 29 2016
Just had a stumpy unit freeze in a larger call on M53 with ~10 participants
,
Sep 29 2016
,
Sep 29 2016
Re #13: Please file a separate issue with feedback logs because stumpy is an old Intel device and it uses different driver and video stack.
,
Sep 29 2016
RtcVideoDecoder falls back to software decode when hardware decoder reports errors. I wonder why RtcVideoDecoder didn't fallback and video froze.
,
Sep 29 2016
Re 15#: filed http://crbug.com/651474
,
Sep 30 2016
How reproducible is the go2001 error vs the other freeze issues? Would you be able to provide more feedback logs from other reproduced freezes please?
,
Oct 28 2016
I don't think we have seen more of this.
,
Nov 18 2016
for now. I don't think we can map it to an existing issue due to age. |
||||||||||||||
►
Sign in to add a comment |
||||||||||||||
Comment 1 by roy...@google.com
, Sep 28 2016