arc-networkd SIGABRT crash in arc_networkd::Manager::InitialSetup |
|||||||||||
Issue descriptionChrome OS Version: 8530.24.0 -> 8755.0.0 Chrome OS Platform: any cheets enabled How frequently does this problem reproduce? (Always, sometimes, hard to reproduce?) I see 9097 crashes (~200 / day) from 2016/07/21 (8530.24.0) to 09/22/16 (8755.0.0). What is the impact to the user, and is there a workaround? If so, what is it? Unknown. Please provide any additional information below. Attach a screen shot or log if possible. Example crash: https://crash.corp.google.com/browse?q=ReportID%3D%2767520a9e00000000%27 0xf742e636 (libc-2.19.so + 0x00016636 ) __libc_do_syscall 0xf743e313 (libc-2.19.so -raise.c:56 ) raise 0xf743f2ff (libc-2.19.so -abort.c:89 ) abort 0xf7669ef9 (libbase-core-395517.so -debugger_posix.cc:249 ) base::debug::BreakDebugger 0xf767da8d (libbase-core-395517.so -logging.cc:755 ) logging::LogMessage::~LogMessage 0xaac4233d (arc-networkd -manager.cc:52 ) arc_networkd::Manager::InitialSetup 0xaac42e6f (arc-networkd -bind_internal.h:186 ) base::internal::Invoker<base::IndexSequence<0u>, base::internal::BindState<base::internal::RunnableAdapter<void (arc_networkd::Manager::*)()>, void(arc_networkd::Manager*), base::WeakPtr<arc_networkd::Manager> >, base::internal::InvokeHelper<true, void, base::internal::RunnableAdapter<void (arc_networkd::Manager::*)()> >, void()>::Run 0xf766addd (libbase-core-395517.so -callback.h:397 ) base::debug::TaskAnnotator::RunTask 0xf7682d91 (libbase-core-395517.so -message_loop.cc:478 ) base::MessageLoop::RunTask 0xf768309d (libbase-core-395517.so -message_loop.cc:487 ) base::MessageLoop::DeferOrRunPendingTask 0xf7684269 (libbase-core-395517.so -message_loop.cc:642 ) base::MessageLoop::DoDelayedWork 0xf7685ce3 (libbase-core-395517.so -message_pump_libevent.cc:229 ) base::MessagePumpLibevent::Run 0xf76a0ea5 (libbase-core-395517.so -run_loop.cc:35 ) base::RunLoop::Run 0xf77380f3 (libbrillo-core-395517.so -base_message_loop.cc:212 ) brillo::BaseMessageLoop::Run 0xf770ec87 (libbrillo-core-395517.so -daemon.cc:29 ) brillo::Daemon::Run 0xaac3e313 (arc-networkd -main.cc:34 ) main 0xf742e307 (libc-2.19.so -libc-start.c:285 ) __libc_start_main 0xaac3e41b (arc-networkd + 0x0000441b ) _start 0xaac5adb3 (arc-networkd -elf-init.c:87 ) __libc_csu_init 0xf775f9df (ld-2.19.so + 0x0000b9df ) _dl_sort_fini 0xaac3e3e7 (arc-networkd -main.cc:35 ) main
,
Sep 22 2016
,
Sep 28 2016
Kevin, any update on this one? I'm experiencing this crash almost every 5 min on my Samus (M54)
,
Sep 28 2016
When you see this crash the Android container is not starting, correct?
,
Sep 29 2016
I can check next time it crashes .. I'm not seeing this that frequently any more
,
Oct 10 2016
Crash numbers are substantially reduced in last M54 builds but crash is still there https://crash.corp.google.com/browse?q=stable_signature%3D%27raise-105fbe2d%27%20AND%20product.version%20like%20%278743.%25%27&ignore_case=false&enable_rewrite=true&omit_field_name=&omit_field_value=&omit_field_opt=%3D#samplereports:5,productversion,magicsignature,stablesignature Crash does not seem to be new in M54 though (e.g. it is in M53 too) Kevin, are you still the right owner to evaluate next steps/fix?
,
Nov 10 2016
M55 still has the issue: go/crash/08d7c10500000000
,
Dec 6 2016
The following revision refers to this bug: https://chromium.googlesource.com/chromiumos/platform2/+/9f649d56e242c5606ec6ade4605ab5a81d079fb0 commit 9f649d56e242c5606ec6ade4605ab5a81d079fb0 Author: Kevin Cernekee <cernekee@chromium.org> Date: Sun Dec 04 04:37:07 2016 arc-networkd: Don't crash if rt_tables isn't populated right away If the user logs in with no active network connection, arc-networkd will check every few seconds to see if: 1) arc0 inside the container is up, and 2) rt_tables has an entry for arc0. (1) should happen in the absence of a LAN connection, but (2) will not, causing arc-networkd to time out and abort. Move the check for (2) into the Set() function, which runs a couple of seconds after the external LAN interface is up. BUG= chromium:649247 TEST=`tail -f /var/log/net.log` with no connection, wifi connection Change-Id: Ie2680c68810af9ff082a589e136ae98fbf7bd00c Reviewed-on: https://chromium-review.googlesource.com/416362 Commit-Ready: Kevin Cernekee <cernekee@chromium.org> Tested-by: Kevin Cernekee <cernekee@chromium.org> Reviewed-by: Mattias Nissler <mnissler@chromium.org> Reviewed-by: Kirtika Ruchandani <kirtika@chromium.org> [modify] https://crrev.com/9f649d56e242c5606ec6ade4605ab5a81d079fb0/arc-networkd/arc_ip_config.cc
,
Dec 6 2016
The following revision refers to this bug: https://chromium.googlesource.com/chromiumos/platform2/+/9e2e4de211e492c4462e7e6cc15fbda4f77932d6 commit 9e2e4de211e492c4462e7e6cc15fbda4f77932d6 Author: Kevin Cernekee <cernekee@chromium.org> Date: Sun Dec 04 04:49:42 2016 arc-networkd: Log a sane exit code if Minijail::RunSyncAndDestroy fails |status| is a status code from waitpid(), so it needs to be run through WEXITSTATUS in order to make it human-readable. BUG= chromium:649247 TEST=watch net.log when a command fails Change-Id: I545b04062b1b5177c10c7e7d810ad3dface80e7c Reviewed-on: https://chromium-review.googlesource.com/416363 Commit-Ready: Kevin Cernekee <cernekee@chromium.org> Tested-by: Kevin Cernekee <cernekee@chromium.org> Reviewed-by: Mattias Nissler <mnissler@chromium.org> Reviewed-by: Abhishek Bhardwaj <abhishekbh@google.com> Reviewed-by: Kirtika Ruchandani <kirtika@chromium.org> [modify] https://crrev.com/9e2e4de211e492c4462e7e6cc15fbda4f77932d6/arc-networkd/arc_ip_config.cc
,
Dec 6 2016
The following revision refers to this bug: https://chromium.googlesource.com/chromiumos/platform2/+/60007a42b43484f5d2090556389eaff75324b24e commit 60007a42b43484f5d2090556389eaff75324b24e Author: Kevin Cernekee <cernekee@chromium.org> Date: Sun Dec 04 05:04:59 2016 arc-networkd: Don't log expected `ip` command failures There are a couple of cases in which arc-networkd may race with the kernel or ARC in tearing down IPv6 configuration parameters. If these fail, do not send a warning to net.log as it isn't necessarily a sign that anything went wrong. BUG= chromium:649247 TEST=trigger the `ip` failures and watch net.log for results Change-Id: I6a89972a6c47956014a36b5c903939b35fc6ef3b Reviewed-on: https://chromium-review.googlesource.com/416364 Commit-Ready: Kevin Cernekee <cernekee@chromium.org> Tested-by: Kevin Cernekee <cernekee@chromium.org> Reviewed-by: Mattias Nissler <mnissler@chromium.org> Reviewed-by: Kirtika Ruchandani <kirtika@chromium.org> [modify] https://crrev.com/60007a42b43484f5d2090556389eaff75324b24e/arc-networkd/arc_ip_config.cc [modify] https://crrev.com/60007a42b43484f5d2090556389eaff75324b24e/arc-networkd/arc_ip_config.h
,
Dec 9 2016
The following revision refers to this bug: https://chromium.googlesource.com/chromiumos/platform2/+/14fff422cf8fc0f5e75faffedb7b888d65d6b5eb commit 14fff422cf8fc0f5e75faffedb7b888d65d6b5eb Author: Kevin Cernekee <cernekee@chromium.org> Date: Tue Dec 06 19:52:50 2016 arc-networkd: Log termination signals Currently the error handler only prints the exit code, but this will not provide any useful information if the process died due to a fatal signal. BUG= chromium:649247 TEST=compile-tested Change-Id: If7794d782ed726e2d7d1b1170fcf18645da69ba6 Signed-off-by: Kevin Cernekee <cernekee@chromium.org> Reviewed-on: https://chromium-review.googlesource.com/417136 [modify] https://crrev.com/14fff422cf8fc0f5e75faffedb7b888d65d6b5eb/arc-networkd/arc_ip_config.cc
,
Dec 9 2016
Are we merging this back to 56? 55?
,
Dec 9 2016
crosreview.com/416362 might be worth merging. I'd try it on M56 first and see if people are still hitting the LOG(FATAL).
,
Dec 19 2016
Any idea what version has CL:416362? The arc_networkd SIGABRTs are still pretty common in R56 (9000.29.0).
,
Dec 19 2016
Your change meets the bar and is auto-approved for M56 (branch: 2924)
,
Dec 19 2016
go/crosland claims that it landed in 9056.0.0. When I search for crash reports for minnie on >= 9056.0.0 I see: https://goto.google.com/peuhx A bunch of these crashes show OnSubprocessExited in the backtrace. This is the parent process crashing because the subprocess hit a fatal error. Those dumps don't tell us anything useful. But for each OnSubprocessExited crash there is usually a corresponding report from the subprocess that does have a useful backtrace. The ones that I see show a crash in ArcIpConfig::Set(), hitting the LOG(FATAL): // At this point, arc0 is up and the LAN interface has been up for several // seconds. If the routing table name has not yet been populated, // something really bad probably happened on the Android side. if (routing_table_id_ == kInvalidTableId) { routing_table_id_ = ReadTableId(con_ifname_); if (routing_table_id_ == kInvalidTableId) { LOG(FATAL) << "Could not look up routing table ID in " << kRoutingTableNames; } } Perhaps this means that there is a real problem in the Android container, or perhaps it means that arc-networkd is making a bad assumption about when the rt_tables file should be getting populated. It would be helpful to see the system logs in order to figure out whether this is only happening on the initial Android boot, if it is correlated with some other issue, or if it's an intermittent error that randomly affects otherwise-functioning systems.
,
Dec 23 2016
This issue has been approved for a merge. Please merge the fix to any appropriate branches as soon as possible! If all merges have been completed, please remove any remaining Merge-Approved labels from this issue. Thanks for your time! To disable nags, add the Disable-Nags label. For more details visit https://www.chromium.org/issue-tracking/autotriage - Your friendly Sheriffbot
,
Dec 26 2016
This issue has been approved for a merge. Please merge the fix to any appropriate branches as soon as possible! If all merges have been completed, please remove any remaining Merge-Approved labels from this issue. Thanks for your time! To disable nags, add the Disable-Nags label. For more details visit https://www.chromium.org/issue-tracking/autotriage - Your friendly Sheriffbot
,
Dec 28 2016
Reassigning to Abhishek per our offline conversation. Typical crash looks like: https://goto.google.com/uynic
,
Feb 9 2017
This issue hasn't been updated in the last 6 weeks, so removing its merge approval label. Please re-request a merge if needed. For more details visit https://www.chromium.org/issue-tracking/autotriage - Your friendly Sheriffbot
,
Mar 16 2018
Seeing these crashes daily from ChromeOS 65.0.3325.107 - looking at the stack and code, it seems that |bus_| must be null when InitialSetup() is executed. That is PostTask()'d from OnInit with a comment that it must execute only after |bus_| has been initialized, but no explanation as to why posting the task achieves that. See crashes 6fac68db42ade8a0, 8661a2365a8b475f, 16f100b1dcc835b1.
,
Mar 16 2018
Could you please file feedback next time you see it, and add @cernekee to the note? Logs would be very helpful.
,
May 21 2018
|
|||||||||||
►
Sign in to add a comment |
|||||||||||
Comment 1 by djkurtz@chromium.org
, Sep 22 2016