New issue
Advanced search Search tips
Note: Color blocks (like or ) mean that a user may not be available. Tooltip shows the reason.

Issue 649247 link

Starred by 4 users

Issue metadata

Status: Duplicate
Owner:
Closed: May 2018
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: Chrome
Pri: 1
Type: Bug



Sign in to add a comment

arc-networkd SIGABRT crash in arc_networkd::Manager::InitialSetup

Project Member Reported by djkurtz@chromium.org, Sep 22 2016

Issue description

Chrome OS Version: 8530.24.0 -> 8755.0.0
Chrome OS Platform: any cheets enabled

How frequently does this problem reproduce? (Always, sometimes, hard to
reproduce?)

I see 9097 crashes (~200 / day) from 2016/07/21 (8530.24.0) to 09/22/16 (8755.0.0).

What is the impact to the user, and is there a workaround? If so, what is
it?

Unknown.

Please provide any additional information below. Attach a screen shot or
log if possible.

Example crash:
https://crash.corp.google.com/browse?q=ReportID%3D%2767520a9e00000000%27

0xf742e636	(libc-2.19.so + 0x00016636 )	__libc_do_syscall
0xf743e313	(libc-2.19.so -raise.c:56 )	raise
0xf743f2ff	(libc-2.19.so -abort.c:89 )	abort
0xf7669ef9	(libbase-core-395517.so -debugger_posix.cc:249 )	base::debug::BreakDebugger
0xf767da8d	(libbase-core-395517.so -logging.cc:755 )	logging::LogMessage::~LogMessage
0xaac4233d	(arc-networkd -manager.cc:52 )	arc_networkd::Manager::InitialSetup
0xaac42e6f	(arc-networkd -bind_internal.h:186 )	base::internal::Invoker<base::IndexSequence<0u>, base::internal::BindState<base::internal::RunnableAdapter<void (arc_networkd::Manager::*)()>, void(arc_networkd::Manager*), base::WeakPtr<arc_networkd::Manager> >, base::internal::InvokeHelper<true, void, base::internal::RunnableAdapter<void (arc_networkd::Manager::*)()> >, void()>::Run
0xf766addd	(libbase-core-395517.so -callback.h:397 )	base::debug::TaskAnnotator::RunTask
0xf7682d91	(libbase-core-395517.so -message_loop.cc:478 )	base::MessageLoop::RunTask
0xf768309d	(libbase-core-395517.so -message_loop.cc:487 )	base::MessageLoop::DeferOrRunPendingTask
0xf7684269	(libbase-core-395517.so -message_loop.cc:642 )	base::MessageLoop::DoDelayedWork
0xf7685ce3	(libbase-core-395517.so -message_pump_libevent.cc:229 )	base::MessagePumpLibevent::Run
0xf76a0ea5	(libbase-core-395517.so -run_loop.cc:35 )	base::RunLoop::Run
0xf77380f3	(libbrillo-core-395517.so -base_message_loop.cc:212 )	brillo::BaseMessageLoop::Run
0xf770ec87	(libbrillo-core-395517.so -daemon.cc:29 )	brillo::Daemon::Run
0xaac3e313	(arc-networkd -main.cc:34 )	main
0xf742e307	(libc-2.19.so -libc-start.c:285 )	__libc_start_main
0xaac3e41b	(arc-networkd + 0x0000441b )	_start
0xaac5adb3	(arc-networkd -elf-init.c:87 )	__libc_csu_init
0xf775f9df	(ld-2.19.so + 0x0000b9df )	_dl_sort_fini
0xaac3e3e7	(arc-networkd -main.cc:35 )	main
 

Comment 2 by snanda@chromium.org, Sep 22 2016

Labels: -Pri-3 Pri-1
Owner: cernekee@chromium.org

Comment 3 by josa...@google.com, Sep 28 2016

Labels: M-54 ReleaseBlock-Beta
Kevin, any update on this one? 

I'm experiencing this crash almost every 5 min on my Samus (M54)
When you see this crash the Android container is not starting, correct?

Comment 5 by josa...@google.com, Sep 29 2016

Labels: -ReleaseBlock-Beta ReleaseBlock-Stable
I can check next time it crashes .. I'm not seeing this that frequently any more
Labels: -ReleaseBlock-Stable
Crash numbers are substantially reduced in last M54 builds but crash is still there 
https://crash.corp.google.com/browse?q=stable_signature%3D%27raise-105fbe2d%27%20AND%20product.version%20like%20%278743.%25%27&ignore_case=false&enable_rewrite=true&omit_field_name=&omit_field_value=&omit_field_opt=%3D#samplereports:5,productversion,magicsignature,stablesignature

Crash does not seem to be new in M54 though (e.g. it is in M53 too)

Kevin, are you still the right owner to evaluate next steps/fix?

M55 still has the issue: go/crash/08d7c10500000000
Project Member

Comment 8 by bugdroid1@chromium.org, Dec 6 2016

The following revision refers to this bug:
  https://chromium.googlesource.com/chromiumos/platform2/+/9f649d56e242c5606ec6ade4605ab5a81d079fb0

commit 9f649d56e242c5606ec6ade4605ab5a81d079fb0
Author: Kevin Cernekee <cernekee@chromium.org>
Date: Sun Dec 04 04:37:07 2016

arc-networkd: Don't crash if rt_tables isn't populated right away

If the user logs in with no active network connection, arc-networkd
will check every few seconds to see if: 1) arc0 inside the container is
up, and 2) rt_tables has an entry for arc0.  (1) should happen in the
absence of a LAN connection, but (2) will not, causing arc-networkd
to time out and abort.

Move the check for (2) into the Set() function, which runs a couple
of seconds after the external LAN interface is up.

BUG= chromium:649247 
TEST=`tail -f /var/log/net.log` with no connection, wifi connection

Change-Id: Ie2680c68810af9ff082a589e136ae98fbf7bd00c
Reviewed-on: https://chromium-review.googlesource.com/416362
Commit-Ready: Kevin Cernekee <cernekee@chromium.org>
Tested-by: Kevin Cernekee <cernekee@chromium.org>
Reviewed-by: Mattias Nissler <mnissler@chromium.org>
Reviewed-by: Kirtika Ruchandani <kirtika@chromium.org>

[modify] https://crrev.com/9f649d56e242c5606ec6ade4605ab5a81d079fb0/arc-networkd/arc_ip_config.cc

Project Member

Comment 9 by bugdroid1@chromium.org, Dec 6 2016

The following revision refers to this bug:
  https://chromium.googlesource.com/chromiumos/platform2/+/9e2e4de211e492c4462e7e6cc15fbda4f77932d6

commit 9e2e4de211e492c4462e7e6cc15fbda4f77932d6
Author: Kevin Cernekee <cernekee@chromium.org>
Date: Sun Dec 04 04:49:42 2016

arc-networkd: Log a sane exit code if Minijail::RunSyncAndDestroy fails

|status| is a status code from waitpid(), so it needs to be run through
WEXITSTATUS in order to make it human-readable.

BUG= chromium:649247 
TEST=watch net.log when a command fails

Change-Id: I545b04062b1b5177c10c7e7d810ad3dface80e7c
Reviewed-on: https://chromium-review.googlesource.com/416363
Commit-Ready: Kevin Cernekee <cernekee@chromium.org>
Tested-by: Kevin Cernekee <cernekee@chromium.org>
Reviewed-by: Mattias Nissler <mnissler@chromium.org>
Reviewed-by: Abhishek Bhardwaj <abhishekbh@google.com>
Reviewed-by: Kirtika Ruchandani <kirtika@chromium.org>

[modify] https://crrev.com/9e2e4de211e492c4462e7e6cc15fbda4f77932d6/arc-networkd/arc_ip_config.cc

Project Member

Comment 10 by bugdroid1@chromium.org, Dec 6 2016

The following revision refers to this bug:
  https://chromium.googlesource.com/chromiumos/platform2/+/60007a42b43484f5d2090556389eaff75324b24e

commit 60007a42b43484f5d2090556389eaff75324b24e
Author: Kevin Cernekee <cernekee@chromium.org>
Date: Sun Dec 04 05:04:59 2016

arc-networkd: Don't log expected `ip` command failures

There are a couple of cases in which arc-networkd may race with the
kernel or ARC in tearing down IPv6 configuration parameters.  If these
fail, do not send a warning to net.log as it isn't necessarily a sign
that anything went wrong.

BUG= chromium:649247 
TEST=trigger the `ip` failures and watch net.log for results

Change-Id: I6a89972a6c47956014a36b5c903939b35fc6ef3b
Reviewed-on: https://chromium-review.googlesource.com/416364
Commit-Ready: Kevin Cernekee <cernekee@chromium.org>
Tested-by: Kevin Cernekee <cernekee@chromium.org>
Reviewed-by: Mattias Nissler <mnissler@chromium.org>
Reviewed-by: Kirtika Ruchandani <kirtika@chromium.org>

[modify] https://crrev.com/60007a42b43484f5d2090556389eaff75324b24e/arc-networkd/arc_ip_config.cc
[modify] https://crrev.com/60007a42b43484f5d2090556389eaff75324b24e/arc-networkd/arc_ip_config.h

Project Member

Comment 11 by bugdroid1@chromium.org, Dec 9 2016

The following revision refers to this bug:
  https://chromium.googlesource.com/chromiumos/platform2/+/14fff422cf8fc0f5e75faffedb7b888d65d6b5eb

commit 14fff422cf8fc0f5e75faffedb7b888d65d6b5eb
Author: Kevin Cernekee <cernekee@chromium.org>
Date: Tue Dec 06 19:52:50 2016

arc-networkd: Log termination signals

Currently the error handler only prints the exit code, but this will
not provide any useful information if the process died due to a fatal
signal.

BUG= chromium:649247 
TEST=compile-tested

Change-Id: If7794d782ed726e2d7d1b1170fcf18645da69ba6
Signed-off-by: Kevin Cernekee <cernekee@chromium.org>
Reviewed-on: https://chromium-review.googlesource.com/417136

[modify] https://crrev.com/14fff422cf8fc0f5e75faffedb7b888d65d6b5eb/arc-networkd/arc_ip_config.cc

Are we merging this back to 56? 55?

crosreview.com/416362 might be worth merging.  I'd try it on M56 first and see if people are still hitting the LOG(FATAL).
Labels: Merge-Request-56
Any idea what version has CL:416362?
The arc_networkd SIGABRTs are still pretty common in R56 (9000.29.0).

Comment 15 by dimu@chromium.org, Dec 19 2016

Labels: -Merge-Request-56 Merge-Approved-56 Hotlist-Merge-Approved
Your change meets the bar and is auto-approved for M56 (branch: 2924)
go/crosland claims that it landed in 9056.0.0.

When I search for crash reports for minnie on >= 9056.0.0 I see:

https://goto.google.com/peuhx

A bunch of these crashes show OnSubprocessExited in the backtrace.  This is the parent process crashing because the subprocess hit a fatal error.  Those dumps don't tell us anything useful.

But for each OnSubprocessExited crash there is usually a corresponding report from the subprocess that does have a useful backtrace.  The ones that I see show a crash in ArcIpConfig::Set(), hitting the LOG(FATAL):

  // At this point, arc0 is up and the LAN interface has been up for several
  // seconds.  If the routing table name has not yet been populated,
  // something really bad probably happened on the Android side.
  if (routing_table_id_ == kInvalidTableId) {
    routing_table_id_ = ReadTableId(con_ifname_);
    if (routing_table_id_ == kInvalidTableId) {
      LOG(FATAL) << "Could not look up routing table ID in "
                 << kRoutingTableNames;
    }
  }

Perhaps this means that there is a real problem in the Android container, or perhaps it means that arc-networkd is making a bad assumption about when the rt_tables file should be getting populated.

It would be helpful to see the system logs in order to figure out whether this is only happening on the initial Android boot, if it is correlated with some other issue, or if it's an intermittent error that randomly affects otherwise-functioning systems.
Project Member

Comment 17 by sheriffbot@chromium.org, Dec 23 2016

This issue has been approved for a merge. Please merge the fix to any appropriate branches as soon as possible!

If all merges have been completed, please remove any remaining Merge-Approved labels from this issue.

Thanks for your time! To disable nags, add the Disable-Nags label.

For more details visit https://www.chromium.org/issue-tracking/autotriage - Your friendly Sheriffbot
Project Member

Comment 18 by sheriffbot@chromium.org, Dec 26 2016

This issue has been approved for a merge. Please merge the fix to any appropriate branches as soon as possible!

If all merges have been completed, please remove any remaining Merge-Approved labels from this issue.

Thanks for your time! To disable nags, add the Disable-Nags label.

For more details visit https://www.chromium.org/issue-tracking/autotriage - Your friendly Sheriffbot
Owner: abhishekbh@google.com
Status: Assigned (was: Available)
Reassigning to Abhishek per our offline conversation.

Typical crash looks like: https://goto.google.com/uynic
Project Member

Comment 20 by sheriffbot@chromium.org, Feb 9 2017

Labels: -Merge-Approved-56
This issue hasn't been updated in the last 6 weeks, so removing its merge approval label. Please re-request a merge if needed.

For more details visit https://www.chromium.org/issue-tracking/autotriage - Your friendly Sheriffbot

Comment 21 by w...@chromium.org, Mar 16 2018

Cc: igo@chromium.org
Labels: M-65
Seeing these crashes daily from ChromeOS  65.0.3325.107 - looking at the stack and code, it seems that |bus_| must be null when InitialSetup() is executed. That is PostTask()'d from OnInit with a comment that it must execute only after |bus_| has been initialized, but no explanation as to why posting the task achieves that.

See crashes 6fac68db42ade8a0, 8661a2365a8b475f, 16f100b1dcc835b1.
Could you please file feedback next time you see it, and add @cernekee to the note?  Logs would be very helpful.
Mergedinto: 825365
Status: Duplicate (was: Assigned)

Sign in to add a comment