New issue
Advanced search Search tips

Issue 764567 link

Starred by 2 users

Issue metadata

Status: Fixed
Owner:
Closed: Sep 2017
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: Fuchsia
Pri: 3
Type: Bug

Blocking:
issue 738275



Sign in to add a comment

Hang at end of tests on Fuchsia

Project Member Reported by scottmg@chromium.org, Sep 13 2017

Issue description

Since I switched to netruncmd 'dm poweroff', the tests sometimes hang at shutdown (when that command can't connect for whatever reason)

This happened in https://luci-logdog.appspot.com/v/?s=chromium%2Fbb%2Fchromium.fyi%2FFuchsia%2F9371%2F%2B%2Frecipes%2Fsteps%2Fmedia_unittests%2F0%2Fstdout

I've reproduced this some times locally, but I'm not sure of the cause yet. We know the test run is done at this point, so if the .json copy succeeds, perhaps we should just force terminate qemu and still call that success.
 
Cc: sergeyu@chromium.org
Oh, yuck, looking back to compare logs, I think it might just fail when the user mode 192.168.3.9 NIC that we use for the testserver comes up before the tap NIC. :( Or it could be when it's added as /dev/class/ethernet/000 rather than /dev/class/ethernet/001 as seems to "normally" happen.

+sergeyu as this could easily be causing net_unittests problems.
Here's a working log:

[00001.704] 02364.02454> netstack: started
[00001.706] 02364.02454> netstack: socket dispatcher started
[00001.707] 02364.02454> netstack: watching for ethernet devices
[00001.726] 01044.01047> devcoord: launch devhost 'devhost:pci#1:1234:1111': pid=2694
[00001.727] 01044.01047> devcoord: launch devhost 'devhost:pci#2:8086:100e': pid=2740
[00001.728] 01044.01047> devcoord: launch devhost 'devhost:pci#3:8086:100e': pid=2792
[00001.729] 02740.02770> eth: using legacy irq mode
[00001.729] 01044.01047> devcoord: launch devhost 'devhost:pci#5:8086:2922': pid=2915
[00001.730] 02740.02770> eth: mac: 52:54:00:63:5e:7a
[00001.731] 02740.02770> eth: iomem @0x7f81ccc54000 (phys 43365000)
[00001.731] 02694.02718> fb: 1024 x 768 (stride=1024 pxlsz=2 format=1): 1572864 bytes @ 0x4f4f873f4000 SW
[00001.731] 02740.02770> eth: intel-ethernet online
[00001.731] 02792.02853> eth: using legacy irq mode
[00001.732] 01243.01297> netifc: ? /dev/class/ethernet/000
[00001.732] 02792.02853> eth: mac: 52:54:00:63:5e:7b
[00001.732] 01243.01297> netifc: create 128 eth buffers
[00001.733] 02792.02853> eth: iomem @0x7bda936eb000 (phys 433ed000)
[00001.734] 02792.02853> eth: intel-ethernet online
[00001.734] 01243.01297> macaddr: 52:54:00:63:5e:7a
[00001.734] 01243.01297> ip6addr: fe80::5054:4d:fe63:5e7a
[00001.734] 01243.01297> snmaddr: ff02::1:ff63:5e7a
[00001.734] 01243.01297> netsvc: nodename='step-atom-yard-juicy'
[00001.734] 01243.01297> netsvc: start
[00001.734] 02364.02656> netstack: linkaddr: 52:54:00:63:5e:7a
[00001.735] 02915.02940> object 0xffffff803d522e48 base 0xb00fa000 size 0x1000 ref 1
[00001.735] 02364.02656> netstack: NIC 2 added using ethernet device "/dev/class/ethernet/000"
[00001.735] 02364.02656> netstack: NIC 2: ipv6addr: fe80::5054:ff:fe63:5e7a
[00001.736] 02364.02656> netstack: linkaddr: 52:54:00:63:5e:7b
[00001.736] 02364.02656> netstack: NIC 3 added using ethernet device "/dev/class/ethernet/001"
[00001.736] 02364.02656> netstack: NIC 3: ipv6addr: fe80::5054:ff:fe63:5e7b
[00001.768] 01044.01154> devmgr: launch /boot/bin/sh (sh:console) OK
$ [00001.774] 01312.01399> vc: new input device /dev/class/input/000
[00001.775] 01312.01399> vc: new input device /dev/class/input/001
[00001.826] 02331.02392> IMPORTANT DEBUGGING NOTE: batches of tests are run inside their
[00001.826] 02331.02392> own process. For debugging a test inside a debugger, use the
[00001.826] 02331.02392> --gtest_filter=<your_test_name> flag along with
[00001.826] 02331.02392> --single-process-tests.
[00001.826] 02331.02392> Using sharding settings from environment. This is shard 0/1
[00001.826] 02331.02392> Using 4 parallel jobs.
[00002.145] 02331.02392> [1/1] CacheStorageManagerTest.GetAllOriginsUsageWithOldIndex (37 ms)
[00002.145] 02331.02392> SUCCESS: all tests passed.
[00002.407] 02331.02392> [1/1] CacheStorageManagerTest.GetAllOriginsUsageWithOldIndex (37 ms)
[00002.407] 02331.02392> SUCCESS: all tests passed.
[00002.737] 02331.02392> [1/1] CacheStorageManagerTest.GetAllOriginsUsageWithOldIndex (40 ms)
[00002.738] 02331.02392> SUCCESS: all tests passed.
[00003.037] 02331.02392> [1/1] CacheStorageManagerTest.GetAllOriginsUsageWithOldIndex (44 ms)
[00003.037] 02331.02392> SUCCESS: all tests passed.
[00003.370] 02331.02392> [1/1] CacheStorageManagerTest.GetAllOriginsUsageWithOldIndex (35 ms)
[00003.370] 02331.02392> SUCCESS: all tests passed.
[00003.659] 02331.02392> [1/1] CacheStorageManagerTest.GetAllOriginsUsageWithOldIndex (50 ms)
[00003.659] 02331.02392> SUCCESS: all tests passed.
[00003.986] 02331.02392> [1/1] CacheStorageManagerTest.GetAllOriginsUsageWithOldIndex (37 ms)
[00003.986] 02331.02392> SUCCESS: all tests passed.
[00004.326] 02331.02392> [1/1] CacheStorageManagerTest.GetAllOriginsUsageWithOldIndex (38 ms)
[00004.326] 02331.02392> SUCCESS: all tests passed.
[00004.619] 02331.02392> [1/1] CacheStorageManagerTest.GetAllOriginsUsageWithOldIndex (39 ms)
[00004.619] 02331.02392> SUCCESS: all tests passed.
[00004.738] 02364.03597> netstack: NIC 3: DHCP acquired IP 192.168.3.9 for 24h0m0s
[00004.738] 02364.03597> netstack: NIC 3: DNS servers: [192.168.3.3]
...
[00011.061] 02331.02392> Summary of all test iterations:
[00011.061] 02331.02392> End of the summary.
[00011.087] 02331.02392> Tests took 9 seconds.
Retrieving results and starting shutdown...
fe80::5054:4d:fe63:5e7a%qemu
read 289399 bytes
Result retrieval complete.



Here's a failure:

[00001.623] 02342.02371> netstack: started
[00001.625] 02342.02371> netstack: socket dispatcher started
[00001.628] 02342.02371> netstack: watching for ethernet devices
[00001.652] 01044.01047> devcoord: launch devhost 'devhost:pci#1:1234:1111': pid=2691
[00001.653] 01044.01047> devcoord: launch devhost 'devhost:pci#2:8086:100e': pid=2751
[00001.654] 01044.01047> devcoord: launch devhost 'devhost:pci#3:8086:100e': pid=2803
[00001.655] 01044.01047> devcoord: launch devhost 'devhost:pci#5:8086:2922': pid=2873
[00001.657] 02691.02714> fb: 1024 x 768 (stride=1024 pxlsz=2 format=1): 1572864 bytes @ 0x4f4f873f4000 SW
[00001.657] 02873.02901> object 0xffffff803d5201d8 base 0xb00fa000 size 0x1000 ref 1
[00001.657] 02803.02856> eth: using legacy irq mode
[00001.658] 02803.02856> eth: mac: 52:54:00:63:5e:7b
[00001.658] 02751.02786> eth: using legacy irq mode
[00001.660] 02803.02856> eth: iomem @0x189a741da000 (phys 437ba000)
[00001.660] 02751.02786> eth: mac: 52:54:00:63:5e:7a
[00001.662] 02751.02786> eth: iomem @0x7f81ccc54000 (phys 437e5000)
[00001.662] 02803.02856> eth: intel-ethernet online
[00001.662] 01249.01334> netifc: ? /dev/class/ethernet/000
[00001.662] 02751.02786> eth: intel-ethernet online
[00001.662] 01249.01334> netifc: create 128 eth buffers
[00001.664] 01249.01334> macaddr: 52:54:00:63:5e:7b
[00001.664] 01249.01334> ip6addr: fe80::5054:4d:fe63:5e7b
[00001.664] 01249.01334> snmaddr: ff02::1:ff63:5e7b
[00001.664] 01249.01334> netsvc: nodename='step-dime-yard-juicy'
[00001.664] 01249.01334> netsvc: start
[00001.665] 02342.02371> netstack: linkaddr: 52:54:00:63:5e:7b
[00001.665] 02342.02371> netstack: NIC 2 added using ethernet device "/dev/class/ethernet/000"
[00001.666] 02342.02371> netstack: NIC 2: ipv6addr: fe80::5054:ff:fe63:5e7b
[00001.666] 02342.03129> netstack: NIC 2: DHCP acquired IP 192.168.3.9 for 24h0m0s
[00001.666] 02342.03129> netstack: NIC 2: DNS servers: [192.168.3.3]
[00001.667] 02342.02371> netstack: linkaddr: 52:54:00:63:5e:7a
[00001.667] 02342.02371> netstack: NIC 3 added using ethernet device "/dev/class/ethernet/001"
[00001.667] 02342.02371> netstack: NIC 3: ipv6addr: fe80::5054:ff:fe63:5e7a
[00001.694] 01044.01154> devmgr: launch /boot/bin/sh (sh:console) OK
$ [00001.702] 01347.01424> vc: new input device /dev/class/input/000
[00001.702] 01347.01424> vc: new input device /dev/class/input/001
[00001.755] 02378.02428> IMPORTANT DEBUGGING NOTE: batches of tests are run inside their
[00001.755] 02378.02428> own process. For debugging a test inside a debugger, use the
[00001.755] 02378.02428> --gtest_filter=<your_test_name> flag along with
[00001.755] 02378.02428> --single-process-tests.
[00001.755] 02378.02428> Using sharding settings from environment. This is shard 0/1
[00001.755] 02378.02428> Using 4 parallel jobs.
[00001.991] 02378.02428> [1/1] CacheStorageManagerTest.GetAllOriginsUsageWithOldIndex (30 ms)
[00001.991] 02378.02428> SUCCESS: all tests passed.
[00002.323] 02378.02428> [1/1] CacheStorageManagerTest.GetAllOriginsUsageWithOldIndex (41 ms)
[00002.324] 02378.02428> SUCCESS: all tests passed.
[00002.663] 02378.02428> [1/1] CacheStorageManagerTest.GetAllOriginsUsageWithOldIndex (56 ms)
[00002.663] 02378.02428> SUCCESS: all tests passed.
[00002.976] 02378.02428> [1/1] CacheStorageManagerTest.GetAllOriginsUsageWithOldIndex (42 ms)
[00002.976] 02378.02428> SUCCESS: all tests passed.
[00003.229] 02378.02428> [1/1] CacheStorageManagerTest.GetAllOriginsUsageWithOldIndex (35 ms)
[00003.229] 02378.02428> SUCCESS: all tests passed.
[00003.483] 02378.02428> [1/1] CacheStorageManagerTest.GetAllOriginsUsageWithOldIndex (39 ms)
[00003.483] 02378.02428> SUCCESS: all tests passed.
[00003.781] 02378.02428> [1/1] CacheStorageManagerTest.GetAllOriginsUsageWithOldIndex (39 ms)
[00003.781] 02378.02428> SUCCESS: all tests passed.
...
[00011.702] 02378.02428> SUCCESS: all tests passed.
[00011.703] 02378.02428> Summary of all test iterations:
[00011.703] 02378.02428> End of the summary.
[00011.737] 02378.02428> Tests took 9 seconds.
Retrieving results and starting shutdown...
Failed to discover 
/work/cr/src/third_party/fuchsia-sdk/tools/netcp: lookup of  timed out
/work/cr/src/third_party/fuchsia-sdk/tools/netruncmd: lookup timed out
Result retrieval complete.
^CTraceback (most recent call last):
  File "/work/cr/src/build/fuchsia/test_runner.py", line 208, in <module>
    sys.exit(main())
  File "/work/cr/src/build/fuchsia/test_runner.py", line 199, in main
    args.test_launcher_summary_output)
  File "/work/cr/src/build/fuchsia/runner_common.py", line 503, in RunFuchsia
    qemu_popen.wait()
  File "/usr/lib/python2.7/subprocess.py", line 1392, in wait
    pid, sts = _eintr_retry_call(os.waitpid, self.pid, 0)
  File "/usr/lib/python2.7/subprocess.py", line 476, in _eintr_retry_call
    return func(*args)
KeyboardInterrupt
qemu-system-x86_64: terminating on signal 2


I think it must be related to the :7b mac being discovered before the :7a one in the fails. Super flaky!
Project Member

Comment 3 by bugdroid1@chromium.org, Sep 14 2017

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/b80bdac0e830ce759d9a04d816b75a40ea85853e

commit b80bdac0e830ce759d9a04d816b75a40ea85853e
Author: Scott Graham <scottmg@chromium.org>
Date: Thu Sep 14 23:32:54 2017

fuchsia: Use minfs disk image for results extraction

SSH, virtio, hostfwd, tun/tap and many other options are either flaky,
require sudo, are excruciatingly slow, or some combination of these.

While writing to a disk and extracting the file on the host is a bit
silly, it's also fast and reliable, and doesn't require host
configuration or elevation.

Bug:  763992 , 761182 , 764567 
Change-Id: Ia66c54ba0879b9edb9725b074d489b58d7415948
Reviewed-on: https://chromium-review.googlesource.com/667867
Reviewed-by: Wez <wez@chromium.org>
Commit-Queue: Scott Graham <scottmg@chromium.org>
Cr-Commit-Position: refs/heads/master@{#502100}
[modify] https://crrev.com/b80bdac0e830ce759d9a04d816b75a40ea85853e/build/fuchsia/exe_runner.py
[modify] https://crrev.com/b80bdac0e830ce759d9a04d816b75a40ea85853e/build/fuchsia/runner_common.py
[modify] https://crrev.com/b80bdac0e830ce759d9a04d816b75a40ea85853e/build/fuchsia/test_runner.py

Status: Fixed (was: Started)
We're back to only the one nic that net_unittests uses, so this shouldn't be a problem any more.

Sign in to add a comment