Hang at end of tests on Fuchsia |
||
Issue descriptionSince I switched to netruncmd 'dm poweroff', the tests sometimes hang at shutdown (when that command can't connect for whatever reason) This happened in https://luci-logdog.appspot.com/v/?s=chromium%2Fbb%2Fchromium.fyi%2FFuchsia%2F9371%2F%2B%2Frecipes%2Fsteps%2Fmedia_unittests%2F0%2Fstdout I've reproduced this some times locally, but I'm not sure of the cause yet. We know the test run is done at this point, so if the .json copy succeeds, perhaps we should just force terminate qemu and still call that success.
,
Sep 13 2017
Here's a working log:
[00001.704] 02364.02454> netstack: started
[00001.706] 02364.02454> netstack: socket dispatcher started
[00001.707] 02364.02454> netstack: watching for ethernet devices
[00001.726] 01044.01047> devcoord: launch devhost 'devhost:pci#1:1234:1111': pid=2694
[00001.727] 01044.01047> devcoord: launch devhost 'devhost:pci#2:8086:100e': pid=2740
[00001.728] 01044.01047> devcoord: launch devhost 'devhost:pci#3:8086:100e': pid=2792
[00001.729] 02740.02770> eth: using legacy irq mode
[00001.729] 01044.01047> devcoord: launch devhost 'devhost:pci#5:8086:2922': pid=2915
[00001.730] 02740.02770> eth: mac: 52:54:00:63:5e:7a
[00001.731] 02740.02770> eth: iomem @0x7f81ccc54000 (phys 43365000)
[00001.731] 02694.02718> fb: 1024 x 768 (stride=1024 pxlsz=2 format=1): 1572864 bytes @ 0x4f4f873f4000 SW
[00001.731] 02740.02770> eth: intel-ethernet online
[00001.731] 02792.02853> eth: using legacy irq mode
[00001.732] 01243.01297> netifc: ? /dev/class/ethernet/000
[00001.732] 02792.02853> eth: mac: 52:54:00:63:5e:7b
[00001.732] 01243.01297> netifc: create 128 eth buffers
[00001.733] 02792.02853> eth: iomem @0x7bda936eb000 (phys 433ed000)
[00001.734] 02792.02853> eth: intel-ethernet online
[00001.734] 01243.01297> macaddr: 52:54:00:63:5e:7a
[00001.734] 01243.01297> ip6addr: fe80::5054:4d:fe63:5e7a
[00001.734] 01243.01297> snmaddr: ff02::1:ff63:5e7a
[00001.734] 01243.01297> netsvc: nodename='step-atom-yard-juicy'
[00001.734] 01243.01297> netsvc: start
[00001.734] 02364.02656> netstack: linkaddr: 52:54:00:63:5e:7a
[00001.735] 02915.02940> object 0xffffff803d522e48 base 0xb00fa000 size 0x1000 ref 1
[00001.735] 02364.02656> netstack: NIC 2 added using ethernet device "/dev/class/ethernet/000"
[00001.735] 02364.02656> netstack: NIC 2: ipv6addr: fe80::5054:ff:fe63:5e7a
[00001.736] 02364.02656> netstack: linkaddr: 52:54:00:63:5e:7b
[00001.736] 02364.02656> netstack: NIC 3 added using ethernet device "/dev/class/ethernet/001"
[00001.736] 02364.02656> netstack: NIC 3: ipv6addr: fe80::5054:ff:fe63:5e7b
[00001.768] 01044.01154> devmgr: launch /boot/bin/sh (sh:console) OK
$ [00001.774] 01312.01399> vc: new input device /dev/class/input/000
[00001.775] 01312.01399> vc: new input device /dev/class/input/001
[00001.826] 02331.02392> IMPORTANT DEBUGGING NOTE: batches of tests are run inside their
[00001.826] 02331.02392> own process. For debugging a test inside a debugger, use the
[00001.826] 02331.02392> --gtest_filter=<your_test_name> flag along with
[00001.826] 02331.02392> --single-process-tests.
[00001.826] 02331.02392> Using sharding settings from environment. This is shard 0/1
[00001.826] 02331.02392> Using 4 parallel jobs.
[00002.145] 02331.02392> [1/1] CacheStorageManagerTest.GetAllOriginsUsageWithOldIndex (37 ms)
[00002.145] 02331.02392> SUCCESS: all tests passed.
[00002.407] 02331.02392> [1/1] CacheStorageManagerTest.GetAllOriginsUsageWithOldIndex (37 ms)
[00002.407] 02331.02392> SUCCESS: all tests passed.
[00002.737] 02331.02392> [1/1] CacheStorageManagerTest.GetAllOriginsUsageWithOldIndex (40 ms)
[00002.738] 02331.02392> SUCCESS: all tests passed.
[00003.037] 02331.02392> [1/1] CacheStorageManagerTest.GetAllOriginsUsageWithOldIndex (44 ms)
[00003.037] 02331.02392> SUCCESS: all tests passed.
[00003.370] 02331.02392> [1/1] CacheStorageManagerTest.GetAllOriginsUsageWithOldIndex (35 ms)
[00003.370] 02331.02392> SUCCESS: all tests passed.
[00003.659] 02331.02392> [1/1] CacheStorageManagerTest.GetAllOriginsUsageWithOldIndex (50 ms)
[00003.659] 02331.02392> SUCCESS: all tests passed.
[00003.986] 02331.02392> [1/1] CacheStorageManagerTest.GetAllOriginsUsageWithOldIndex (37 ms)
[00003.986] 02331.02392> SUCCESS: all tests passed.
[00004.326] 02331.02392> [1/1] CacheStorageManagerTest.GetAllOriginsUsageWithOldIndex (38 ms)
[00004.326] 02331.02392> SUCCESS: all tests passed.
[00004.619] 02331.02392> [1/1] CacheStorageManagerTest.GetAllOriginsUsageWithOldIndex (39 ms)
[00004.619] 02331.02392> SUCCESS: all tests passed.
[00004.738] 02364.03597> netstack: NIC 3: DHCP acquired IP 192.168.3.9 for 24h0m0s
[00004.738] 02364.03597> netstack: NIC 3: DNS servers: [192.168.3.3]
...
[00011.061] 02331.02392> Summary of all test iterations:
[00011.061] 02331.02392> End of the summary.
[00011.087] 02331.02392> Tests took 9 seconds.
Retrieving results and starting shutdown...
fe80::5054:4d:fe63:5e7a%qemu
read 289399 bytes
Result retrieval complete.
Here's a failure:
[00001.623] 02342.02371> netstack: started
[00001.625] 02342.02371> netstack: socket dispatcher started
[00001.628] 02342.02371> netstack: watching for ethernet devices
[00001.652] 01044.01047> devcoord: launch devhost 'devhost:pci#1:1234:1111': pid=2691
[00001.653] 01044.01047> devcoord: launch devhost 'devhost:pci#2:8086:100e': pid=2751
[00001.654] 01044.01047> devcoord: launch devhost 'devhost:pci#3:8086:100e': pid=2803
[00001.655] 01044.01047> devcoord: launch devhost 'devhost:pci#5:8086:2922': pid=2873
[00001.657] 02691.02714> fb: 1024 x 768 (stride=1024 pxlsz=2 format=1): 1572864 bytes @ 0x4f4f873f4000 SW
[00001.657] 02873.02901> object 0xffffff803d5201d8 base 0xb00fa000 size 0x1000 ref 1
[00001.657] 02803.02856> eth: using legacy irq mode
[00001.658] 02803.02856> eth: mac: 52:54:00:63:5e:7b
[00001.658] 02751.02786> eth: using legacy irq mode
[00001.660] 02803.02856> eth: iomem @0x189a741da000 (phys 437ba000)
[00001.660] 02751.02786> eth: mac: 52:54:00:63:5e:7a
[00001.662] 02751.02786> eth: iomem @0x7f81ccc54000 (phys 437e5000)
[00001.662] 02803.02856> eth: intel-ethernet online
[00001.662] 01249.01334> netifc: ? /dev/class/ethernet/000
[00001.662] 02751.02786> eth: intel-ethernet online
[00001.662] 01249.01334> netifc: create 128 eth buffers
[00001.664] 01249.01334> macaddr: 52:54:00:63:5e:7b
[00001.664] 01249.01334> ip6addr: fe80::5054:4d:fe63:5e7b
[00001.664] 01249.01334> snmaddr: ff02::1:ff63:5e7b
[00001.664] 01249.01334> netsvc: nodename='step-dime-yard-juicy'
[00001.664] 01249.01334> netsvc: start
[00001.665] 02342.02371> netstack: linkaddr: 52:54:00:63:5e:7b
[00001.665] 02342.02371> netstack: NIC 2 added using ethernet device "/dev/class/ethernet/000"
[00001.666] 02342.02371> netstack: NIC 2: ipv6addr: fe80::5054:ff:fe63:5e7b
[00001.666] 02342.03129> netstack: NIC 2: DHCP acquired IP 192.168.3.9 for 24h0m0s
[00001.666] 02342.03129> netstack: NIC 2: DNS servers: [192.168.3.3]
[00001.667] 02342.02371> netstack: linkaddr: 52:54:00:63:5e:7a
[00001.667] 02342.02371> netstack: NIC 3 added using ethernet device "/dev/class/ethernet/001"
[00001.667] 02342.02371> netstack: NIC 3: ipv6addr: fe80::5054:ff:fe63:5e7a
[00001.694] 01044.01154> devmgr: launch /boot/bin/sh (sh:console) OK
$ [00001.702] 01347.01424> vc: new input device /dev/class/input/000
[00001.702] 01347.01424> vc: new input device /dev/class/input/001
[00001.755] 02378.02428> IMPORTANT DEBUGGING NOTE: batches of tests are run inside their
[00001.755] 02378.02428> own process. For debugging a test inside a debugger, use the
[00001.755] 02378.02428> --gtest_filter=<your_test_name> flag along with
[00001.755] 02378.02428> --single-process-tests.
[00001.755] 02378.02428> Using sharding settings from environment. This is shard 0/1
[00001.755] 02378.02428> Using 4 parallel jobs.
[00001.991] 02378.02428> [1/1] CacheStorageManagerTest.GetAllOriginsUsageWithOldIndex (30 ms)
[00001.991] 02378.02428> SUCCESS: all tests passed.
[00002.323] 02378.02428> [1/1] CacheStorageManagerTest.GetAllOriginsUsageWithOldIndex (41 ms)
[00002.324] 02378.02428> SUCCESS: all tests passed.
[00002.663] 02378.02428> [1/1] CacheStorageManagerTest.GetAllOriginsUsageWithOldIndex (56 ms)
[00002.663] 02378.02428> SUCCESS: all tests passed.
[00002.976] 02378.02428> [1/1] CacheStorageManagerTest.GetAllOriginsUsageWithOldIndex (42 ms)
[00002.976] 02378.02428> SUCCESS: all tests passed.
[00003.229] 02378.02428> [1/1] CacheStorageManagerTest.GetAllOriginsUsageWithOldIndex (35 ms)
[00003.229] 02378.02428> SUCCESS: all tests passed.
[00003.483] 02378.02428> [1/1] CacheStorageManagerTest.GetAllOriginsUsageWithOldIndex (39 ms)
[00003.483] 02378.02428> SUCCESS: all tests passed.
[00003.781] 02378.02428> [1/1] CacheStorageManagerTest.GetAllOriginsUsageWithOldIndex (39 ms)
[00003.781] 02378.02428> SUCCESS: all tests passed.
...
[00011.702] 02378.02428> SUCCESS: all tests passed.
[00011.703] 02378.02428> Summary of all test iterations:
[00011.703] 02378.02428> End of the summary.
[00011.737] 02378.02428> Tests took 9 seconds.
Retrieving results and starting shutdown...
Failed to discover
/work/cr/src/third_party/fuchsia-sdk/tools/netcp: lookup of timed out
/work/cr/src/third_party/fuchsia-sdk/tools/netruncmd: lookup timed out
Result retrieval complete.
^CTraceback (most recent call last):
File "/work/cr/src/build/fuchsia/test_runner.py", line 208, in <module>
sys.exit(main())
File "/work/cr/src/build/fuchsia/test_runner.py", line 199, in main
args.test_launcher_summary_output)
File "/work/cr/src/build/fuchsia/runner_common.py", line 503, in RunFuchsia
qemu_popen.wait()
File "/usr/lib/python2.7/subprocess.py", line 1392, in wait
pid, sts = _eintr_retry_call(os.waitpid, self.pid, 0)
File "/usr/lib/python2.7/subprocess.py", line 476, in _eintr_retry_call
return func(*args)
KeyboardInterrupt
qemu-system-x86_64: terminating on signal 2
I think it must be related to the :7b mac being discovered before the :7a one in the fails. Super flaky!
,
Sep 14 2017
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/b80bdac0e830ce759d9a04d816b75a40ea85853e commit b80bdac0e830ce759d9a04d816b75a40ea85853e Author: Scott Graham <scottmg@chromium.org> Date: Thu Sep 14 23:32:54 2017 fuchsia: Use minfs disk image for results extraction SSH, virtio, hostfwd, tun/tap and many other options are either flaky, require sudo, are excruciatingly slow, or some combination of these. While writing to a disk and extracting the file on the host is a bit silly, it's also fast and reliable, and doesn't require host configuration or elevation. Bug: 763992 , 761182 , 764567 Change-Id: Ia66c54ba0879b9edb9725b074d489b58d7415948 Reviewed-on: https://chromium-review.googlesource.com/667867 Reviewed-by: Wez <wez@chromium.org> Commit-Queue: Scott Graham <scottmg@chromium.org> Cr-Commit-Position: refs/heads/master@{#502100} [modify] https://crrev.com/b80bdac0e830ce759d9a04d816b75a40ea85853e/build/fuchsia/exe_runner.py [modify] https://crrev.com/b80bdac0e830ce759d9a04d816b75a40ea85853e/build/fuchsia/runner_common.py [modify] https://crrev.com/b80bdac0e830ce759d9a04d816b75a40ea85853e/build/fuchsia/test_runner.py
,
Sep 14 2017
We're back to only the one nic that net_unittests uses, so this shouldn't be a problem any more. |
||
►
Sign in to add a comment |
||
Comment 1 by scottmg@chromium.org
, Sep 13 2017