Fuchsia/x64 FYI bot reports "Success" but shows shard timed-out "exception" for net_unittests |
|
Issue descriptionRecent builds, e.g: https://ci.chromium.org/buildbot/chromium.fyi/Fuchsia/12344 are mostly reporting "Success", but also show a net_unittests shard timed-out: net_unittests net_unittests Run on OS: 'Ubuntu-14.04' Shard duration: 0:12:56.199630 stdout some shards did not complete: 0 swarming.summary step_metadata missing shard #0 shard #0 shard #0 isolated out Looking at the stdout from the shard, it seems that it did complete successfully. Perhaps the shard took longer than allowed, but was still regarded otherwise as having succeeded?
,
Dec 19 2017
This is the same as bug 796026 . Have a look at https://logs.chromium.org/v/?s=chromium%2Fbb%2Fchromium.fyi%2FFuchsia%2F12344%2F%2B%2Frecipes%2Fsteps%2Fnet_unittests%2F0%2Fstdout . There's no `fsync` but we need something like that before shutdown.
,
Dec 19 2017
That said, I'm not sure why the build is green in this case, that looks like it's a bug in the recipe, maybe, or in swarming.
,
Dec 19 2017
We're getting full stdio, but truncated JSON - does the recipe gather the pass/fail data from the JSON or from stdio?
,
Dec 19 2017
From the json. We ignore the stdio.
,
Dec 19 2017
It's from the JSON (because otherwise we wouldn't have even bothered generating the json in the first place, since exfiltrating it was quite a hassle).
,
Dec 19 2017
argh.
,
Dec 19 2017
Oh, there's /system/bin/sync. I'll try that.
,
Dec 20 2017
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/ef62139da55ec66c148556896ca3bf0b23b886a0 commit ef62139da55ec66c148556896ca3bf0b23b886a0 Author: Sergey Ulanov <sergeyu@chromium.org> Date: Wed Dec 20 21:50:36 2017 TestLauncher: fsync() summary JSON file In some instances Fuchsia's test runner script fails to extract test summary file. This change updates TestLauncher to call fsync(), which may fix the problem for Fuchsia. Bug: 796318 Change-Id: Idb802c5e3047b2205a2606bcaea3d31008bc4935 Reviewed-on: https://chromium-review.googlesource.com/835052 Commit-Queue: Sergey Ulanov <sergeyu@chromium.org> Reviewed-by: Wez <wez@chromium.org> Reviewed-by: Dirk Pranke <dpranke@chromium.org> Reviewed-by: Lei Zhang <thestig@chromium.org> Reviewed-by: Scott Graham <scottmg@chromium.org> Cr-Commit-Position: refs/heads/master@{#525476} [modify] https://crrev.com/ef62139da55ec66c148556896ca3bf0b23b886a0/base/test/launcher/test_results_tracker.cc [modify] https://crrev.com/ef62139da55ec66c148556896ca3bf0b23b886a0/build/fuchsia/runner_common.py
,
Dec 21 2017
The previous fix didn't solve the problem. I see the following error in the log: [00699.252] 03759.03785> SUCCESS: all tests passed. [00718.775] 03642.03669> VnodeMinfs::Sync Completion wait failure: -21 [00718.799] 03759.03785> [3:2004251015:1220/224602.520856:718799182:ERROR:test_launcher.cc(1159)] Failed to save test launcher output summary. fsync() times out if it doesn't complete in 15 seconds. Given that summary file for net_unittests is big (15MB), it may timeout when the underlying filesystem is slow. Possible solutions 1. Create minfs disk image under tmpfs on the host. Is /tmp mounted as tmpfs on swarming bots? 2. Call fsync() more than once to force it to sync the FS, even if it takes more that 15 seconds.
,
Dec 21 2017
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/d6a88b16c1c35715a0dfba78e5878b8a77c48dd7 commit d6a88b16c1c35715a0dfba78e5878b8a77c48dd7 Author: Sergey Ulanov <sergeyu@chromium.org> Date: Thu Dec 21 20:18:26 2017 [Fuchsia] Update test runner to use /tmp for output minfs image. /tmp is expected to perform better on GCE when used as a backing storage for minfs disk image, which should make fsync() less likely to timeout when writing output.json. Bug: 796318 Change-Id: I6a410d9b7284623ad5ad490225e4f489395a881d Reviewed-on: https://chromium-review.googlesource.com/838511 Commit-Queue: Sergey Ulanov <sergeyu@chromium.org> Reviewed-by: Wez <wez@chromium.org> Cr-Commit-Position: refs/heads/master@{#525784} [modify] https://crrev.com/d6a88b16c1c35715a0dfba78e5878b8a77c48dd7/build/fuchsia/runner_common.py
,
Dec 21 2017
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/11400a4b68ae249b919ba918d9d302ff463c9c63 commit 11400a4b68ae249b919ba918d9d302ff463c9c63 Author: Sergey Ulanov <sergeyu@chromium.org> Date: Thu Dec 21 21:44:13 2017 Revert "[Fuchsia] Update test runner to use /tmp for output minfs image." This reverts commit d6a88b16c1c35715a0dfba78e5878b8a77c48dd7. Reason for revert: net_unittests still fails. Original change's description: > [Fuchsia] Update test runner to use /tmp for output minfs image. > > /tmp is expected to perform better on GCE when used as a backing > storage for minfs disk image, which should make fsync() less > likely to timeout when writing output.json. > > Bug: 796318 > Change-Id: I6a410d9b7284623ad5ad490225e4f489395a881d > Reviewed-on: https://chromium-review.googlesource.com/838511 > Commit-Queue: Sergey Ulanov <sergeyu@chromium.org> > Reviewed-by: Wez <wez@chromium.org> > Cr-Commit-Position: refs/heads/master@{#525784} TBR=wez@chromium.org,sergeyu@chromium.org Change-Id: Ieb709b65af577712862b8538d3e33a2f8aed81ee No-Presubmit: true No-Tree-Checks: true No-Try: true Bug: 796318 Reviewed-on: https://chromium-review.googlesource.com/841242 Reviewed-by: Sergey Ulanov <sergeyu@chromium.org> Commit-Queue: Sergey Ulanov <sergeyu@chromium.org> Cr-Commit-Position: refs/heads/master@{#525816} [modify] https://crrev.com/11400a4b68ae249b919ba918d9d302ff463c9c63/build/fuchsia/runner_common.py
,
Dec 21 2017
Shall we try adding some limited number of Flush() calls, each with a warning logged to stdio? If we can confirm that that "fixes" things then the timings, and OS-level tracing, may help diagnose the issue.
,
Dec 22 2017
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/47148862c71e160dc651033322f5869c10039dd8 commit 47148862c71e160dc651033322f5869c10039dd8 Author: Sergey Ulanov <sergeyu@chromium.org> Date: Fri Dec 22 02:27:17 2017 Retry fsync() when writing test summary file. On Fuchsia fsync() times out after 15 seconds, which may not be enough. Retry fsync() multiple times - this will allow to verify that the issue is caused by disk IO being slow and not something else. Bug: 796318 Change-Id: I54f491e5f7b08bc765573a5eda0acee2f3ed31a7 Reviewed-on: https://chromium-review.googlesource.com/841626 Reviewed-by: Lei Zhang <thestig@chromium.org> Commit-Queue: Sergey Ulanov <sergeyu@chromium.org> Cr-Commit-Position: refs/heads/master@{#525905} [modify] https://crrev.com/47148862c71e160dc651033322f5869c10039dd8/base/test/launcher/test_results_tracker.cc
,
Dec 22 2017
net_unittests is green now, with the hack I landed above: https://ci.chromium.org/buildbot/chromium.fyi/Fuchsia/12441 Keeping this bug open to implement a better solution. We could switch to a virtio drive, which may provide better perf under nested VM. fsync() also shouldn't timeout (opened ZX-1513 to track that issue).
,
Dec 23 2017
Let's close this out and file a separate bug for improving the output summary gathering implementation?
,
Jan 3 2018
|
|
►
Sign in to add a comment |
|
Comment 1 by w...@chromium.org
, Dec 19 2017Owner: sergeyu@chromium.org