Tast test bundle panicked after writing test error to closed channel |
|
Issue descriptionIn the setzer-release #2291 run at http://stainless/browse/chromeos-autotest-results/208618800-chromeos-test/, I think that a bug in Tast's test-running code resulted in a panic after the arc.IntentForward test timed out that caused the test bundle process to abort without running other tests. Summarizing the log: 03:25:33 arc.IntentForward starts 03:26:34 arc.IntentForward reports "Failed to open a web page" error 03:26:35 arc.IntentForward reports "Failed to open Downloads" error 03:26:35 test times out and Test.Run reports error: "test.go:99: Test timed out: context deadline exceeded" 03:26:36 platform.CheckProcesses starts 03:26:36 platform.CheckProcesses ends 03:26:36 power.CheckStatus starts 03:26:36 bundle crashes with "exit status 2 (panic: send on closed channel [recovered] ..." Here's the actual error: panic: send on closed channel goroutine 21 [running]: chromiumos/tast/testing.(*State).Errorf(0xc4200b0f00, 0x5d4b94753593, 0x9, 0xc42005fb80, 0x1, 0x1) /build/setzer/tmp/portage/chromeos-base/tast-local-tests-cros-0.0.1-r76/work/tast-local-tests-cros-0.0.1/tast-base/src/chromiumos/tast/testing/state.go:125 +0xf3 chromiumos/tast/testing.(*Test).Run.func2.1(0xc4200b0f00, 0xc420158070) /build/setzer/tmp/portage/chromeos-base/tast-local-tests-cros-0.0.1-r76/work/tast-local-tests-cros-0.0.1/tast-base/src/chromiumos/tast/testing/test.go:88 +0xb5 panic(0x5d4b948bb9c0, 0x5d4b94972d20) /usr/lib/go/x86_64-cros-linux-gnu/src/runtime/panic.go:502 +0x22d chromiumos/tast/testing.(*State).Logf(0xc4200b0f00, 0x5d4b9475a373, 0x15, 0xc42005fe40, 0x1, 0x1) /build/setzer/tmp/portage/chromeos-base/tast-local-tests-cros-0.0.1-r76/work/tast-local-tests-cros-0.0.1/tast-base/src/chromiumos/tast/testing/state.go:111 +0xfa chromiumos/tast/testing.ContextLogf(0x5d4b94979720, 0xc42006a2c0, 0x5d4b9475a373, 0x15, 0xc42005fe40, 0x1, 0x1) /build/setzer/tmp/portage/chromeos-base/tast-local-tests-cros-0.0.1-r76/work/tast-local-tests-cros-0.0.1/tast-base/src/chromiumos/tast/testing/state.go:181 +0xce chromiumos/tast/local/bundles/cros/arc.IntentForward.func1(0x5d4b94979720, 0xc42006a2c0, 0xc42013e300, 0x5d4b9476ceb9, 0x3d, 0x0, 0x0) /build/setzer/tmp/portage/chromeos-base/tast-local-tests-cros-0.0.1-r76/work/tast-local-tests-cros-0.0.1/src/chromiumos/tast/local/bundles/cros/arc/intent_forward.go:51 +0x120 chromiumos/tast/local/bundles/cros/arc.IntentForward(0xc4200b0f00) /build/setzer/tmp/portage/chromeos-base/tast-local-tests-cros-0.0.1-r76/work/tast-local-tests-cros-0.0.1/src/chromiumos/tast/local/bundles/cros/arc/intent_forward.go:77 +0x4fc chromiumos/tast/testing.(*Test).Run.func2(0xc4200b0f00, 0xc420158070, 0xc420142870) /build/setzer/tmp/portage/chromeos-base/tast-local-tests-cros-0.0.1-r76/work/tast-local-tests-cros-0.0.1/tast-base/src/chromiumos/tast/testing/test.go:92 +0x66 created by chromiumos/tast/testing.(*Test).Run /build/setzer/tmp/portage/chromeos-base/tast-local-tests-cros-0.0.1-r76/work/tast-local-tests-cros-0.0.1/tast-base/src/chromiumos/tast/testing/test.go:85 +0xb8) (status 5) arc.IntentForward called testing.State.Logf after its output channel had been closed by runTest() in chromiumos/tast/bundle/bundle.go. Writing to a closed channel causes a panic, which crashed the entire bundle here. Test.Run() contains the following to try to recover from panics that occur while a test is running: // Tests call runtime.Goexit() to make the current goroutine exit immediately // (after running defer blocks) on failure. done := make(chan bool, 1) go func() { defer func() { if r := recover(); r != nil { s.Errorf("Panic: %v", r) } done <- true }() tst.Func(s) }() select { case <-done: // The goroutine running the test finished. case <-s.ctx.Done(): s.Errorf("Test timed out: %v", s.ctx.Err()) // TODO(derat): Might need to make sure it's dead somehow... } I think that that worked here, but the recovery code itself calls Errorf(), which results in another write to the channel, which causes another panic, which has nobody to catch it now... :-( I suspect that Test.Run should probably instead be responsible for closing the channel that it's using (I remember wavering on this, not sure if there were cases where I'd want to reuse the same channel across multiple tests). I don't think that that would solve this, though, since the test function is running asynchronously here and hasn't necessarily stopped when Test.Run() returns (as happened here). The deeper problem (which I'm aware of; see the TODO above) is that I don't have a good way to kill tests that ignore their deadlines right now. Tests run in goroutines, and there's no way to kill a goroutine. This is probably mostly harmless in this case, but it could be bad if a test continued running and did things that stepped on later tests' toes. The big hammer would be for the test bundle to fork-exec itself to run each test. It can be sure that a test is dead if it does this, but it'll result in additional overhead -- each remote test will need to reestablish the SSH connection, for instance. That might not be a huge deal. It *is* possible to exit at any point in a goroutine by calling runtime.Goexit() (https://golang.org/pkg/runtime/#Goexit) -- this is what testing.State.Fatal() does, for example. It might be reasonable for State.Context() (and functions in other test support packages?) to make the test abort if it continues using its context after it's hit its deadline. To start out with, I'll probably fix the panic-during-recovery bug in Test.Run() and add some light measures to try to abort rogue tests. Running each test in its own process is still on the table, but I'd rather not focus on the hostile-test use case -- if we end up with lots of runaway tests, we probably need to be more careful in how we write tests. :-)
,
Jun 17 2018
The following revision refers to this bug: https://chromium.googlesource.com/chromiumos/platform/tast/+/46ea2fc66ffb17fd23402bf5b82feb304abe5960 commit 46ea2fc66ffb17fd23402bf5b82feb304abe5960 Author: Daniel Erat <derat@chromium.org> Date: Sun Jun 17 00:09:08 2018 tast: Fix panic when logging panic from runaway test. Fix a subtle issue where testing.Test.Run's code to recover from an in-test panic could write to a closed output channel, resulting in an unhandled panic that would bring down the entire test bundle. BUG= chromium:853406 TEST=added a unit test Change-Id: I315b1e404b76b4ca56642ce7b2ef1821db84c068 Reviewed-on: https://chromium-review.googlesource.com/1103672 Commit-Ready: Dan Erat <derat@chromium.org> Tested-by: Dan Erat <derat@chromium.org> Reviewed-by: Jason Clinton <jclinton@chromium.org> [modify] https://crrev.com/46ea2fc66ffb17fd23402bf5b82feb304abe5960/src/chromiumos/tast/testing/test_test.go [modify] https://crrev.com/46ea2fc66ffb17fd23402bf5b82feb304abe5960/src/chromiumos/tast/testing/test.go [modify] https://crrev.com/46ea2fc66ffb17fd23402bf5b82feb304abe5960/src/chromiumos/tast/testing/state.go [modify] https://crrev.com/46ea2fc66ffb17fd23402bf5b82feb304abe5960/src/chromiumos/tast/bundle/bundle.go
,
Jun 22 2018
The following revision refers to this bug: https://chromium.googlesource.com/chromiumos/platform/tast/+/2c1ed3911f61f8d1b267c1844bcaf152ac8b4afa commit 2c1ed3911f61f8d1b267c1844bcaf152ac8b4afa Author: Daniel Erat <derat@chromium.org> Date: Fri Jun 22 13:00:17 2018 tast: Fix races in reporting of test timeouts. Make the goroutines that run tests responsible for closing their output channels. Output channels were formerly closed by the main goroutine, which caused races where runaway tests could panic after writing to closed channels. https://tour.golang.org/concurrency/4 warns against doing this: "Only the sender should close a channel, never the receiver. Sending on a closed channel will cause a panic." This change also makes test bundles responsible for reporting test timeouts rather than testing.Test.Run. BUG= chromium:853406 TEST=updated unit tests; also manually verified that a timeout error is reported for a test that ignores its deadline and that the bundle doesn't panic if the test writes to its output channel after the next test has started Change-Id: I04a40480b01f4e1ffab5c9c1f9faf800c43984dc Reviewed-on: https://chromium-review.googlesource.com/1111014 Commit-Ready: Dan Erat <derat@chromium.org> Tested-by: Dan Erat <derat@chromium.org> Reviewed-by: Dan Erat <derat@chromium.org> [modify] https://crrev.com/2c1ed3911f61f8d1b267c1844bcaf152ac8b4afa/src/chromiumos/tast/testing/test_test.go [modify] https://crrev.com/2c1ed3911f61f8d1b267c1844bcaf152ac8b4afa/src/chromiumos/tast/bundle/bundle_test.go [modify] https://crrev.com/2c1ed3911f61f8d1b267c1844bcaf152ac8b4afa/src/chromiumos/tast/testing/state.go [modify] https://crrev.com/2c1ed3911f61f8d1b267c1844bcaf152ac8b4afa/src/chromiumos/tast/bundle/bundle.go [modify] https://crrev.com/2c1ed3911f61f8d1b267c1844bcaf152ac8b4afa/src/chromiumos/tast/testing/test.go
,
Jun 25 2018
|
|
►
Sign in to add a comment |
|
Comment 1 by derat@chromium.org
, Jun 16 2018