Isolate client segfaults when it can't reach remote service. |
||||
Issue descriptionApparently Isolate client attempts to access the remote Isolate service when it tests, and will segfault if it fails. ok github.com/luci/luci-go/client/isolate 0.103s 2016/03/29 09:52:19 POST /_ah/api/isolateservice/v1/preupload 2016/03/29 09:52:19 POST /_ah/api/isolateservice/v1/preupload 2016/03/29 09:52:19 POST /_ah/api/isolateservice/v1/preupload 2016/03/29 09:52:19 Task failed, retrying after a sleep of 0: http request failed: Service Unavailable (HTTP 503) 2016/03/29 09:52:19 POST /_ah/api/isolateservice/v1/preupload 2016/03/29 09:52:19 POST /_ah/api/isolateservice/v1/preupload 2016/03/29 09:52:19 POST /_ah/api/isolateservice/v1/store_inline 2016/03/29 09:52:19 POST /_ah/api/isolateservice/v1/store_inline 2016/03/29 09:52:19 PUT /fake/cloudstorage 2016/03/29 09:52:19 POST /_ah/api/isolateservice/v1/store_inline 2016/03/29 09:52:19 Task failed, retrying after a sleep of 0: http request failed: Service Unavailable (HTTP 503) 2016/03/29 09:52:19 POST /_ah/api/isolateservice/v1/store_inline 2016/03/29 09:52:19 Task failed, retrying after a sleep of 0: http request failed: Service Unavailable (HTTP 503) 2016/03/29 09:52:19 PUT /fake/cloudstorage 2016/03/29 09:52:19 PUT /fake/cloudstorage 2016/03/29 09:52:19 Task failed, retrying after a sleep of 0: Put http://127.0.0.1:50976/fake/cloudstorage?digest=639b8c971d725db4b2eed1f1794c6c8a75ba09fb: read tcp 127.0.0.1:50977->127.0.0.1:50976: wsarecv: An existing connection was forcibly closed by the remote host. panic: runtime error: invalid memory address or nil pointer dereference [signal 0xc0000005 code=0x0 addr=0x28 pc=0x4896cc] goroutine 34 [running]: panic(0x7d12e0, 0xc082012060) E:/b/build/slave/luci-go-win64/build/golang/go/src/runtime/panic.go:464 +0x3f4 github.com/luci/luci-go/client/isolatedclient.(*compressed).Read(0xc082061350, 0xc0821ae000, 0x8000, 0x8000, 0x0, 0x0, 0x0) E:/b/build/slave/luci-go-win64/build/infra/go/src/github.com/luci/luci-go/client/isolatedclient/isolatedclient.go:250 +0x6c io/ioutil.(*nopCloser).Read(0xc082013060, 0xc0821ae000, 0x8000, 0x8000, 0xc0821820b0, 0x0, 0x0) <autogenerated>:4 +0x89 io.(*multiReader).Read(0xc0820a9880, 0xc0821ae000, 0x8000, 0x8000, 0x1, 0x0, 0x0) E:/b/build/slave/luci-go-win64/build/golang/go/src/io/multi.go:13 +0xab io.copyBuffer(0x28b01c0, 0xc082013390, 0x28b00f0, 0xc0820a9880, 0xc0821ae000, 0x8000, 0x8000, 0x4018, 0x0, 0x0) E:/b/build/slave/luci-go-win64/build/golang/go/src/io/io.go:380 +0x24e io.Copy(0x28b01c0, 0xc082013390, 0x28b00f0, 0xc0820a9880, 0xc082013390, 0x0, 0x0) E:/b/build/slave/luci-go-win64/build/golang/go/src/io/io.go:350 +0x6b net/http.(*transferWriter).WriteBody(0xc0821902a0, 0x28b0168, 0xc0821820b0, 0x0, 0x0) E:/b/build/slave/luci-go-win64/build/golang/go/src/net/http/transfer.go:218 +0x2b9 net/http.(*Request).write(0xc08239e620, 0x28b0078, 0xc082100080, 0x28b0000, 0xc082061470, 0x0, 0x0, 0x0) E:/b/build/slave/luci-go-win64/build/golang/go/src/net/http/request.go:497 +0xb9b net/http.(*persistConn).writeLoop(0xc08203bd40) E:/b/build/slave/luci-go-win64/build/golang/go/src/net/http/transport.go:1279 +0x2e3 created by net/http.(*Transport).dialConn E:/b/build/slave/luci-go-win64/build/golang/go/src/net/http/transport.go:854 +0x10d2 FAIL github.com/luci/luci-go/client/isolatedclient 0.157s PTAL. I think the remote service alone is bad for tests, but the failure handling definitely seems like a bug.
,
Mar 29 2016
Yep, this is from the Win64 luci-go tryserver logs. It may be hard to reproduce since it relies on remote service flake. IMO the actual solution here is to mock the remote service during the test. The line in question is: return c.r.Read(p) AFAICT, "c" and "p" are not nil, meaning "c.r" probably is. This only really happens when Close is called, which it is in defer: https://github.com/luci/luci-go/blob/master/client/isolatedclient/isolatedclient.go#L182 Since the only thing that has a "compressed" instance is the HTTP library, and the stack trace is coming from the HTTP library, it seems like the problem is that the HTTP request retains the "compressed" instance and is trying to Read from the "compressed" object after it has been closed. Perhaps the thing to do is have "Read" return io.EOF if "compressed" has already been closed?
,
Apr 1 2016
,
Apr 1 2016
assigning to myself to be looked in mid-april.
,
Apr 1 2016
The following revision refers to this bug: https://chromium.googlesource.com/external/github.com/luci/luci-go.git/+/50fe3c609c97b5effbc76d816a439fd19ab620ec commit 50fe3c609c97b5effbc76d816a439fd19ab620ec Author: dnj <dnj@chromium.org> Date: Fri Apr 01 19:25:47 2016 Isolate: Use generators instead of seekers Have Isolate use generators to obtain io.ReadCloser instances instead of reusing a single io.ReadCloser instance via Seek. This greatly simplifies the code and ensures a clean division between HTTP request data on retry. In the standard case, this will require one additional file open per archival, but will demand far fewer simultaneously-open file handles. BUG= chromium:598754 TEST=None Review URL: https://codereview.chromium.org/1846263002 [modify] https://crrev.com/50fe3c609c97b5effbc76d816a439fd19ab620ec/client/archiver/archiver.go [modify] https://crrev.com/50fe3c609c97b5effbc76d816a439fd19ab620ec/client/archiver/archiver_test.go [modify] https://crrev.com/50fe3c609c97b5effbc76d816a439fd19ab620ec/client/archiver/utils.go [modify] https://crrev.com/50fe3c609c97b5effbc76d816a439fd19ab620ec/client/internal/lhttp/client.go [modify] https://crrev.com/50fe3c609c97b5effbc76d816a439fd19ab620ec/client/internal/lhttp/client_test.go [modify] https://crrev.com/50fe3c609c97b5effbc76d816a439fd19ab620ec/client/isolate/isolate.go [modify] https://crrev.com/50fe3c609c97b5effbc76d816a439fd19ab620ec/client/isolatedclient/isolatedclient.go [modify] https://crrev.com/50fe3c609c97b5effbc76d816a439fd19ab620ec/client/isolatedclient/isolatedclient_test.go
,
Apr 1 2016
I'm optimistically marking this "Fixed" with the CL in #5; please re-open if this sort of failure is observed again.
,
Apr 27 2016
|
||||
►
Sign in to add a comment |
||||
Comment 1 by tandrii@chromium.org
, Mar 29 2016