GWP-ASan overwrites GetLastError() with ERROR_SUCCESS |
||||
Issue descriptionWhile trying to enable GWP-ASan's field trial testing config in https://chromium-review.googlesource.com/c/chromium/src/+/1388938 I was hitting the following unexpected test failures: Parameterized/PolicyUpdateServiceTest.PolicyCorruptedOnStartup/1 Parameterized/PolicyUpdateServiceTest.PolicyCorruptedOnStartup/0 Parameterized/UpdateServiceTest.SuccessfulUpdate/0 Parameterized/UpdateServiceTest.SuccessfulUpdate/1 Parameterized/UpdateServiceTest.PolicyCorrupted/0 Parameterized/UpdateServiceTest.PolicyCorrupted/1 Parameterized/PolicyUpdateServiceTest.FailedUpdateRetries/0 Parameterized/PolicyUpdateServiceTest.FailedUpdateRetries/1 Parameterized/PolicyUpdateServiceTest.Backoff/0 Parameterized/PolicyUpdateServiceTest.Backoff/1 After a little bit of digging I found that the test failures were due to a DCHECK in base file routines that were triggered when an error occured but GetLastError() did not have an error saved. The cause was that an allocation routine was called at some point after the file error and GWP-ASan's TLSGetValue (using TlsGetValue on Windows) overwrites GetLastError with ERROR_SUCCESS as noted in https://docs.microsoft.com/en-us/windows/desktop/api/processthreadsapi/nf-processthreadsapi-tlsgetvalue: Functions that return indications of failure call SetLastError when they fail. They generally do not call SetLastError when they succeed. The TlsGetValue function is an exception to this general rule. The TlsGetValue function calls SetLastError to clear a thread's last error when it succeeds. That allows checking for the error-free retrieval of zero values. To fix I need to save LastError across calls to TlsGetValue and I need to investigate whether I need something similar on POSIX platforms as well for errno. I'd like to refactor base::internal::ScopedClearLastError to base::ScopedClearLastError so that I can do that easily.
,
Jan 10
,
Jan 10
That's an interesting finding! Thanks for sharing. We don't actually enable heap profiler by default. So most of the tests (besides the profiler ones) run without profiler enabled. However the scenario can potentially take place in the wild, when the profiler is enabled. The TlsGetValue documentation you linked is not clear whether last error is updated on every invocation of TlsGetValue or only when the returned value happens to be zero. If former we seem indeed need to store last error across TLS accesses. If latter we can try to workaround it by making sure we never store zero to TL variables.
,
Jan 10
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/f11f6e41c03def6838a6ac32d3976990639413e9 commit f11f6e41c03def6838a6ac32d3976990639413e9 Author: Vlad Tsyrklevich <vtsyrklevich@chromium.org> Date: Thu Jan 10 23:05:22 2019 GWP-ASan: Preserve last error across allocations Calls to TlsGetValue on every allocation can overwrite the value in GetLastError() before error handling code has a chance to handle the error. Preserve its value across calls. Bug: 920440 Change-Id: Iaee843b71505625fc88ac51698b63c41de957c33 Reviewed-on: https://chromium-review.googlesource.com/c/1404476 Commit-Queue: Vlad Tsyrklevich <vtsyrklevich@chromium.org> Reviewed-by: Vitaly Buka <vitalybuka@chromium.org> Cr-Commit-Position: refs/heads/master@{#621790} [modify] https://crrev.com/f11f6e41c03def6838a6ac32d3976990639413e9/components/gwp_asan/client/sampling_allocator_shims_win.h
,
Jan 14
> If former we seem indeed need to store last error across TLS accesses. It's the former, TlsGetValue() sets LastError to 0 on success irrespective of the returned value.
,
Jan 16
We can probably use thread_local storage on Win. At least there's a precedent https://cs.chromium.org/chromium/src/base/threading/platform_thread_posix.cc?type=cs&q=thread_local&g=0&l=149
,
Jan 16
I was going to try to land the Mac thread_local fix in LLVM first but given that there is precedent I would rather do it sooner as it will also help with some of the perf regressions I'm seeing in crbug.com/921237. I'm also working on implementing macOS support for GWP-ASan anyway so I'll try to finish running down the LLVM bug either way. Thanks for pointing that out.
,
Jan 17
(5 days ago)
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/c72ef12f9ff6132ce2368484883fa0a391807f43 commit c72ef12f9ff6132ce2368484883fa0a391807f43 Author: Vlad Tsyrklevich <vtsyrklevich@chromium.org> Date: Thu Jan 17 21:23:01 2019 GWP-ASan: Optimize sampling using thread_local After being assigned some performance regressions after enabling GWP-ASan in the field trial testing config I benchmarked a number of GWP-ASan configurations. I found that by turning down the sampling frequency from the current value I could reduce the GWP-ASan overhead but that it was still noticeable even with sampling values so high that we only noticed the overhead of sampling in allocations (and performing the range checks during frees though this cost is not noticeable.) Currently sampling is performed by holding a counter in thread-local storage and accessing TLS using system APIs. I found that using thread_local to store the sampling counter I could reduce the overhead of a micro-benchmark testing the sampling overhead by approximately 30-40%. ALWAYS_INLINE improves performance even further. A bug on macOS has held up thread_local adoption in the C++11 allowed features; however, I've disabled macOS support in order to get the performance improvement on Windows and I'm working down the LLVM issue blocking macOS adoption in parallel. Bug: 922556, 920440, 896019 Change-Id: Ic35a0e44a25b27bcf3bf8a31234e9704f26b904b Reviewed-on: https://chromium-review.googlesource.com/c/1416093 Auto-Submit: Vlad Tsyrklevich <vtsyrklevich@chromium.org> Commit-Queue: Vlad Tsyrklevich <vtsyrklevich@chromium.org> Reviewed-by: Vitaly Buka <vitalybuka@chromium.org> Cr-Commit-Position: refs/heads/master@{#623848} [modify] https://crrev.com/c72ef12f9ff6132ce2368484883fa0a391807f43/components/gwp_asan/BUILD.gn [modify] https://crrev.com/c72ef12f9ff6132ce2368484883fa0a391807f43/components/gwp_asan/client/BUILD.gn [modify] https://crrev.com/c72ef12f9ff6132ce2368484883fa0a391807f43/components/gwp_asan/client/sampling_allocator_shims.cc [delete] https://crrev.com/5bdea1a4780492ef2ae046f17b3ca9c64795711d/components/gwp_asan/client/sampling_allocator_shims_posix.h [delete] https://crrev.com/5bdea1a4780492ef2ae046f17b3ca9c64795711d/components/gwp_asan/client/sampling_allocator_shims_win.h
,
Jan 18
(4 days ago)
Alexei I've reassigned to you as this is fixed in GWP-ASan. Note that using thread_local in the allocator shims on macOS is tricky. First access to the thread_local in a thread will call malloc, so a re-entry guard not based on thread_local is required. |
||||
►
Sign in to add a comment |
||||
Comment 1 by vtsyrklevich@chromium.org
, Jan 9