New issue
Advanced search Search tips
Note: Color blocks (like or ) mean that a user may not be available. Tooltip shows the reason.

Issue 834269 link

Starred by 2 users

Issue metadata

Status: Fixed
Owner:
Closed: May 2018
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: Linux
Pri: 2
Type: Bug

Blocked on:
issue angleproject:2482



Sign in to add a comment

Linux FYI GPU TSAN Release failing angle_end2end_tests because of timeout

Project Member Reported by jmad...@chromium.org, Apr 18 2018

Issue description

This test seems to have become extremely slow, and is timing out.

First failing:
https://ci.chromium.org/p/chromium/builders/luci.chromium.ci/Linux%20FYI%20GPU%20TSAN%20Release/5925

Last known good:
https://ci.chromium.org/p/chromium/builders/luci.chromium.ci/Linux%20FYI%20GPU%20TSAN%20Release/5924

ANGLE regression range:
https://chromium.googlesource.com/angle/angle/+/41c43ce7f74906c87c5e1c6c8cc0673f8ab854b1

Test output:
https://chromium-swarm.appspot.com/task?id=3ce997cf0314a710&refresh=10&show_raw=1

The test doesn't seem to have any errors, it just gets slower and slower as the test progresses.

For instance at the start of the tests, things are fast:
ClientArraysTest.ForbidsClientSideElementBuffer/ES3_OPENGL (53 ms)

Near the later part of the test, OpenGL tests are very very slow:
Texture2DTest.ZeroSizedUploads/ES2_OPENGL (61045 ms)

I'm not sure what's happening here. Can anyone offer some suggestions?
 
Though towards the end the slowness is more pronounced, things are randomly getting slow throughout the run.
You can see "Still waiting for the following processes to finish" messages in random places in https://chromium-swarm.appspot.com/task?id=3ce997cf0314a710&refresh=10&show_raw=1.
These appear mostly in OPENGL tests, and there are no OpenGL changes in regression range.

Maybe the bot or TSAN instrumentation got slower?
OTOH, only angle_end2end_tests became slower, maybe also angle_white_box_tests a bit. So, it could be that LVL are responsible. It should be possible to reproduce this locally and see what's going on.

Comment 3 by piman@chromium.org, Apr 19 2018

Tried locally, definitely reproducible, running e.g. angle_end2end_tests --gtest_filter='Texture2DTest.TexStorage*' --gtest_repeat=100 --single-process-tests
The OpenGL steps keep getting slower and slower (the Vulkan ones don't seem to).

Attached a debugger, and it's pretty consistently within a stack like this:

(gdb) bt
#0  0x000000000044e100 in FreeRange() ()
    at /b/build/slave/linux_upload_clang/build/src/third_party/llvm/compiler-rt/lib/tsan/rtl/tsan_sync.cc:90
#1  0x000000000044dfcc in FreeBlock() ()
    at /b/build/slave/linux_upload_clang/build/src/third_party/llvm/compiler-rt/lib/tsan/rtl/tsan_sync.cc:79
#2  0x0000000000437ddf in user_free() ()
    at /b/build/slave/linux_upload_clang/build/src/third_party/llvm/compiler-rt/lib/tsan/rtl/tsan_mman.cc:200
#3  0x0000000000437ddf in user_free() ()
    at /b/build/slave/linux_upload_clang/build/src/third_party/llvm/compiler-rt/lib/tsan/rtl/tsan_mman.cc:170
#4  0x0000000000438110 in user_realloc() ()
    at /b/build/slave/linux_upload_clang/build/src/third_party/llvm/compiler-rt/lib/tsan/rtl/tsan_mman.cc:219
#5  0x00000000003e65d9 in __interceptor_realloc() ()
    at /b/build/slave/linux_upload_clang/build/src/third_party/llvm/compiler-rt/lib/tsan/rtl/tsan_interceptors.cc:695
#6  0x00007f34612dbfab in  () at /usr/lib/x86_64-linux-gnu/libGLX_nvidia.so.0
warning: (Internal error: pc 0x7f3469fe54e9 in read in psymtab, but not in symtab.)

warning: (Internal error: pc 0x7f3469fe54e9 in read in psymtab, but not in symtab.)

warning: (Internal error: pc 0x7f3469fe54c0 in read in psymtab, but not in symtab.)

warning: (Internal error: pc 0x7f3469fe54e9 in read in psymtab, but not in symtab.)

#7  0x00007f34612f4ede in  () at /usr/lib/x86_64-linux-gnu/libGLX_nvidia.so.0
warning: (Internal error: pc 0x7f3469fe54e9 in read in psymtab, but not in symtab.)

#8  0x00007f34612ef982 in  () at /usr/lib/x86_64-linux-gnu/libGLX_nvidia.so.0
#9  0x00007f3461284f19 in  () at /usr/lib/x86_64-linux-gnu/libGLX_nvidia.so.0
#10 0x00007f34612b2599 in  () at /usr/lib/x86_64-linux-gnu/libGLX_nvidia.so.0
---Type <return> to continue, or q <return> to quit---
#11 0x00007f34612a815d in glXQueryExtensionsString ()
    at /usr/lib/x86_64-linux-gnu/libGLX_nvidia.so.0
#12 0x00007f3469124ddc in initialize() ()
    at ../../third_party/angle/src/libANGLE/renderer/gl/glx/FunctionsGLX.cpp:327
#13 0x00007f3469124ddc in initialize() ()
    at ../../third_party/angle/src/libANGLE/renderer/gl/glx/FunctionsGLX.cpp:206
#14 0x00007f34691203b4 in initialize() ()
    at ../../third_party/angle/src/libANGLE/renderer/gl/glx/DisplayGLX.cpp:108
warning: (Internal error: pc 0x7f3469fe54e9 in read in psymtab, but not in symtab.)

#15 0x00007f3468ecb364 in initialize() ()
    at ../../third_party/angle/src/libANGLE/Display.cpp:466
warning: (Internal error: pc 0x7f3469fe54e9 in read in psymtab, but not in symtab.)

#16 0x00007f3468e56273 in Initialize() ()
    at ../../third_party/angle/src/libGLESv2/entry_points_egl.cpp:87
warning: (Internal error: pc 0x7f3469fe54e9 in read in psymtab, but not in symtab.)

#17 0x00007f3469fe54ea in warning: (Internal error: pc 0x7f3469fe54e9 in read in psymtab, but not in symtab.)

eglInitializewarning: (Internal error: pc 0x7f3469fe54e9 in read in psymtab, but not in symtab.)

warning: (Internal error: pc 0x7f3469fe54e9 in read in psymtab, but not in symtab.)

 ()warning: (Internal error: pc 0x7f3469fe54e9 in read in psymtab, but not in symtab.)


    at ../../third_party/angle/src/libEGL/libEGL.cppwarning: (Internal error: pc 0x7f3469fe54e9 in read in psymtab, but not in symtab.)

warning: (Internal error: pc 0x7f3469fe54e9 in read in psymtab, but not in symtab.)

:87
#18 0x00007f3469ffd418 in initializeDisplayAndSurface() ()
    at ../../third_party/angle/util/EGLWindow.cpp:226
#19 0x0000000000a6046c in ANGLETestSetUp() ()
    at ../../third_party/angle/src/tests/test_utils/ANGLETest.cpp:299
#20 0x0000000000a623fa in ANGLETest::SetUp() ()
    at ../../third_party/angle/src/tests/test_utils/ANGLETest.cpp:977
#21 0x00000000008a0211 in SetUp() ()
---Type <return> to continue, or q <return> to quit---
    at ../../third_party/angle/src/tests/gl_tests/TextureTest.cpp:78
#22 0x000000000089f571 in SetUp() ()
    at ../../third_party/angle/src/tests/gl_tests/TextureTest.cpp:162
#23 0x00000000008a00a1 in non-virtual thunk to (anonymous namespace)::Texture2DTest::SetUp() ()
    at ../../third_party/googletest/src/googletest/include/gtest/internal/gtest-linked_ptr.h:153
#24 0x0000000000a8449c in Run() ()
    at ../../third_party/googletest/src/googletest/src/gtest-internal-inl.h:920
#25 0x0000000000a854fd in Run() ()
    at ../../third_party/googletest/src/googletest/src/gtest.cc:2667
#26 0x0000000000a85d77 in Run() ()
    at ../../third_party/googletest/src/googletest/src/gtest.cc:2785
#27 0x0000000000a963e7 in RunAllTests() ()
    at ../../third_party/googletest/src/googletest/src/gtest.cc:5047
#28 0x0000000000a95cec in Run() ()
    at ../../third_party/googletest/src/googletest/src/gtest-internal-inl.h:920
#29 0x0000000000a9eb67 in Run() ()
    at ../../third_party/googletest/src/googletest/include/gtest/gtest.h:2327
#30 0x0000000000a9eb67 in Run() () at ../../base/test/test_suite.cc:275
#31 0x0000000000a6b583 in (anonymous namespace)::RunHelper(base::TestSuite*) ()
    at ../../gpu/angle_end2end_tests_main.cc:19
#32 0x0000000000a6b5d5 in Run() () at ../../base/bind_internal.h:402
---Type <return> to continue, or q <return> to quit---
#33 0x0000000000a6b5d5 in Run() () at ../../base/bind_internal.h:530
#34 0x0000000000a6b5d5 in Run() () at ../../base/bind_internal.h:604
#35 0x0000000000a6b5d5 in Run() () at ../../base/bind_internal.h:586
#36 0x0000000000aa24c4 in LaunchUnitTestsInternal() ()
    at ../../base/callback.h:95
#37 0x0000000000aa24c4 in LaunchUnitTestsInternal() ()
    at ../../base/test/launcher/unit_test_launcher.cc:225
#38 0x0000000000aa2ada in LaunchUnitTestsWithOptions() ()
    at ../../base/test/launcher/unit_test_launcher.cc:597
#39 0x0000000000a6b501 in main() () at ../../gpu/angle_end2end_tests_main.cc:29

Comment 4 by piman@chromium.org, Apr 20 2018

The resident size seems to be growing at every iteration too, suggesting a leak.

Comment 5 by piman@chromium.org, Apr 20 2018

Owner: cwallez@chromium.org
->cwallez, any idea?
Issue 834563 has been merged into this issue.
Status: Available (was: Untriaged)
I'll take a look if I'm able to repro.

Notes:
 - glXQueryExtensionsStrings' return value doesn't need to be freed explicitly.
 - The regression range doesn't seem very useful?

Several ideas but I didn't check any yet:
 - Does the resident size grow without TSAN too? If not, then maybe TSAN's tracking data keeps growing during the test?
 - Does this happen on a different driver? If not it could be caused by NVIDIA's driver being multi-threaded.
 - Does this happen without the Vulkan backend? It could be caused by the initialization of the Vulkan validation layers.
It could be related to the LVL. It's unclear, but I think that was in the regression range.

Comment 10 by piman@chromium.org, Apr 20 2018

FWIW I couldn't repro the RSS increase without TSAN, but it's also possible that it's a small leak (i.e. within noise) that gets magnified by TSAN.

I could very well believe that glXQueryExtensionsString allocates and leak the extension string, it doesn't have a well-defined lifetime. Creating a new XDisplay (therefore a new GLX extension client-side structure in the driver) many times (once on every test) is fairly unusual.

Assuming it's a leak in the driver, would it be completely unreasonable to avoid recreating/reinitializing the EGL display for every test in this test harness, and instead cache it, at least for the tests that don't care about the EGL parts of the API?
Cc: cwallez@chromium.org
Owner: jmad...@chromium.org
Status: Assigned (was: Available)
Going to try disabling the LVL in this builder.
Project Member

Comment 12 by bugdroid1@chromium.org, May 1 2018

The following revision refers to this bug:
  https://chromium.googlesource.com/angle/angle/+/ad3aaeba3e0d7f704ca84b2bac4c23e1242014c9

commit ad3aaeba3e0d7f704ca84b2bac4c23e1242014c9
Author: Jamie Madill <jmadill@chromium.org>
Date: Tue May 01 12:50:52 2018

Disable Vulkan layers in sanitized builds.

This was causing very slow builds/test runs.

Bug: chromium:837166
Bug:  chromium:834269 
Change-Id: If2e5665455d4a8af13cbc732a65a07550ace8304
Reviewed-on: https://chromium-review.googlesource.com/1036220
Reviewed-by: Jamie Madill <jmadill@chromium.org>

[modify] https://crrev.com/ad3aaeba3e0d7f704ca84b2bac4c23e1242014c9/gni/angle.gni

Project Member

Comment 13 by bugdroid1@chromium.org, May 1 2018

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/a3ffbd46d8a7667f14738fdf42796f344e3ced64

commit a3ffbd46d8a7667f14738fdf42796f344e3ced64
Author: angle-chromium-autoroll@skia-buildbots.google.com.iam.gserviceaccount.com <angle-chromium-autoroll@skia-buildbots.google.com.iam.gserviceaccount.com>
Date: Tue May 01 17:49:32 2018

Roll src/third_party/angle/ ddd772455..ad3aaeba3 (1 commit)

https://chromium.googlesource.com/angle/angle.git/+log/ddd772455ce7..ad3aaeba3e0d

$ git log ddd772455..ad3aaeba3 --date=short --no-merges --format='%ad %ae %s'
2018-05-01 jmadill Disable Vulkan layers in sanitized builds.

Created with:
  roll-dep src/third_party/angle
BUG=chromium:837166, chromium:834269 


The AutoRoll server is located here: https://angle-chromium-roll.skia.org

Documentation for the AutoRoller is here:
https://skia.googlesource.com/buildbot/+/master/autoroll/README.md

If the roll is causing failures, please contact the current sheriff, who should
be CC'd on the roll, and stop the roller if necessary.


CQ_INCLUDE_TRYBOTS=luci.chromium.try:android_optional_gpu_tests_rel;luci.chromium.try:linux_optional_gpu_tests_rel;luci.chromium.try:mac_optional_gpu_tests_rel;luci.chromium.try:win_optional_gpu_tests_rel
TBR=cwallez@chromium.org
No-Try: True

Change-Id: I469040e3c7b571ae7d469330c2f0a142215ced61
Reviewed-on: https://chromium-review.googlesource.com/1036500
Commit-Queue: Jamie Madill <jmadill@chromium.org>
Commit-Queue: angle-chromium-autoroll <angle-chromium-autoroll@skia-buildbots.google.com.iam.gserviceaccount.com>
Reviewed-by: angle-chromium-autoroll <angle-chromium-autoroll@skia-buildbots.google.com.iam.gserviceaccount.com>
Cr-Commit-Position: refs/heads/master@{#555103}
[modify] https://crrev.com/a3ffbd46d8a7667f14738fdf42796f344e3ced64/DEPS

Blockedon: angleproject:2482
Status: Fixed (was: Assigned)
Tentatively marking as fixed.
Project Member

Comment 15 by ClusterFuzz, May 9 2018

Labels: Needs-Feedback
ClusterFuzz testcase 4614092258803712 is still reproducing on tip-of-tree build (trunk).

Please re-test your fix against this testcase and if the fix was incorrect or incomplete, please re-open the bug. Otherwise, ignore this notification and add ClusterFuzz-Wrong label.
Project Member

Comment 17 by bugdroid1@chromium.org, May 14 2018

The following revision refers to this bug:
  https://chromium.googlesource.com/angle/angle/+/f345cdf37b81690c6942e64986d0d276531f38bd

commit f345cdf37b81690c6942e64986d0d276531f38bd
Author: Corentin Wallez <cwallez@chromium.org>
Date: Mon May 14 19:54:56 2018

DisplayGLX: Close the X display if we own it.

BUG= chromium:834269 

Change-Id: Ia49f80f4c057ad467428a13e8cd4ca54ad48d5c4
Reviewed-on: https://chromium-review.googlesource.com/1058084
Reviewed-by: Jamie Madill <jmadill@chromium.org>
Reviewed-by: Geoff Lang <geofflang@chromium.org>
Commit-Queue: Corentin Wallez <cwallez@chromium.org>

[modify] https://crrev.com/f345cdf37b81690c6942e64986d0d276531f38bd/src/libANGLE/renderer/gl/glx/DisplayGLX.cpp

Project Member

Comment 18 by bugdroid1@chromium.org, May 14 2018

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/9e7a5bd04dc7b4efc0874abc33986b392b2fe793

commit 9e7a5bd04dc7b4efc0874abc33986b392b2fe793
Author: angle-chromium-autoroll@skia-buildbots.google.com.iam.gserviceaccount.com <angle-chromium-autoroll@skia-buildbots.google.com.iam.gserviceaccount.com>
Date: Mon May 14 23:45:14 2018

Roll src/third_party/angle/ 5d2ccc534..5730c0bf4 (9 commits)

https://chromium.googlesource.com/angle/angle.git/+log/5d2ccc534d26..5730c0bf431e

$ git log 5d2ccc534..5730c0bf4 --date=short --no-merges --format='%ad %ae %s'
2018-05-14 geofflang Add more dEQP EGL expectations for Linux and Android.
2018-05-14 cwallez DisplayGLX: Close the X display if we own it.
2018-05-14 geofflang Add more dEQP EGL expectations for Linux and Android.
2018-05-14 geofflang DEQP: Print not supported messages from tests.
2018-05-14 geofflang Add dEQP EGL test expectations for Linux and Android.
2018-05-10 jmadill Vulkan: Implement masked color clear with depth.
2018-05-14 jmadill Fix libGLESv2 wrong .def file.
2018-05-14 jmadill Fix warnings from size_t conversions.
2018-04-23 lfy GLES1: Renderer (minimal)

Created with:
  roll-dep src/third_party/angle
BUG= chromium:834269 , chromium:842028 


The AutoRoll server is located here: https://angle-chromium-roll.skia.org

Documentation for the AutoRoller is here:
https://skia.googlesource.com/buildbot/+/master/autoroll/README.md

If the roll is causing failures, please contact the current sheriff, who should
be CC'd on the roll, and stop the roller if necessary.


CQ_INCLUDE_TRYBOTS=luci.chromium.try:android_optional_gpu_tests_rel;luci.chromium.try:linux_optional_gpu_tests_rel;luci.chromium.try:mac_optional_gpu_tests_rel;luci.chromium.try:win_optional_gpu_tests_rel
TBR=ynovikov@chromium.org

Change-Id: I1ac4625bc1c520a30186f260160dffbdf5787693
Reviewed-on: https://chromium-review.googlesource.com/1058172
Commit-Queue: angle-chromium-autoroll <angle-chromium-autoroll@skia-buildbots.google.com.iam.gserviceaccount.com>
Reviewed-by: angle-chromium-autoroll <angle-chromium-autoroll@skia-buildbots.google.com.iam.gserviceaccount.com>
Cr-Commit-Position: refs/heads/master@{#558531}
[modify] https://crrev.com/9e7a5bd04dc7b4efc0874abc33986b392b2fe793/DEPS

Sign in to add a comment