New issue
Advanced search Search tips
Note: Color blocks (like or ) mean that a user may not be available. Tooltip shows the reason.

Issue 917555 link

Starred by 3 users

Issue metadata

Status: Fixed
Owner:
Closed: Jan 15
Cc:
Components:
EstimatedDays: ----
NextAction: 2019-01-07
OS: Linux
Pri: 1
Type: Bug

Blocked on:
issue 918993
issue 922237

Blocking:
issue 918942



Sign in to add a comment

dawn_end2end_tests failing on some linux bots

Project Member Reported by zmo@chromium.org, Dec 22

Issue description





Linux FYI Debug (NVIDIA):

https://ci.chromium.org/p/chromium/builders/luci.chromium.ci/Linux%20FYI%20Debug%20%28NVIDIA%29

Linux FYI Release (NVIDIA):

https://ci.chromium.org/p/chromium/builders/luci.chromium.ci/Linux%20FYI%20Release%20%28NVIDIA%29

The failure is due to an assertion in DeviceVk.cpp:

https://cs.chromium.org/chromium/src/third_party/dawn/src/dawn_native/vulkan/DeviceVk.cpp?rcl=300eec0f82c3c086e2c472d157594dff8fa81feb&l=87

I think when this fails, it fails all vulkan tests.

kainino@ thinks it's due to vulkan driver missing on certain bots

Here are a list bots we see this failing:

build651-m4
build652-m4
build653-m4
build655-m4
build656-m4
build81-m4
build825-m4
build827-m4

Since we can only go back 200 runs, so there might be other bots that are also having this issue.
 
Project Member

Comment 1 by bugdroid1@chromium.org, Dec 22

The following revision refers to this bug:
  https://dawn.googlesource.com/dawn/+/cb71ba7b3a42849b5e15794426cb9fe55cba8b13

commit cb71ba7b3a42849b5e15794426cb9fe55cba8b13
Author: Kai Ninomiya <kainino@chromium.org>
Date: Sat Dec 22 05:40:11 2018

Don't use ConsumedError on device initialization errors

If there's an error during device initialization, it's too early to use
ConsumedError (SetErrorCallback can't possibly have been called).
In this case, manually handle the error from initialization.

This will help us diagnose  issue chromium:917555 , where device
initialization is failing but the error is not printed.

TBR: cwallez@chromium.org
Bug:  chromium:917555 
Change-Id: I63ba3983688f508550afe2815ca1dda36130fed1
Reviewed-on: https://dawn-review.googlesource.com/c/3520
Reviewed-by: Kai Ninomiya <kainino@chromium.org>
Commit-Queue: Kai Ninomiya <kainino@chromium.org>

[modify] https://crrev.com/cb71ba7b3a42849b5e15794426cb9fe55cba8b13/src/dawn_native/vulkan/DeviceVk.cpp

Project Member

Comment 2 by bugdroid1@chromium.org, Dec 22

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/6babc45011db4dc476e4fc7977e9ab4a7ad6b75d

commit 6babc45011db4dc476e4fc7977e9ab4a7ad6b75d
Author: chromium-autoroll <chromium-autoroll@skia-public.iam.gserviceaccount.com>
Date: Sat Dec 22 07:41:53 2018

Roll src/third_party/dawn 300eec0f82c3..cb71ba7b3a42 (1 commits)

https://dawn.googlesource.com/dawn.git/+log/300eec0f82c3..cb71ba7b3a42


git log 300eec0f82c3..cb71ba7b3a42 --date=short --no-merges --format='%ad %ae %s'
2018-12-22 kainino@chromium.org Don't use ConsumedError on device initialization errors


Created with:
  gclient setdep -r src/third_party/dawn@cb71ba7b3a42

The AutoRoll server is located here: https://autoroll.skia.org/r/dawn-chromium-autoroll

Documentation for the AutoRoller is here:
https://skia.googlesource.com/buildbot/+/master/autoroll/README.md

If the roll is causing failures, please contact the current sheriff, who should
be CC'd on the roll, and stop the roller if necessary.

CQ_INCLUDE_TRYBOTS=luci.chromium.try:linux_optional_gpu_tests_rel;luci.chromium.try:mac_optional_gpu_tests_rel;luci.chromium.try:win_optional_gpu_tests_rel

BUG= chromium:917555 
TBR=cwallez@chromium.org

Change-Id: Ia39c4e2b6f0f4185ded92a33dc933378c2d39918
Reviewed-on: https://chromium-review.googlesource.com/c/1389660
Reviewed-by: chromium-autoroll <chromium-autoroll@skia-public.iam.gserviceaccount.com>
Commit-Queue: chromium-autoroll <chromium-autoroll@skia-public.iam.gserviceaccount.com>
Cr-Commit-Position: refs/heads/master@{#618739}
[modify] https://crrev.com/6babc45011db4dc476e4fc7977e9ab4a7ad6b75d/DEPS

Labels: -Pri-1 Pri-2
Owner: kainino@chromium.org
There haven't been any more failures on these bots since Dec 21. Whatever was flaking seems to have stopped, but the debug print is in there now in case it happens again.
NextAction: 2019-01-07
NextAction to check if there are any more flakes and close if not.
Blocking: 918942
btw, Corentin has this patch in flight: https://dawn-review.googlesource.com/c/dawn/+/3623/3/src/dawn_native/vulkan/DeviceVk.cpp#562
to improve the reporting further.
correction: line 494 in that link.

Here's the important log info from  issue 918942 :

Device initialization error: vkCreateInstance failed
More recent flakes have been observed in Issue 918993.

Example:
https://chromium-swarm.appspot.com/task?id=422e870b0072ca10&refresh=10&show_raw=1

[ RUN      ] BasicTests.BufferSetSubData/Vulkan
Device initialization error: vkCreateInstance failed
Assertion failure at ../../third_party/dawn/src/dawn_native/vulkan/DeviceVk.cpp:94 (Device): false
[7004:7007:0103/135409.299646:845181726:ERROR:kill_posix.cc(84)] Unable to terminate process group 7008: No such process (3)
[1/77] BasicTests.BufferSetSubData/Vulkan (CRASHED)

Blockedon: 918993
Project Member

Comment 10 by bugdroid1@chromium.org, Jan 4

The following revision refers to this bug:
  https://dawn.googlesource.com/dawn/+/e9212dfe309c31e3bce7eb95752d6919ebe53917

commit e9212dfe309c31e3bce7eb95752d6919ebe53917
Author: Corentin Wallez <cwallez@chromium.org>
Date: Fri Jan 04 10:18:40 2019

Vulkan: Print the VkResult value on device creation failure.

This adds a CheckVkSuccess utility function that adds the VkResult value
to the context lost error message.

Also adds a small fix to dawn_native/Error.h interoperability between
MaybeError and ResultOrError<NonPointer> with tests.

BUG= chromium:917555 
BUG=dawn:79

Change-Id: Icc01122d62d83693fc0ea3f26b272f2372fd3087
Reviewed-on: https://dawn-review.googlesource.com/c/3623
Commit-Queue: Corentin Wallez <cwallez@chromium.org>
Reviewed-by: Kai Ninomiya <kainino@chromium.org>

[modify] https://crrev.com/e9212dfe309c31e3bce7eb95752d6919ebe53917/src/dawn_native/Error.h
[modify] https://crrev.com/e9212dfe309c31e3bce7eb95752d6919ebe53917/src/dawn_native/vulkan/DeviceVk.cpp
[modify] https://crrev.com/e9212dfe309c31e3bce7eb95752d6919ebe53917/src/tests/unittests/ErrorTests.cpp
[add] https://crrev.com/e9212dfe309c31e3bce7eb95752d6919ebe53917/src/dawn_native/vulkan/VulkanError.h
[modify] https://crrev.com/e9212dfe309c31e3bce7eb95752d6919ebe53917/BUILD.gn
[add] https://crrev.com/e9212dfe309c31e3bce7eb95752d6919ebe53917/src/dawn_native/vulkan/VulkanError.cpp

Project Member

Comment 11 by bugdroid1@chromium.org, Jan 4

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/fba8e1c09a862ad8ff8e2a021e7de3e1460d2902

commit fba8e1c09a862ad8ff8e2a021e7de3e1460d2902
Author: chromium-autoroll <chromium-autoroll@skia-public.iam.gserviceaccount.com>
Date: Fri Jan 04 17:19:35 2019

Roll src/third_party/dawn 93158ebede89..110bc7918fc2 (6 commits)

https://dawn.googlesource.com/dawn.git/+log/93158ebede89..110bc7918fc2


git log 93158ebede89..110bc7918fc2 --date=short --no-merges --format='%ad %ae %s'
2019-01-04 cwallez@chromium.org Validate EndPass isn't called more than once.
2019-01-04 cwallez@chromium.org dawn_native: Add Instance and Adapters
2019-01-04 cwallez@chromium.org Vulkan: Print the VkResult value on device creation failure.
2019-01-04 cwallez@chromium.org WireServer: check buffer exists before sending the map callback
2019-01-04 cwallez@chromium.org WireCmd: guard against overflows when computing array sizes
2019-01-04 yunchao.he@intel.com Unify the compare function for sampler and depth stencil


Created with:
  gclient setdep -r src/third_party/dawn@110bc7918fc2

The AutoRoll server is located here: https://autoroll.skia.org/r/dawn-chromium-autoroll

Documentation for the AutoRoller is here:
https://skia.googlesource.com/buildbot/+/master/autoroll/README.md

If the roll is causing failures, please contact the current sheriff, who should
be CC'd on the roll, and stop the roller if necessary.

CQ_INCLUDE_TRYBOTS=luci.chromium.try:linux_optional_gpu_tests_rel;luci.chromium.try:mac_optional_gpu_tests_rel;luci.chromium.try:win_optional_gpu_tests_rel

BUG= chromium:918254 , chromium:917555 , chromium:918254 ,chromium:918094,chromium:918348,chromium:918260
TBR=cwallez@chromium.org

Change-Id: I4fe10af6d77616459112369b6414dc5b18b3cee4
Reviewed-on: https://chromium-review.googlesource.com/c/1396182
Reviewed-by: chromium-autoroll <chromium-autoroll@skia-public.iam.gserviceaccount.com>
Commit-Queue: chromium-autoroll <chromium-autoroll@skia-public.iam.gserviceaccount.com>
Cr-Commit-Position: refs/heads/master@{#619987}
[modify] https://crrev.com/fba8e1c09a862ad8ff8e2a021e7de3e1460d2902/DEPS

Sheriff here: These failures are still showing intermittently:
https://ci.chromium.org/p/chromium/builders/luci.chromium.ci/Linux%20FYI%20GPU%20TSAN%20Release/35439
https://ci.chromium.org/p/chromium/builders/luci.chromium.ci/Linux%20FYI%20Debug%20%28NVIDIA%29/5183

Example:
[ RUN      ] BindGroupTests.ReusedUBO/Vulkan
Device initialization error: vkCreateInstance failed with VK_ERROR_INCOMPATIBLE_DRIVER
Assertion failure at ../../third_party/dawn/src/dawn_native/vulkan/DeviceVk.cpp:99 (Device): false
Labels: Sheriff-Chromium
Labels: -Pri-2 Pri-1
Raising to P1 because these intermittent failures are causing cognitive load for the Chromium sheriffs as well as the GPU pixel wranglers. Let's try to resolve this next week.

The NextAction date has arrived: 2019-01-07
Labels: -Sheriff-Chromium
Cc: senorblanco@chromium.org cwallez@chromium.org
 Issue 919525  has been merged into this issue.
Blockedon: 922237
Status: Fixed (was: Assigned)

Sign in to add a comment