New issue
Advanced search Search tips

Issue 920304 link

Starred by 1 user

Issue metadata

Status: Fixed
Owner:
Closed: Jan 16
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: ----
Pri: 3
Type: Bug



Sign in to add a comment

TransactionFailedError: too much contention on these datastore entities

Project Member Reported by benjhayden@chromium.org, Jan 9

Issue description

This error spiked in the dashboard yesterday, with nearly 8000 occurrences within 24 hours:
TransactionFailedError: too much contention on these datastore entities. please try again. entity group key: (app=s~chromeperf, CachedPickledString, "externally_visible__list_tests_get_tests_v2_redacted")

The stack trace implicated a call to layered_cache.DeleteAsync(). Documentation suggests:
Every attempt to create, update, or delete an entity takes place in the context of a transaction. There is a write throughput limit of about one transaction per second within a single entity group.

The stack trace goes on to point to the DeleteAsync call in TestMetadata.CreateCallback.
So it appears that several new master/bot/suite started uploading, creating several new TestMetadata entities, all of which wanted to purge the CachedPickledString and MultipartEntity entities containing cached list of tests for their respective master/bot/suite. However, since the master/bot/suite was new, it is unlikely that that the CachedPickledString/MultipartEntity entities existed, neither at first nor for each subsequent attempt to purge them, so many of the attempts to delete the entities were unnecessary.

A fix is ready to check if the entities exist before attempting to delete them. This should reduce contention.
 
This error was seen for both add_point and add_histograms since they both create TestMetadata entities.
Project Member

Comment 2 by bugdroid1@chromium.org, Jan 11

The following revision refers to this bug:
  https://chromium.googlesource.com/catapult/+/96320b515106e029ad6326b81cb6feef5660c6a8

commit 96320b515106e029ad6326b81cb6feef5660c6a8
Author: benshayden <benjhayden@chromium.org>
Date: Fri Jan 11 19:45:24 2019

Prevent contention errors in DeleteAsync using existence checks.

This error spiked in the dashboard yesterday, with nearly 8000 occurrences
within 24 hours:
TransactionFailedError: too much contention on these datastore entities. please
try again. entity group key: (app=s~chromeperf, CachedPickledString,
"externally_visible__list_tests_get_tests_v2_redacted")

The stack trace implicated a call to layered_cache.DeleteAsync().
Documentation suggests:
Every attempt to create, update, or delete an entity takes place in the context
of a transaction. There is a write throughput limit of about one transaction per second
within a single entity group.

The stack trace goes on to point to the DeleteAsync call in TestMetadata.CreateCallback.
So it appears that several new master/bot/suite started uploading, creating several
new TestMetadata entities, all of which wanted to purge the CachedPickledString
and MultipartEntity entities containing cached list of tests
for their respective master/bot/suite. However, since the master/bot/suite was
new, it is unlikely that that the CachedPickledString/MultipartEntity entities
existed, neither at first nor for each subsequent attempt to purge them.

This fix avoids attempting to delete the entities if they don't exist. Calling
get() does not automatically create a transaction. This fix should reduce calls
to delete entities, and thereby reduce contention on them.

This cached list of tests is only used by the V1 UI. V2spa uses a different set
of cached test suite descriptors, which is not purged when new TestMetadata
entities are created, so test suite descriptors may be stale for up to a day. That
could change if users complain, in which case this bugfix would benefit v2spa as
well as v1 ui.

Bug:  chromium:920304 

Change-Id: I9e826590208370f140819e3d94dbe428812c626b
Reviewed-on: https://chromium-review.googlesource.com/c/1403458
Reviewed-by: Sean McCullough <seanmccullough@chromium.org>
Commit-Queue: Ben Hayden <benjhayden@chromium.org>

[modify] https://crrev.com/96320b515106e029ad6326b81cb6feef5660c6a8/dashboard/dashboard/common/stored_object.py
[modify] https://crrev.com/96320b515106e029ad6326b81cb6feef5660c6a8/dashboard/dashboard/common/layered_cache.py

Project Member

Comment 3 by bugdroid1@chromium.org, Jan 11

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/660d8d32dc59178b53de1dfd2623e298647f2447

commit 660d8d32dc59178b53de1dfd2623e298647f2447
Author: chromium-autoroll <chromium-autoroll@skia-public.iam.gserviceaccount.com>
Date: Fri Jan 11 23:45:39 2019

Roll src/third_party/catapult 6eeb1d2fc794..96320b515106 (1 commits)

https://chromium.googlesource.com/catapult.git/+log/6eeb1d2fc794..96320b515106


git log 6eeb1d2fc794..96320b515106 --date=short --no-merges --format='%ad %ae %s'
2019-01-11 benjhayden@chromium.org Prevent contention errors in DeleteAsync using existence checks.


Created with:
  gclient setdep -r src/third_party/catapult@96320b515106

The AutoRoll server is located here: https://autoroll.skia.org/r/catapult-autoroll

Documentation for the AutoRoller is here:
https://skia.googlesource.com/buildbot/+/master/autoroll/README.md

If the roll is causing failures, please contact the current sheriff, who should
be CC'd on the roll, and stop the roller if necessary.

CQ_INCLUDE_TRYBOTS=luci.chromium.try:android_optional_gpu_tests_rel;luci.chromium.try:linux_optional_gpu_tests_rel;luci.chromium.try:mac_optional_gpu_tests_rel;luci.chromium.try:win_optional_gpu_tests_rel

BUG= chromium:920304 
TBR=sullivan@chromium.org

Change-Id: I6fab5daefaf1f04be32e530e95ceb99005c62270
Reviewed-on: https://chromium-review.googlesource.com/c/1407555
Reviewed-by: chromium-autoroll <chromium-autoroll@skia-public.iam.gserviceaccount.com>
Commit-Queue: chromium-autoroll <chromium-autoroll@skia-public.iam.gserviceaccount.com>
Cr-Commit-Position: refs/heads/master@{#622222}
[modify] https://crrev.com/660d8d32dc59178b53de1dfd2623e298647f2447/DEPS

Comment 4 by seanmccullough@chromium.org, Jan 16 (6 days ago)

Status: Fixed (was: Started)

Sign in to add a comment