New issue
Advanced search Search tips
Note: Color blocks (like or ) mean that a user may not be available. Tooltip shows the reason.

Issue 661689 link

Starred by 2 users

Issue metadata

Status: Fixed
Owner:
Closed: Mar 2017
Cc:
Components:
EstimatedDays: ----
NextAction: 2017-01-08
OS: ----
Pri: 2
Type: Feature

Blocking:
issue 623635



Sign in to add a comment

Buildbucket: consistent search by buildset tag

Project Member Reported by tandrii@chromium.org, Nov 2 2016

Issue description

phajdan.jr@: can we possibly have an entity per patchset, such that we can get strongly consistent, transactional results for builds corresponding to the same patchset?
 

Comment 1 by no...@chromium.org, Nov 2 2016

Blocking: 623635

Comment 2 by no...@chromium.org, Nov 2 2016

Status: Assigned (was: Untriaged)
preliminary discussion in luci chat suggests that if we put builds into a common entity group defined by the buildset, then an ancestor query will always return fresh data.

here current design I am thinking of:
- add first-class citizen "buildset" field into Build API entity (but not datastore entity). Reason: buildset _tag_ can have multiple values, but a field can have only one
- when putting a build, if specified, the build will be put under BuildSet entity. The tag "buildset" will be added automatically. Both buildset field and tag can be specified unless they are different
- add "buildset" filter to build search. If specified, an ancestor query is performed, which is strongly consistent (callers won't see stale data)
- update buildbucket clients (git-cl-try, Rietveld and CQ) to pass "buildset" field instead of tag (2 lines changes in each API call)

Comment 3 by no...@chromium.org, Nov 2 2016

Cc: d...@chromium.org vadimsh@chromium.org
the design above causes the following problem:

earlier we considered to have multiple values for buildset tag to associate same build with different patchsets. If we introduce buildset field, gerrit is "supposed" to use it because we probably don't want to see stale results in Gerrit UI. That means that in practice buildset tag should have only one value => there is no point declaring a special field. Also we won't have to update all clients. 

New design:
- document buildset as a special tag. It was special, but now it is even more special.
- when putting builds, require "buildset" tag to have one value. This precludes from associating one build with many gerrit patchsets
- when putting a build, if buildset tag is specified, put Build under BuildSet root entity
- when searching for builds and buildset is specified, do an ancestor query

Now the big question is, if we switch search like that, it won't return anything for old builds. For old builds we have to search the old way (by tag). We could try checking if BuildSet with that key exists (cheap), if so do an ancestor query, otherwise fall back to search by tag.

+dnj, vadimsh for design review
Didn't fully understand the proposal, but changing entity hierarchy seems complicated. Consider denormalizing the data instead: when adding a build, do XG transaction that puts the build where it is now AND creates a new entity (referencing build ID) under new BuildSet entity group.

When searching by buildset, use this new entity group to grab build ID's.


Comment 5 by no...@chromium.org, Nov 2 2016

using XG transactions for this is probably a good idea because number of entity groups is constant here. And it is simpler.

We will still have to choose between old and new searches, which can be done by checking if BuildSet entity exists.

Comment 6 by no...@chromium.org, Nov 2 2016

actually not constant because put_batch handler accepts many builds and typically they belong the same buildset. Maybe we should limit number of builds in put_batch to 24 to fit XGT... I don't really like it though.
It doesn't have to put them all in single huge transaction.

For N builds, there'll be N transactions touching 2 groups (each).

Another option is to do BuildSet update first, before creating Build entity, and just ignore non-existing Builds when searching BuildSet's.

(I assume here we know Build ID before creating Build entity, so we can put it into BuildSet entity first).

I don't know, it's just a suggestion. It feels simpler to me (an addition to the datamodel, not a restructuring), but I can't say it is significantly better than your proposal.

Comment 8 by no...@chromium.org, Nov 2 2016

> For N builds, there'll be N transactions touching 2 groups (each).

One of these two groups is the same group for all transactions. Will
Datastore serialize all transactions?
Yes, there'll be bad contention there.

Thus my second proposal I outlined above. Basically:

1. Generate build IDs for all builds.
2. Create BuildSet entities referencing these builds IDs (non transactionally). If the request dies after this point, we can have some garbage build IDs hanging in the datastore.
3. Create Build entities (non transactionally).

When searching by build set:
1. Query BuildSet entity group using strong consistency to get a bunch of Build IDs.
2. Fetch the corresponding Builds. If some are missing, they are probably garbage generated in step (2) above.
3. Carry on with filtering.

The BuildSet entity group and strongly consistent ancestor query can actually be replaced with a single entity BuildSet that has a repeated field build_ids in it. Then step the update process becomes "Transactionally add new build IDs to BuildSet entity", and the query process becomes "Do datastore Get to fetch BuildSet".

This is handrolled index essentially, it it has roughly same performance and consistency guarantees as the usual datastore one. The advantage is that we can also put more BuildSet associated data into this single entity. Also single ndb.Get is probably slightly faster (and memcachable) than strongly consistent DB query.

*shrug*
yeah, i think this design is simpler and still work.

build id preallocation is not strictly safe in buildbucket because it generates builds ids on its own [1], but it is as unsafe as now: buildbucket generates an id and puts a build without checking if build existed before. However, with this custom index we can overwrite a build with an id and not update the index...

[1]: https://cs.chromium.org/chromium/infra/appengine/cr-buildbucket/model.py?q=new_build_id&sq=package:chromium&l=176
BuildSet search stuff can also ignore Builds that have different build_set field in them.
yeah ok

btw i realized designs in #2 and #3 won't work because we assume build keys are monotonically decreasing. I am not sure they would be if start putting parent key in them
Great design btw. I think I will implement it tomorrow
Labels: -Restrict-View-Google Pri-1
NextAction: 2016-11-08
Actually nothing prevents us from creating multiple BuildSet entity for each value of buildset tag, so we don't have to sacrifice multiple buildsets feature.

FTR In order to
- avoid having BuildSet entities with incomplete build_ids, e.g. if a build was added to an old buildset
- have ((no BuildSet with key X) => (no builds with buildset =X)) to avoid falling back to the old-style query
I plan to implement a MapReduce job that creates BuildSet entities for old builds. This will take some time to implement and run.
SGTM. Do I get it right that actual api of buildbucket stays the same?
Yes
after discussing with dnj@ offline:

we can avoid the MR job (which is dangerous and i never did before) by doing the following:
1. when a build is being added and a BuildSet entity does not exist, run a query to find all builds with that buildset and include them in the BuildSet.build_ids (this increases latency). Also set to_be_deleted=True BuildSet entity property. This will start creating BuildSet entities that may not include builds that were created in last 24 hours (due to index update latency)
2. deploy this code on Friday evening
3. on Monday deploy a new version of Buildbucket that does not perform that query in (1) and does not set to_be_deleted property. 
4. Delete all BuildSet entities that have to_be_deleted property.
5. Update search to check if BuildSet entity exists. If yes, fetch the builds by ids and filter them on gae frontend. Otherwise, fall back to the datastore query.

This is based on our claim that after doing (4) all BuildSet entities will be complete.

To make it possible to extend this consistent search for other tags in future for:
- Call BuildSet entity TagIndex
- the key of TagIndex is sha512(tag), where tag includes the key ("buildset") // this has to be done anyway
- include the full tag inside the TagIndex
- all properties in TagIndex are not indexed
- if we want to introduce new tags with consistent search in future, they will have to use some new syntax that is currently considered invalid, e.g. "*mytag:value" where *mytag is the new tag that is marked consistent. A consistent search will have to search by "*mtag:value", which is different from search by "mytag:value".
Cc: estaab@chromium.org
Owner: ----
Status: Available (was: Assigned)
Nodir only has a few weeks left before leave - how urgent is this for us? Andrii, I know you're pretty busy with gerrit but you're potentially a good person to implement this. What do you think?
Not urgent. While CQ still detects inconsistent search results every week, now that this is known it's not costing us in extra debugging time. I'm concerned about 

I don't think I have sufficient knowledge of GAE datastore to code things Nodir and Vadim outlined above in effectively short time.
Labels: -Pri-1 Pri-2
NextAction: 2017-01-08
Ok, let's revisit this next quarter unless someone gets spare cycles sooner.
Owner: no...@chromium.org
Status: Started (was: Available)
Project Member

Comment 22 by bugdroid1@chromium.org, Feb 7 2017

The following revision refers to this bug:
  https://chromium.googlesource.com/infra/infra/+/beaa9ea8252e3b0831e5c61b6fcc8d51626006d3

commit beaa9ea8252e3b0831e5c61b6fcc8d51626006d3
Author: Nodir Turakulov <nodir@chromium.org>
Date: Tue Feb 07 01:24:32 2017

buildbucket refactoring: BuildRequest

service.add and service.add_async accept a lot of parameters. Declare
them all in a class instead.

This is preparation for tag indexing.

BUG= 661689 

Change-Id: I8510a5b81b8508da7dc3d5da10b6de9ee05affc7
Reviewed-on: https://chromium-review.googlesource.com/437344
Commit-Queue: Nodir Turakulov <nodir@chromium.org>
Reviewed-by: Vadim Shtayura <vadimsh@chromium.org>

[modify] https://crrev.com/beaa9ea8252e3b0831e5c61b6fcc8d51626006d3/appengine/cr-buildbucket/service.py
[modify] https://crrev.com/beaa9ea8252e3b0831e5c61b6fcc8d51626006d3/appengine/cr-buildbucket/test/service_test.py
[modify] https://crrev.com/beaa9ea8252e3b0831e5c61b6fcc8d51626006d3/appengine/cr-buildbucket/test/api_test.py
[modify] https://crrev.com/beaa9ea8252e3b0831e5c61b6fcc8d51626006d3/appengine/cr-buildbucket/api.py

Project Member

Comment 23 by bugdroid1@chromium.org, Feb 7 2017

The following revision refers to this bug:
  https://chromium.googlesource.com/infra/infra/+/9ca11d0ddbe5e3f5d576d54e1c358b90fb17d233

commit 9ca11d0ddbe5e3f5d576d54e1c358b90fb17d233
Author: Nodir Turakulov <nodir@chromium.org>
Date: Tue Feb 07 04:24:44 2017

buildbucket: mock explicitly

api_test.py mocks all members of service.py. This is coarse.
Instead mock functions individually.

R=vadimsh@chromium.org
BUG= 661689 

Change-Id: I95ffe5004dc3968204a53bd6ebe379f1cf80a5b3
Reviewed-on: https://chromium-review.googlesource.com/437754
Commit-Queue: Nodir Turakulov <nodir@chromium.org>
Reviewed-by: Vadim Shtayura <vadimsh@chromium.org>

[modify] https://crrev.com/9ca11d0ddbe5e3f5d576d54e1c358b90fb17d233/appengine/cr-buildbucket/test/api_test.py

Project Member

Comment 24 by bugdroid1@chromium.org, Feb 16 2017

The following revision refers to this bug:
  https://chromium.googlesource.com/infra/infra/+/793cd1ab709d3534a410bf6400534e67ca40607d

commit 793cd1ab709d3534a410bf6400534e67ca40607d
Author: Nodir Turakulov <nodir@chromium.org>
Date: Thu Feb 16 00:29:19 2017

buildbucket: index by buildset

We need consistent search of builds by buildset. Implement indexing by
tags.

Add TagIndex entity which will act as a custom index of builds by
tags. So far only buildset tag is indexed.

This is phase 1 in implementing consistent search and it indexes only
new builds. Next phase will index old builds.

R=vadimsh@chromium.org, tandrii@chromium.org
BUG= 661689 

Change-Id: I3889d05a4f18b0c9e513e4045eddcd040044ab5d
Reviewed-on: https://chromium-review.googlesource.com/436304
Commit-Queue: Nodir Turakulov <nodir@chromium.org>
Reviewed-by: Vadim Shtayura <vadimsh@chromium.org>

[modify] https://crrev.com/793cd1ab709d3534a410bf6400534e67ca40607d/appengine/cr-buildbucket/service.py
[modify] https://crrev.com/793cd1ab709d3534a410bf6400534e67ca40607d/appengine/cr-buildbucket/test/service_test.py
[modify] https://crrev.com/793cd1ab709d3534a410bf6400534e67ca40607d/appengine/cr-buildbucket/model.py
[modify] https://crrev.com/793cd1ab709d3534a410bf6400534e67ca40607d/appengine/cr-buildbucket/test/api_test.py
[modify] https://crrev.com/793cd1ab709d3534a410bf6400534e67ca40607d/appengine/cr-buildbucket/api.py

Project Member

Comment 25 by bugdroid1@chromium.org, Mar 17 2017

The following revision refers to this bug:
  https://chromium.googlesource.com/infra/infra/+/52113a23396571e573acbe3590e7c964e787ccf3

commit 52113a23396571e573acbe3590e7c964e787ccf3
Author: Nodir Turakulov <nodir@chromium.org>
Date: Fri Mar 17 06:58:41 2017

buildbucket: implement tag index backfilling

Implement tag index backfilling using a push task that:
1. splits entire build space into N segments
2. for each segment, processes a chunk of builds in the segment for up to 50s
   or until we reach a limit of index entries. Then schedule a task to update
   indexes and/or a task to continue processing the segment.

We can observe the progress by looking at percent:XX suffix in URLs of
pending tasks in the backfill-tag-index task queue.

We avoid exhausting quota by being able to pause the backfill-tag-index
task queue and by setting a cap on maximum concurrent requests.

R=vadimsh@chromium.org
BUG= 661689 

Change-Id: I187a7c20edee9211b203e66778c85ef3af88937d
Reviewed-on: https://chromium-review.googlesource.com/450869
Commit-Queue: Nodir Turakulov <nodir@chromium.org>
Reviewed-by: Vadim Shtayura <vadimsh@chromium.org>

[modify] https://crrev.com/52113a23396571e573acbe3590e7c964e787ccf3/appengine/cr-buildbucket/handlers.py
[modify] https://crrev.com/52113a23396571e573acbe3590e7c964e787ccf3/appengine/cr-buildbucket/test/handlers_test.py
[modify] https://crrev.com/52113a23396571e573acbe3590e7c964e787ccf3/appengine/cr-buildbucket/test/service_test.py
[modify] https://crrev.com/52113a23396571e573acbe3590e7c964e787ccf3/appengine/cr-buildbucket/test/api_test.py
[modify] https://crrev.com/52113a23396571e573acbe3590e7c964e787ccf3/appengine/cr-buildbucket/service.py
[modify] https://crrev.com/52113a23396571e573acbe3590e7c964e787ccf3/appengine/cr-buildbucket/api.py
[modify] https://crrev.com/52113a23396571e573acbe3590e7c964e787ccf3/appengine/cr-buildbucket/queue.yaml
[modify] https://crrev.com/52113a23396571e573acbe3590e7c964e787ccf3/appengine/cr-buildbucket/model.py

Comment 26 by no...@chromium.org, Mar 17 2017

TagIndex entities on cr-buildbucket.appspot.com are backfilled

Comment 27 by no...@chromium.org, Mar 30 2017

https://chromium-review.googlesource.com/c/453081/ is ready to land, but we've discovered a problem in gae_ts_mon: graphs display 0s in distribution metrics as >0
Project Member

Comment 28 by bugdroid1@chromium.org, Mar 31 2017

The following revision refers to this bug:
  https://chromium.googlesource.com/infra/infra/+/7b66aaa13211235b1cd2876666a2db4f78a6dd95

commit 7b66aaa13211235b1cd2876666a2db4f78a6dd95
Author: Nodir Turakulov <nodir@google.com>
Date: Fri Mar 31 05:05:37 2017

buildbucket: implement search by tag index

Initially do both datastore query and tag index search and compare
results. This adds ~20ms in an overall 200ms request
Report an error if results are different.

R=vadimsh@chromium.org
BUG= 661689 

Change-Id: I9b4600cd46f7c7f323ab164b222e98fc893d2e64
Reviewed-on: https://chromium-review.googlesource.com/453081
Commit-Queue: Nodir Turakulov <nodir@chromium.org>
Reviewed-by: Vadim Shtayura <vadimsh@chromium.org>

[modify] https://crrev.com/7b66aaa13211235b1cd2876666a2db4f78a6dd95/appengine/cr-buildbucket/service.py
[modify] https://crrev.com/7b66aaa13211235b1cd2876666a2db4f78a6dd95/appengine/cr-buildbucket/metrics.py
[modify] https://crrev.com/7b66aaa13211235b1cd2876666a2db4f78a6dd95/appengine/cr-buildbucket/errors.py
[modify] https://crrev.com/7b66aaa13211235b1cd2876666a2db4f78a6dd95/appengine/cr-buildbucket/test/service_test.py
[modify] https://crrev.com/7b66aaa13211235b1cd2876666a2db4f78a6dd95/appengine/cr-buildbucket/api.py

Project Member

Comment 29 by bugdroid1@chromium.org, Mar 31 2017

The following revision refers to this bug:
  https://chromium.googlesource.com/infra/infra/+/22516057353e1aa58e5a56b7e86b4b8ae94a4136

commit 22516057353e1aa58e5a56b7e86b4b8ae94a4136
Author: Nodir Turakulov <nodir@google.com>
Date: Fri Mar 31 17:49:25 2017

buildbucket: print only ids of inconsistent results

When we discover inconsistent results and print full builds, we often
hit the log entry length limit, so we don't see tag search results.

Print ids only

TBR=vadimsh@chromium.org
BUG= 661689 

Change-Id: I1eec3e753542866273762494501c4f6632fad8f5
Reviewed-on: https://chromium-review.googlesource.com/465087
Reviewed-by: Nodir Turakulov <nodir@chromium.org>
Commit-Queue: Nodir Turakulov <nodir@chromium.org>

[modify] https://crrev.com/22516057353e1aa58e5a56b7e86b4b8ae94a4136/appengine/cr-buildbucket/service.py

Project Member

Comment 30 by bugdroid1@chromium.org, Mar 31 2017

The following revision refers to this bug:
  https://chromium.googlesource.com/infra/infra/+/fca7a1e2f69e3f6409690568996585ac5803c196

commit fca7a1e2f69e3f6409690568996585ac5803c196
Author: Nodir Turakulov <nodir@google.com>
Date: Fri Mar 31 19:50:16 2017

buildbucket: tolerate dup tag index entries

Currently add_many_async may add dup entries to a tag index.
This is a bug.

Fix the bug and tolerate dups.

R=vadimsh@chromium.org
BUG= 661689 

Change-Id: I508e16d267c21a6b6a88aca7742850e9c1fe66da
Reviewed-on: https://chromium-review.googlesource.com/465107
Commit-Queue: Nodir Turakulov <nodir@chromium.org>
Reviewed-by: Vadim Shtayura <vadimsh@chromium.org>

[modify] https://crrev.com/fca7a1e2f69e3f6409690568996585ac5803c196/appengine/cr-buildbucket/service.py
[modify] https://crrev.com/fca7a1e2f69e3f6409690568996585ac5803c196/appengine/cr-buildbucket/test/service_test.py

Project Member

Comment 31 by bugdroid1@chromium.org, Mar 31 2017

The following revision refers to this bug:
  https://chromium.googlesource.com/infra/infra/+/711c566143636946517d9992556f044eb1de443b

commit 711c566143636946517d9992556f044eb1de443b
Author: Nodir Turakulov <nodir@google.com>
Date: Fri Mar 31 20:09:22 2017

buildbucket: more logging in tag index search

Log a message when tag index is absent or empty.
Log build id of last entry

R=vadimsh@chromium.org
BUG= 661689 

Change-Id: I7489d3fea1ab51f35e18fbc145492578b6f5e33b
Reviewed-on: https://chromium-review.googlesource.com/465349
Reviewed-by: Vadim Shtayura <vadimsh@chromium.org>
Commit-Queue: Nodir Turakulov <nodir@chromium.org>

[modify] https://crrev.com/711c566143636946517d9992556f044eb1de443b/appengine/cr-buildbucket/service.py

Project Member

Comment 32 by bugdroid1@chromium.org, Mar 31 2017

The following revision refers to this bug:
  https://chromium.googlesource.com/infra/infra/+/711c566143636946517d9992556f044eb1de443b

commit 711c566143636946517d9992556f044eb1de443b
Author: Nodir Turakulov <nodir@google.com>
Date: Fri Mar 31 20:09:22 2017

buildbucket: more logging in tag index search

Log a message when tag index is absent or empty.
Log build id of last entry

R=vadimsh@chromium.org
BUG= 661689 

Change-Id: I7489d3fea1ab51f35e18fbc145492578b6f5e33b
Reviewed-on: https://chromium-review.googlesource.com/465349
Reviewed-by: Vadim Shtayura <vadimsh@chromium.org>
Commit-Queue: Nodir Turakulov <nodir@chromium.org>

[modify] https://crrev.com/711c566143636946517d9992556f044eb1de443b/appengine/cr-buildbucket/service.py

Project Member

Comment 33 by bugdroid1@chromium.org, Mar 31 2017

The following revision refers to this bug:
  https://chromium.googlesource.com/infra/infra/+/e89410c1eb38bcc10c4947328dab362feb6d1ac5

commit e89410c1eb38bcc10c4947328dab362feb6d1ac5
Author: Nodir Turakulov <nodir@google.com>
Date: Fri Mar 31 20:25:22 2017

buildbucket: do not verify tag search results

Currently tag search results are verified by doing extra datastore
query. This CL removes the verification because we trust tag index
search implementation.

The tag index search implementation CL removed search-by-buildset
optimiztions from datastore query search because it is not needed
anymore. This made query search much slower for the vast majority of
queries in production, so it is not practical to run unoptimized query
search in prod.

R=vadimsh@chromium.org
BUG= 661689 

Change-Id: I6ad565410dd361d742e00ee6d4239486e3a68a6c
Reviewed-on: https://chromium-review.googlesource.com/464388
Commit-Queue: Nodir Turakulov <nodir@chromium.org>
Reviewed-by: Vadim Shtayura <vadimsh@chromium.org>

[modify] https://crrev.com/e89410c1eb38bcc10c4947328dab362feb6d1ac5/appengine/cr-buildbucket/service.py

Comment 34 by no...@chromium.org, Mar 31 2017

Deployed.

Comment 35 by no...@chromium.org, Mar 31 2017

Status: Fixed (was: Started)
Project Member

Comment 36 by bugdroid1@chromium.org, Mar 31 2017

The following revision refers to this bug:
  https://chromium.googlesource.com/infra/infra/+/a2220f74dd4be42b48150b3b315648129a67d686

commit a2220f74dd4be42b48150b3b315648129a67d686
Author: Nodir Turakulov <nodir@google.com>
Date: Fri Mar 31 22:38:37 2017

buildbucket: remove dual search

We have enough tests and prod data to conclude that tag index search is
correct. Remove tag search result verification from the prod code.

R=vadimsh@chromium.org
BUG= 661689 

Change-Id: I9090bb310011c3f82e9d190287795e7c9af95878
Reviewed-on: https://chromium-review.googlesource.com/465510
Reviewed-by: Vadim Shtayura <vadimsh@chromium.org>
Commit-Queue: Nodir Turakulov <nodir@chromium.org>

[modify] https://crrev.com/a2220f74dd4be42b48150b3b315648129a67d686/appengine/cr-buildbucket/service.py
[modify] https://crrev.com/a2220f74dd4be42b48150b3b315648129a67d686/appengine/cr-buildbucket/test/service_test.py

Sign in to add a comment