Issue metadata
Sign in to add a comment
|
Buildbucket: consistent search by buildset tag |
||||||||||||||||||||||
Issue descriptionphajdan.jr@: can we possibly have an entity per patchset, such that we can get strongly consistent, transactional results for builds corresponding to the same patchset?
,
Nov 2 2016
preliminary discussion in luci chat suggests that if we put builds into a common entity group defined by the buildset, then an ancestor query will always return fresh data. here current design I am thinking of: - add first-class citizen "buildset" field into Build API entity (but not datastore entity). Reason: buildset _tag_ can have multiple values, but a field can have only one - when putting a build, if specified, the build will be put under BuildSet entity. The tag "buildset" will be added automatically. Both buildset field and tag can be specified unless they are different - add "buildset" filter to build search. If specified, an ancestor query is performed, which is strongly consistent (callers won't see stale data) - update buildbucket clients (git-cl-try, Rietveld and CQ) to pass "buildset" field instead of tag (2 lines changes in each API call)
,
Nov 2 2016
the design above causes the following problem: earlier we considered to have multiple values for buildset tag to associate same build with different patchsets. If we introduce buildset field, gerrit is "supposed" to use it because we probably don't want to see stale results in Gerrit UI. That means that in practice buildset tag should have only one value => there is no point declaring a special field. Also we won't have to update all clients. New design: - document buildset as a special tag. It was special, but now it is even more special. - when putting builds, require "buildset" tag to have one value. This precludes from associating one build with many gerrit patchsets - when putting a build, if buildset tag is specified, put Build under BuildSet root entity - when searching for builds and buildset is specified, do an ancestor query Now the big question is, if we switch search like that, it won't return anything for old builds. For old builds we have to search the old way (by tag). We could try checking if BuildSet with that key exists (cheap), if so do an ancestor query, otherwise fall back to search by tag. +dnj, vadimsh for design review
,
Nov 2 2016
Didn't fully understand the proposal, but changing entity hierarchy seems complicated. Consider denormalizing the data instead: when adding a build, do XG transaction that puts the build where it is now AND creates a new entity (referencing build ID) under new BuildSet entity group. When searching by buildset, use this new entity group to grab build ID's.
,
Nov 2 2016
using XG transactions for this is probably a good idea because number of entity groups is constant here. And it is simpler. We will still have to choose between old and new searches, which can be done by checking if BuildSet entity exists.
,
Nov 2 2016
actually not constant because put_batch handler accepts many builds and typically they belong the same buildset. Maybe we should limit number of builds in put_batch to 24 to fit XGT... I don't really like it though.
,
Nov 2 2016
It doesn't have to put them all in single huge transaction. For N builds, there'll be N transactions touching 2 groups (each). Another option is to do BuildSet update first, before creating Build entity, and just ignore non-existing Builds when searching BuildSet's. (I assume here we know Build ID before creating Build entity, so we can put it into BuildSet entity first). I don't know, it's just a suggestion. It feels simpler to me (an addition to the datamodel, not a restructuring), but I can't say it is significantly better than your proposal.
,
Nov 2 2016
> For N builds, there'll be N transactions touching 2 groups (each). One of these two groups is the same group for all transactions. Will Datastore serialize all transactions?
,
Nov 2 2016
Yes, there'll be bad contention there. Thus my second proposal I outlined above. Basically: 1. Generate build IDs for all builds. 2. Create BuildSet entities referencing these builds IDs (non transactionally). If the request dies after this point, we can have some garbage build IDs hanging in the datastore. 3. Create Build entities (non transactionally). When searching by build set: 1. Query BuildSet entity group using strong consistency to get a bunch of Build IDs. 2. Fetch the corresponding Builds. If some are missing, they are probably garbage generated in step (2) above. 3. Carry on with filtering. The BuildSet entity group and strongly consistent ancestor query can actually be replaced with a single entity BuildSet that has a repeated field build_ids in it. Then step the update process becomes "Transactionally add new build IDs to BuildSet entity", and the query process becomes "Do datastore Get to fetch BuildSet". This is handrolled index essentially, it it has roughly same performance and consistency guarantees as the usual datastore one. The advantage is that we can also put more BuildSet associated data into this single entity. Also single ndb.Get is probably slightly faster (and memcachable) than strongly consistent DB query. *shrug*
,
Nov 3 2016
yeah, i think this design is simpler and still work. build id preallocation is not strictly safe in buildbucket because it generates builds ids on its own [1], but it is as unsafe as now: buildbucket generates an id and puts a build without checking if build existed before. However, with this custom index we can overwrite a build with an id and not update the index... [1]: https://cs.chromium.org/chromium/infra/appengine/cr-buildbucket/model.py?q=new_build_id&sq=package:chromium&l=176
,
Nov 3 2016
BuildSet search stuff can also ignore Builds that have different build_set field in them.
,
Nov 3 2016
yeah ok btw i realized designs in #2 and #3 won't work because we assume build keys are monotonically decreasing. I am not sure they would be if start putting parent key in them
,
Nov 3 2016
Great design btw. I think I will implement it tomorrow
,
Nov 3 2016
Actually nothing prevents us from creating multiple BuildSet entity for each value of buildset tag, so we don't have to sacrifice multiple buildsets feature. FTR In order to - avoid having BuildSet entities with incomplete build_ids, e.g. if a build was added to an old buildset - have ((no BuildSet with key X) => (no builds with buildset =X)) to avoid falling back to the old-style query I plan to implement a MapReduce job that creates BuildSet entities for old builds. This will take some time to implement and run.
,
Nov 3 2016
SGTM. Do I get it right that actual api of buildbucket stays the same?
,
Nov 3 2016
Yes
,
Nov 3 2016
after discussing with dnj@ offline:
we can avoid the MR job (which is dangerous and i never did before) by doing the following:
1. when a build is being added and a BuildSet entity does not exist, run a query to find all builds with that buildset and include them in the BuildSet.build_ids (this increases latency). Also set to_be_deleted=True BuildSet entity property. This will start creating BuildSet entities that may not include builds that were created in last 24 hours (due to index update latency)
2. deploy this code on Friday evening
3. on Monday deploy a new version of Buildbucket that does not perform that query in (1) and does not set to_be_deleted property.
4. Delete all BuildSet entities that have to_be_deleted property.
5. Update search to check if BuildSet entity exists. If yes, fetch the builds by ids and filter them on gae frontend. Otherwise, fall back to the datastore query.
This is based on our claim that after doing (4) all BuildSet entities will be complete.
To make it possible to extend this consistent search for other tags in future for:
- Call BuildSet entity TagIndex
- the key of TagIndex is sha512(tag), where tag includes the key ("buildset") // this has to be done anyway
- include the full tag inside the TagIndex
- all properties in TagIndex are not indexed
- if we want to introduce new tags with consistent search in future, they will have to use some new syntax that is currently considered invalid, e.g. "*mytag:value" where *mytag is the new tag that is marked consistent. A consistent search will have to search by "*mtag:value", which is different from search by "mytag:value".
,
Nov 12 2016
Nodir only has a few weeks left before leave - how urgent is this for us? Andrii, I know you're pretty busy with gerrit but you're potentially a good person to implement this. What do you think?
,
Nov 14 2016
Not urgent. While CQ still detects inconsistent search results every week, now that this is known it's not costing us in extra debugging time. I'm concerned about I don't think I have sufficient knowledge of GAE datastore to code things Nodir and Vadim outlined above in effectively short time.
,
Nov 14 2016
Ok, let's revisit this next quarter unless someone gets spare cycles sooner.
,
Feb 1 2017
,
Feb 7 2017
The following revision refers to this bug: https://chromium.googlesource.com/infra/infra/+/beaa9ea8252e3b0831e5c61b6fcc8d51626006d3 commit beaa9ea8252e3b0831e5c61b6fcc8d51626006d3 Author: Nodir Turakulov <nodir@chromium.org> Date: Tue Feb 07 01:24:32 2017 buildbucket refactoring: BuildRequest service.add and service.add_async accept a lot of parameters. Declare them all in a class instead. This is preparation for tag indexing. BUG= 661689 Change-Id: I8510a5b81b8508da7dc3d5da10b6de9ee05affc7 Reviewed-on: https://chromium-review.googlesource.com/437344 Commit-Queue: Nodir Turakulov <nodir@chromium.org> Reviewed-by: Vadim Shtayura <vadimsh@chromium.org> [modify] https://crrev.com/beaa9ea8252e3b0831e5c61b6fcc8d51626006d3/appengine/cr-buildbucket/service.py [modify] https://crrev.com/beaa9ea8252e3b0831e5c61b6fcc8d51626006d3/appengine/cr-buildbucket/test/service_test.py [modify] https://crrev.com/beaa9ea8252e3b0831e5c61b6fcc8d51626006d3/appengine/cr-buildbucket/test/api_test.py [modify] https://crrev.com/beaa9ea8252e3b0831e5c61b6fcc8d51626006d3/appengine/cr-buildbucket/api.py
,
Feb 7 2017
The following revision refers to this bug: https://chromium.googlesource.com/infra/infra/+/9ca11d0ddbe5e3f5d576d54e1c358b90fb17d233 commit 9ca11d0ddbe5e3f5d576d54e1c358b90fb17d233 Author: Nodir Turakulov <nodir@chromium.org> Date: Tue Feb 07 04:24:44 2017 buildbucket: mock explicitly api_test.py mocks all members of service.py. This is coarse. Instead mock functions individually. R=vadimsh@chromium.org BUG= 661689 Change-Id: I95ffe5004dc3968204a53bd6ebe379f1cf80a5b3 Reviewed-on: https://chromium-review.googlesource.com/437754 Commit-Queue: Nodir Turakulov <nodir@chromium.org> Reviewed-by: Vadim Shtayura <vadimsh@chromium.org> [modify] https://crrev.com/9ca11d0ddbe5e3f5d576d54e1c358b90fb17d233/appengine/cr-buildbucket/test/api_test.py
,
Feb 16 2017
The following revision refers to this bug: https://chromium.googlesource.com/infra/infra/+/793cd1ab709d3534a410bf6400534e67ca40607d commit 793cd1ab709d3534a410bf6400534e67ca40607d Author: Nodir Turakulov <nodir@chromium.org> Date: Thu Feb 16 00:29:19 2017 buildbucket: index by buildset We need consistent search of builds by buildset. Implement indexing by tags. Add TagIndex entity which will act as a custom index of builds by tags. So far only buildset tag is indexed. This is phase 1 in implementing consistent search and it indexes only new builds. Next phase will index old builds. R=vadimsh@chromium.org, tandrii@chromium.org BUG= 661689 Change-Id: I3889d05a4f18b0c9e513e4045eddcd040044ab5d Reviewed-on: https://chromium-review.googlesource.com/436304 Commit-Queue: Nodir Turakulov <nodir@chromium.org> Reviewed-by: Vadim Shtayura <vadimsh@chromium.org> [modify] https://crrev.com/793cd1ab709d3534a410bf6400534e67ca40607d/appengine/cr-buildbucket/service.py [modify] https://crrev.com/793cd1ab709d3534a410bf6400534e67ca40607d/appengine/cr-buildbucket/test/service_test.py [modify] https://crrev.com/793cd1ab709d3534a410bf6400534e67ca40607d/appengine/cr-buildbucket/model.py [modify] https://crrev.com/793cd1ab709d3534a410bf6400534e67ca40607d/appengine/cr-buildbucket/test/api_test.py [modify] https://crrev.com/793cd1ab709d3534a410bf6400534e67ca40607d/appengine/cr-buildbucket/api.py
,
Mar 17 2017
The following revision refers to this bug: https://chromium.googlesource.com/infra/infra/+/52113a23396571e573acbe3590e7c964e787ccf3 commit 52113a23396571e573acbe3590e7c964e787ccf3 Author: Nodir Turakulov <nodir@chromium.org> Date: Fri Mar 17 06:58:41 2017 buildbucket: implement tag index backfilling Implement tag index backfilling using a push task that: 1. splits entire build space into N segments 2. for each segment, processes a chunk of builds in the segment for up to 50s or until we reach a limit of index entries. Then schedule a task to update indexes and/or a task to continue processing the segment. We can observe the progress by looking at percent:XX suffix in URLs of pending tasks in the backfill-tag-index task queue. We avoid exhausting quota by being able to pause the backfill-tag-index task queue and by setting a cap on maximum concurrent requests. R=vadimsh@chromium.org BUG= 661689 Change-Id: I187a7c20edee9211b203e66778c85ef3af88937d Reviewed-on: https://chromium-review.googlesource.com/450869 Commit-Queue: Nodir Turakulov <nodir@chromium.org> Reviewed-by: Vadim Shtayura <vadimsh@chromium.org> [modify] https://crrev.com/52113a23396571e573acbe3590e7c964e787ccf3/appengine/cr-buildbucket/handlers.py [modify] https://crrev.com/52113a23396571e573acbe3590e7c964e787ccf3/appengine/cr-buildbucket/test/handlers_test.py [modify] https://crrev.com/52113a23396571e573acbe3590e7c964e787ccf3/appengine/cr-buildbucket/test/service_test.py [modify] https://crrev.com/52113a23396571e573acbe3590e7c964e787ccf3/appengine/cr-buildbucket/test/api_test.py [modify] https://crrev.com/52113a23396571e573acbe3590e7c964e787ccf3/appengine/cr-buildbucket/service.py [modify] https://crrev.com/52113a23396571e573acbe3590e7c964e787ccf3/appengine/cr-buildbucket/api.py [modify] https://crrev.com/52113a23396571e573acbe3590e7c964e787ccf3/appengine/cr-buildbucket/queue.yaml [modify] https://crrev.com/52113a23396571e573acbe3590e7c964e787ccf3/appengine/cr-buildbucket/model.py
,
Mar 17 2017
TagIndex entities on cr-buildbucket.appspot.com are backfilled
,
Mar 30 2017
https://chromium-review.googlesource.com/c/453081/ is ready to land, but we've discovered a problem in gae_ts_mon: graphs display 0s in distribution metrics as >0
,
Mar 31 2017
The following revision refers to this bug: https://chromium.googlesource.com/infra/infra/+/7b66aaa13211235b1cd2876666a2db4f78a6dd95 commit 7b66aaa13211235b1cd2876666a2db4f78a6dd95 Author: Nodir Turakulov <nodir@google.com> Date: Fri Mar 31 05:05:37 2017 buildbucket: implement search by tag index Initially do both datastore query and tag index search and compare results. This adds ~20ms in an overall 200ms request Report an error if results are different. R=vadimsh@chromium.org BUG= 661689 Change-Id: I9b4600cd46f7c7f323ab164b222e98fc893d2e64 Reviewed-on: https://chromium-review.googlesource.com/453081 Commit-Queue: Nodir Turakulov <nodir@chromium.org> Reviewed-by: Vadim Shtayura <vadimsh@chromium.org> [modify] https://crrev.com/7b66aaa13211235b1cd2876666a2db4f78a6dd95/appengine/cr-buildbucket/service.py [modify] https://crrev.com/7b66aaa13211235b1cd2876666a2db4f78a6dd95/appengine/cr-buildbucket/metrics.py [modify] https://crrev.com/7b66aaa13211235b1cd2876666a2db4f78a6dd95/appengine/cr-buildbucket/errors.py [modify] https://crrev.com/7b66aaa13211235b1cd2876666a2db4f78a6dd95/appengine/cr-buildbucket/test/service_test.py [modify] https://crrev.com/7b66aaa13211235b1cd2876666a2db4f78a6dd95/appengine/cr-buildbucket/api.py
,
Mar 31 2017
The following revision refers to this bug: https://chromium.googlesource.com/infra/infra/+/22516057353e1aa58e5a56b7e86b4b8ae94a4136 commit 22516057353e1aa58e5a56b7e86b4b8ae94a4136 Author: Nodir Turakulov <nodir@google.com> Date: Fri Mar 31 17:49:25 2017 buildbucket: print only ids of inconsistent results When we discover inconsistent results and print full builds, we often hit the log entry length limit, so we don't see tag search results. Print ids only TBR=vadimsh@chromium.org BUG= 661689 Change-Id: I1eec3e753542866273762494501c4f6632fad8f5 Reviewed-on: https://chromium-review.googlesource.com/465087 Reviewed-by: Nodir Turakulov <nodir@chromium.org> Commit-Queue: Nodir Turakulov <nodir@chromium.org> [modify] https://crrev.com/22516057353e1aa58e5a56b7e86b4b8ae94a4136/appengine/cr-buildbucket/service.py
,
Mar 31 2017
The following revision refers to this bug: https://chromium.googlesource.com/infra/infra/+/fca7a1e2f69e3f6409690568996585ac5803c196 commit fca7a1e2f69e3f6409690568996585ac5803c196 Author: Nodir Turakulov <nodir@google.com> Date: Fri Mar 31 19:50:16 2017 buildbucket: tolerate dup tag index entries Currently add_many_async may add dup entries to a tag index. This is a bug. Fix the bug and tolerate dups. R=vadimsh@chromium.org BUG= 661689 Change-Id: I508e16d267c21a6b6a88aca7742850e9c1fe66da Reviewed-on: https://chromium-review.googlesource.com/465107 Commit-Queue: Nodir Turakulov <nodir@chromium.org> Reviewed-by: Vadim Shtayura <vadimsh@chromium.org> [modify] https://crrev.com/fca7a1e2f69e3f6409690568996585ac5803c196/appengine/cr-buildbucket/service.py [modify] https://crrev.com/fca7a1e2f69e3f6409690568996585ac5803c196/appengine/cr-buildbucket/test/service_test.py
,
Mar 31 2017
The following revision refers to this bug: https://chromium.googlesource.com/infra/infra/+/711c566143636946517d9992556f044eb1de443b commit 711c566143636946517d9992556f044eb1de443b Author: Nodir Turakulov <nodir@google.com> Date: Fri Mar 31 20:09:22 2017 buildbucket: more logging in tag index search Log a message when tag index is absent or empty. Log build id of last entry R=vadimsh@chromium.org BUG= 661689 Change-Id: I7489d3fea1ab51f35e18fbc145492578b6f5e33b Reviewed-on: https://chromium-review.googlesource.com/465349 Reviewed-by: Vadim Shtayura <vadimsh@chromium.org> Commit-Queue: Nodir Turakulov <nodir@chromium.org> [modify] https://crrev.com/711c566143636946517d9992556f044eb1de443b/appengine/cr-buildbucket/service.py
,
Mar 31 2017
The following revision refers to this bug: https://chromium.googlesource.com/infra/infra/+/711c566143636946517d9992556f044eb1de443b commit 711c566143636946517d9992556f044eb1de443b Author: Nodir Turakulov <nodir@google.com> Date: Fri Mar 31 20:09:22 2017 buildbucket: more logging in tag index search Log a message when tag index is absent or empty. Log build id of last entry R=vadimsh@chromium.org BUG= 661689 Change-Id: I7489d3fea1ab51f35e18fbc145492578b6f5e33b Reviewed-on: https://chromium-review.googlesource.com/465349 Reviewed-by: Vadim Shtayura <vadimsh@chromium.org> Commit-Queue: Nodir Turakulov <nodir@chromium.org> [modify] https://crrev.com/711c566143636946517d9992556f044eb1de443b/appengine/cr-buildbucket/service.py
,
Mar 31 2017
The following revision refers to this bug: https://chromium.googlesource.com/infra/infra/+/e89410c1eb38bcc10c4947328dab362feb6d1ac5 commit e89410c1eb38bcc10c4947328dab362feb6d1ac5 Author: Nodir Turakulov <nodir@google.com> Date: Fri Mar 31 20:25:22 2017 buildbucket: do not verify tag search results Currently tag search results are verified by doing extra datastore query. This CL removes the verification because we trust tag index search implementation. The tag index search implementation CL removed search-by-buildset optimiztions from datastore query search because it is not needed anymore. This made query search much slower for the vast majority of queries in production, so it is not practical to run unoptimized query search in prod. R=vadimsh@chromium.org BUG= 661689 Change-Id: I6ad565410dd361d742e00ee6d4239486e3a68a6c Reviewed-on: https://chromium-review.googlesource.com/464388 Commit-Queue: Nodir Turakulov <nodir@chromium.org> Reviewed-by: Vadim Shtayura <vadimsh@chromium.org> [modify] https://crrev.com/e89410c1eb38bcc10c4947328dab362feb6d1ac5/appengine/cr-buildbucket/service.py
,
Mar 31 2017
Deployed.
,
Mar 31 2017
,
Mar 31 2017
The following revision refers to this bug: https://chromium.googlesource.com/infra/infra/+/a2220f74dd4be42b48150b3b315648129a67d686 commit a2220f74dd4be42b48150b3b315648129a67d686 Author: Nodir Turakulov <nodir@google.com> Date: Fri Mar 31 22:38:37 2017 buildbucket: remove dual search We have enough tests and prod data to conclude that tag index search is correct. Remove tag search result verification from the prod code. R=vadimsh@chromium.org BUG= 661689 Change-Id: I9090bb310011c3f82e9d190287795e7c9af95878 Reviewed-on: https://chromium-review.googlesource.com/465510 Reviewed-by: Vadim Shtayura <vadimsh@chromium.org> Commit-Queue: Nodir Turakulov <nodir@chromium.org> [modify] https://crrev.com/a2220f74dd4be42b48150b3b315648129a67d686/appengine/cr-buildbucket/service.py [modify] https://crrev.com/a2220f74dd4be42b48150b3b315648129a67d686/appengine/cr-buildbucket/test/service_test.py |
|||||||||||||||||||||||
►
Sign in to add a comment |
|||||||||||||||||||||||
Comment 1 by no...@chromium.org
, Nov 2 2016