Just talk proto to BigQuery, no tabledefs |
||||||||
Issue description
Should be able to define a proto as you normally would, but add some
custom proto options to describe how the events should be logged to
big query.
Example .proto file:
syntax = "proto3";
package som;
// Start stuff that should be in tabledef.proto
import "google/protobuf/descriptor.proto";
extend google.protobuf.MessageOptions {
string bq_dataset_id = 50001;
string bq_table_id = 50002;
string bq_table_name = 50003;
string bq_table_description = 50004;
bool bq_parition_table = 50005;
}
extend google.protobuf.FieldOptions {
string bq_field_description = 50006;
tabledef.Type bq_field_type = 50007;
}
// End stuff that should be in tabledef.proto
message SOMAnnotationEvent {
option(bq_dataset_id) = "events";
option(bq_table_id) = "annotations";
option(bq_table_name) = "User Annotations";
option(bq_table_description) = "Annotations added to Alerts by Users";
int64 timestamp = 1 [
(bq_field_description) = "Timestamp when the annotation was modified.",
(bq_field_type) = TIMESTAMP];
// Other fields that should be the "events/annotations" bq table, etc
}
And modify bqshcemupdater to use the extra information in the generated go source to determine how to set dataset, table name, column types etc in BQ.
,
Oct 16 2017
Remove bq field type option. google.protobuf.Timestamp should be used for timestamps
,
Oct 16 2017
,
Oct 17 2017
Nice, that simplifies things even more. (Speaking of pRPC: say we wanted to log RPC requests verbatim to bq. Could we just import the proto package definition used for the RPC bindings into our table proto message?) and #TIL about google.protobuf.Timestamp!
,
Oct 17 2017
yeah, totally. pRPC's message protos are no different from any other protos there is a lot of interesting protos https://github.com/google/protobuf/tree/master/src/google/protobuf https://github.com/google/protobuf/blob/master/src/google/protobuf/descriptor.proto#L840 is where comments live. cproto generates pb.discovery.go with FileDescriptorSet() which contains comments
,
Oct 17 2017
,
Oct 18 2017
After some discussion, we've concluded that this is probably all we need for protos-as-tabledefs:
// table_def.proto now just contains:
package tabledef;
extend google.protobuf.MessageOptions {
string dataset_id = 50001;
string table_id = 50002;
string table_name = 50003;
string table_description = 50004;
}
// EOF
// In my_logging_proto.proto:
import "infra/libs/bqschema/tabledef/table_def.proto";
// Maybe this comment can be used as tabledef.table_description if it
// isn't specified explicitly below.
message SOMAnnotationEvent {
option(tabledef.dataset_id) = "events";
option(tabledef.table_id) = "annotations";
option(tabledef.table_name) = "User Annotations";
option(tabledef.table_description) = "Annotations added to Alerts by Users";
// Timestamp when the annotation was modified.
// This comment automatically becomes the BQ field description.
google.protobuf.Timestamp = 1;
// Other fields that should be the "events/annotations" bq table, etc
}
bqshcemaupdater should be modified to read proto descriptors (from a temp file) and execute the BQ schema changes targeted at the datasets and tables listed in the proto options.
Logging clients should be modified to check for proto.GetExtension(<loggingEventProto>, tabledef.E_DatasetId) etc to discover the destination dataset, table ID etc.
,
Oct 18 2017
Chopped down even further, we don't really need proto Options/Extensions either! The following plan obviates tabledef concept altogether: - bqschemaupdater gets explicit flags for proto message, project, dataset and table id. - change eventupload to accept project id, dataset and table id instead of tabledef, and modify it to accept proto.Message events. - update eventupload usages
,
Oct 18 2017
and remove bqexport code altogether.
,
Oct 18 2017
Can we enforce documenting those flags in the protos? I think we want that information (dataset id, etc.) in version control.
,
Oct 19 2017
there is a little problem with dataset id and table id in .pb.txt or .proto files. In the past the chrome-infra-events project was the only one that contained events, so (dataset_id, table_id) pair uniquely identified a table. Now that we've lifted the chrome-infra-events requirement, the pair no longer identifies a table. E.g. SOM tables would not be identifiable because the pair is missing the project id. Also in the past the fact that both table schema (fields) and table location (dataset id, table id) were stored in the same place (.pb.txt) enforced 1:1 relationship between schema and location. I think this requirement comes from event_mon where it had to be true. With BQ, we no longer have to require 1:1 relationship between schema and location, and SOM specifically wants one schema and two locations (staging and prod). This makes me think we should separate schema and location, not store them in the same place. We could have .pb.txt files only for location, but i am not sure it is worth it. We'd have to keep bqexport around whose sole purpose would be to generate a struct variable that have the dataset id and table id. Note that bqexport would not longer generate event structs b/c protobuf compiler would do that. The cost of not having dataset id and table id in .pb.txt or .proto files is that we would write them twice: 1) in a script/Makefile that passes them to bqschemaupdater as flags 2) in code that sends events. I think this problem is not severe enough to justify .pb.txt and bqexport. Note that tableid/projectid can still be under version control if the user wants that, e.g. SOM's Makefile would have bqschemaupdater targets and it would be under version control. Also note that some things, e.g. table friendly name and whether table is partitioned or not, is not used by code that sends events. Only bqschemaupdater needs them, so there is no value in storing them in a place common to bqschemaupdater and code.
,
Oct 19 2017
I meant that we could have a comment indicating the location(s) of the schema's usage. They don't have to be parsed by bqschemaupdater or used for anything other than documentation.
,
Oct 19 2017
that is a good idea
,
Oct 19 2017
The following revision refers to this bug: https://chromium.googlesource.com/infra/luci/luci-go.git/+/c284df44f82c84dcf21f20d5d965996e11dcbeda commit c284df44f82c84dcf21f20d5d965996e11dcbeda Author: Nodir Turakulov <nodir@google.com> Date: Thu Oct 19 18:36:30 2017 [descutil] add IndexSourceCodeInfo Add a function that greatly simplifies parsing of comments in a proto descriptor file. Use it in rpc subcommand. Bug: 775262 Change-Id: Ia16e2970b1f5a29e2e329fb85d9e3cb53ea5e849 Reviewed-on: https://chromium-review.googlesource.com/726954 Reviewed-by: Sean McCullough <seanmccullough@chromium.org> Commit-Queue: Nodir Turakulov <nodir@chromium.org> [modify] https://crrev.com/c284df44f82c84dcf21f20d5d965996e11dcbeda/common/proto/google/descutil/util.go [modify] https://crrev.com/c284df44f82c84dcf21f20d5d965996e11dcbeda/grpc/cmd/rpc/printer.go [modify] https://crrev.com/c284df44f82c84dcf21f20d5d965996e11dcbeda/grpc/cmd/rpc/printer_test.go [modify] https://crrev.com/c284df44f82c84dcf21f20d5d965996e11dcbeda/grpc/cmd/rpc/show.go
,
Oct 20 2017
,
Oct 20 2017
The following revision refers to this bug: https://chromium.googlesource.com/infra/infra/+/4a6e6859aed9d1dcef5adbc650b816c87da6157f commit 4a6e6859aed9d1dcef5adbc650b816c87da6157f Author: Nodir Turakulov <nodir@google.com> Date: Fri Oct 20 23:46:29 2017 [bqschemaupdater] read .proto instead of .pb.txt Update bqschemaupdater to read .proto files intead of .pb.txt files. Remove the TableDef dependency. This is a breaking change, but does not cause a breakage immediately. It forces people to switch from .pb.txt to .proto as soon as they need to update schema. See doc.go for how bq schema is derived from a proto message. Bug: 775262 Change-Id: I67a739a740526cb6667b1b1652a118d8e48a60aa Reviewed-on: https://chromium-review.googlesource.com/729053 Commit-Queue: Nodir Turakulov <nodir@chromium.org> Reviewed-by: Katie Thomas <katthomas@google.com> [add] https://crrev.com/4a6e6859aed9d1dcef5adbc650b816c87da6157f/go/src/infra/tools/bqschemaupdater/testdata/event.proto [add] https://crrev.com/4a6e6859aed9d1dcef5adbc650b816c87da6157f/go/src/infra/tools/bqschemaupdater/schema.go [modify] https://crrev.com/4a6e6859aed9d1dcef5adbc650b816c87da6157f/go/src/infra/tools/bqschemaupdater/table_store.go [add] https://crrev.com/4a6e6859aed9d1dcef5adbc650b816c87da6157f/go/src/infra/tools/bqschemaupdater/testdata/generate.go [delete] https://crrev.com/7d3a7ef3c4323ea55d13e369f9511d988a988e10/go/src/infra/tools/bqschemaupdater/bqschemaupdater.go [delete] https://crrev.com/7d3a7ef3c4323ea55d13e369f9511d988a988e10/go/src/infra/tools/bqschemaupdater/bqschemaupdater_test.go [add] https://crrev.com/4a6e6859aed9d1dcef5adbc650b816c87da6157f/go/src/infra/tools/bqschemaupdater/doc.go [add] https://crrev.com/4a6e6859aed9d1dcef5adbc650b816c87da6157f/go/src/infra/tools/bqschemaupdater/main.go [add] https://crrev.com/4a6e6859aed9d1dcef5adbc650b816c87da6157f/go/src/infra/tools/bqschemaupdater/main_test.go [add] https://crrev.com/4a6e6859aed9d1dcef5adbc650b816c87da6157f/go/src/infra/tools/bqschemaupdater/testdata/event.desc [add] https://crrev.com/4a6e6859aed9d1dcef5adbc650b816c87da6157f/go/src/infra/tools/bqschemaupdater/testdata/event.pb.go
,
Oct 23 2017
The following revision refers to this bug: https://chromium.googlesource.com/infra/infra/+/31a9a37096b63341fd5d5adab74f2d19965e8c1a commit 31a9a37096b63341fd5d5adab74f2d19965e8c1a Author: Katie Thomas <katthomas@google.com> Date: Mon Oct 23 15:47:05 2017 [eventupload] remove TableDef references Bug:775262 Change-Id: I2202d666022f5c4c16f57e987616d4efae313a82 Reviewed-on: https://chromium-review.googlesource.com/729153 Reviewed-by: Nodir Turakulov <nodir@chromium.org> Reviewed-by: Sean McCullough <seanmccullough@chromium.org> Commit-Queue: Katie Thomas <katthomas@google.com> [modify] https://crrev.com/31a9a37096b63341fd5d5adab74f2d19965e8c1a/go/src/infra/appengine/sheriff-o-matic/som/model/model.go [modify] https://crrev.com/31a9a37096b63341fd5d5adab74f2d19965e8c1a/go/src/infra/libs/eventupload/eventupload.go [modify] https://crrev.com/31a9a37096b63341fd5d5adab74f2d19965e8c1a/go/src/infra/appengine/sheriff-o-matic/som/handler/analyze.go [modify] https://crrev.com/31a9a37096b63341fd5d5adab74f2d19965e8c1a/go/src/infra/appengine/test-results/frontend/upload.go [modify] https://crrev.com/31a9a37096b63341fd5d5adab74f2d19965e8c1a/go/src/infra/tools/kitchen/monitoring_test.go [modify] https://crrev.com/31a9a37096b63341fd5d5adab74f2d19965e8c1a/go/src/infra/libs/eventupload/eventupload_test.go [modify] https://crrev.com/31a9a37096b63341fd5d5adab74f2d19965e8c1a/go/src/infra/tools/kitchen/monitoring.go
,
Nov 13 2017
,
Nov 13 2017
The following revision refers to this bug: https://chromium.googlesource.com/infra/infra/+/fafa4b02eff0b70cba678b8185f7373855b62e94 commit fafa4b02eff0b70cba678b8185f7373855b62e94 Author: Katie Thomas <katthomas@google.com> Date: Mon Nov 13 20:33:15 2017 [eventupload] modify to expect protos Specifically, modify prepareSrc to expect a struct which implements proto.Message, or a slice of such values. This is simpler, and all clients currently do it anyway. Bug:775262 Change-Id: Id5824bafb8c450944ab12141c0c3e9b032f046db Reviewed-on: https://chromium-review.googlesource.com/748975 Reviewed-by: Nodir Turakulov <nodir@chromium.org> Reviewed-by: Sean McCullough <seanmccullough@chromium.org> Commit-Queue: Katie Thomas <katthomas@google.com> [add] https://crrev.com/fafa4b02eff0b70cba678b8185f7373855b62e94/go/src/infra/libs/eventupload/testmessage.pb.go [add] https://crrev.com/fafa4b02eff0b70cba678b8185f7373855b62e94/go/src/infra/libs/eventupload/testmessage.proto [modify] https://crrev.com/fafa4b02eff0b70cba678b8185f7373855b62e94/go/src/infra/libs/eventupload/eventupload_test.go [modify] https://crrev.com/fafa4b02eff0b70cba678b8185f7373855b62e94/go/src/infra/libs/eventupload/eventupload.go
,
Dec 2 2017
|
||||||||
►
Sign in to add a comment |
||||||||
Comment 1 by no...@chromium.org
, Oct 16 2017