New issue
Advanced search Search tips
Note: Color blocks (like or ) mean that a user may not be available. Tooltip shows the reason.

Issue 775262 link

Starred by 1 user

Issue metadata

Status: Fixed
Owner:
Closed: Dec 2017
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: ----
Pri: 2
Type: Bug

Blocked on:
issue 784512



Sign in to add a comment

Just talk proto to BigQuery, no tabledefs

Project Member Reported by seanmccullough@chromium.org, Oct 16 2017

Issue description

Should be able to define a proto as you normally would, but add some
custom proto options to describe how the events should be logged to
big query.

Example .proto file:

syntax = "proto3";

package som;

// Start stuff that should be in tabledef.proto
import "google/protobuf/descriptor.proto";

extend google.protobuf.MessageOptions {
  string bq_dataset_id = 50001;
  string bq_table_id = 50002;
  string bq_table_name = 50003;
  string bq_table_description = 50004;
  bool bq_parition_table = 50005;
}

extend google.protobuf.FieldOptions {
  string bq_field_description = 50006;
  tabledef.Type bq_field_type = 50007;
}

// End stuff that should be in tabledef.proto

message SOMAnnotationEvent {
  option(bq_dataset_id) = "events";
  option(bq_table_id) = "annotations";
  option(bq_table_name) = "User Annotations";
  option(bq_table_description) = "Annotations added to Alerts by Users";

  int64 timestamp = 1 [
      (bq_field_description) = "Timestamp when the annotation was modified.",
      (bq_field_type) = TIMESTAMP];

  // Other fields that should be the "events/annotations" bq table, etc
}

And modify bqshcemupdater to use the extra information in the generated go source to determine how to set dataset, table name, column types etc  in BQ.

 

Comment 1 by no...@chromium.org, Oct 16 2017

Remove descriptions. We can load descriptions from leading comments (that should be written anyway). PRPC does that successfully

Remove partition. False is a bad default. We always want to partition 

Comment 2 by no...@chromium.org, Oct 16 2017

Remove bq field type option. google.protobuf.Timestamp should be used for timestamps 

Comment 3 by no...@chromium.org, Oct 16 2017

Cc: katthomas@chromium.org
Nice, that simplifies things even more.

(Speaking of pRPC: say we wanted to log RPC requests verbatim to bq. Could we just import the proto package definition used for the RPC bindings into our table proto message?)

and #TIL about google.protobuf.Timestamp!

Comment 5 by no...@chromium.org, Oct 17 2017

yeah, totally. pRPC's message protos are no different from any other protos

there is a lot of interesting protos https://github.com/google/protobuf/tree/master/src/google/protobuf

https://github.com/google/protobuf/blob/master/src/google/protobuf/descriptor.proto#L840
is where comments live.
cproto generates pb.discovery.go with FileDescriptorSet() which contains comments
Status: Started (was: Available)
After some discussion, we've concluded that this is probably all we need for protos-as-tabledefs:

// table_def.proto now just contains:
package tabledef;

extend google.protobuf.MessageOptions {
  string dataset_id = 50001;
  string table_id = 50002;
  string table_name = 50003;
  string table_description = 50004;
}
// EOF

// In my_logging_proto.proto:

import "infra/libs/bqschema/tabledef/table_def.proto";

// Maybe this comment can be used as tabledef.table_description if it
// isn't specified explicitly below.
message SOMAnnotationEvent {
  option(tabledef.dataset_id) = "events";
  option(tabledef.table_id) = "annotations";
  option(tabledef.table_name) = "User Annotations";
  option(tabledef.table_description) = "Annotations added to Alerts by Users";

  // Timestamp when the annotation was modified.
  // This comment automatically becomes the BQ field description.
  google.protobuf.Timestamp = 1;
 
  // Other fields that should be the "events/annotations" bq table, etc
}

bqshcemaupdater should be modified to read proto descriptors (from a temp file) and execute the BQ schema changes targeted at the datasets and tables listed in the proto options.

Logging clients should be modified to check for  proto.GetExtension(<loggingEventProto>, tabledef.E_DatasetId) etc to discover the destination dataset, table ID etc.

Labels: -Pri-3 Pri-2
Summary: Just talk proto to BigQuery, no tabledefs (was: generate tabledefs from .proto files + proto Options)
Chopped down even further, we don't really need proto Options/Extensions either!

The following plan obviates tabledef concept altogether:

- bqschemaupdater gets explicit flags for proto message, project, dataset and table id.
- change eventupload to accept project id, dataset and table id instead of tabledef, and modify it to accept proto.Message events.
- update eventupload usages


and remove bqexport code altogether. 
Can we enforce documenting those flags in the protos? I think we want that information (dataset id, etc.) in version control.

Comment 11 by no...@chromium.org, Oct 19 2017

Owner: no...@chromium.org
there is a little problem with dataset id and table id in .pb.txt or .proto files. In the past the chrome-infra-events project was the only one that contained events, so (dataset_id, table_id) pair uniquely identified a table. Now that we've lifted the chrome-infra-events requirement, the pair no longer identifies a table. E.g. SOM tables would not be identifiable because the pair is missing the project id.

Also in the past the fact that both table schema (fields) and table location (dataset id, table id) were stored in the same place (.pb.txt) enforced 1:1 relationship between schema and location. I think this requirement comes from event_mon where it had to be true. With BQ, we no longer have to require 1:1 relationship between schema and location, and SOM specifically wants one schema and two locations (staging and prod). This makes me think we should separate schema and location, not store them in the same place.

We could have .pb.txt files only for location, but i am not sure it is worth it. We'd have to keep bqexport around whose sole purpose would be to generate a struct variable that have the dataset id and table id. Note that bqexport would not longer generate event structs b/c protobuf compiler would do that. The cost of not having dataset id and table id in .pb.txt or .proto files is that we would write them twice: 1) in a script/Makefile that passes them to bqschemaupdater as flags 2) in code that sends events. I think this problem is not severe enough to justify .pb.txt and bqexport. Note that tableid/projectid can still be under version control if the user wants that, e.g. SOM's Makefile would have bqschemaupdater targets and it would be under version control.

Also note that some things, e.g. table friendly name and whether table is partitioned or not, is not used by code that sends events. Only bqschemaupdater needs them, so there is no value in storing them in a place common to bqschemaupdater and code.
I meant that we could have a comment indicating the location(s) of the schema's usage. They don't have to be parsed by bqschemaupdater or used for anything other than documentation. 


Comment 13 by no...@chromium.org, Oct 19 2017

that is a good idea
Project Member

Comment 14 by bugdroid1@chromium.org, Oct 19 2017

The following revision refers to this bug:
  https://chromium.googlesource.com/infra/luci/luci-go.git/+/c284df44f82c84dcf21f20d5d965996e11dcbeda

commit c284df44f82c84dcf21f20d5d965996e11dcbeda
Author: Nodir Turakulov <nodir@google.com>
Date: Thu Oct 19 18:36:30 2017

[descutil] add IndexSourceCodeInfo

Add a function that greatly simplifies parsing of comments in a proto descriptor
file. Use it in rpc subcommand.

Bug:  775262 
Change-Id: Ia16e2970b1f5a29e2e329fb85d9e3cb53ea5e849
Reviewed-on: https://chromium-review.googlesource.com/726954
Reviewed-by: Sean McCullough <seanmccullough@chromium.org>
Commit-Queue: Nodir Turakulov <nodir@chromium.org>

[modify] https://crrev.com/c284df44f82c84dcf21f20d5d965996e11dcbeda/common/proto/google/descutil/util.go
[modify] https://crrev.com/c284df44f82c84dcf21f20d5d965996e11dcbeda/grpc/cmd/rpc/printer.go
[modify] https://crrev.com/c284df44f82c84dcf21f20d5d965996e11dcbeda/grpc/cmd/rpc/printer_test.go
[modify] https://crrev.com/c284df44f82c84dcf21f20d5d965996e11dcbeda/grpc/cmd/rpc/show.go

Cc: yyanagisawa@chromium.org
Project Member

Comment 16 by bugdroid1@chromium.org, Oct 20 2017

The following revision refers to this bug:
  https://chromium.googlesource.com/infra/infra/+/4a6e6859aed9d1dcef5adbc650b816c87da6157f

commit 4a6e6859aed9d1dcef5adbc650b816c87da6157f
Author: Nodir Turakulov <nodir@google.com>
Date: Fri Oct 20 23:46:29 2017

[bqschemaupdater] read .proto instead of .pb.txt

Update bqschemaupdater to read .proto files intead of .pb.txt files.
Remove the TableDef dependency.

This is a breaking change, but does not cause a breakage immediately.
It forces people to switch from .pb.txt to .proto as soon as they need to
update schema.

See doc.go for how bq schema is derived from a proto message.

Bug:  775262 
Change-Id: I67a739a740526cb6667b1b1652a118d8e48a60aa
Reviewed-on: https://chromium-review.googlesource.com/729053
Commit-Queue: Nodir Turakulov <nodir@chromium.org>
Reviewed-by: Katie Thomas <katthomas@google.com>

[add] https://crrev.com/4a6e6859aed9d1dcef5adbc650b816c87da6157f/go/src/infra/tools/bqschemaupdater/testdata/event.proto
[add] https://crrev.com/4a6e6859aed9d1dcef5adbc650b816c87da6157f/go/src/infra/tools/bqschemaupdater/schema.go
[modify] https://crrev.com/4a6e6859aed9d1dcef5adbc650b816c87da6157f/go/src/infra/tools/bqschemaupdater/table_store.go
[add] https://crrev.com/4a6e6859aed9d1dcef5adbc650b816c87da6157f/go/src/infra/tools/bqschemaupdater/testdata/generate.go
[delete] https://crrev.com/7d3a7ef3c4323ea55d13e369f9511d988a988e10/go/src/infra/tools/bqschemaupdater/bqschemaupdater.go
[delete] https://crrev.com/7d3a7ef3c4323ea55d13e369f9511d988a988e10/go/src/infra/tools/bqschemaupdater/bqschemaupdater_test.go
[add] https://crrev.com/4a6e6859aed9d1dcef5adbc650b816c87da6157f/go/src/infra/tools/bqschemaupdater/doc.go
[add] https://crrev.com/4a6e6859aed9d1dcef5adbc650b816c87da6157f/go/src/infra/tools/bqschemaupdater/main.go
[add] https://crrev.com/4a6e6859aed9d1dcef5adbc650b816c87da6157f/go/src/infra/tools/bqschemaupdater/main_test.go
[add] https://crrev.com/4a6e6859aed9d1dcef5adbc650b816c87da6157f/go/src/infra/tools/bqschemaupdater/testdata/event.desc
[add] https://crrev.com/4a6e6859aed9d1dcef5adbc650b816c87da6157f/go/src/infra/tools/bqschemaupdater/testdata/event.pb.go

Blockedon: 784512
Project Member

Comment 19 by bugdroid1@chromium.org, Nov 13 2017

The following revision refers to this bug:
  https://chromium.googlesource.com/infra/infra/+/fafa4b02eff0b70cba678b8185f7373855b62e94

commit fafa4b02eff0b70cba678b8185f7373855b62e94
Author: Katie Thomas <katthomas@google.com>
Date: Mon Nov 13 20:33:15 2017

[eventupload] modify to expect protos

Specifically, modify prepareSrc to expect a struct which implements
proto.Message, or a slice of such values. This is simpler, and all
clients currently do it anyway.

Bug:775262
Change-Id: Id5824bafb8c450944ab12141c0c3e9b032f046db
Reviewed-on: https://chromium-review.googlesource.com/748975
Reviewed-by: Nodir Turakulov <nodir@chromium.org>
Reviewed-by: Sean McCullough <seanmccullough@chromium.org>
Commit-Queue: Katie Thomas <katthomas@google.com>

[add] https://crrev.com/fafa4b02eff0b70cba678b8185f7373855b62e94/go/src/infra/libs/eventupload/testmessage.pb.go
[add] https://crrev.com/fafa4b02eff0b70cba678b8185f7373855b62e94/go/src/infra/libs/eventupload/testmessage.proto
[modify] https://crrev.com/fafa4b02eff0b70cba678b8185f7373855b62e94/go/src/infra/libs/eventupload/eventupload_test.go
[modify] https://crrev.com/fafa4b02eff0b70cba678b8185f7373855b62e94/go/src/infra/libs/eventupload/eventupload.go

Status: Fixed (was: Started)

Sign in to add a comment