Tricium analyzers should be able to easily iterate through only non-binary files |
||||||
Issue descriptionIt's cool that Tricium is commenting on CLs now! But Tricium also adds trailing space comments on PNG files, which doesn't really make sense and can feel kind of spammy, especially since the way Gerrit presents image files means that we don't actually have the ability to mark these comments as not useful. Example of a CL: https://chromium-review.googlesource.com/c/infra/infra/+/1023155/7
,
Apr 25 2018
Thanks for filing the issue Tiffany -- good point, in general we don't want to comment on any binary files. A few thoughts, in no particular order: - Most analyzers won't want to read any non-text files - It's possible that some analyzers may want to read non-text files, but probably won't want to comment on them (maybe file-level comments might be OK?) - For analyzers in Go, I think we should have a package with common utility functions. IsBinaryFile could be one of them. It appears that content type sniffing can be done by https://golang.org/pkg/net/http/#DetectContentType. - Analyzers could be responsible for deciding whether to read files based on sniffing, perhaps? - Gerrit seems to know already if files are binary or not: https://gerrit-review.googlesource.com/Documentation/rest-api-changes.html#file-info Maybe we could make sure that this FileInfo is included as part of the FILES data type?
,
Apr 25 2018
Leading on from the last point: We're *already* fetching for all files for all CLs that Tricium looks at (https://cs.chromium.org/chromium/infra/go/src/infra/tricium/appengine/gerrit/gerrit.go?l=190) The FileInfo does correctly say whether the png files are binary for the xample CL in the bug description above: https://chromium-review.googlesource.com/changes/infra%2finfra~master~I0e51da348f58040c4f707988fb9d43d04c5e8e0b?o=CURRENT_REVISION&o=CURRENT_FILES Proposed changes: - Change the FILES data type (https://cs.chromium.org/chromium/infra/go/src/infra/tricium/api/v1/data.proto?l=60) to include metadata from Gerrit FileInfo including whether the files are binary. - Change most analyzers, like Spacey, to skip binary files when looping through files in the CL.
,
Apr 25 2018
Also note, a simple quick fix for now would be to make it so that tricium just ignores binary files: https://crrev.com/c/1028369
,
Apr 25 2018
The following revision refers to this bug: https://chromium.googlesource.com/infra/infra/+/4b62e9436513f337861a4f2356dba4ef265f7fa3 commit 4b62e9436513f337861a4f2356dba4ef265f7fa3 Author: Quinten Yearsley <qyearsley@chromium.org> Date: Wed Apr 25 21:50:58 2018 Don't include binary files in paths to analyze This is a simple and possibily temporary change which would make it so that Tricium doesn't look at binary files (e.g. images) in CLs. We may want analyzers to be able to look at images and other binary files, but if we are to include both binary and text files, we want to provide an easy way for analyzers to tell which files are text. Bug: 836548 Change-Id: I98b80f0ee1b3bf81dc241884791cebf89350e376 Reviewed-on: https://chromium-review.googlesource.com/1028369 Commit-Queue: Quinten Yearsley <qyearsley@chromium.org> Reviewed-by: Marc-Antoine Ruel <maruel@chromium.org> [modify] https://crrev.com/4b62e9436513f337861a4f2356dba4ef265f7fa3/go/src/infra/tricium/appengine/gerrit/poll.go [modify] https://crrev.com/4b62e9436513f337861a4f2356dba4ef265f7fa3/go/src/infra/tricium/appengine/gerrit/poll_test.go
,
Apr 26 2018
,
Apr 30 2018
Currently the GIT_FILE_DETAILS and FILES formats have a `paths` field that is a list of path strings. In order to pipe the details about whether a file is binary through to the analyzer, I think, it would be best to have a map of paths to file details; file details could potentially include: - whether the file is binary - file size - how the file is being changed (deleted, moved, modified, added) - which lines are added/modified? - how many lines are added/deleted/modified? https://cs.chromium.org/chromium/infra/go/src/infra/tricium/api/v1/data.proto?q=tricium+data&sq=package:chromium&dr=CSs&l=54 In order to change this, we could follow this list of steps: 1. Add a new field (maybe called files? file_info? file_details? file_metadata?) which is a map of paths to metadata. This change could also change the Analyze and Launch requests to pipe this information through from Gerrit through to GitFileDetails. This new field would look something like: map<string, FileMetadata> files = 3; 2. Change existing analyzers to iterate through this instead of the paths field 3. Finally, the paths field would be redundant and could be removed.
,
Apr 30 2018
I'd say
enum Type {
UNKNOWN = 0;
FILE = 1;
SYMLINK = 2;
}
enum Action {
UNKNOWN = 0;
ADDED = 1;
MODIFIED = 1;
DELETED = 2;
TYPE_CHANGED = 3; // e.g. symlink -> file
METAONLY = 4; // e.g. +x
}
message ChangeStat {
...
}
message File {
string path = 1;
Type type = 2;
Action action = 3;
ChangeStat stats = 4;
}
repeated File Files = 3;
In particular, not use a map.
,
Jun 4 2018
,
Jun 5 2018
,
Jun 6 2018
The following revision refers to this bug: https://chromium.googlesource.com/infra/infra/+/1742a752ffa122c9b214bc74073aaa63deb6e837 commit 1742a752ffa122c9b214bc74073aaa63deb6e837 Author: Diego Martinez <diegomtzg@google.com> Date: Wed Jun 06 18:27:56 2018 Use file metadata rather than paths Load a new file struct from gerrit instead of only the file path in order to keep track of more file metadata inside of analyzers Bug: 836548 Change-Id: I2d99e3e8debe0f67f6dbf62e001daeb3e44d9308 Reviewed-on: https://chromium-review.googlesource.com/1086089 Commit-Queue: Diego Martinez <diegomtzg@google.com> Reviewed-by: Quinten Yearsley <qyearsley@chromium.org> [modify] https://crrev.com/1742a752ffa122c9b214bc74073aaa63deb6e837/go/src/infra/tricium/appengine/common/track/track.go [modify] https://crrev.com/1742a752ffa122c9b214bc74073aaa63deb6e837/go/src/infra/tricium/api/admin/v1/pb.discovery.go [modify] https://crrev.com/1742a752ffa122c9b214bc74073aaa63deb6e837/go/src/infra/tricium/functions/copyright/copyright.go [modify] https://crrev.com/1742a752ffa122c9b214bc74073aaa63deb6e837/go/src/infra/tricium/api/admin/v1/driver.pb.go [modify] https://crrev.com/1742a752ffa122c9b214bc74073aaa63deb6e837/go/src/infra/tricium/functions/hello/hello.go [modify] https://crrev.com/1742a752ffa122c9b214bc74073aaa63deb6e837/go/src/infra/tricium/api/v1/pb.discovery.go [modify] https://crrev.com/1742a752ffa122c9b214bc74073aaa63deb6e837/go/src/infra/tricium/api/admin/v1/reporter.pb.go [modify] https://crrev.com/1742a752ffa122c9b214bc74073aaa63deb6e837/go/src/infra/tricium/api/v1/tricium.proto [modify] https://crrev.com/1742a752ffa122c9b214bc74073aaa63deb6e837/go/src/infra/tricium/api/v1/function.pb.go [modify] https://crrev.com/1742a752ffa122c9b214bc74073aaa63deb6e837/go/src/infra/tricium/appengine/common/config/generate_test.go [modify] https://crrev.com/1742a752ffa122c9b214bc74073aaa63deb6e837/go/src/infra/tricium/api/admin/v1/config.proto [modify] https://crrev.com/1742a752ffa122c9b214bc74073aaa63deb6e837/go/src/infra/tricium/appengine/gerrit/poll_test.go [modify] https://crrev.com/1742a752ffa122c9b214bc74073aaa63deb6e837/go/src/infra/tricium/appengine/frontend/rpc_analyze.go [modify] https://crrev.com/1742a752ffa122c9b214bc74073aaa63deb6e837/go/src/infra/tricium/api/admin/v1/launcher.pb.go [modify] https://crrev.com/1742a752ffa122c9b214bc74073aaa63deb6e837/go/src/infra/tricium/appengine/launcher/rpc_launch.go [modify] https://crrev.com/1742a752ffa122c9b214bc74073aaa63deb6e837/go/src/infra/tricium/functions/spacey/spacey.go [modify] https://crrev.com/1742a752ffa122c9b214bc74073aaa63deb6e837/go/src/infra/tricium/api/admin/v1/launcher.proto [modify] https://crrev.com/1742a752ffa122c9b214bc74073aaa63deb6e837/go/src/infra/tricium/appengine/common/config/generate.go [modify] https://crrev.com/1742a752ffa122c9b214bc74073aaa63deb6e837/go/src/infra/tricium/appengine/gerrit/poll.go [modify] https://crrev.com/1742a752ffa122c9b214bc74073aaa63deb6e837/go/src/infra/tricium/api/admin/v1/tracker.pb.go [modify] https://crrev.com/1742a752ffa122c9b214bc74073aaa63deb6e837/go/src/infra/tricium/api/v1/tricium.pb.go [modify] https://crrev.com/1742a752ffa122c9b214bc74073aaa63deb6e837/go/src/infra/tricium/api/v1/config.pb.go [modify] https://crrev.com/1742a752ffa122c9b214bc74073aaa63deb6e837/go/src/infra/tricium/appengine/launcher/rpc_launch_test.go [modify] https://crrev.com/1742a752ffa122c9b214bc74073aaa63deb6e837/go/src/infra/tricium/appengine/config/rpc_generate_workflow.go [modify] https://crrev.com/1742a752ffa122c9b214bc74073aaa63deb6e837/go/src/infra/tricium/api/admin/v1/config.pb.go [modify] https://crrev.com/1742a752ffa122c9b214bc74073aaa63deb6e837/go/src/infra/tricium/appengine/frontend/handlers_test.go [modify] https://crrev.com/1742a752ffa122c9b214bc74073aaa63deb6e837/go/src/infra/tricium/functions/git-file-isolator/isolator.go [modify] https://crrev.com/1742a752ffa122c9b214bc74073aaa63deb6e837/go/src/infra/tricium/api/v1/data.pb.go [modify] https://crrev.com/1742a752ffa122c9b214bc74073aaa63deb6e837/go/src/infra/tricium/api/v1/data.proto [modify] https://crrev.com/1742a752ffa122c9b214bc74073aaa63deb6e837/go/src/infra/tricium/api/admin/v1/workflow.pb.go [modify] https://crrev.com/1742a752ffa122c9b214bc74073aaa63deb6e837/go/src/infra/tricium/appengine/frontend/rpc_analyze_test.go [modify] https://crrev.com/1742a752ffa122c9b214bc74073aaa63deb6e837/go/src/infra/tricium/api/v1/platform.pb.go
,
Jun 13 2018
Looks like the is_binary isn't yet passed on to analyzers in all places I'd expect. Specifically: For this CL which contains a binary file: CL: https://chromium-review.googlesource.com/c/infra/infra/+/1089995/6 Tricium run: https://tricium-dev.appspot.com/run/5764593871749120 Spacey task: https://chromium-swarm.appspot.com/task?id=3e130d89616b5f10 Spacey output: 2018/06/13 17:40:34 Read FILES data: files:<path:"go/src/infra/tricium/functions/pylint/.gitignore" > files:<path:"go/src/infra/tricium/functions/pylint/Makefile" > files:<path:"go/src/infra/tricium/functions/pylint/README.md" > files:<path:"go/src/infra/tricium/functions/pylint/cipd.yaml" > files:<path:"go/src/infra/tricium/functions/pylint/pylint.infra_testing" > files:<path:"go/src/infra/tricium/functions/pylint/pylint_parser" > files:<path:"go/src/infra/tricium/functions/pylint/pylint_parser.go" > files:<path:"go/src/infra/tricium/functions/pylint/pylint_parser_test.go" > files:<path:"go/src/infra/tricium/functions/pylint/pylintrc" > files:<path:"go/src/infra/tricium/functions/pylint/test/python.example" > files:<path:"go/src/infra/tricium/functions/pylint/test/tricium/data/files.json" > 2018/06/13 17:40:34 Not emitting comments for file: go/src/infra/tricium/functions/pylint/Makefile ... the above line repeated many times, even though the file only occurs once ... 2018/06/13 17:40:34 Failed to read file: bufio.Scanner: token too long, path: go/src/infra/tricium/functions/pylint/pylint_parser Expected: ... files: <path:"go/src/infra/tricium/functions/pylint/pylint_parser" is_binary:true > ... As an additional separate bug: When the task fails and has no output, Tricium fails to mark the task as failed.
,
Jun 13 2018
The following revision refers to this bug: https://chromium.googlesource.com/infra/infra/+/4a88a844335fd5027dc413a42d7257cdd2f71cd5 commit 4a88a844335fd5027dc413a42d7257cdd2f71cd5 Author: Quinten Yearsley <qyearsley@chromium.org> Date: Wed Jun 13 21:00:58 2018 [tricium] Use jsonpb marshaler when reading/writing data The jsonpb marshaler makes property names like "isBinary", whereas the standard json marshaler makes property names like "is_binary". These are incompatible so the same marshaler has to be used everywhere. Before this CL, the standard json marshaler library is used in data_helpers whereas jsonpb is used when creating the initial GitFileDetails data, so when the git-file-isolator unmarshaled the input it ignored the isBinary field. I chose jsonpb over json because the package documentation says: "...the standard "encoding/json" package ... does not operate correctly on protocol buffers." The main change is in data_helpers.go; this change also expands the manual test case for git-file-isolator. Bug: 836548 Change-Id: Id8559f03428ac8777dba18b89cf948861f2c36fd Reviewed-on: https://chromium-review.googlesource.com/1099688 Commit-Queue: Quinten Yearsley <qyearsley@chromium.org> Reviewed-by: Marc-Antoine Ruel <maruel@chromium.org> [modify] https://crrev.com/4a88a844335fd5027dc413a42d7257cdd2f71cd5/go/src/infra/tricium/functions/git-file-isolator/isolator.go [modify] https://crrev.com/4a88a844335fd5027dc413a42d7257cdd2f71cd5/go/src/infra/tricium/api/v1/data_helpers_test.go [modify] https://crrev.com/4a88a844335fd5027dc413a42d7257cdd2f71cd5/go/src/infra/tricium/functions/git-file-isolator/test/tricium/data/git_file_details.json [modify] https://crrev.com/4a88a844335fd5027dc413a42d7257cdd2f71cd5/go/src/infra/tricium/appengine/launcher/rpc_launch.go [modify] https://crrev.com/4a88a844335fd5027dc413a42d7257cdd2f71cd5/go/src/infra/tricium/api/v1/data_helpers.go [modify] https://crrev.com/4a88a844335fd5027dc413a42d7257cdd2f71cd5/go/src/infra/tricium/appengine/common/isolate.go
,
Jun 13 2018
I've now deployed new versions of all the functions; we should set up another example CL with a binary file to confirm that it works.
,
Jun 14 2018
Tested, confirmed: https://chromium-review.googlesource.com/c/playground/gerrit-tricium/+/1101311
,
Jun 14 2018
Note, in the above case there's something else weird going on: the Spacey and Hello tasks completed successfully, but Tricium didn't update its state. |
||||||
►
Sign in to add a comment |
||||||
Comment 1 by qyears...@chromium.org
, Apr 25 2018