New issue
Advanced search Search tips
Note: Color blocks (like or ) mean that a user may not be available. Tooltip shows the reason.

Issue 836548 link

Starred by 3 users

Issue metadata

Status: Fixed
Owner:
Last visit > 30 days ago
Closed: Jun 2018
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: ----
Pri: 2
Type: Bug



Sign in to add a comment

Tricium analyzers should be able to easily iterate through only non-binary files

Project Member Reported by zhangtiff@chromium.org, Apr 25 2018

Issue description

It's cool that Tricium is commenting on CLs now! But Tricium also adds trailing space comments on PNG files, which doesn't really make sense and can feel kind of spammy, especially since the way Gerrit presents image files means that we don't actually have the ability to mark these comments as not useful. 

Example of a CL: https://chromium-review.googlesource.com/c/infra/infra/+/1023155/7 
 
 Issue 820743  has been merged into this issue.
Components: -Infra Infra>CodeAnalysis
Labels: -OS-Linux -Pri-3 Tricium Pri-2
Status: Available (was: Unconfirmed)
Summary: Tricium shouldn't comment on binary files (e.g. PNG files) (was: Tricium shouldn't comment on PNG files)
Thanks for filing the issue Tiffany -- good point, in general we don't want to comment on any binary files.

A few thoughts, in no particular order:

 - Most analyzers won't want to read any non-text files
 - It's possible that some analyzers may want to read non-text files, but probably won't want to comment on them (maybe file-level comments might be OK?)
 - For analyzers in Go, I think we should have a package with common utility functions. IsBinaryFile could be one of them. It appears that content type sniffing can be done by https://golang.org/pkg/net/http/#DetectContentType.
 - Analyzers could be responsible for deciding whether to read files based on sniffing, perhaps?
 - Gerrit seems to know already if files are binary or not:
   https://gerrit-review.googlesource.com/Documentation/rest-api-changes.html#file-info
   Maybe we could make sure that this FileInfo is included as part of the FILES data type?
Leading on from the last point:

We're *already* fetching for all files for all CLs that Tricium looks at (https://cs.chromium.org/chromium/infra/go/src/infra/tricium/appengine/gerrit/gerrit.go?l=190)

The FileInfo does correctly say whether the png files are binary for the xample CL in the bug description above:

https://chromium-review.googlesource.com/changes/infra%2finfra~master~I0e51da348f58040c4f707988fb9d43d04c5e8e0b?o=CURRENT_REVISION&o=CURRENT_FILES

Proposed changes:

 - Change the FILES data type (https://cs.chromium.org/chromium/infra/go/src/infra/tricium/api/v1/data.proto?l=60) to include metadata from Gerrit FileInfo including whether the files are binary.
 - Change most analyzers, like Spacey, to skip binary files when looping through files in the CL.
Also note, a simple quick fix for now would be to make it so that tricium just ignores binary files: https://crrev.com/c/1028369
Project Member

Comment 5 by bugdroid1@chromium.org, Apr 25 2018

The following revision refers to this bug:
  https://chromium.googlesource.com/infra/infra/+/4b62e9436513f337861a4f2356dba4ef265f7fa3

commit 4b62e9436513f337861a4f2356dba4ef265f7fa3
Author: Quinten Yearsley <qyearsley@chromium.org>
Date: Wed Apr 25 21:50:58 2018

Don't include binary files in paths to analyze

This is a simple and possibily temporary change which would make
it so that Tricium doesn't look at binary files (e.g. images) in CLs.

We may want analyzers to be able to look at images and other binary
files, but if we are to include both binary and text files, we want
to provide an easy way for analyzers to tell which files are text.

Bug:  836548 
Change-Id: I98b80f0ee1b3bf81dc241884791cebf89350e376
Reviewed-on: https://chromium-review.googlesource.com/1028369
Commit-Queue: Quinten Yearsley <qyearsley@chromium.org>
Reviewed-by: Marc-Antoine Ruel <maruel@chromium.org>

[modify] https://crrev.com/4b62e9436513f337861a4f2356dba4ef265f7fa3/go/src/infra/tricium/appengine/gerrit/poll.go
[modify] https://crrev.com/4b62e9436513f337861a4f2356dba4ef265f7fa3/go/src/infra/tricium/appengine/gerrit/poll_test.go

Summary: Tricium analyzers should be able to easily iterate through only non-binary files (was: Tricium shouldn't comment on binary files (e.g. PNG files))
Currently the GIT_FILE_DETAILS and FILES formats have a `paths` field that is a list of path strings.

In order to pipe the details about whether a file is binary through to the analyzer, I think, it would be best to have a map of paths to file details; file details could potentially include:

 - whether the file is binary
 - file size
 - how the file is being changed (deleted, moved, modified, added)
 - which lines are added/modified?
 - how many lines are added/deleted/modified?

https://cs.chromium.org/chromium/infra/go/src/infra/tricium/api/v1/data.proto?q=tricium+data&sq=package:chromium&dr=CSs&l=54

In order to change this, we could follow this list of steps:

 1. Add a new field (maybe called files? file_info? file_details? file_metadata?) which is a map of paths to metadata. This change could also change the Analyze and Launch requests to pipe this information through from Gerrit through to GitFileDetails. This new field would look something like:

    map<string, FileMetadata> files = 3;

 2. Change existing analyzers to iterate through this instead of the paths field

 3. Finally, the paths field would be redundant and could be removed.

Comment 8 by mar...@chromium.org, Apr 30 2018

I'd say 

enum Type {
  UNKNOWN = 0;
  FILE = 1;
  SYMLINK = 2;
}

enum Action {
  UNKNOWN = 0;
  ADDED = 1;
  MODIFIED = 1;
  DELETED = 2;
  TYPE_CHANGED = 3;  // e.g. symlink -> file
  METAONLY = 4;  // e.g. +x
}

message ChangeStat {
...
}

message File {
  string path = 1;
  Type type = 2;
  Action action = 3;
  ChangeStat stats = 4;
}

repeated File Files = 3;


In particular, not use a map.
Owner: diegomtzg@google.com
Status: Assigned (was: Available)
Status: Started (was: Assigned)
Project Member

Comment 11 by bugdroid1@chromium.org, Jun 6 2018

The following revision refers to this bug:
  https://chromium.googlesource.com/infra/infra/+/1742a752ffa122c9b214bc74073aaa63deb6e837

commit 1742a752ffa122c9b214bc74073aaa63deb6e837
Author: Diego Martinez <diegomtzg@google.com>
Date: Wed Jun 06 18:27:56 2018

Use file metadata rather than paths

Load a new file struct from gerrit instead of only the file path in order to keep track of more file metadata inside of analyzers

Bug:  836548 
Change-Id: I2d99e3e8debe0f67f6dbf62e001daeb3e44d9308
Reviewed-on: https://chromium-review.googlesource.com/1086089
Commit-Queue: Diego Martinez <diegomtzg@google.com>
Reviewed-by: Quinten Yearsley <qyearsley@chromium.org>

[modify] https://crrev.com/1742a752ffa122c9b214bc74073aaa63deb6e837/go/src/infra/tricium/appengine/common/track/track.go
[modify] https://crrev.com/1742a752ffa122c9b214bc74073aaa63deb6e837/go/src/infra/tricium/api/admin/v1/pb.discovery.go
[modify] https://crrev.com/1742a752ffa122c9b214bc74073aaa63deb6e837/go/src/infra/tricium/functions/copyright/copyright.go
[modify] https://crrev.com/1742a752ffa122c9b214bc74073aaa63deb6e837/go/src/infra/tricium/api/admin/v1/driver.pb.go
[modify] https://crrev.com/1742a752ffa122c9b214bc74073aaa63deb6e837/go/src/infra/tricium/functions/hello/hello.go
[modify] https://crrev.com/1742a752ffa122c9b214bc74073aaa63deb6e837/go/src/infra/tricium/api/v1/pb.discovery.go
[modify] https://crrev.com/1742a752ffa122c9b214bc74073aaa63deb6e837/go/src/infra/tricium/api/admin/v1/reporter.pb.go
[modify] https://crrev.com/1742a752ffa122c9b214bc74073aaa63deb6e837/go/src/infra/tricium/api/v1/tricium.proto
[modify] https://crrev.com/1742a752ffa122c9b214bc74073aaa63deb6e837/go/src/infra/tricium/api/v1/function.pb.go
[modify] https://crrev.com/1742a752ffa122c9b214bc74073aaa63deb6e837/go/src/infra/tricium/appengine/common/config/generate_test.go
[modify] https://crrev.com/1742a752ffa122c9b214bc74073aaa63deb6e837/go/src/infra/tricium/api/admin/v1/config.proto
[modify] https://crrev.com/1742a752ffa122c9b214bc74073aaa63deb6e837/go/src/infra/tricium/appengine/gerrit/poll_test.go
[modify] https://crrev.com/1742a752ffa122c9b214bc74073aaa63deb6e837/go/src/infra/tricium/appengine/frontend/rpc_analyze.go
[modify] https://crrev.com/1742a752ffa122c9b214bc74073aaa63deb6e837/go/src/infra/tricium/api/admin/v1/launcher.pb.go
[modify] https://crrev.com/1742a752ffa122c9b214bc74073aaa63deb6e837/go/src/infra/tricium/appengine/launcher/rpc_launch.go
[modify] https://crrev.com/1742a752ffa122c9b214bc74073aaa63deb6e837/go/src/infra/tricium/functions/spacey/spacey.go
[modify] https://crrev.com/1742a752ffa122c9b214bc74073aaa63deb6e837/go/src/infra/tricium/api/admin/v1/launcher.proto
[modify] https://crrev.com/1742a752ffa122c9b214bc74073aaa63deb6e837/go/src/infra/tricium/appengine/common/config/generate.go
[modify] https://crrev.com/1742a752ffa122c9b214bc74073aaa63deb6e837/go/src/infra/tricium/appengine/gerrit/poll.go
[modify] https://crrev.com/1742a752ffa122c9b214bc74073aaa63deb6e837/go/src/infra/tricium/api/admin/v1/tracker.pb.go
[modify] https://crrev.com/1742a752ffa122c9b214bc74073aaa63deb6e837/go/src/infra/tricium/api/v1/tricium.pb.go
[modify] https://crrev.com/1742a752ffa122c9b214bc74073aaa63deb6e837/go/src/infra/tricium/api/v1/config.pb.go
[modify] https://crrev.com/1742a752ffa122c9b214bc74073aaa63deb6e837/go/src/infra/tricium/appengine/launcher/rpc_launch_test.go
[modify] https://crrev.com/1742a752ffa122c9b214bc74073aaa63deb6e837/go/src/infra/tricium/appengine/config/rpc_generate_workflow.go
[modify] https://crrev.com/1742a752ffa122c9b214bc74073aaa63deb6e837/go/src/infra/tricium/api/admin/v1/config.pb.go
[modify] https://crrev.com/1742a752ffa122c9b214bc74073aaa63deb6e837/go/src/infra/tricium/appengine/frontend/handlers_test.go
[modify] https://crrev.com/1742a752ffa122c9b214bc74073aaa63deb6e837/go/src/infra/tricium/functions/git-file-isolator/isolator.go
[modify] https://crrev.com/1742a752ffa122c9b214bc74073aaa63deb6e837/go/src/infra/tricium/api/v1/data.pb.go
[modify] https://crrev.com/1742a752ffa122c9b214bc74073aaa63deb6e837/go/src/infra/tricium/api/v1/data.proto
[modify] https://crrev.com/1742a752ffa122c9b214bc74073aaa63deb6e837/go/src/infra/tricium/api/admin/v1/workflow.pb.go
[modify] https://crrev.com/1742a752ffa122c9b214bc74073aaa63deb6e837/go/src/infra/tricium/appengine/frontend/rpc_analyze_test.go
[modify] https://crrev.com/1742a752ffa122c9b214bc74073aaa63deb6e837/go/src/infra/tricium/api/v1/platform.pb.go

Looks like the is_binary isn't yet passed on to analyzers in all places I'd expect. Specifically:

For this CL which contains a binary file:
CL: https://chromium-review.googlesource.com/c/infra/infra/+/1089995/6
Tricium run: https://tricium-dev.appspot.com/run/5764593871749120
Spacey task: https://chromium-swarm.appspot.com/task?id=3e130d89616b5f10

Spacey output:
2018/06/13 17:40:34 Read FILES data: files:<path:"go/src/infra/tricium/functions/pylint/.gitignore" > files:<path:"go/src/infra/tricium/functions/pylint/Makefile" > files:<path:"go/src/infra/tricium/functions/pylint/README.md" > files:<path:"go/src/infra/tricium/functions/pylint/cipd.yaml" > files:<path:"go/src/infra/tricium/functions/pylint/pylint.infra_testing" > files:<path:"go/src/infra/tricium/functions/pylint/pylint_parser" > files:<path:"go/src/infra/tricium/functions/pylint/pylint_parser.go" > files:<path:"go/src/infra/tricium/functions/pylint/pylint_parser_test.go" > files:<path:"go/src/infra/tricium/functions/pylint/pylintrc" > files:<path:"go/src/infra/tricium/functions/pylint/test/python.example" > files:<path:"go/src/infra/tricium/functions/pylint/test/tricium/data/files.json" > 
2018/06/13 17:40:34 Not emitting comments for file: go/src/infra/tricium/functions/pylint/Makefile
 ... the above line repeated many times, even though the file only occurs once ...
2018/06/13 17:40:34 Failed to read file: bufio.Scanner: token too long, path: go/src/infra/tricium/functions/pylint/pylint_parser

Expected:
... files: <path:"go/src/infra/tricium/functions/pylint/pylint_parser" is_binary:true > ...

As an additional separate bug: When the task fails and has no output, Tricium fails to mark the task as failed.
Project Member

Comment 13 by bugdroid1@chromium.org, Jun 13 2018

The following revision refers to this bug:
  https://chromium.googlesource.com/infra/infra/+/4a88a844335fd5027dc413a42d7257cdd2f71cd5

commit 4a88a844335fd5027dc413a42d7257cdd2f71cd5
Author: Quinten Yearsley <qyearsley@chromium.org>
Date: Wed Jun 13 21:00:58 2018

[tricium] Use jsonpb marshaler when reading/writing data

The jsonpb marshaler makes property names like "isBinary",
whereas the standard json marshaler makes property names like
"is_binary". These are incompatible so the same marshaler has to
be used everywhere.

Before this CL, the standard json marshaler library is used in
data_helpers whereas jsonpb is used when creating the initial
GitFileDetails data, so when the git-file-isolator unmarshaled the
input it ignored the isBinary field.

I chose jsonpb over json because the package documentation says:
"...the standard "encoding/json" package ... does not operate
correctly on protocol buffers."

The main change is in data_helpers.go; this change also expands
the manual test case for git-file-isolator.

Bug:  836548 
Change-Id: Id8559f03428ac8777dba18b89cf948861f2c36fd
Reviewed-on: https://chromium-review.googlesource.com/1099688
Commit-Queue: Quinten Yearsley <qyearsley@chromium.org>
Reviewed-by: Marc-Antoine Ruel <maruel@chromium.org>

[modify] https://crrev.com/4a88a844335fd5027dc413a42d7257cdd2f71cd5/go/src/infra/tricium/functions/git-file-isolator/isolator.go
[modify] https://crrev.com/4a88a844335fd5027dc413a42d7257cdd2f71cd5/go/src/infra/tricium/api/v1/data_helpers_test.go
[modify] https://crrev.com/4a88a844335fd5027dc413a42d7257cdd2f71cd5/go/src/infra/tricium/functions/git-file-isolator/test/tricium/data/git_file_details.json
[modify] https://crrev.com/4a88a844335fd5027dc413a42d7257cdd2f71cd5/go/src/infra/tricium/appengine/launcher/rpc_launch.go
[modify] https://crrev.com/4a88a844335fd5027dc413a42d7257cdd2f71cd5/go/src/infra/tricium/api/v1/data_helpers.go
[modify] https://crrev.com/4a88a844335fd5027dc413a42d7257cdd2f71cd5/go/src/infra/tricium/appengine/common/isolate.go

I've now deployed new versions of all the functions; we should set up another example CL with a binary file to confirm that it works.
Status: Fixed (was: Started)
Tested, confirmed: https://chromium-review.googlesource.com/c/playground/gerrit-tricium/+/1101311
Note, in the above case there's something else weird going on: the Spacey and Hello tasks completed successfully, but Tricium didn't update its state.

Sign in to add a comment