New issue
Advanced search Search tips
Note: Color blocks (like or ) mean that a user may not be available. Tooltip shows the reason.

Issue 862770 link

Starred by 2 users

Issue metadata

Status: Fixed
Owner:
Closed: Jan 16
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: ----
Pri: 2
Type: Feature

Blocked on:
issue 904007



Sign in to add a comment

Support skipping tricium for some files (e.g. generated files, expectations)

Project Member Reported by hinoka@chromium.org, Jul 11

Issue description

Since they're generally not written by hand.
 
Labels: -Type-Bug Type-Feature
Status: Available (was: Untriaged)
Summary: Skip running Tricium on generated or expectation files (was: Running tricium on expectation or generated files generally isn't as helpful)
Good point.

Some types of files that come to mind:

 - recipe expectations
 - web test expectations
 - generated protobuf files
 - there are probably lots more types of generated files

There are several things that could be done; one possibility is having a blacklist of file path patterns for which we want to skip all analysis.

Keeping this as a P3 since I'd expect people to just ignore warnings in generated files.
Cc: diegomtzg@google.com
Mentioned by Diego today.

If we want to do this by keeping a list of file path patterns, there are multiple options:

 - Tricium could add a flag to file metadata for each file
 - A helper function could be added for analyzers to filter

Note, we don't want to unconditionally always filter out all possibly generated files for all analyzers, since it's possible that some analyzers would want to check these files.
Cc: -diegomtzg@google.com
Cc: qyears...@chromium.org
 Issue 889834  has been merged into this issue.
It's quite annoying since Tricium keeps posting comments on each patchset. This adds noise and hides useful comments from reviewers.
Cc: -qyears...@chromium.org
Labels: -Pri-3 Pri-2
Owner: qyears...@chromium.org
Status: Assigned (was: Available)
I agree, this will continue be a source of noise in general, and should be done sooner rather than later.

I propose adding a helper function that is available to all analyzers to filter different kinds of generated or expectation files; use it in most or all of the existing analyzers (second option from #2).

Alternative: Add this function in Tricium itself, and always hide these files from all analyzers (first option from #2).

Reason: I think that it makes sense to add this as an optional thing that analyzer can use, because it's similar to the binary file filtering: it's conceivable that some analyzer will want to look at the full actual list of files, and specific decisions about filtering files seems like something that should be explicitly done in the analyzer. Always just giving the analyzer the full list of files makes it easier to reason about what the input of the analyzer is going to be.

Note: whichever way we choose, we can always change our mind later, the cost of changing to the other way doesn't seem very high.

There may be different new types of generated files over time, and there's no sure way to tell all possible generated file types from the content or path of the file, although there is almost always a way to guess a particular type of generated file from the path. For example:

  Recipe expectations: .expected/*.json
  Web test expectatations: LayoutTests/*-expected.* or in the future:  web_tests/*-expected.*
  Generated proto files: *_pb2.py or *.pb.go
  Expectation files for milo frontend tests: expectations/*.html

So the helper function might just be composed of a bunch of checks to see whether it matches any of these patterns.
Summary: Skip running analyzers on files that aren't directly edited (e.g. generated, expectations, third party) (was: Skip running Tricium on generated or expectation files)
We'll also want to make it easy to skip (most) third party files, which may be modified in rolls etc. but not directly edited. One exception is third_party/blink.
Cc: iannucci@chromium.org mar...@chromium.org
Robbie had an idea about a way to do this that wouldn't require that all of these file patterns are kept in a list with the Tricium code: Config could be kept in .gitattributes (or a similar file) and GitFileIsolator could potentially read that file and just not copy files which are to be ignored.

Config about what files to skip could be alongside the files to skip. This way, for example, the config saying to skip different sets of third party code could be put alongside that third party code; config saying to skip layout test expectations could be with the layout tests; it would be easier to skip new sets of files that should be skipped.

The easiest way to do this now would be to only support skipping all analyzers; but potentially, if GitFileIsolator could output separate sets of files for each analyzer then we'd be able to support skipping particular analyzers.
This needs to be addressed as soon as possible. :/
Assuming Quinten is there next week I'd like this to be looked at in priority, since it's currently very noisy for a fair number of kinds of changes.

I'm fine with the .gitattributes idea, or autodetecting on a "generated via" / "autogenerated" word in the first 3 lines in a comment, or anything that gets the desired result.
Status: Started (was: Assigned)
I'm in favor of the .gitattributes idea, modifying GitFileIsolator to filter out files.

This would mean that, at first, all analyzers would filter out the same set of files. Potentially in the future we could support per-analyzer skipping, but I'm not sure whether we need to do that.

We could start out initially skipping based on path patterns. To skip based on the contents of .gitattributes, I believe that GitFileIsolator must additionally check out .gitattributes files (right now it's running `git checkout FETCH_HEAD -- file1 file2 file3 etc` to check out only files in the change).
Cc: qyears...@chromium.org
 Issue 903994  has been merged into this issue.
thestig@ pointed out another category of files we want to skip: 
files ending in _expected.txt, such as
//src/third_party/pdfium/testing/resources/javascript/annot_props_expected.txt

Uploaded CL https://chromium-review.googlesource.com/c/infra/infra/+/1329751
Blockedon: 904007
Project Member

Comment 14 by bugdroid1@chromium.org, Nov 10

The following revision refers to this bug:
  https://chromium.googlesource.com/infra/infra/+/b027de3bd60c5a6920d2eeb78705f48b1c30efc1

commit b027de3bd60c5a6920d2eeb78705f48b1c30efc1
Author: Quinten Yearsley <qyearsley@chromium.org>
Date: Sat Nov 10 00:06:35 2018

[tricium git-file-isolator] Skip known generated/expectation files.

This would be an initial quick change, to be followed up by
actually reading .gitattributes files to decide what to skip.

Bug:  862770 
Change-Id: I965f637af921e609fe5f51f23c56ead5a9670cec
Reviewed-on: https://chromium-review.googlesource.com/c/1329751
Commit-Queue: Quinten Yearsley <qyearsley@chromium.org>
Reviewed-by: Marc-Antoine Ruel <maruel@chromium.org>
Cr-Commit-Position: refs/heads/master@{#18914}
[add] https://crrev.com/b027de3bd60c5a6920d2eeb78705f48b1c30efc1/go/src/infra/tricium/functions/git-file-isolator/isolator_test.go
[modify] https://crrev.com/b027de3bd60c5a6920d2eeb78705f48b1c30efc1/go/src/infra/tricium/functions/git-file-isolator/isolator.go

Summary: Support skipping tricium for some files (e.g. generated files, expectations) (was: Skip running analyzers on files that aren't directly edited (e.g. generated, expectations, third party))
New version of GitFileIsolator deployed, and it seems to now be filtering out generated pb files as expected, e.g. https://chromium-review.googlesource.com/c/infra/infra/+/1332331.

Next steps:
 - check and and use .gitattributes files
 - add .gitattributes files in the various repos
 - remove hard-coded list in git-file-isolator
Status: Fixed (was: Started)
Status: Started (was: Fixed)
The new version is bad, and produces the error:
Failed to run command: exit status 1, cmd: [git checkout FETCH_HEAD -- 100644 blob 9cc142c8ed9c01fc7874eaa4c0018ec890fccafe	.gitattributes 100644 blob 7c091e9e5dfdb1aac2556d4e9b5e99c9e415e101	third_party/blink/renderer/core/.gitattributes]

That is, it's not just using the filenames in the list of .gitattributes files as it should.
Project Member

Comment 18 by bugdroid1@chromium.org, Jan 16 (6 days ago)

The following revision refers to this bug:
  https://chromium.googlesource.com/infra/infra/+/3f776d393633a5e64f9db3d7ba4b47e477f01b75

commit 3f776d393633a5e64f9db3d7ba4b47e477f01b75
Author: Quinten Yearsley <qyearsley@chromium.org>
Date: Wed Jan 16 17:41:47 2019

When checking .gitattributes files, use --name-only.

In the initially committed version of the the .gitattributes code,
--name-only was accidentally left out, so the other fields output
by ls-tree, including file modes etc. were kept, which causes git
checkout to fail.

This wasn't caught in testing because the unit tests don't cover
any of the command invocation, and test that involves running
the analyzer (`make test`) didn't catch it because there were no
.gitattributes files to check out in the sample change.

Bug:  862770 
Change-Id: Ic61d898bf81ff186d4f1b08a091a51ffc167f13c
Reviewed-on: https://chromium-review.googlesource.com/c/1412048
Reviewed-by: Marc-Antoine Ruel <maruel@chromium.org>
Commit-Queue: Quinten Yearsley <qyearsley@chromium.org>
Cr-Commit-Position: refs/heads/master@{#20025}
[modify] https://crrev.com/3f776d393633a5e64f9db3d7ba4b47e477f01b75/go/src/infra/tricium/functions/git-file-isolator/test/tricium/data/git_file_details.json
[modify] https://crrev.com/3f776d393633a5e64f9db3d7ba4b47e477f01b75/go/src/infra/tricium/functions/git-file-isolator/isolator.go

Comment 19 by qyears...@chromium.org, Jan 16 (6 days ago)

Status: Fixed (was: Started)
New version of isolator deployed, seems to be working (e.g. https://chromium-swarm.appspot.com/task?id=427095bcf1c2f810&refresh=10&show_raw=1).

Sign in to add a comment