Only report comments for lines modified by the patch (filter comment to those in diff) |
||||||||
Issue descriptionMake it possible for analyzers or the service to only report comments on changed lines. This can either be done by making analyzers aware of the diff or by making the service aware of the diff. For analyzers the diff could be propagated in the FILES data and there could be an option to analyze all lines (e.g., when there is no diff info). Alternatively, the service could be made aware of the diff info in order to filter the result. This alternative has the down side of the service having to know how to extract diff info.
,
Oct 17 2017
The Gerrit change details does not include any specifics about changed lines: https://gerrit-review.googlesource.com/Documentation/rest-api-changes.html#change-info This means the information would need to be extracted in the GitFileIsolator and added as an extension to the FILES data.
,
Nov 13 2017
,
Mar 2 2018
Issue 808812 has been merged into this issue.
,
Mar 2 2018
As noted in #2, the diff could be extracted by GitFileIsolator and added to FILES data. Either https://gerrit-review.googlesource.com/Documentation/rest-api-changes.html#get-patch or `git diff FETCH_HEAD~ FETCH_HEAD` could be used to get the diff for real files. Requests to https://gerrit-review.googlesource.com/Documentation/rest-api-changes.html#get-diff could potentially be used to get the diff of magic paths like /COMMIT_MSG.
,
Mar 29 2018
,
Mar 29 2018
Issue 827401 has been merged into this issue.
,
Apr 3 2018
,
Apr 11 2018
Just a passing thought -- when filtering, any comments at line 0 col 0 (i.e. file level comments) should be kept regardless of whether line 0 was changed in the diff.
,
Apr 11 2018
Agreed. Also, relevant for the copyright anlayzer: Comments at line 0 with no range are always file comments; any non-file-comments that start at line 0 should must have a non-zero end line number. The comments from the copyright analyzer right now have line 0 and no range, and therefore they would always be posted on all files even if the copyright message isn't modified in the CL. If this isn't intended, then the copyright analyzer should probably be modified to include an ending line number.
,
Jun 15 2018
,
Jun 15 2018
,
Jun 22 2018
Returning to this -- this is the main blocker for rolling out Tricium to the Fuchsia team at large. We talked briefly offline about a few different options for this, and I'm curious if any further thought has been given. As far as I recall, the two options were: a) get the changed line numbers from the revision (likely with the GitFileIsolator? Is that called for both binary- and recipe-based analyzers?) and check that each comment the analyzer returns is attached to a changed/added/deleted line (or the file) or b) run each analyzer twice, once with the revision in question and once without and diff the comment results. Option b) is more thorough, and may allow the service to catch more comments*, but option a) is much more efficient in terms of analyzer time, particularly when we have recipe analyzers doing checkouts and possibly partial builds. Thoughts? *What would be an example of this? I vaguely remember we talked about it, but can't offhand remember a situation where this would happen
,
Jun 22 2018
I was hoping to talk to you about this :-D That's right, those were basically the two options. More thoughts: Option (a): There are a couple ways this could be done. Possibility (1) is getting the patch in GitFileIsolator. GitFileIsolator is not necessarily used by all analyzers though, and in particular it wouldn't be used by recipe-baesd analyzers that want to get the whole checkout. Possibility (2) is fetching the diff using the Gerrit API: https://gerrit-review.googlesource.com/Documentation/rest-api-changes.html#get-patch This incurs some time fetching the patch and one additional request to Gerrit each time results are going to be reported, but can be done the same for all Tricium comments that are reported. It could potentially be done just before reporting results to Gerrit, in rpc_report_results.go: https://cs.chromium.org/chromium/infra/go/src/infra/tricium/appengine/gerrit/rpc_report_results.go Potentially we could use an already-written git diff parser library, e.g. https://github.com/waigani/diffparser to get the list of changed lines. Option (b): For an example of why this would be more thorough and "correct", consider an example with pylint: 43| a = 'foo' ..| ... (no use of a) 65| return a + 'bar' Changing line 65 to `return 'bar'` would might lead to an "unused var" warning. But there's nothing really wrong on line 65, but there's a new warning overall after changing line 65. Similar cases might arise with clang-tidy, like with the "bugprone-use-after-move" warning. As for what would be required to do this: (1) Right now, each for each isolator/analyzer run, there is one "WorkerRun" per platform. There would have to be two (one at ref, and one at ref^: after and before the change). These worker runs would be tracked in datastore: https://cs.chromium.org/chromium/infra/go/src/infra/tricium/appengine/common/track/track.go (2) When a worker is done (see rpc_worker_done.go), we would still check if all workers are done for that function, and then aggregate comments. But now, this would involve getting the difference of comments in the after/before workers. Option (b) means twice as much total compute time, but the actual clock time for an analyzer would be about the same since the after/before tasks should be run in parallel.
,
Jun 25 2018
I talked to supertri@ and she pointed out a couple things: Sometimes option (b), by itself, is not exactly what we want. If there is a pre-existing problem in changed code, that's not fixed in the change itself, it's still helpful to report warnings about the problem, because it's a good opportunity to fix the change. By doing a diff of the problems found after and before the patch, that would make it so we don't report pre-existing problems in changed lines. So, option (a) for now would still be good, and would be a bit simpler. What tricorder did was: Report all comments to the code review site, and do filtering on the code review site. For us, that would mean changing Gerrit to hide robot comments on non-change lines by default (which would also be possible). This would be option (c). For analyzers where specifically want to see warnings for the whole file, this might be overridden somehow (e.g. "unused imports" warnings we want to see even if the imports lines are not changed). I think the easiest change right now might be (a), but I'm not sure.
,
Jun 26 2018
It doesn't sound like it's possible to hide robot comments for unchanged lines in Gerrit as a setting (I asked the Gerrit users list) right now, and so that would require changes to Gerrit itself. What might be easier would be to use the GetDiff (https://gerrit-review.googlesource.com/Documentation/rest-api-changes.html#get-diff) API call, which returns a json representation of the diff, which is fairly trivial to parse and get the changed lines. If at some point, then, we decide to implement the before/after runs and return the merged comments, we could override that without too much effort.
,
Jun 26 2018
That's true. One possible issue with the "get-diff" endpoint is that it appears to be for one file at a time. So, suppose there's ~10 files in a CL that have comments, that would mean 10x as many requests. This may be OK for now, but our QPS to Gerrit will be increasing later on, so it might be too high later on.
,
Jun 27 2018
Put up https://chromium-review.googlesource.com/c/infra/infra/+/1117476 as a first pass at getting this done with GetPatch. I'm also having a discussion on g/gerritcodereview-users about the Gerrit setting and/or a simpler API call that would provide what we need without having to parse the patch, will update with any conclusions reached.
,
Jun 27 2018
Excellent - Discussing on gerritcodereview-users is a great idea, thanks for doing that.
,
Jun 29 2018
Julie has now landed change https://chromium-review.googlesource.com/c/infra/infra/+/1117476
,
Jul 3
Note: After investigating some more and adding some logging, it seems that the changed lines map is empty. For example, with added logging for runs on https://chromium-review.googlesource.com/c/playground/gerrit-tricium/+/1097599, I'm now seeing: Changed lines: map[] .. File "example.txt" is not in changed lines. ... File "example.py" is not in changed lines. ... In this case, these are both newly-added files, so all lines in these files should be considered "changed" for our purposes.
,
Jul 3
Interesting. To eliminate a potential issue with filenames, the map is completely empty, yes? I could see it having contents, but the filenames it extracts from the patch don't map to the filenames in the Tricium data structure. That at least would be a reasonably trivial fix.... If that's not the case, my instinct is that there's some mismatch between the format expected by the diffparser we're using and the one gerrit provides, but I can investigate more.
,
Jul 3
Yep, currently the map is empty and apparently contains no filenames, so I think it's not a matter of filename mismatch.
Adding slightly more logging, I see that the fetched "raw diff" is empty ([]byte{}).
,
Jul 3
Ah, I have a hypothesis as to why it's empty. In the logs: Using URL: https://chromium-review.googlesource.com/a/changes/playground%2Fgerrit-tricium~master~Ide1756bb3ed99986f6eafba3e3d51a94ae261647/revisions/refs/changes/99/1097599/33/patch The revision "refs/changes/99/1097599/33" is the ref for the revision but doesn't count as "revision-id" in the API. But the legacy patchset number (33) does count. I'll upload a CL in a minute to change this.
,
Jul 3
Another thing I just noticed: fetching of the AnalyzeRequest and fetching of the changed lines is currently done in parallel, but the changed lines fetching depends on first having all of the request info. I saw one entry in the log where it tried to fetch changed lines when all of the request fields (host, change, revision) were still all empty.
,
Jul 3
The following revision refers to this bug: https://chromium.googlesource.com/infra/infra/+/51ba075bf1d0305c0586c106961cebb1a14a7a01 commit 51ba075bf1d0305c0586c106961cebb1a14a7a01 Author: Quinten Yearsley <qyearsley@chromium.org> Date: Tue Jul 03 20:27:14 2018 [tricium] Fix request for patch from Gerrit In this CL: - Change the revision ID in the request URL to be just patch set number - Ensure AnalyzeRequest is fetched before fetching patch - Add logging Bug: 767856 Change-Id: Ia30690785b48c05c2c1344f01f4bb0ab2ba9e350 Reviewed-on: https://chromium-review.googlesource.com/1125027 Reviewed-by: Marc-Antoine Ruel <maruel@chromium.org> Commit-Queue: Quinten Yearsley <qyearsley@chromium.org> [modify] https://crrev.com/51ba075bf1d0305c0586c106961cebb1a14a7a01/go/src/infra/tricium/appengine/gerrit/rpc_report_results.go [modify] https://crrev.com/51ba075bf1d0305c0586c106961cebb1a14a7a01/go/src/infra/tricium/appengine/gerrit/gerrit.go
,
Jul 3
Based on a test after deploying to tricium-dev, it now looks like diff filtering may be working, but line numbers in posted gerrit comments are still not working: https://chromium-review.googlesource.com/c/infra/infra/+/1121340/21
,
Jul 6
The following revision refers to this bug: https://chromium.googlesource.com/infra/infra/+/f93a4872e02d7b54da082ff91eb7dc6a4b5cba38 commit f93a4872e02d7b54da082ff91eb7dc6a4b5cba38 Author: Quinten Yearsley <qyearsley@chromium.org> Date: Fri Jul 06 20:46:14 2018 [tricium] Don't try to strip JSONP prefix from patch body Bug: 767856 Change-Id: I72b9de3c102d349596b873756eb58b8b0b7af66f Reviewed-on: https://chromium-review.googlesource.com/1127341 Reviewed-by: Marc-Antoine Ruel <maruel@chromium.org> Commit-Queue: Quinten Yearsley <qyearsley@chromium.org> [modify] https://crrev.com/f93a4872e02d7b54da082ff91eb7dc6a4b5cba38/go/src/infra/tricium/appengine/gerrit/gerrit.go
,
Jul 6
The following revision refers to this bug: https://chromium.googlesource.com/infra/infra/+/334c03da6b4c313f690a1141145362e173a7bfea commit 334c03da6b4c313f690a1141145362e173a7bfea Author: Quinten Yearsley <qyearsley@chromium.org> Date: Fri Jul 06 21:27:44 2018 [tricium] Send no headers when fetching gerrit patch The Gerrit documentation indicates that these headers will be in the response, but these are not needed in the request. This is confirmed by the fact that making a request to get the patch from a logged-in browser session requires no special headers. Bug: 767856 Change-Id: I64bd042bbf2d3512806be7bba930c494786da6bb Reviewed-on: https://chromium-review.googlesource.com/1128263 Reviewed-by: Marc-Antoine Ruel <maruel@chromium.org> Commit-Queue: Quinten Yearsley <qyearsley@chromium.org> [modify] https://crrev.com/334c03da6b4c313f690a1141145362e173a7bfea/go/src/infra/tricium/appengine/gerrit/gerrit.go
,
Jul 6
This appears to be working, although there's one point that is a bit confusing now: In the Progress response, the numComments field is the number of comments before filtering -- so the Tricium view will show a number that is higher than the number of comments actually posted. A couple ideas that might make this less confusing: 1. In the tricium plugin UI, don't show the number of comments. 2. When filtering comments out, set Included = false in the stored CommentSelection entity; then when counting number of comments for Progress, return the number of included comments rather than all comments. Any thoughts about this? I'm slightly in favor of #1, I think... currently the number of comments listed in the Progress response is currently from track.WorkerRunResult.NumComments, which is "number of comments produced, including those not sent in the response".
,
Jul 6
I'd agree that 1) seems like a better option, since it doesn't have much bearing to the user how many comments were generated, just that some or none were. Including the number of comments in the Progress response isn't particularly useful, since they'll be filtered and merged at the end anyways (assuming I understand the pipeline correctly)
,
Aug 20
|
||||||||
►
Sign in to add a comment |
||||||||
Comment 1 by emso@chromium.org
, Sep 25 2017