New issue
Advanced search Search tips
Note: Color blocks (like or ) mean that a user may not be available. Tooltip shows the reason.

Issue 867538 link

Starred by 1 user

Issue metadata

Status: Fixed
Owner:
Last visit > 30 days ago
Closed: Aug 8
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: ----
Pri: 3
Type: Feature



Sign in to add a comment

Spellchecker: check only in comments (and possibly strings)

Project Member Reported by qyears...@chromium.org, Jul 25

Issue description

Split off from  bug 867514 .

Even a conservative "dictionary of possible misspellings" approach may create too much noise when run on source code; particularly when highly abbrivated names are used.

Example case:
  https://fuchsia-review.googlesource.com/c/tools/+/175949/2/symbolize/pipeline.go#109

Implementing this may require going through source files of different types and deciding what's "in a comment" and what's "in a string". This will probably involve tokenizing different types of source files in order to see where multi-line comments (or strings?) start in different languages such as Python, Go, etc.

(Side note: If we take the approach of tokenizing, and iterating through all tokens keeping track of whether we're in a comment etc., then at that point we might as well just load the codespell dictionary and use the words there directly without invoking the codespell Pythons script).
 
This was what I originally had in mind when we first considered writing a spellchecker. I believe that we can have a JSON file with each language's single and multi-line comment format so that we can determine when a line (or several lines) are part of a comment.
Yep, that's right. In light of Julie's comment, maybe your original idea would have been better, in terms of potential noise... (although I think the current version where we invoke codespell was a bit faster to implement and see some results).
Owner: diegomtzg@google.com
Status: Assigned (was: Available)
Project Member

Comment 4 by bugdroid1@chromium.org, Jul 30

The following revision refers to this bug:
  https://chromium.googlesource.com/infra/infra/+/3c26b80f01caac70776f6bd592035478c418f14d

commit 3c26b80f01caac70776f6bd592035478c418f14d
Author: Diego Martinez <diegomtzg@google.com>
Date: Mon Jul 30 22:54:08 2018

[tricium] Add comment-only spellchecker analyzer

Initial work on spellchecker v2 that only checks comments for spelling errors.
Not relying on CodeSpell anymore (except for its dictionary of common misspellings).

Bug:  867538 
Change-Id: Ibd783bc5129d75e3bf26bf4de5a04ab74b9a0232
Reviewed-on: https://chromium-review.googlesource.com/1152297
Commit-Queue: Diego Martinez <diegomtzg@google.com>
Reviewed-by: Quinten Yearsley <qyearsley@chromium.org>

[add] https://crrev.com/3c26b80f01caac70776f6bd592035478c418f14d/go/src/infra/tricium/functions/spellchecker/test/tricium/data/files.json
[modify] https://crrev.com/3c26b80f01caac70776f6bd592035478c418f14d/go/src/infra/tricium/functions/spellchecker/cipd.yaml
[modify] https://crrev.com/3c26b80f01caac70776f6bd592035478c418f14d/go/src/infra/tricium/functions/spellchecker/spellchecker_test.go
[modify] https://crrev.com/3c26b80f01caac70776f6bd592035478c418f14d/go/src/infra/tricium/functions/spellchecker/Makefile
[add] https://crrev.com/3c26b80f01caac70776f6bd592035478c418f14d/go/src/infra/tricium/functions/spellchecker/dictionary.txt
[modify] https://crrev.com/3c26b80f01caac70776f6bd592035478c418f14d/go/src/infra/tricium/functions/spellchecker/README.md
[add] https://crrev.com/3c26b80f01caac70776f6bd592035478c418f14d/go/src/infra/tricium/functions/spellchecker/comment_formats.json
[add] https://crrev.com/3c26b80f01caac70776f6bd592035478c418f14d/go/src/infra/tricium/functions/spellchecker/test/example.c
[modify] https://crrev.com/3c26b80f01caac70776f6bd592035478c418f14d/go/src/infra/tricium/functions/spellchecker/.gitignore
[modify] https://crrev.com/3c26b80f01caac70776f6bd592035478c418f14d/go/src/infra/tricium/functions/spellchecker/spellchecker.go

Status: Started (was: Assigned)
Project Member

Comment 6 by bugdroid1@chromium.org, Aug 7

The following revision refers to this bug:
  https://chromium.googlesource.com/infra/infra/+/76cad057e83160353e12955eed0de9c879f26552

commit 76cad057e83160353e12955eed0de9c879f26552
Author: Diego Martinez <diegomtzg@google.com>
Date: Tue Aug 07 19:15:34 2018

Refactor Spellchecker to work as a state machine.

Bug:  867538 
Change-Id: I76d8e18d434b940c47eb479894186f4e0f0c465e
Reviewed-on: https://chromium-review.googlesource.com/1161083
Commit-Queue: Diego Martinez <diegomtzg@google.com>
Reviewed-by: Quinten Yearsley <qyearsley@chromium.org>
Reviewed-by: Marc-Antoine Ruel <maruel@chromium.org>

[modify] https://crrev.com/76cad057e83160353e12955eed0de9c879f26552/go/src/infra/tricium/functions/spellchecker/test/tricium/data/files.json
[modify] https://crrev.com/76cad057e83160353e12955eed0de9c879f26552/go/src/infra/tricium/functions/spellchecker/cipd.yaml
[modify] https://crrev.com/76cad057e83160353e12955eed0de9c879f26552/go/src/infra/tricium/functions/spellchecker/spellchecker_test.go
[modify] https://crrev.com/76cad057e83160353e12955eed0de9c879f26552/go/src/infra/tricium/functions/spellchecker/Makefile
[modify] https://crrev.com/76cad057e83160353e12955eed0de9c879f26552/go/src/infra/tricium/functions/spellchecker/dictionary.txt
[modify] https://crrev.com/76cad057e83160353e12955eed0de9c879f26552/go/src/infra/tricium/functions/spellchecker/README.md
[add] https://crrev.com/76cad057e83160353e12955eed0de9c879f26552/go/src/infra/tricium/functions/spellchecker/test/example.txt
[modify] https://crrev.com/76cad057e83160353e12955eed0de9c879f26552/go/src/infra/tricium/functions/spellchecker/test/example.c
[modify] https://crrev.com/76cad057e83160353e12955eed0de9c879f26552/go/src/infra/tricium/functions/spellchecker/.gitignore
[modify] https://crrev.com/76cad057e83160353e12955eed0de9c879f26552/go/src/infra/tricium/functions/spellchecker/spellchecker.go

Status: Fixed (was: Started)

Sign in to add a comment