New issue
Advanced search Search tips
Note: Color blocks (like or ) mean that a user may not be available. Tooltip shows the reason.
Starred by 1 user

Issue metadata

Status: WontFix
Owner: ----
Closed: Jun 2017
Cc:
Components:
HW: All
NextAction: ----
OS: All
Priority: 3
Type: Bug



Sign in to add a comment

RegExp: /[\c%]/ is not yet handled by the spec

Project Member Reported by jgruber@chromium.org, Apr 4 2017

Issue description

/[\c%]/ doesn't match the RegExp grammar in Annex B and should throw a SyntaxError. The same holds for any '\cX' within a character class where X is not in {0-9,_,a-z,A-Z}.

An attempted derivation:

CharacterClass (https://tc39.github.io/ecma262/#prod-CharacterClass)
 -> ClassRanges
 -> NonemptyClassRanges
 -> ClassAtom
 -> ClassAtomNoDash
 -> ClassEscape (Annex B)
 -> CharacterEscape
 -> Nothing matches at this point.

A couple of similar but valid cases:

/[\c0]/, ..., /[\c9]/, /[\c_]/  (ClassControlLetter)
/[\ca]/, ..., /[\cZ]/  (ControlLetter)
/\c%/  (ok outside of character class)
 
This syntax is supported across browsers. I think the right fix here would be to update Annex B with whatever the real current cross-browser grammar is. In particular, I wonder if the "but not c" clause of IdentityEscape is implemented in browsers.

littledan@littledan-ThinkPad-T460p:~/v8/v8$ eshost -e "/[\c%]/.exec('')"
#### jsc
null

#### chakracore
null

#### d8
null

#### spidermonkey
null
/^[\c%]*$/.test("\\c%")  -->

v8: true
firefox: true
Some related work was started in this patch: https://github.com/tc39/ecma262/commit/fbdfda6f2a613f3c4813d4b34e32f5c5134cf921

However, Andre may have left out this case as its interpretation differs between browsers. In particular, ChakraCore differs from the agreement between SpiderMonkey, V8 and JSC.

SM, V8 and JSC will treat [\c%] as [\\c%], but ChakraCore will treat it as [\x05], taking the lower 5 bits, as for other control escapes.

For a class like [\c], SM, V8 and JSC will treat it as [\\c], whereas ChakraCore will treat it as [c].

IMO it would be reasonable to standardize on the 3/4 behavior for both of these cases.

-----
Boring raw tests:

littledan@littledan-ThinkPad-T460p:~/v8/v8$ eshost -e '/^[\c%]$/.test("\\")'
#### chakracore
false

#### d8
true

#### jsc
true

#### spidermonkey
true


littledan@littledan-ThinkPad-T460p:~/v8/v8$ eshost -e '/[\c%]/.test("c")'
#### chakracore
false

#### d8
true

#### jsc
true

#### spidermonkey
true

littledan@littledan-ThinkPad-T460p:~/v8/v8$ eshost -e '/[\c%]/.test("\x05")'
#### jsc
false

#### chakracore
true

#### d8
false

#### spidermonkey
false


littledan@littledan-ThinkPad-T460p:~/v8/v8$ eshost -e '/^[\c]$/.test("c")'
#### jsc
true

#### d8
true

#### chakracore
true

#### spidermonkey
true

#### v8debug


littledan@littledan-ThinkPad-T460p:~/v8/v8$ eshost -e '/^[\c]$/.test("\\")'
#### jsc
true

#### d8
true

#### spidermonkey
true

#### chakracore
false

Labels: SpecViolation-OpenQuestion
Just so I am understanding this right:

For the case outside [] all browsers agree on the strange appendix-sanctioned interpretation where \c% matches the same as \\c% would match?

For the case inside [] there is disagreement on what it means with non-Chakra matching the same as [\\c%] and Chakra matching only the single code point 5?
If #5 is correct then I think Chakra has surprising behavior.
Upstream bug to cross-reference: https://github.com/tc39/ecma262/issues/863
Labels: Priority-3
Summary: RegExp: /[\c%]/ is not yet handled by the spec (was: RegExp: /[\c%]/ should throw SyntaxError)
See also

https://github.com/tc39/ecma262/pull/864
https://github.com/tc39/ecma262/issues/863
Status: WontFix (was: Available)
The specification fix here got consensus at TC39; you can see a draft at https://github.com/tc39/ecma262/pull/864 (but this spec patch has to be reworded). Irregexp is working as intended!

Sign in to add a comment