Project: v8 Issues People Development process History Sign in
New issue
Advanced search Search tips
Issue 4545 Implement RegExp lookbehind
Starred by 8 users Project Member Reported by yangguo@chromium.org, Nov 10 2015 Back to list
Status: Assigned
Owner:
Cc:
HW: ----
OS: ----
Priority: 2
Type: FeatureRequest



Sign in to add a comment
One thing I'm currently confused about is the semantics of back references.

Consider this: /(?<=(a)\1)\w/. If we assume to read forward in the lookbehind submatch, then this would match "aab".

However, the example dart implementation would read backwards, so the regexp would have to look like this: /(?<=\1(a)\w/ in order to work. Note that there is a slight difference: the capture group is capturing the second "a".

In V8, the parser parses from left to right, and only accepts back references it has already seen, which may conflict with the read order in the lookbehind.
Yes, I think use of backreferences is one case where the backwards implementation is a little surprising.  They turn into forwardreferences :-).  However they are a rather obscure feature inside a rather obscure feature.

It's not quite right that V8 parses left to right though:

> var re = /^\1(a)$/
> re.exec("a")
["a", "a"]

So the \1 is recognized as a backreference even though it comes before its capturing parenthesis.  It matches the empty string because the capture has not found anything yet.
Actually when V8 encounters a backreference it binds them to already parsed capture groups. If no capture group corresponds to the backref index yet, it's represented by the RegExpEmpty. That's also why the "forwardreferences" dont work out of the box
Ah, OK.

Some cleverness somewhere has to turn the RegExpEmpty into a decimal escape if the capture group never turns up:
> re=/^\2(a)/
/^\2(a)/
> re.exec("\x02a")
["02a", "a"]
I was wrong in #3. When there is an unknown back reference, we actually invoke a mini-parser to scan ahead to find the capture group.
I had forgotten about that if I knew it :-)

In Miniexp there is a pass over the constructed AST to determine what is a decimal escape and what is a backreference.
Project Member Comment 7 by bugdroid1@chromium.org, Nov 17 2015
The following revision refers to this bug:
  https://chromium.googlesource.com/v8/v8.git/+/37632606bbce1418238b13fd90cb6ef6705871cd

commit 37632606bbce1418238b13fd90cb6ef6705871cd
Author: yangguo <yangguo@chromium.org>
Date: Tue Nov 17 11:14:36 2015

Experimental support for RegExp lookbehind.

R=erikcorry@chromium.org, littledan@chromium.org
BUG=v8:4545
LOG=N

Review URL: https://codereview.chromium.org/1418963009

Cr-Commit-Position: refs/heads/master@{#32029}

[modify] http://crrev.com/37632606bbce1418238b13fd90cb6ef6705871cd/src/ast.cc
[modify] http://crrev.com/37632606bbce1418238b13fd90cb6ef6705871cd/src/ast.h
[modify] http://crrev.com/37632606bbce1418238b13fd90cb6ef6705871cd/src/bootstrapper.cc
[modify] http://crrev.com/37632606bbce1418238b13fd90cb6ef6705871cd/src/flag-definitions.h
[modify] http://crrev.com/37632606bbce1418238b13fd90cb6ef6705871cd/src/parser.cc
[modify] http://crrev.com/37632606bbce1418238b13fd90cb6ef6705871cd/src/parser.h
[modify] http://crrev.com/37632606bbce1418238b13fd90cb6ef6705871cd/src/regexp/arm/regexp-macro-assembler-arm.cc
[modify] http://crrev.com/37632606bbce1418238b13fd90cb6ef6705871cd/src/regexp/arm/regexp-macro-assembler-arm.h
[modify] http://crrev.com/37632606bbce1418238b13fd90cb6ef6705871cd/src/regexp/arm64/regexp-macro-assembler-arm64.cc
[modify] http://crrev.com/37632606bbce1418238b13fd90cb6ef6705871cd/src/regexp/arm64/regexp-macro-assembler-arm64.h
[modify] http://crrev.com/37632606bbce1418238b13fd90cb6ef6705871cd/src/regexp/bytecodes-irregexp.h
[modify] http://crrev.com/37632606bbce1418238b13fd90cb6ef6705871cd/src/regexp/ia32/regexp-macro-assembler-ia32.cc
[modify] http://crrev.com/37632606bbce1418238b13fd90cb6ef6705871cd/src/regexp/ia32/regexp-macro-assembler-ia32.h
[modify] http://crrev.com/37632606bbce1418238b13fd90cb6ef6705871cd/src/regexp/interpreter-irregexp.cc
[modify] http://crrev.com/37632606bbce1418238b13fd90cb6ef6705871cd/src/regexp/jsregexp.cc
[modify] http://crrev.com/37632606bbce1418238b13fd90cb6ef6705871cd/src/regexp/jsregexp.h
[modify] http://crrev.com/37632606bbce1418238b13fd90cb6ef6705871cd/src/regexp/mips/regexp-macro-assembler-mips.cc
[modify] http://crrev.com/37632606bbce1418238b13fd90cb6ef6705871cd/src/regexp/mips/regexp-macro-assembler-mips.h
[modify] http://crrev.com/37632606bbce1418238b13fd90cb6ef6705871cd/src/regexp/mips64/regexp-macro-assembler-mips64.cc
[modify] http://crrev.com/37632606bbce1418238b13fd90cb6ef6705871cd/src/regexp/mips64/regexp-macro-assembler-mips64.h
[modify] http://crrev.com/37632606bbce1418238b13fd90cb6ef6705871cd/src/regexp/regexp-macro-assembler-irregexp.cc
[modify] http://crrev.com/37632606bbce1418238b13fd90cb6ef6705871cd/src/regexp/regexp-macro-assembler-irregexp.h
[modify] http://crrev.com/37632606bbce1418238b13fd90cb6ef6705871cd/src/regexp/regexp-macro-assembler-tracer.cc
[modify] http://crrev.com/37632606bbce1418238b13fd90cb6ef6705871cd/src/regexp/regexp-macro-assembler-tracer.h
[modify] http://crrev.com/37632606bbce1418238b13fd90cb6ef6705871cd/src/regexp/regexp-macro-assembler.h
[modify] http://crrev.com/37632606bbce1418238b13fd90cb6ef6705871cd/src/regexp/x64/regexp-macro-assembler-x64.cc
[modify] http://crrev.com/37632606bbce1418238b13fd90cb6ef6705871cd/src/regexp/x64/regexp-macro-assembler-x64.h
[modify] http://crrev.com/37632606bbce1418238b13fd90cb6ef6705871cd/test/cctest/test-regexp.cc
[add] http://crrev.com/37632606bbce1418238b13fd90cb6ef6705871cd/test/mjsunit/harmony/regexp-lookbehind.js

Project Member Comment 8 by bugdroid1@chromium.org, Nov 17 2015
The following revision refers to this bug:
  https://chromium.googlesource.com/v8/v8.git/+/5b2ae9d9088d64a8060f87c315e066b973e7009e

commit 5b2ae9d9088d64a8060f87c315e066b973e7009e
Author: yangguo <yangguo@chromium.org>
Date: Tue Nov 17 11:55:03 2015

Revert of Experimental support for RegExp lookbehind. (patchset #18 id:340001 of https://codereview.chromium.org/1418963009/ )

Reason for revert:
gc stress breaks due to string_start_minus_one not being set correctly.

Original issue's description:
> Experimental support for RegExp lookbehind.
>
> R=erikcorry@chromium.org, littledan@chromium.org
> BUG=v8:4545
> LOG=N
>
> Committed: https://crrev.com/37632606bbce1418238b13fd90cb6ef6705871cd
> Cr-Commit-Position: refs/heads/master@{#32029}

TBR=littledan@chromium.org,erikcorry@chromium.org,erikcorry@google.com
NOPRESUBMIT=true
NOTREECHECKS=true
NOTRY=true
BUG=v8:4545

Review URL: https://codereview.chromium.org/1451373003

Cr-Commit-Position: refs/heads/master@{#32032}

[modify] http://crrev.com/5b2ae9d9088d64a8060f87c315e066b973e7009e/src/ast.cc
[modify] http://crrev.com/5b2ae9d9088d64a8060f87c315e066b973e7009e/src/ast.h
[modify] http://crrev.com/5b2ae9d9088d64a8060f87c315e066b973e7009e/src/bootstrapper.cc
[modify] http://crrev.com/5b2ae9d9088d64a8060f87c315e066b973e7009e/src/flag-definitions.h
[modify] http://crrev.com/5b2ae9d9088d64a8060f87c315e066b973e7009e/src/parser.cc
[modify] http://crrev.com/5b2ae9d9088d64a8060f87c315e066b973e7009e/src/parser.h
[modify] http://crrev.com/5b2ae9d9088d64a8060f87c315e066b973e7009e/src/regexp/arm/regexp-macro-assembler-arm.cc
[modify] http://crrev.com/5b2ae9d9088d64a8060f87c315e066b973e7009e/src/regexp/arm/regexp-macro-assembler-arm.h
[modify] http://crrev.com/5b2ae9d9088d64a8060f87c315e066b973e7009e/src/regexp/arm64/regexp-macro-assembler-arm64.cc
[modify] http://crrev.com/5b2ae9d9088d64a8060f87c315e066b973e7009e/src/regexp/arm64/regexp-macro-assembler-arm64.h
[modify] http://crrev.com/5b2ae9d9088d64a8060f87c315e066b973e7009e/src/regexp/bytecodes-irregexp.h
[modify] http://crrev.com/5b2ae9d9088d64a8060f87c315e066b973e7009e/src/regexp/ia32/regexp-macro-assembler-ia32.cc
[modify] http://crrev.com/5b2ae9d9088d64a8060f87c315e066b973e7009e/src/regexp/ia32/regexp-macro-assembler-ia32.h
[modify] http://crrev.com/5b2ae9d9088d64a8060f87c315e066b973e7009e/src/regexp/interpreter-irregexp.cc
[modify] http://crrev.com/5b2ae9d9088d64a8060f87c315e066b973e7009e/src/regexp/jsregexp.cc
[modify] http://crrev.com/5b2ae9d9088d64a8060f87c315e066b973e7009e/src/regexp/jsregexp.h
[modify] http://crrev.com/5b2ae9d9088d64a8060f87c315e066b973e7009e/src/regexp/mips/regexp-macro-assembler-mips.cc
[modify] http://crrev.com/5b2ae9d9088d64a8060f87c315e066b973e7009e/src/regexp/mips/regexp-macro-assembler-mips.h
[modify] http://crrev.com/5b2ae9d9088d64a8060f87c315e066b973e7009e/src/regexp/mips64/regexp-macro-assembler-mips64.cc
[modify] http://crrev.com/5b2ae9d9088d64a8060f87c315e066b973e7009e/src/regexp/mips64/regexp-macro-assembler-mips64.h
[modify] http://crrev.com/5b2ae9d9088d64a8060f87c315e066b973e7009e/src/regexp/regexp-macro-assembler-irregexp.cc
[modify] http://crrev.com/5b2ae9d9088d64a8060f87c315e066b973e7009e/src/regexp/regexp-macro-assembler-irregexp.h
[modify] http://crrev.com/5b2ae9d9088d64a8060f87c315e066b973e7009e/src/regexp/regexp-macro-assembler-tracer.cc
[modify] http://crrev.com/5b2ae9d9088d64a8060f87c315e066b973e7009e/src/regexp/regexp-macro-assembler-tracer.h
[modify] http://crrev.com/5b2ae9d9088d64a8060f87c315e066b973e7009e/src/regexp/regexp-macro-assembler.h
[modify] http://crrev.com/5b2ae9d9088d64a8060f87c315e066b973e7009e/src/regexp/x64/regexp-macro-assembler-x64.cc
[modify] http://crrev.com/5b2ae9d9088d64a8060f87c315e066b973e7009e/src/regexp/x64/regexp-macro-assembler-x64.h
[modify] http://crrev.com/5b2ae9d9088d64a8060f87c315e066b973e7009e/test/cctest/test-regexp.cc
[delete] http://crrev.com/2f7d6b46d07c91c86fc48a13c5495c60dbbd9ee6/test/mjsunit/harmony/regexp-lookbehind.js

Project Member Comment 9 by bugdroid1@chromium.org, Nov 17 2015
The following revision refers to this bug:
  https://chromium.googlesource.com/v8/v8.git/+/906903acb558723edf5bf87581c0c37183dc6c46

commit 906903acb558723edf5bf87581c0c37183dc6c46
Author: yangguo <yangguo@chromium.org>
Date: Tue Nov 17 13:33:03 2015

Experimental support for RegExp lookbehind.

R=erikcorry@chromium.org, littledan@chromium.org
BUG=v8:4545
LOG=N

Committed: https://crrev.com/37632606bbce1418238b13fd90cb6ef6705871cd
Cr-Commit-Position: refs/heads/master@{#32029}

Review URL: https://codereview.chromium.org/1418963009

Cr-Commit-Position: refs/heads/master@{#32043}

[modify] http://crrev.com/906903acb558723edf5bf87581c0c37183dc6c46/src/ast.cc
[modify] http://crrev.com/906903acb558723edf5bf87581c0c37183dc6c46/src/ast.h
[modify] http://crrev.com/906903acb558723edf5bf87581c0c37183dc6c46/src/bootstrapper.cc
[modify] http://crrev.com/906903acb558723edf5bf87581c0c37183dc6c46/src/flag-definitions.h
[modify] http://crrev.com/906903acb558723edf5bf87581c0c37183dc6c46/src/parser.cc
[modify] http://crrev.com/906903acb558723edf5bf87581c0c37183dc6c46/src/parser.h
[modify] http://crrev.com/906903acb558723edf5bf87581c0c37183dc6c46/src/regexp/arm/regexp-macro-assembler-arm.cc
[modify] http://crrev.com/906903acb558723edf5bf87581c0c37183dc6c46/src/regexp/arm/regexp-macro-assembler-arm.h
[modify] http://crrev.com/906903acb558723edf5bf87581c0c37183dc6c46/src/regexp/arm64/regexp-macro-assembler-arm64.cc
[modify] http://crrev.com/906903acb558723edf5bf87581c0c37183dc6c46/src/regexp/arm64/regexp-macro-assembler-arm64.h
[modify] http://crrev.com/906903acb558723edf5bf87581c0c37183dc6c46/src/regexp/bytecodes-irregexp.h
[modify] http://crrev.com/906903acb558723edf5bf87581c0c37183dc6c46/src/regexp/ia32/regexp-macro-assembler-ia32.cc
[modify] http://crrev.com/906903acb558723edf5bf87581c0c37183dc6c46/src/regexp/ia32/regexp-macro-assembler-ia32.h
[modify] http://crrev.com/906903acb558723edf5bf87581c0c37183dc6c46/src/regexp/interpreter-irregexp.cc
[modify] http://crrev.com/906903acb558723edf5bf87581c0c37183dc6c46/src/regexp/jsregexp.cc
[modify] http://crrev.com/906903acb558723edf5bf87581c0c37183dc6c46/src/regexp/jsregexp.h
[modify] http://crrev.com/906903acb558723edf5bf87581c0c37183dc6c46/src/regexp/mips/regexp-macro-assembler-mips.cc
[modify] http://crrev.com/906903acb558723edf5bf87581c0c37183dc6c46/src/regexp/mips/regexp-macro-assembler-mips.h
[modify] http://crrev.com/906903acb558723edf5bf87581c0c37183dc6c46/src/regexp/mips64/regexp-macro-assembler-mips64.cc
[modify] http://crrev.com/906903acb558723edf5bf87581c0c37183dc6c46/src/regexp/mips64/regexp-macro-assembler-mips64.h
[modify] http://crrev.com/906903acb558723edf5bf87581c0c37183dc6c46/src/regexp/regexp-macro-assembler-irregexp.cc
[modify] http://crrev.com/906903acb558723edf5bf87581c0c37183dc6c46/src/regexp/regexp-macro-assembler-irregexp.h
[modify] http://crrev.com/906903acb558723edf5bf87581c0c37183dc6c46/src/regexp/regexp-macro-assembler-tracer.cc
[modify] http://crrev.com/906903acb558723edf5bf87581c0c37183dc6c46/src/regexp/regexp-macro-assembler-tracer.h
[modify] http://crrev.com/906903acb558723edf5bf87581c0c37183dc6c46/src/regexp/regexp-macro-assembler.h
[modify] http://crrev.com/906903acb558723edf5bf87581c0c37183dc6c46/src/regexp/x64/regexp-macro-assembler-x64.cc
[modify] http://crrev.com/906903acb558723edf5bf87581c0c37183dc6c46/src/regexp/x64/regexp-macro-assembler-x64.h
[modify] http://crrev.com/906903acb558723edf5bf87581c0c37183dc6c46/test/cctest/test-regexp.cc
[add] http://crrev.com/906903acb558723edf5bf87581c0c37183dc6c46/test/mjsunit/harmony/regexp-lookbehind.js

Note that this is currently hidden behind the flag --harmony-regexp-lookbehind.
Project Member Comment 11 by bugdroid1@chromium.org, Nov 17 2015
The following revision refers to this bug:
  https://chromium.googlesource.com/v8/v8.git/+/e852c49eeae9b2140978d68e2cdd7e237feb6c5b

commit e852c49eeae9b2140978d68e2cdd7e237feb6c5b
Author: mbrandy <mbrandy@us.ibm.com>
Date: Tue Nov 17 19:40:48 2015

PPC: Experimental support for RegExp lookbehind.

Port 906903acb558723edf5bf87581c0c37183dc6c46

R=yangguo@chromium.org, joransiu@ca.ibm.com, jyan@ca.ibm.com, michael_dawson@ca.ibm.com
BUG=v8:4545
LOG=N

Review URL: https://codereview.chromium.org/1454783002

Cr-Commit-Position: refs/heads/master@{#32056}

[modify] http://crrev.com/e852c49eeae9b2140978d68e2cdd7e237feb6c5b/src/regexp/ppc/regexp-macro-assembler-ppc.cc
[modify] http://crrev.com/e852c49eeae9b2140978d68e2cdd7e237feb6c5b/src/regexp/ppc/regexp-macro-assembler-ppc.h

This is pretty much done, but still hidden behind a flag. Do we wait till the proposal advances or do we just stage it?
Can we switch it on and send a Canary out with it, then switch it off while we decide what to do?

Do we know what Mozilla are doing?
Not sure what the status of their implementation is. There's not a lot happening: https://bugzilla.mozilla.org/show_bug.cgi?id=1225665

I'm not sure what good a Canary would do. The code path won't be tested if nobody uses the lookbehind syntax.
I'd argue we should leave this flipped off until the proposal reaches at least Stage 2 (or maybe Stage 3) at TC-39. We don't know what kinds of subtle changes in semantics might come in the future, and we don't want to create an hazard where we might want to change the semantics that we ship later, as it could break websites.
Project Member Comment 16 by 76821325...@developer.gserviceaccount.com, Dec 14 2015
The following revision refers to this bug:
  https://chromium.googlesource.com/v8/v8.git/+/02633ddec1d410da5a5d8f7ea0fcfa572ad8820c

commit 02633ddec1d410da5a5d8f7ea0fcfa572ad8820c
Author: yangguo <yangguo@chromium.org>
Date: Mon Dec 14 11:28:58 2015

[harmony] stage regexp lookbehind assertions.

R=littledan@chromium.org, rossberg@chromium.org
BUG=v8:4545
LOG=Y

Review URL: https://codereview.chromium.org/1512253003

Cr-Commit-Position: refs/heads/master@{#32830}

[modify] http://crrev.com/02633ddec1d410da5a5d8f7ea0fcfa572ad8820c/src/flag-definitions.h

Project Member Comment 17 by 76821325...@developer.gserviceaccount.com, Dec 16 2015
The following revision refers to this bug:
  https://chromium.googlesource.com/v8/v8.git/+/0e8f233cc44e9ebadf562886db77ed6aa4245c2e

commit 0e8f233cc44e9ebadf562886db77ed6aa4245c2e
Author: yangguo <yangguo@chromium.org>
Date: Wed Dec 16 10:52:49 2015

[harmony] unstage regexp lookbehind assertions.

R=hablich@chromium.org
BUG=v8:4545
LOG=Y

Review URL: https://codereview.chromium.org/1524233003

Cr-Commit-Position: refs/heads/master@{#32889}

[modify] http://crrev.com/0e8f233cc44e9ebadf562886db77ed6aa4245c2e/src/flag-definitions.h

Project Member Comment 18 by 76821325...@developer.gserviceaccount.com, Dec 21 2015
The following revision refers to this bug:
  https://chromium.googlesource.com/v8/v8.git/+/cd3054bfa6f413550d6149b6322f37255dc1b110

commit cd3054bfa6f413550d6149b6322f37255dc1b110
Author: yangguo <yangguo@chromium.org>
Date: Mon Dec 21 13:54:04 2015

[harmony] stage regexp lookbehind assertion.

R=hablich@chromium.org
BUG=v8:4545
LOG=N

Review URL: https://codereview.chromium.org/1537273004

Cr-Commit-Position: refs/heads/master@{#32987}

[modify] http://crrev.com/cd3054bfa6f413550d6149b6322f37255dc1b110/src/flag-definitions.h

Labels: Priority-2
Sign in to add a comment