New issue
Advanced search Search tips
Note: Color blocks (like or ) mean that a user may not be available. Tooltip shows the reason.

Issue 739243 link

Starred by 7 users

Issue metadata

Status: Fixed
Owner:
Closed: Jan 2018
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: All
Pri: 1
Type: Bug



Sign in to add a comment

simple_test.mht is being checked out with local changes (due to CRLF line endings)

Project Member Reported by mgiuca@chromium.org, Jul 5 2017

Issue description

Something is causing git to (seemingly at random) decide to change the line endings of simple_test.mht and leave it as a local uncommitted change, when a git checkout jumps backwards over r483988.

Trying to do any subsequent git operations (such as check out a branch or rebase) results in:
error: Your local changes to the following files would be overwritten by checkout:
	third_party/WebKit/Source/web/tests/data/mhtml/simple_test.mht
Please commit your changes or stash them before you switch branches.
Aborting

This is causing sporadic patch failures on bots and failures using tools that repeatedly check out and rebase across the period in question (e.g., git rebase-update). Can be worked around by running git reset --hard, but the problem keeps recurring.

I filed this as https://bugs.chromium.org/p/gerrit/issues/detail?id=6641 on Gerrit but clearly this is not a Gerrit issue any more.

r483988 (d438b5025423053aa58b053c1d3edf4ea58c89a3) (joelhockey@) renames third_party/WebKit/Source/web/tests/data/mhtml/shadow.mht to third_party/WebKit/Source/core/testing/data/mhtml/shadow.mht without changing the file contents at all (blob hash is cf0d82f15e30). That file has DOS / CRLF line endings. It also deletes third_party/WebKit/Source/web/tests/data/mhtml/.gitattributes, which formerly instructed git to change Unix line endings to DOS upon checkout.

When I go from a *new* checkout (after r484160) to an *old* checkout (before r483988), there is an *apparently random chance* that Git will leave the local version of third_party/WebKit/Source/web/tests/data/mhtml/simple_test.mht on f0afef3828d0 (with Unix line endings), which shows up in "git diff" as follows:

$ git diff
diff --git a/third_party/WebKit/Source/web/tests/data/mhtml/simple_test.mht b/third_party/WebKit/Source/web/tests/data/mhtml/simple_test.mht
index cf0d82f15e30..f0afef3828d0 100644
--- a/third_party/WebKit/Source/web/tests/data/mhtml/simple_test.mht
+++ b/third_party/WebKit/Source/web/tests/data/mhtml/simple_test.mht
[full diff follows]

I believe this is a race condition in git due to the fact that r483988 both deletes the .gitattributes file and moves a file with CRLF line endings. I can't explain it but it is definitely a bug in git. It's possible that r484160 is a red herring that neither caused any additional problems, nor fixed the problem.

r484160 (2f82d84f2066dc1323b03b98e97bc34f89804bc6) (jbroman@) attempted to fix the problem by converting the file to Unix line endings, but evidently didn't. That's unsurprising, because the uncommitted local change is present when you have an *old* revision checked out, not a new one (so it doesn't matter what the line endings are of the new version of the file).

There isn't much we can do, I think. But here's a write-up of the problem anyway.
 
I am trying to replicate this in a fresh repo but getting even weirder results:

(From a fresh directory)
git init
mkdir dir1 dir2
echo -en "abc\r\ndef\r\nghi\r\n" > dir1/file.txt
git add dir1/file.txt 
git commit -m "Added file.txt to dir1. Has CRLF baked in."
echo -e "* text=auto\n*.txt text eol=crlf" > dir1/.gitattributes
cp dir1/.gitattributes dir2
git add dir{1,2}/.gitattributes 
git commit -m "Added .gitattributes to both directories."
git rm dir1/.gitattributes 
git mv dir1/file.txt dir2/
git commit -m "Moved file.txt to dir2, and deleted dir1 .gitattributes."

Now this is equivalent to Chrome's r483988, but *much weirder*. Now git is in a super weird state where *no matter what I do*, dir2/file.txt is "modified" (even though it has DOS line endings, git thinks it should have Unix line endings and won't let me do anything until I commit it). git reset --hard has no effect; there are still local changes. OK, so I'll commit it then.

git commit -am "Made dir2/file.txt have Unix line endings inside the repo."

Now this is equivalent to Chrome's r484160 (after the line endings have been fixed in the repo).

git checkout HEAD^^

Now this is equivalent to pre-r483988 (i.e., the way it's been for years), but *much weirder*. We are back to a state where there's a modified file and git won't let me move until I commit the changes. I was hoping to reproduce the very obscure bug found above but I've gotten my test repo into an inexplicably weird state. If Chrome had gotten into this state, nobody would've been able to do anything until for the past few years until it was fixed. I've got no idea how my small repo is exhibiting this behaviour (I've set up the attributes exactly the same as Chrome as far as I can tell).

TL;DR: It's a really bad idea to have a revision control system that pretends files are different to what they actually are.
Status: WontFix (was: ExternalDependency)
OK, I am done with this. I've spent hours on it and have a few theories and a few partial solutions but my testing has shown that these issues can still trigger. I came up with a complicated plan but since I can't be 100% my plan won't trigger the same issues again, I'm going to scrap it and just send an email to chromium-dev to prepare people for pain.

Basically the repo before r484160 was in a dodgy state (having a .gitattributes file with crlf ALONGSIDE a file that has CRLFs in it). Syncing over that point has a chance to cause unmodified changes. Any changes I make to try and mitigate this require recreating some of that dodgy state which has a chance to cause further changes.

WontFix.

=== BELOW THIS LINE IS MY ORIGINAL PLAN WHICH I AM NOT GOING TO DO ===

I tried restoring just web/tests/data/mhtml/.gitattributes but that didn't seem to have an effect. I am pretty sure that if I restore the old .gitattributes *and* simple_test.mht (in its original state) then that will not cause any trouble because it means both of these files will not have changed between r483987 and whenever I land my CL.

But then deleting those two files again will likely cause the same issue.

So here's my plan:

1. Land a CL that restores web/tests/data/mhtml/{.gitattributes,simple_test.mht} from r483987 (i.e., with CRLF line endings).
2. A week later (after the majority of people have synced past that), change web/tests/data/mhtml/simple_test.mht to Unix line endings, to put the repo into a sane state.
3. A week after that, delete both of these files. It should finally be safe to delete them since there is no CRLF mismatch.

It turns out this doesn't work... Step 2 is flawed because syncing *backwards* in time over Step 2 will recreate a dodgy state with a chance to leave unmodified files.

Please never ever use .gitattributes to change line endings. It is a terrible misfeature that has a million edge cases and there is a very simple solution which is to simply check in a file with CRLF line endings into the repo.
Cc: iannucci@chromium.org estaab@chromium.org aga...@chromium.org
Status: Available (was: WontFix)
Re-opening; I want folks in infra to look at this.
Labels: Infra-Troopers
Cc: ehmaldonado@chromium.org
 Issue 739372  has been merged into this issue.
Labels: -Infra-Troopers -Pri-2 Pri-1

Comment 7 by pdr@chromium.org, Jul 5 2017

About two weeks ago I had an issue with this file too (filed https://crbug.com736613#c3). I had a local modification and could not undo it. I ended up fixing it by deleting the file and then undoing my local change using git.

I talked with agable about this and he mentioned that we had tried turning off auto crlf a while back but couldn't at the time. I wonder if we could revisit the decision to have different line endings in the repo? For example, it would be possible to rewrite this test using a server that emits the special endings.
 Issue 738908  has been merged into this issue.
"Simple" solution, as pdr suggests:
1) Fix all our tests to not require CRLF line endings, then make sure every file in the repo only uses LF.

"Complex" solution, all on the infra side:
1) Remove core.autocrlf from all developer machines (this is hard, many people probably have it set in their ~/.gitconfig, and use it for non-chromium projects).
2) Commit all files to the repository with the line endings they're supposed to have: LF in nearly all cases, CRLF for a few weird windows-specific files.
3) Add a PRESUBMIT which looks to see if you're uploading a diff which includes changing line endings, and warning that that's probably not what you want to do.
Can we solve this by requiring core.autocrlf=false in src.git (and another other repo that might be affected)?
No, because then we get the opposite problem: people who use IDEs on Windows accidentally committing CRLF line endings to normal source files.
Hm. Okay, well, you probably need your 2) and 3) then. Is "force core.autocrlf=false" not the right answer to "remove core.autocrlf from all developer machines" then (i.e., is that the same thing)? 
Ah, I see what you mean. Yes, setting "core.autocrlf false" would be a solution to (1), but I wouldn't be willing to do that without (2) and (3) at the same time.


And I'd prefer the other solution anyway.
Owner: qyears...@chromium.org
I'm not sure that "make sure every file in the repo only uses LF" is viable, but it is something we could at least look into.

qyearsley@, I think Blink is mostly likely prone to this sort of problem. Can you investigate whether we can enforce LF-only for the w3c imports and other layout tests?
Owner: aga...@chromium.org
Status: Assigned (was: Available)
Actually, let's make the "make sure every file in the repo only uses LF" thing a separate task, rather than overloading this bug.

@agable - can you explain what is happening such that we're ending up with what look like uncommitted changes that we can't undo? I'm still not following that part, and it seems like your comments in #9 are trying to prevent something weird from happening rather than fixing the weirdness. Is there actually a bug or a race in git here? Or something else in the core service that isn't working correctly? Or is git somehow behaving "correctly" here such that there's nothing we can fix and all we can do is prevent the weirdness?
#15 There is definitely a bug / race in git because I am seeing it randomly (I ran dozens if not hundreds of manual trials yesterday on git while diagnosing this issue).

It might be worth looking at this as an upstream bug, but it's very complex and hard to even reproduce let alone track down a culprit (see #1). I think avoiding automatic line endings changes is a better approach than trying to fix git.
mgiuca@, thanks. I have no problem trying to *also* figure out better mechanisms for consistently handling line endings, but I do think we can and should also try to get git fixed if need be.
So can anyone on this thread explain the discrepancy between the Chromium repo and a fresh one, as described in #1?

(Specifically that if I set up a fresh repo exactly like that directory in Chromium *before* joelhockey's patch, it immediately shows the CRLF version of the file to have uncommitted changes, whereas in Chromium it's been in that state without annoying anyone for ages.)
Pre-r483988, there was a .gitattributes file with eol=crlf in the directory the file moved from, so \r\n wasn't in the git store (i.e. it was normalized), I believe.
#19, no, the bug was caused by the fact that there was *both* a .gitattributes with eol=crlf AND a file with \r\n in the git store. As evidenced by:

# Before joelhockey's change:
$ git ls-tree d438b5025423053aa58b053c1d3edf4ea58c89a3^ -- third_party/WebKit/Source/web/tests/data/mhtml/simple_test.mht
100644 blob cf0d82f15e306b2cd6759e092d5703683f6df1bd	third_party/WebKit/Source/web/tests/data/mhtml/simple_test.mht
# After joelhocket's change:
$ git ls-tree d438b5025423053aa58b053c1d3edf4ea58c89a3 -- third_party/WebKit/Source/core/testing/data/mhtml/simple_test.mht
100644 blob cf0d82f15e306b2cd6759e092d5703683f6df1bd	third_party/WebKit/Source/core/testing/data/mhtml/simple_test.mht
# After jbroman's change:
$ git ls-tree 2f82d84f2066dc1323b03b98e97bc34f89804bc6 -- third_party/WebKit/Source/core/testing/data/mhtml/simple_test.mht
100644 blob f0afef3828d090806408eee335b753f17c8361cf	third_party/WebKit/Source/core/testing/data/mhtml/simple_test.mht

cf0d82f1 is the blob with \r\n while f0afef38 is the blob with \n line endings. I believe the entire issue stems from the fact that *before* joelhockey's change, this file had CRLF line endings AND a .gitattributes intended to create a CRLF view on a LF file.

Comment 21 by wyatta@google.com, Jul 6 2017

 Issue gerrit:6654  has been merged into this issue.
Owner: dpranke@chromium.org
There are still 11 .gitattributes files in the repo. They all seem to exist solely to specify that certain html, svg, and mht files should not have their line endings normalized when checked in to the repo. They all consist of lines like "path/to/file -crlf". Confusingly, "negative crlf" means "don't normalize line endings, therefore letting this file retain crlf, unless it doesn't have crlf already, in which case who cares".

Note that this is *different* from what the .gitattributes file in this particular case was doing. It was setting "eol=crlf", which explicitly replaces \lf with \cr\lf upon checkout.

I think we should do the following:
1) Remove any remaining instances of "eol=crlf", because it causes problems in linux and in files with mixed line-endings (like MHTML files can have)
2) Split all of our .gitattributes files such that none of them have any slashes in the filenames they specify -- this way the .gitattributes files will be adjacent to the files they modify. It will result in many more files, yes, but hopefully also fewer mistakes where a file is moved/renamed but the corresponding gitattributes aren't updated as well.

Dirk, WDYT?
Owner: aga...@chromium.org
Maybe we should aim for removing all of the gitattribute files and just fixing the files (and associated tests) instead?

Comment 24 by sdy@chromium.org, Jul 11 2017

I dealt with something like this a while ago. Does the commit message in r425694 help at all?
#24 It seems like that CL put those files into the correct state (being checked in as LF but checked out as CRLF). That's pretty much the same as jbroman's r484160.

The problem though isn't the state the repo was left in afterwards, but the transition over that point (when all the bots and developers sync). I think you didn't run into this issue because it only happened due to the file being moved while having CRLF endings.

Comment 26 by sdy@chromium.org, Jul 11 2017

#25 Ah, OK. So, the only thing I think it (might?) add is why `git reset --hard` works: it just blows away the index.
Status: Fixed (was: Assigned)
.gitattributes files in the repo have been normalized and now have a sane set of settings.

Sign in to add a comment