New issue
Advanced search Search tips

Issue 856635 link

Starred by 1 user

Issue metadata

Status: Fixed
Owner:
Closed: Sep 10
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: Windows
Pri: 3
Type: Bug

Blocking:
issue 598772



Sign in to add a comment

ToTWinCFI bot unhappy

Project Member Reported by thakis@chromium.org, Jun 26 2018

Issue description

Comment 1 by p...@chromium.org, Jun 26 2018

Something weird is going on with ToTWinThinLTO64, too. Looks like the current build:
https://ci.chromium.org/buildbot/chromium.clang/ToTWinThinLTO64/1428

has been running for 12 *days*  (presumably one of the link processes has hung).

I checked on my Windows machine a few days ago which was running builds in a loop with r334630 applied and noticed that it had hung as well. I didn't see the "Can't get a temporary file" error, though.

Comment 2 by thakis@chromium.org, Jun 26 2018

Oh, I hadn't noticed the 64-bit bot was in that state too. I had filed bug 856166 for restarting the 32-bit one since it had been hanging for 12 days too. It hangs less long now that it's been rebooted, so maybe that days-long hang was some since-fixed thing.
The "Can't get a temporary file" error goes together with a "Permission denied" error. One thing that might cause this is trying to open a file that is in the process of being deleted (https://docs.microsoft.com/en-us/windows/desktop/api/fileapi/nf-fileapi-createfilea says you will get ERROR_ACCESS_DENIED in such cases).

The code in llvm/lib/Support/Path.cpp retries on errc::file_exists, but errc::permission_denied is passed right through.

We may relatively easily end up in this situation. The model we use for creating the temporary file name only uses 6 hexadecimal digits, which means we have a greater than 50% chance of generating a duplicate name after 5000 names or so. We easily generate that many in a single build.

Just generating a duplicate temporary name is not enough, of course. We also have to attempt to create the second file after the first one has been scheduled for deletion, but before that deletion has actually completed. I suspect this is a matter of putting enough load on the filesystem.
Blocking: 598772
Determined locally that the permission denied error occurs as soon as an attempt is made to create a temporary file with a name that was already created in the same run.

If all file names that fit the model already exist before the program is run, createUniqueEntity will loop forever.
Owner: inglorion@chromium.org
I'll send one or more patches to improve this.
> If all file names that fit the model already exist before the program is run, createUniqueEntity will loop forever.

Is that really what's happening? That would mean that over 16 million temporary files must already exist.

I wonder whether the issue is related to our use of the Windows crypto APIs in a slightly unusual way (by creating and tearing down a context for every byte): http://llvm-cs.pcc.me.uk/lib/Support/Windows/Process.inc#451

Maybe that's enough to make it more deterministic.
> > If all file names that fit the model already exist before the program is run, createUniqueEntity will loop forever.

> Is that really what's happening?

I don't know if that is what the bot was doing for 12 hours, but it's something I observed locally that seems worth fixing.
So you saw the 16 million files locally? Interesting, I wonder how that happened.
Not 16 million; I tested with a smaller pattern. The problems can be reproduced with just a single %.
This should be fixed by Clang rev 338745.
Actually, I've been looking at this as just the "permission denied" problem, but we also had the 12-hour and 12-day builds problem, which my change does not address. I'll look into that next.
Thanks @pcc for reminding me about the long/stuck builds.
Status: Fixed (was: Untriaged)
I haven't seen any long/stuck builds since at least the end of August. The compile step has been succeeding. Tests are failing, but many of those are flaky tests that also affect the non-LTO bots, and others have been addressed on separate bugs. I think it's time to close this bug.

Sign in to add a comment