ToTWinCFI bot unhappy |
||||
Issue descriptionLast 4 builds: Had two failures like so: https://logs.chromium.org/v/?s=chromium%2Fbb%2Fchromium.clang%2FToTWinCFI%2F1283%2F%2B%2Frecipes%2Fsteps%2Fcompile%2F0%2Fstdout LLVM ERROR: ThinLTO: Can't get a temporary file Had two build timeouts after 12h. disk isn't very full: https://viceroy.corp.google.com/chrome_infra/Machines/per_machine?hostname=win109-c1&duration=1d&refresh=-1&utc_end=1529962212.873&utm_source=shortn&utm_medium=inorganic&utm_campaign=shortn#_VG_PDeuFRL9
,
Jun 26 2018
Oh, I hadn't noticed the 64-bit bot was in that state too. I had filed bug 856166 for restarting the 32-bit one since it had been hanging for 12 days too. It hangs less long now that it's been rebooted, so maybe that days-long hang was some since-fixed thing.
,
Jul 28
The "Can't get a temporary file" error goes together with a "Permission denied" error. One thing that might cause this is trying to open a file that is in the process of being deleted (https://docs.microsoft.com/en-us/windows/desktop/api/fileapi/nf-fileapi-createfilea says you will get ERROR_ACCESS_DENIED in such cases). The code in llvm/lib/Support/Path.cpp retries on errc::file_exists, but errc::permission_denied is passed right through. We may relatively easily end up in this situation. The model we use for creating the temporary file name only uses 6 hexadecimal digits, which means we have a greater than 50% chance of generating a duplicate name after 5000 names or so. We easily generate that many in a single build. Just generating a duplicate temporary name is not enough, of course. We also have to attempt to create the second file after the first one has been scheduled for deletion, but before that deletion has actually completed. I suspect this is a matter of putting enough load on the filesystem.
,
Jul 31
,
Jul 31
Determined locally that the permission denied error occurs as soon as an attempt is made to create a temporary file with a name that was already created in the same run. If all file names that fit the model already exist before the program is run, createUniqueEntity will loop forever.
,
Jul 31
I'll send one or more patches to improve this.
,
Jul 31
> If all file names that fit the model already exist before the program is run, createUniqueEntity will loop forever. Is that really what's happening? That would mean that over 16 million temporary files must already exist. I wonder whether the issue is related to our use of the Windows crypto APIs in a slightly unusual way (by creating and tearing down a context for every byte): http://llvm-cs.pcc.me.uk/lib/Support/Windows/Process.inc#451 Maybe that's enough to make it more deterministic.
,
Aug 1
> > If all file names that fit the model already exist before the program is run, createUniqueEntity will loop forever. > Is that really what's happening? I don't know if that is what the bot was doing for 12 hours, but it's something I observed locally that seems worth fixing.
,
Aug 1
So you saw the 16 million files locally? Interesting, I wonder how that happened.
,
Aug 1
Not 16 million; I tested with a smaller pattern. The problems can be reproduced with just a single %.
,
Aug 2
This should be fixed by Clang rev 338745.
,
Aug 2
Actually, I've been looking at this as just the "permission denied" problem, but we also had the 12-hour and 12-day builds problem, which my change does not address. I'll look into that next.
,
Aug 2
Thanks @pcc for reminding me about the long/stuck builds.
,
Sep 10
I haven't seen any long/stuck builds since at least the end of August. The compile step has been succeeding. Tests are failing, but many of those are flaky tests that also affect the non-LTO bots, and others have been addressed on separate bugs. I think it's time to close this bug. |
||||
►
Sign in to add a comment |
||||
Comment 1 by p...@chromium.org
, Jun 26 2018