New issue
Advanced search Search tips

Issue 900121 link

Starred by 3 users

Issue metadata

Status: Fixed
Owner:
Closed: Nov 5
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: Windows
Pri: 1
Type: Bug-Regression



Sign in to add a comment

Regression: Unable to install unsigned Canary Build #72.0.3596.0 (64 Bit Clang) for Windows OS

Reported by aiman.an...@etouch.net, Oct 30

Issue description

Chrome Version:  72.0.3596.0 (Official Build) Revision 0b292963c513fc540a19aef5251a8b28a68f6f6f-refs/branch-heads/3596@{#1}(64-bit).
 	
OS: Win(7,8,10).
       
What steps will reproduce the problem?
1. Download latest unsigned Canary Build #72.0.3596.0 (64 Bit Clang) for Windows OS.
2. Try to install 'mini_installer.exe' and observe.

Actual Result: Unable to install unsigned Canary Build #72.0.3596.0 (64 Bit Clang) for Windows OS

Expected Result: Should be able to install unsigned Canary Build #72.0.3596.0 (64 Bit Clang) for Windows OS

This is a regression issue, broken in M-72, and will soon update other info:

Good Build: 72.0.3595.0
Bad Build: 72.0.3596.0

Note: 
1. This issue is only reproducible with windows unsigned build Clang (64 Bit).

  

 
Actual Result.mp4
864 KB View Download
Expected Result.mp4
776 KB View Download
Cc: grt@chromium.org
Components: -Internals>Installer Internals>PlatformIntegration
Unable to repro. Based on the error dialog (chrome.exe in the title), it appears that the problem is with launching chrome.exe, not with the installer.

Could you confirm that you see the same error when you double-click on the new "Google Chrome" desktop shortcut?

0xC0000034 is STATUS_OBJECT_NAME_NOT_FOUND. The last time I saw this, it was a result of some interesting caching in Windows. It would be interesting to see if Chrome launches successfully after a system restart. Could you check this?
Hi,

Yes, able to see the same error when double-clicking on the new "Google Chrome" desktop shortcut. (Please refer screen-cast for reference)

Note:
1. Able to reproduce the issue only one first instance (when downloaded the build and trying to launch).
2. After restarting the machine the problem resolves and i can install/launch chrome successfully without any error.

Thank You!

72.0.3596.0_Behaviour.mp4
982 KB View Download
Are you still able to reproduce the problem on any machine? If so, what steps are needed? Did you previously install and then uninstall, for example?
Hi,

After restarting machine, issue is no longer at our end and now we are able to launch canary.
Status: WontFix (was: Unconfirmed)
I'll close this out for now. Please re-open if you're able to reproduce this consistently. Thanks.
Cc: thakis@chromium.org
Components: -Internals>PlatformIntegration Build
Owner: r...@chromium.org
Status: Assigned (was: WontFix)
I was just able to repro this. +thakis and rnk this is caused by recent advancements in deterministic builds. We are confusing the Windows loader by using the same timestamp on all builds. I resolved this same error back in r550532 by explicitly updating the last-modified time of files in a particular build step. I believe that this practice of using the same timestamp for all builds is at least one cause of recent flakiness in the test_installer and mini_installer_tests build steps. Please address asap, as this is messing with trybots and the CQ.

Read on for more details.

I see:

---------------------------
chrome.exe - Application Error
---------------------------
The application was unable to start correctly (0xc0000034). Click OK to close the application. 
---------------------------
OK   
---------------------------

and windbg tells me:

SXS: Unable to open assembly directory under storage root "C:\Program Files (x86)\Chromium\Application\72.0.3602.0\"; Status = 0xc0000034
SXS: RtlGetAssemblyStorageRoot() unable to resolve storage map entry.  Status = 0xc0000034

chrome.exe is 3603 in my case. There is no occurrence of 3602 in chrome.exe itself. I've confirmed that the manifest says 3603, so it's confusing as to why the loader is (presumably) trying to find chrome_elf.dll in the 3602 directory. I had previously installed a build of 3602, so this explains why that particular version was picked, but not why the loader is confused. A reboot makes it go away until I uninstall and reinstall other versions.

So there's something in the loader that thinks that same-sized files are the same, despite them not. This could be caused by them having the same timestamp, but why would that be the case? These were binaries build on different machines -- 3602 came from https://chromium-swarm.appspot.com/task?id=40f8c689a8124b10&refresh=10&show_raw=1 whereas 3603 was built just now on my local machine. How could they possibly have been coincidentally built at exactly the same time?
We're not using the same timestamp for all builds. For official builds, we use a different timestamp every day. Every canary should have its own timestamp (as long as we don't do more than one canary / day, which as far as I know we aren't doing).

(for non-official builds, we change the timestamp only once per month.)

See the lengthy comment at the top of https://cs.chromium.org/chromium/src/build/compute_build_timestamp.py?q=file:timestamp.*%5C.py+file:build&sq=package:chromium&dr for details.

Is the revert of https://chromium-review.googlesource.com/c/chromium/src/+/1317569 speculative? Because that CL only changed the timestamps of files in the .7z file, which shouldn't faze the loader much.

But if you're sure that's the CL to blame, we could change mini_installer to update the mtimes of all files it unpacks after unpacking as a workaround.
The files in the mini_installer from two builds on the same day built at two revs will have the same timestamp, correct? If it happens that builds across a bump in the BUILD # in the VERSION file are built on the same day, they will have the same timestamp. If a given swarming bot runs one of the mini_installer tests from a build before the BUILD # bump, tests on that same bot after the BUILD # bump will fail. You'll see this 0xc0000034 error because the loader sees that chrome.exe has the same size and timestamp, so it assumes its dependencies haven't changed and will try to load chrome_elf.dll from the pre-bump version directory.

I'm not certain that every test failure is the result of this timestamp issue, but I'm certain that this particular bug here is.

Ah, if that revert only changes the time of the .7z file, then it's the wrong change. What is it that causes chrome.exe *in* the .7z file to have a fixed timestamp?

I don't think that the mini_installer should touch all files after it extracts them. This will add I/O to every install/update of Chrome on every client machine.
Doesn't that CL set the timestamp of files within the archive (the new switch is documented as "If set, used as timestamp for all files in the archive.")?
Apologies, many newbie questions follow.

First: Is this bug about official builds or non-official builds?

If about official: Do we run offically-built tests on swarming? Do we do more than one official build per day that we try to install? Either would be news to me (which doesn't mean it doesn't happen, it only means my assumptions were wrong).

If about non-official: So this bug is "only" about test failures on swarming, not something users see?


Do we bump VERSION more than once per 24h window? If not, can the scenario you describe happen?

The script currently takes the date of build/util/LASTCHANGE (https://cs.chromium.org/search/?q=build/util/LASTCHANGE&sq=package:chromium&type=cs) and rounds that to one day (official) or one month (unofficial). Since that file changes on every commit, that seemed like a better file to use than VERSION.


> You'll see this 0xc0000034 error because the loader sees that chrome.exe has the same size and timestamp, so it assumes its dependencies haven't changed and will try to load chrome_elf.dll from the pre-bump version directory.

Can you say a few more words about this? How does the loader pick a different directory based on size/timestamp? I understand how them not changing means that it keeps caches around, but how does it know to look in a different directory on cache miss?


> I don't think that the mini_installer should touch all files after it extracts them. This will add I/O to every install/update of Chrome on every client machine.

That's minuscule to actually unpacking, right? And we're talking only a handful (order of 10) files, we're not shipping component builds. Updating 10 timestamps is pretty fast.

> Ah, if that revert only changes the time of the .7z file, then it's the wrong change. What is it that causes chrome.exe *in* the .7z file to have a fixed timestamp?

That's several months old. If this is about official builds, the simplest workaround until I'm back is to tweak https://cs.chromium.org/chromium/src/build/compute_build_timestamp.py?q=file:timestamp.*%5C.py+file:build&sq=package:chromium&dr&l=57 to change build times more often than once per day.
I didn't see your comment 9 until now (I was busy writing comment 10).

On second thought, you're probably right. Let's try landing that revert you queued up and see if it helps, and then I can try to come up with something better when I'm back. (I think making mini_installer.exe update mtimes after extracting is something better, but we can discuss that then :-) ). I'd still appreciate answers to my questions in comment 10 at some point, but no rush.
Another question: What does "latest unsigned Canary Build" mean? How is it different from a regular canary build? Just didn't go through code signing on kokoro? How often do we build these unsigned builds? Do we ever ship them to anywhere?
http://crrev.com/550532 suggests that the loader uses the on-disk metadata mtime, not the mtime stored in the executable. If that's the case, then I buy that the revert https://chromium-review.googlesource.com/c/chromium/src/+/1317569 helps. (It changes the mtime in the 7z which affects the on-disk mtime, but it doesn't affect the timestamp in the PE/COFF headers of the executables -- but it sounds like the loader cares about the actual file mtime, not the pe/coff timetamp?)

I'll shut up now, sorry about all the bug comment spam :-)
Hi Nico. I'll answer to the best of my ability:

I believe you can reproduce the issue described in this bug by the following:
- download the 72.0.3592.2 win64-clang build from the unsigned archive
- install it
- run it
- uninstall it
- download the 72.0.3596.0 win64-clang build from the unsigned archive
- install it
- run it
- kablewie

Both of these builds have a chrome.exe that is 1588224 bytes and was modified on Oct 29 at 6:00:00 (on my machine according to explorer.exe).

This is not a problem for signed builds since our signing step explodes the archive, signs the PE files, then packages everything up again. I imagine therefore (but haven't yet confirmed) that the signed builds are okay.

This same problem exists for Chromium waterfall builds as I described earlier -- any change in BUILD # that doesn't also correspond to a change in either chrome.exe's size or its timestamp will trigger the failure to launch when the mini_installer tests from two builds across the BUILD # bump are run on the same machine without an intervening reboot.

So what's the exact nature of the failure? I learned by experimentation that Windows appears to cache a .exe's dependencies, presumably for faster process startup. This cache seems to be keyed on size+timestamp. When Windows loads up chrome.exe, it discovers via its manifest that it should look in the version dir to find dependent modules; chrome_elf.dll being the critical one here. After the initial load, it doesn't appear to actually check the manifest, it just jumps straight to loading W.X.Y.Z\chrome_elf.dll. I ran into this when mmeade@ was working on updating these tests so that they'd launch chrome.exe to verify that it actually worked. In that case, it was happening because we manufacture a different-versioned chrome by editing a mini_installer. Without explicitly updating the timestamps of files, the install of a second version would always trigger this problem.
I just looked at mini_installer test failures on one of the bots, and they start in the first build that includes a chrome/VERSION change to bump the BUILD and then disappear a few runs later. This convinces me that the timestamp thing was causing the test flakiness.

Could you make the timestamp somehow dependent on the commit hash or commit position listed in LASTCHANGE so that it's deterministic but distinct for different LASTCHANGE values?
Status: Fixed (was: Assigned)
Calling this fixed since the revert is in.
can the timestamp be made a function of the chrome version?

Sign in to add a comment