New issue
Advanced search Search tips

Issue 907488 link

Starred by 1 user

Issue metadata

Status: Assigned
Owner:
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: ----
Pri: 3
Type: Bug

Blocking:
issue 899438



Sign in to add a comment

isolated files contain the name of the build directory

Project Member Reported by thakis@chromium.org, Nov 21

Issue description

(spun off from https://bugs.chromium.org/p/chromium/issues/detail?id=899438#c12 - c15):

turns out the .isolated files themselves are ironically not build-directory independent.

accessibility_unittests.isolated                                                                                                                                 : DIFFERENT (unexpected): 
  - files:
    - out\Release\VkICD_mock_icd.dll.pdb:  {u'h': u'9eb10e1b4fe15cbcdf3019324bec1b89a62578f7', u's': 12070912, u'm': 438} != None
    - out\Release.2\VkICD_mock_icd.dll.pdb:  None != {u'h': u'9eb10e1b4fe15cbcdf3019324bec1b89a62578f7', u's': 12070912, u'm': 438}
  - relative_cwd:  out\Release != out\Release.2



I suppose we could replace the build dir with some fixed string at isolate generation time? i.e. s:out/whatever:out/fixedbuilddir: ...somewhere (swarming code itself? mb?)


I know that at least the telemetry tests need the build dir to be called "Release" in release builds (https://bugs.chromium.org/p/chromium/issues/detail?id=831145#c26), but that's confusing and a bug we need to fix anyhow. So making isolated files builddir-independent would be a good motivation for fixing things like this.
 
Cc: mar...@chromium.org
Components: Infra>Platform>Swarming
Labels: -OS-Mac
Project Member

Comment 2 by bugdroid1@chromium.org, Nov 21

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/dda84a41a0fde29418aef182419a471ba6c3c7b6

commit dda84a41a0fde29418aef182419a471ba6c3c7b6
Author: Nico Weber <thakis@chromium.org>
Date: Wed Nov 21 14:59:15 2018

compare_build_artifacts.py: Add workaround for .isolated files containing the name of the build dir.

Bug: 899438,907488
Change-Id: Ic958d09e1fd4fc8a34f4f33fc9bfb9e3bc5d6a54
Reviewed-on: https://chromium-review.googlesource.com/c/1346670
Commit-Queue: Nico Weber <thakis@chromium.org>
Reviewed-by: Hans Wennborg <hans@chromium.org>
Cr-Commit-Position: refs/heads/master@{#610051}
[modify] https://crrev.com/dda84a41a0fde29418aef182419a471ba6c3c7b6/tools/determinism/compare_build_artifacts.py

I feel the current behavior is:
- clear out
- build in out\Release
- build in out\Release.2

I thought the script was initially doing:
- clear out
- build in out\Release
- rename out\Release to out\Release.2
- build in out\Release

The later would fix this problem.
I'm changing the behavior to the current behavior in issue 899438.

The original way of doing things didn't verify that build artifacts are independent of the name of the build dir (which is super important for goma cacheability, and it's kind of important for swarming cacheability).
Project Member

Comment 5 by bugdroid1@chromium.org, Nov 22

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/tools/build/+/00e5353103a6dca9622e819e48138152fded70fc

commit 00e5353103a6dca9622e819e48138152fded70fc
Author: Nico Weber <thakis@chromium.org>
Date: Thu Nov 22 14:33:14 2018

Deterministic bots: Stop running api.isolate.isolate_tests().

Now that we're on gn, mb writes the .isolate file which lists all files in it,
and isolate_tests() then takes that .isolate file to create a .isolated file
containing hashes of all binaries, and names of hidden .tar archives that
have content-addressed names with all the contents of these files.

Creating the .isolated files is pretty slow (all build outputs are be
zipped up), the .isolated files contain the name of the build dir which
means it must be filtered out, and since the tar file names are content-addressed
those would have to be filtered out at compare time too, and the .isolated
files don't buy us anything we don't already get from the .isolate files
(compare_build_artifacts.py gets its file list from the .isolate files, not
from the .isolated files).

So stop running this step. It will make the bots cycle faster, and until
.isolated files are builddir-name independent it allows removing some
workarounds from compare_build_artifacts.py.

Bug: 907488,899438, 876915 
Change-Id: I416d5966755a1d199842bcdd94b80aed156b5594
Reviewed-on: https://chromium-review.googlesource.com/c/1347346
Commit-Queue: Nico Weber <thakis@chromium.org>
Reviewed-by: Marc-Antoine Ruel <maruel@chromium.org>
Reviewed-by: Hans Wennborg <hans@chromium.org>

[modify] https://crrev.com/00e5353103a6dca9622e819e48138152fded70fc/scripts/slave/recipes/swarming/deterministic_build.expected/full_chromium_swarm_Windows_deterministic_fail.json
[modify] https://crrev.com/00e5353103a6dca9622e819e48138152fded70fc/scripts/slave/recipes/swarming/deterministic_build.expected/full_chromium_swarm_Mac_deterministic__dbg_.json
[modify] https://crrev.com/00e5353103a6dca9622e819e48138152fded70fc/scripts/slave/recipes/swarming/deterministic_build.expected/full_chromium_swarm_Deterministic_Linux_fail.json
[modify] https://crrev.com/00e5353103a6dca9622e819e48138152fded70fc/scripts/slave/recipes/swarming/deterministic_build.expected/full_chromium_swarm_Windows_Clang_deterministic.json
[modify] https://crrev.com/00e5353103a6dca9622e819e48138152fded70fc/scripts/slave/recipes/swarming/deterministic_build.expected/full_chromium_swarm_Deterministic_Linux.json
[modify] https://crrev.com/00e5353103a6dca9622e819e48138152fded70fc/scripts/slave/recipes/swarming/deterministic_build.expected/full_chromium_swarm_Deterministic_Linux__dbg__fail.json
[modify] https://crrev.com/00e5353103a6dca9622e819e48138152fded70fc/scripts/slave/recipes/swarming/deterministic_build.expected/full_chromium_swarm_Windows_Clang_deterministic_fail.json
[modify] https://crrev.com/00e5353103a6dca9622e819e48138152fded70fc/scripts/slave/recipes/swarming/deterministic_build.expected/full_chromium_swarm_Deterministic_Linux__dbg_.json
[modify] https://crrev.com/00e5353103a6dca9622e819e48138152fded70fc/scripts/slave/recipes/swarming/deterministic_build.expected/full_chromium_swarm_Mac_deterministic_fail.json
[modify] https://crrev.com/00e5353103a6dca9622e819e48138152fded70fc/scripts/slave/recipes/swarming/deterministic_build.expected/full_chromium_swarm_Mac_deterministic__dbg__fail.json
[modify] https://crrev.com/00e5353103a6dca9622e819e48138152fded70fc/scripts/slave/recipes/swarming/deterministic_build.expected/full_chromium_swarm_Mac_deterministic.json
[modify] https://crrev.com/00e5353103a6dca9622e819e48138152fded70fc/scripts/slave/recipes/swarming/deterministic_build.expected/full_chromium_swarm_linux_chromium_clobber_deterministic_fail.json
[modify] https://crrev.com/00e5353103a6dca9622e819e48138152fded70fc/scripts/slave/recipes/swarming/deterministic_build.expected/full_chromium_swarm_Windows_deterministic.json
[modify] https://crrev.com/00e5353103a6dca9622e819e48138152fded70fc/scripts/slave/recipes/swarming/deterministic_build.py
[modify] https://crrev.com/00e5353103a6dca9622e819e48138152fded70fc/scripts/slave/recipes/swarming/deterministic_build.expected/full_chromium_swarm_linux_chromium_clobber_deterministic.json

Cc: dpranke@chromium.org
The relative_cwd is actually used by swarming when the task is launched, and it needs to be right, i.e., it needs to point to the actual build directory.

I don't think you can "fix" this in the sense of making two isolates built in two different build directories be the same without essentially implementing the work in bug 907585.
Sure, it needs to be a real directory, but when we zip things up we could take the physical build dir (say) "out/Release" and zip it up as the "canonicalized" name "out/builddir" so that after unzipping the build artifacts are at "out/builddir". Swarming would then run from that path instead.
You would have to adjust the command line as well as the build directory name. For example, run_web_test.py expects the directory to be out/Release by default and out/Debug if you pass the --debug flag. You can make other directories work, but you can't just arbitrarily rename things and expect them to work. I think the telemetry-based harnesses will have similar limitations.

Fundamentally by renaming directories you're trying to do something with the isolates that they're not designed to do. If you really want to do this, I'd suggest the right thing to do is to fix the design. 

Otherwise, I probably wouldn't want to change anything just to make the deterministic builders work using two different build directories, since the problems don't arise in practice elsewhere, and since there are probably other ways you can do equivalent tests on the deterministic builders.
See `I know that at least the telemetry tests need the build dir to be called "Release" in release builds...` in comment 0 -- that's true today, but it's kind of a bug. Only a handful of scripts have requirements on the build dir name, and that's historically been confusing to people. (See the link in comment 0; it cost us a day there.) We've discussed putting a launcher script in the build dir at build time to remove this restriction before; this might be a good forcing function.

> Fundamentally by renaming directories you're trying to do something with the isolates that they're not designed to do. If you really want to do this, I'd suggest the right thing to do is to fix the design. 

I understand the "some scripts make assumption about the name of the build dir" part, but I don't understand the "Fundamentally" part here. What do you have in mind with "fix the design"?

> just to make the deterministic builders work

The deterministic builders are happy by now, I landed some workaround. However, due to this it's still true that a dev sending swarming tests for their local build dir won't share the swarming results cache with all the builders, which seems unfortunate (unless they call their build dir "out/Release" which anecdotally not too many people do).

This isn't an urgent bug, but it's imho a bug :-)
> See `I know that at least the telemetry tests need the build dir to be called "Release" in release builds...` in comment 0
> -- that's true today, but it's kind of a bug.

It's not really a bug, since it was never a requirement (and still isn't a requirement) that that not be the case. A script that launches a binary needs to be told where the binary is *somehow*.

However, I don't want to argue this point too much, since I agree that it would be better if we had a better mechanism for handling different build directories. The way we want to address that is described in bugs 897803 and 816629. Also, debating whether something is a bug or a feature request gets tedious real quick :).

> I understand the "some scripts make assumption about the name of the build dir" part, 
> but I don't understand the "Fundamentally" part here. What do you have in mind with "fix the design"?

"fix" was the wrong word to use, I should've used "change". "Fundamentally" means that by design isolates are intended to capture the list of files needed to run a command *and* the command to run. The command to run includes a command line and a CWD to use (since we don't require that the commands be CWD-independent). This design works very well for what it is supposed to do.

I don't see a way of making these things be build-dir-independent generically without changing that design. maruel's suggestion in bug 907585 is to pull all of the command-to-run logic out of the isolate, which has some upsides (solves your problem, simplifies isolates) but also has some downsides (now you have to specify what command to run some other way).

[ This isn't the place to debate whether the work in bug 907585 is worth doing; bug 907585 is the place for that ]

I don't think doing something in between these two points is likely to be worth it, but it's possible someone will come up with a design proposal I haven't thought of.

I'm not entirely clear why you're pursuing this line of work, or how important it is to you. At least partially, I think it's in the hope that we'll be able to dedup more builds, right? I'm not sure that there'll be much of a big win in practice. Are there other reasons?
To be clear, I'm not trying to nitpick words here! This bug / issue / feature / whatever here isn't super important to me.

I believe what I propose could work with the current design of having a build dir, a cwd, and a command in the isolated file. You'd just have to use some opt-in flag for the isolate writer, and if you use that your command's executable must reside right in the build dir and the cwd must be the build dir (somewhat handwavy).

I think doing that would be relatively easy, assuming we fix the few script tests that make assumptions about the build dir name (which imho is a good thing to fix independently of this issue here).

> I'm not entirely clear why you're pursuing this line of work,

Do you mean build determinism in general, or build-dir-independent isolated files in particular?
> I believe what I propose could work with the current design of having a build dir 

How would the swarming bot know what the build dir was (i.e., which directory to start in)?

>> I'm not entirely clear why you're pursuing this line of work,

> Do you mean build determinism in general, or build-dir-independent 
> isolated files in particular?

The build-dir-independent part :). 
> How would the swarming bot know what the build dir was (i.e., which directory to start in)?

Currently, the isolate writes

  current_pwd = 'out/Release'

At zip time, the isolate step would replace every mention of 'out/Release' with 'out/somefixedstring' and put

  current_pwd = 'out/somefixedstring'

> The build-dir-independent part :). 

For compiles, this is useful to have devs and bots share the goma cache. For links, it's mostly useful for having reproducible builds and maybe we'll do linking on goma one day.

For isolated files, it's mostly completionism. Switching the deterministic bots to use two build dirs found all kinds of small nondeterminisms, this is just one of them. I agree that this one isn't super important; but if it was fixed then devs and bots could share a swarming cache for test results.
I don't see the point of
   current_pwd = 'out/somefixedstring'
vs
   just build the build in the same output directory for all recipes

This is much simpler and doesn't involve black magic.
That doesn't do anything for local builds.
99.99% of local builds do not use isolated output.
My goal here is to have build directories that are deterministic, independent of current time, current directory, etc. I agree that one could say "this is a silly goal, just use a fixed directory and set your system time to something constant". I disagree with this and think having deterministic builds is useful.

(As said above, I agree that this particular issue isn't the most important, but it's part of the set of issues that make our build nondeterministic.)
I agree it's important. What I mean is that from a deterministic PoV, enforcing to use a single build output directory is a reasonable requirement.

This is not something we can force onto users, but they normally do not use the isolated service. From the Goma perspective this is another story, but isolation is done well after compile.
re #c13:
> At zip time, the isolate step would replace every mention of 'out/Release'
> with 'out/somefixedstring' and put
>
>  current_pwd = 'out/somefixedstring'

Oh, I see, the isolate step would move the files into out/somefixedstring? Yeah, I guess that would probably work, but it might be confusing if you then wanted to download the file back to your local machine.

re #c14:
> I don't see the point of
>   current_pwd = 'out/somefixedstring'
> vs
>   just build the build in the same output directory for all recipes

That would work on the bots where there's only one build per checkout, but wouldn't work for local checkouts with multiple builds.

re #c15
> I disagree with this and think having deterministic builds is useful.

I think it's important to have a deterministic, relocatable compile. I think it's much less important to have a deterministic, relocatable isolate file.
It sounds like we mostly agree on the big picture (deterministic builds in general is good; this issue here isn't the most pressing one).

> enforcing to use a single build output directory is a reasonable requirement.

I think there's some stages of build output directory:

1. Requiring a fixed absolute path ("MUST be at /home/maruel/src/chrome/out/gn"). I think we all agree that this is at best inconvenient, and it's not what we currently require.

2. Requiring a fixed relative path from source root to build dir ("MUST be at out/gn"). 

3. Require a fixed relative path from build dir to source root ("MUST be ../..").

4. Source dir and build dir completely independent.

I have no idea how to make 4 happen, so even though that's abstractly nicest, it's probably not feasible.

You're saying 2 is enough, I'd like us to be at 3. (But, again, I agree that for isolated files this is more a coimpletionism thing.) The main reason is that I wanted automated testing that we have something better than world 1, and with the current recipe setup it's easiest to write a test for 3 (which is what issue 899438 did).

I ended up just excluding isolated files from the comparison on the deterministic bots (which also happened to improve cycle times of the det bots by 30 min), so this isn't a blocker or anything.

But if isolated files were in world 3, I think we wouldn't switch from that to world 2, which suggests that world 3 is a better place to be in.
I'd actually like for us to be at 4, but I haven't given much thought to it (and architecturally swarming and isolates treat these as one thing, not two, so maybe it's nonsensical at this point). 

I'm saying that there's a ton of value in being deterministic at 1 and 2, but a lot less value is added by 3 and 4, even if I also think they'd be nice to get to.
Any ETA on a fix? The related bot failures are generating noise for the sheriffs.

Thanks!
This bug has no visible effect and doesn't cause any bot failures.
@rogerm - what failures were you referring to?
i'm guessing  issue 912204 

Sign in to add a comment