New issue
Advanced search Search tips

Issue 633159 link

Starred by 7 users

Issue metadata

Status: WontFix
Owner: ----
Closed: Apr 2018
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: ----
Pri: 1
Type: Bug


Show other hotlists

Hotlists containing this issue:
Hotlist-1


Sign in to add a comment

if mksnapshot crashes, we don't rebuild

Project Member Reported by marja@chromium.org, Aug 1 2016

Issue description


What steps will reproduce the problem?
(1) Do a crashy change somewhere so that mksnapshot will crash
(2) Try to build. It fails w/

FAILED: gen/snapshot.cc snapshot_blob.bin 
python ../../tools/run.py ./mksnapshot --startup_src gen/snapshot.cc --random-seed 314159265 --startup_blob snapshot_blob.bin

(3) Try to build again.

What is the expected output?

We rebuild, and mksnapshot fails again.

What do you see instead?

The build succeeds. When run, d8 crashes w/

#
# Fatal error in ../../src/snapshot/snapshot-external.cc, line 30
# Check failed: snapshot_blob->raw_size > 0.
#

==== C stack trace ===============================

 1: V8_Fatal
 2: v8::internal::SetSnapshotFromFile(v8::StartupData*)
 3: v8::base::CallOnceImpl(long*, void (*)(void*), void*)
 4: 0x7fe36912a4ec
 5: v8::internal::InitializeExternalStartupData(char const*)
 6: v8::Shell::Main(int, char**)
 7: __libc_start_main
 8: 0x7fe3687adb51
Aborted (core dumped)

-----

This is a minor disturbance but it's confusing.
 

Comment 1 by marja@chromium.org, Aug 1 2016

Components: Blink>JavaScript
Cc: vogelheim@chromium.org
@Daniel: Maybe we don't depend correctly on the external snapshot files?
Status: Available (was: Untriaged)
I think the problem might be in that when mksnapshot writes a partial snapshot and then crashes, the build system in a rebuild sees the output file and thinks re-running mksnapshot is not needed since the timestamp looks okay.

A simple solution might be to change the gn action to do an equivalent of "mksnapshot foo.tmp && mv foo.tmp foo"
re #2: Maybe, but I'm leaning towards the explanation in #4.

I would have expected ninja to remove build artifacts from failed build steps (like make does), but I couldn't find any description of what ninja actually does in the ninja manual. If ninja indeed leaves the half-built files lying around, this would fully explain the problems observed.

Not sure how to fix this, though: The workaround in #4 should work fine for unix-ish systems, but I'm not so sure for windows. (Also, I'd consider that a bug in ninja, but I'm not at all sure ninja folks would agree.)


Tools should write output atomically (eg write to temp file, then rename to final destination at the end).
Given #6, I guess the run.py script needs to be changed to a mksnapshot specific one that does the tmp-file-then-move-on-success logic for both of the possible output files.

(mksnapshot may generate two output files. I'm not sure how it would be possible to 'atomically' write two files w/ POSIX file system semantics, but for the common case of mksnapshot having a logical error this should work fine.)
If mksnapshot writes to a file, it should do this itself, not the run.py script.

(That's ok: Just list both files as outputs, and write both atomically. If one of them is out-of-date, ninja will rebuild, so if it should crash between the two renames for some reason ninja will rebuild)
Maybe related to  issue 671758 , PTAL.
Status: WontFix (was: Available)
I filed into v8 bug, please use [1] for tracking.

[1] https://bugs.chromium.org/p/v8/issues/detail?id=5720 mksnapshot should write temporary file then move to output file
Cc: thakis@chromium.org yangguo@chromium.org
 Issue v8:5720  has been merged into this issue.
Status: Available (was: WontFix)
Labels: -Pri-3 Pri-1
Raising prio since the duplicates had higher prio.

A log of some offline discussion:
1. As a workaround we could make a very simple python wrapper that does the file moving
2. Optionally we could let the executable do it, but it's a bit more work
3. There are some doubts about the error reasons because we theoretically first create the snapshot and then open the files for writing. If we crash before opening any file, why would we have files lying around? Though this is fishy, a feature like in 1 or 2 wouldn't be a regression.
4. Additionally we could let mksnapshot run as a test in a new test suite (wherever host=target), to see how often we flakily fail. We could also compare the snapshot files for equality.
I have a WIP CL for extra testing of mksnapshot:
https://codereview.chromium.org/2567603002/

I don't intend to land it, rather use it on the tryserver for stress testing.
Project Member

Comment 15 by bugdroid1@chromium.org, Mar 22 2017

The following revision refers to this bug:
  https://chromium.googlesource.com/v8/v8.git/+/0b90e985f736cae797b23969e6e53e303f018236

commit 0b90e985f736cae797b23969e6e53e303f018236
Author: yangguo <yangguo@chromium.org>
Date: Wed Mar 22 12:13:50 2017

[snapshot] only create snapshot files as last step in mksnapshot.

R=leszeks@chromium.org
BUG= chromium:633159 

Review-Url: https://codereview.chromium.org/2767903002
Cr-Commit-Position: refs/heads/master@{#44015}

[modify] https://crrev.com/0b90e985f736cae797b23969e6e53e303f018236/src/snapshot/mksnapshot.cc

Project Member

Comment 16 by sheriffbot@chromium.org, Apr 12 2018

Labels: Hotlist-Recharge-Cold
Status: Untriaged (was: Available)
This issue has been Available for over a year. If it's no longer important or seems unlikely to be fixed, please consider closing it out. If it is important, please re-triage the issue.

Sorry for the inconvenience if the bug really should have been left as Available.

For more details visit https://www.chromium.org/issue-tracking/autotriage - Your friendly Sheriffbot
Status: WontFix (was: Untriaged)
mksnapshot used to crash rarely on windows due to windows kernel bug. That has since been addressed. I don't think there's value in spending time on this.

Comment 18 by marja@chromium.org, Apr 16 2018

It's pretty common (at least for me) to make local chances (during developing) which make mksnapshot crash.

However, I think this is fixed now.
In that particular case #15 should already fix this problem :)

Sign in to add a comment