New issue
Advanced search Search tips
Note: Color blocks (like or ) mean that a user may not be available. Tooltip shows the reason.

Issue 604772 link

Starred by 2 users

Issue metadata

Status: Verified
Owner:
Last visit > 30 days ago
Closed: Jun 2016
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: Chrome
Pri: 0
Type: Bug

Blocking:
issue 602638
issue 605847



Sign in to add a comment

veyron_rialto test images exceed size limit

Reported by charliemooney@chromium.org, Apr 19 2016

Issue description

It appears that they Veyron-b builder has been dead for >1 week due to an out-of-space error in the HWtest step:

FAIL: Unhandled AutoservDiskFullHostError: Not enough free space on /usr/local/autotest - 0.610GB free, want 0.700GB

https://uberchromegw.corp.google.com/i/chromeos/builders/veyron-b-release-group/builds/1463

 
Cc: gkihumba@chromium.org

Comment 3 by fdeng@chromium.org, Apr 21 2016

Cc: jrbarnette@chromium.org
+jrbarnette

the problem seems to be that the dut is running out of space, or at least does not have enough space that's needed to run the test. 

maybe ssh into the dut and try to find out what takes up the space? 
Summary: veyron_rialto builders fail "Not enough free space" (was: Veyron-b builders fail "Not enough free space")
I anticipate that the problem is that the rialto stateful
image size has grown beyond its designated "safe" maximum size.

I'm checking now.

Cc: sosa@chromium.org
Summary: veyron_rialto test images exceed limit (was: veyron_rialto builders fail "Not enough free space")
The problem is basically that the size of "dev_image" has grown,
and is triggering the minimum space threshold check in verify_software().

This isn't an infrastructure problem, per se; it's a product problem
for rialto.  The rialto team should weigh in on how they want this
fixed.

Blocking: 602638
Cc: joth@chromium.org
Owner: amstan@chromium.org
Summary: veyron_rialto test images exceed size limit (was: veyron_rialto test images exceed limit)

Comment 8 by benhenry@google.com, Apr 26 2016

Components: Infra>Client>ChromeOS
Labels: -Infra-ChromeOS
Cc: pho...@chromium.org
 Issue 605601  has been merged into this issue.
Blocking: 605847
Ping. This bug needs attention. Rialto canary can not pass if this is not resolved.
Labels: -Pri-3 Pri-0
Aviv, amstan@ is no longer on our team, he probably does not receive his chromium.org email anymore. 
I do not really know anything about what's happening here or who might be a good person to talk to. So maybe ping puneetster@ about finding an owner?
Owner: joth@chromium.org
-> to joth@ to find owner
(note: a possible resolution is to disable hwtest on canaries, if we aren't looking at them)

Comment 16 by joth@chromium.org, May 25 2016

Owner: amstan@chromium.org
amstan is in the rialto team. he was planning to look at this.

To clarify though - this Test failure has been happening for over a month, but Rialto canary builds are still being pushed out fine (e.g. 8368.0.0 is currently live). To my understanding that means HWTest already is disabled for canaries.

What changed today to cause the leap in priority?

Comment 17 by joth@chromium.org, May 25 2016

Cc: akes...@chromium.org
> To clarify though - this Test failure has been happening for
> over a month, but Rialto canary builds are still being pushed
> out fine (e.g. 8368.0.0 is currently live). To my
> understanding that means HWTest already is disabled for canaries.

The problem causes testing to fail consistently.  Although there
are images, they don't pass tests.  They can't even _run_ tests.

If builds are being pushed out, they're lacking the basic quality
guarantees that come from running and passing sanity tests.


> What changed today to cause the leap in priority?

We're revisiting nuisance problems that can be a stumbling
block for sheriffs.  We want all of the canaries to be green,
which means we need to do something about this bug.

Comment 19 by joth@chromium.org, May 25 2016

@jrbarnette - Agreed on those points, and if this is causing problems for sheriffs it's good for us to know this so we can prioritise it.
But if nothing new is failing today that wasn't already broken yesterday (and, for the last month), making this issue jump from P3 to P0 is a rather extreme way of communicating this need.


For reference, previous investigations on image size are in https://code.google.com/p/chrome-os-partner/issues/detail?id=50351 

This should not have been a P3 in the first place, that seems to have been an oversight.
Oh, I apologize. I assumed rialto was a ChromeOS device and since amstan@ left our team that this was no longer in his area. Nevermind my comment then!

Comment 22 by sosa@chromium.org, May 25 2016

Can we move this redness to a warning on the infra side since it's been broken like this for a while?

Or vice versa move it to it's own builder group and mark as experimental?
If we fix the redness by applying an "ignore this for now"
filter, how will we guarantee that when the redness is fixed,
we cannot forget to remove the "ignore this filter"?

Comment 24 by joth@chromium.org, May 25 2016

Owner: joth@chromium.org
Status: Started (was: Available)
Size limit should be fixed by:
https://chrome-internal-review.googlesource.com/260738
https://chrome-internal-review.googlesource.com/260739

Comment 25 by joth@chromium.org, May 26 2016

Correction: the two CLs listed in #24 only free up space on the Root FS (i.e. for https://code.google.com/p/chrome-os-partner/issues/detail?id=50351).
The problem in OP here is /usr/local is running out of space, so those CLs won't help this issue. 

Re comment #25:  Right.

I'll note that I downloaded stateful.tgz for a recent veyron_rialto
build, and checked the size after unpacking it.  It clocks in at
around .785G, which means that the the Autotest limit of .700GB is
actually too small.  Rialto probably can't provision for testing
until the size of the test image is reduced.

Comment 27 by joth@chromium.org, May 26 2016

When rialto switched from app_shell to chrome it dropped the USE=app_shell flag. Looks like this is also used in numerous places to strip out test, most noteably including telemetry (which throws about 400M into dev_image)
https://cs.corp.google.com/search/?q=app_shell+f:%5C.ebuild+f:test+package:%5Echromeos_public$&m=100&det=mat&type=cs

proposal:
- in all those *-tests*.ebuild files, replace 'app_shell' with 'no_chrome_tests'
- make app_shell's make.defaults set no_chrome_tests in addition to app_shell USE flags
- make rialto's make.conf USE no_chrome_tests 

(Better suggestions for a USE flag name than no_chrome_tests ?)

Comment 28 by joth@chromium.org, May 26 2016

Splitting a new USE flag for dropping chrome tests (in place of using app_shell) appears to get us more or less back where were with 777M free in stateful. 


Patches to do this. A follow-up could have chromeless_tty users  also set chromeless_tests and then the ebuilds can be simplified too.

https://chromium-review.googlesource.com/#/c/347605/
https://chromium-review.googlesource.com/#/c/347513/
https://chrome-internal-review.googlesource.com/#/c/260796/
https://chrome-internal-review.googlesource.com/#/c/260819/
Project Member

Comment 29 by bugdroid1@chromium.org, May 27 2016

Comment 30 by joth@chromium.org, Jun 9 2016

Status: Fixed (was: Started)
Closing... please feel free to reopen if its not fixed.
Status: Verified (was: Fixed)

Sign in to add a comment