veyron_rialto test images exceed size limit
Reported by
charliemooney@chromium.org,
Apr 19 2016
|
|||||||||||||||
Issue descriptionIt appears that they Veyron-b builder has been dead for >1 week due to an out-of-space error in the HWtest step: FAIL: Unhandled AutoservDiskFullHostError: Not enough free space on /usr/local/autotest - 0.610GB free, want 0.700GB https://uberchromegw.corp.google.com/i/chromeos/builders/veyron-b-release-group/builds/1463
,
Apr 20 2016
See https://bugs.chromium.org/p/chromium/issues/detail?id=602638 for the autofiled bug
,
Apr 21 2016
+jrbarnette the problem seems to be that the dut is running out of space, or at least does not have enough space that's needed to run the test. maybe ssh into the dut and try to find out what takes up the space?
,
Apr 21 2016
I anticipate that the problem is that the rialto stateful image size has grown beyond its designated "safe" maximum size. I'm checking now.
,
Apr 21 2016
The problem is basically that the size of "dev_image" has grown, and is triggering the minimum space threshold check in verify_software(). This isn't an infrastructure problem, per se; it's a product problem for rialto. The rialto team should weigh in on how they want this fixed.
,
Apr 21 2016
,
Apr 21 2016
,
Apr 26 2016
,
May 19 2016
,
May 19 2016
,
May 25 2016
Ping. This bug needs attention. Rialto canary can not pass if this is not resolved.
,
May 25 2016
,
May 25 2016
Aviv, amstan@ is no longer on our team, he probably does not receive his chromium.org email anymore. I do not really know anything about what's happening here or who might be a good person to talk to. So maybe ping puneetster@ about finding an owner?
,
May 25 2016
-> to joth@ to find owner
,
May 25 2016
(note: a possible resolution is to disable hwtest on canaries, if we aren't looking at them)
,
May 25 2016
amstan is in the rialto team. he was planning to look at this. To clarify though - this Test failure has been happening for over a month, but Rialto canary builds are still being pushed out fine (e.g. 8368.0.0 is currently live). To my understanding that means HWTest already is disabled for canaries. What changed today to cause the leap in priority?
,
May 25 2016
,
May 25 2016
> To clarify though - this Test failure has been happening for > over a month, but Rialto canary builds are still being pushed > out fine (e.g. 8368.0.0 is currently live). To my > understanding that means HWTest already is disabled for canaries. The problem causes testing to fail consistently. Although there are images, they don't pass tests. They can't even _run_ tests. If builds are being pushed out, they're lacking the basic quality guarantees that come from running and passing sanity tests. > What changed today to cause the leap in priority? We're revisiting nuisance problems that can be a stumbling block for sheriffs. We want all of the canaries to be green, which means we need to do something about this bug.
,
May 25 2016
@jrbarnette - Agreed on those points, and if this is causing problems for sheriffs it's good for us to know this so we can prioritise it. But if nothing new is failing today that wasn't already broken yesterday (and, for the last month), making this issue jump from P3 to P0 is a rather extreme way of communicating this need. For reference, previous investigations on image size are in https://code.google.com/p/chrome-os-partner/issues/detail?id=50351
,
May 25 2016
This should not have been a P3 in the first place, that seems to have been an oversight.
,
May 25 2016
Oh, I apologize. I assumed rialto was a ChromeOS device and since amstan@ left our team that this was no longer in his area. Nevermind my comment then!
,
May 25 2016
Can we move this redness to a warning on the infra side since it's been broken like this for a while? Or vice versa move it to it's own builder group and mark as experimental?
,
May 25 2016
If we fix the redness by applying an "ignore this for now" filter, how will we guarantee that when the redness is fixed, we cannot forget to remove the "ignore this filter"?
,
May 25 2016
Size limit should be fixed by: https://chrome-internal-review.googlesource.com/260738 https://chrome-internal-review.googlesource.com/260739
,
May 26 2016
Correction: the two CLs listed in #24 only free up space on the Root FS (i.e. for https://code.google.com/p/chrome-os-partner/issues/detail?id=50351). The problem in OP here is /usr/local is running out of space, so those CLs won't help this issue.
,
May 26 2016
Re comment #25: Right. I'll note that I downloaded stateful.tgz for a recent veyron_rialto build, and checked the size after unpacking it. It clocks in at around .785G, which means that the the Autotest limit of .700GB is actually too small. Rialto probably can't provision for testing until the size of the test image is reduced.
,
May 26 2016
When rialto switched from app_shell to chrome it dropped the USE=app_shell flag. Looks like this is also used in numerous places to strip out test, most noteably including telemetry (which throws about 400M into dev_image) https://cs.corp.google.com/search/?q=app_shell+f:%5C.ebuild+f:test+package:%5Echromeos_public$&m=100&det=mat&type=cs proposal: - in all those *-tests*.ebuild files, replace 'app_shell' with 'no_chrome_tests' - make app_shell's make.defaults set no_chrome_tests in addition to app_shell USE flags - make rialto's make.conf USE no_chrome_tests (Better suggestions for a USE flag name than no_chrome_tests ?)
,
May 26 2016
Splitting a new USE flag for dropping chrome tests (in place of using app_shell) appears to get us more or less back where were with 777M free in stateful. Patches to do this. A follow-up could have chromeless_tty users also set chromeless_tests and then the ebuilds can be simplified too. https://chromium-review.googlesource.com/#/c/347605/ https://chromium-review.googlesource.com/#/c/347513/ https://chrome-internal-review.googlesource.com/#/c/260796/ https://chrome-internal-review.googlesource.com/#/c/260819/
,
May 27 2016
The following revision refers to this bug: https://chrome-internal.googlesource.com/chromeos/overlays/overlay-variant-veyron-rialto-private/+/5ed864a815fadba866cbee57625bd080670a3985 commit 5ed864a815fadba866cbee57625bd080670a3985 Author: Jonathan Dixon <joth@google.com> Date: Thu May 26 20:18:09 2016
,
Jun 9 2016
,
Jun 27 2016
Closing... please feel free to reopen if its not fixed.
,
Jun 27 2016
|
|||||||||||||||
►
Sign in to add a comment |
|||||||||||||||
Comment 1 by gkihumba@chromium.org
, Apr 19 2016