New issue
Advanced search Search tips
Note: Color blocks (like or ) mean that a user may not be available. Tooltip shows the reason.

Issue 833539 link

Starred by 2 users

Issue metadata

Status: Duplicate
Merged: issue 735306
Owner: ----
Closed: Aug 20
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: ----
Pri: 1
Type: Bug


Show other hotlists

Hotlists containing this issue:
chrome-client-infra-backlog


Sign in to add a comment

Recipe engine bug: unexpected failure

Project Member Reported by thakis@chromium.org, Apr 16 2018

Issue description

Cc: jbudorick@chromium.org
Components: Infra>Client>Chrome
Labels: -Restrict-View-Google -Infra-Troopers -Infra -Infra-Area-Recipes
Stacktrace is:

Traceback (most recent call last):
  File "C:\b\rr\tmp7nxq1u\rw\checkout\scripts\slave\.recipe_deps\recipe_engine\recipe_engine\util.py", line 151, in raises
    yield
  File "C:\b\rr\tmp7nxq1u\rw\checkout\scripts\slave\.recipe_deps\recipe_engine\recipe_engine\run.py", line 232, in run_step
    step_result = open_step.run()
  File "C:\b\rr\tmp7nxq1u\rw\checkout\scripts\slave\.recipe_deps\recipe_engine\recipe_engine\step_runner.py", line 243, in run
    return construct_step_result(rendered_step, retcode)
  File "C:\b\rr\tmp7nxq1u\rw\checkout\scripts\slave\.recipe_deps\recipe_engine\recipe_engine\step_runner.py", line 687, in construct_step_result
    result = ph.result(step_result.presentation, td)
  File "C:\b\rr\tmp7nxq1u\rw\checkout\scripts/slave\recipe_modules\test_utils\api.py", line 24, in result
    ret = super(GTestResultsOutputPlaceholder, self).result(presentation, test)
  File "C:\b\rr\tmp7nxq1u\rw\checkout\scripts\slave\.recipe_deps\recipe_engine\recipe_modules\json\api.py", line 55, in result
    raw_data = self.raw.result(presentation, test)
  File "C:\b\rr\tmp7nxq1u\rw\checkout\scripts\slave\.recipe_deps\recipe_engine\recipe_modules\raw_io\api.py", line 108, in result
    ret = self.decode(f.read())
  File "C:\b\rr\tmp7nxq1u\rw\checkout\scripts\slave\.recipe_deps\recipe_engine\recipe_modules\raw_io\api.py", line 136, in decode
    else result.decode('utf-8', 'replace').encode('utf-8'))
MemoryError

Looks like this is the chromium recipe reading large files and running the engine out of memory. We could potentially improve raw_io, but usually this means that the recipe is trying to read 500+MB files.
(in this particular build, it looks like MANY (all?) of the browser_tests test suites crash, so I suspect the test result file has a stack track for every single one of them)
Labels: -Pri-1 Pri-2 Type-Bug
Status: Available (was: Untriaged)
This seems to be a one-off and is not blocking the builder otherwise. Adding to chrome-client-infra-backlog hotlist for further tracking.

Most likely, we should add a safeguard to the recipe engine and stop reading a large file after a certain threshold (say, 100MB), with an appropriate error. This should provide enough context to the user without blowing up the engine.
Labels: -Pri-2 Pri-1
This happened again, on what looks like an otherwise passing test run: https://ci.chromium.org/buildbot/chromium.clang/ToTWin%28dll%29/1095
https://logs.chromium.org/v/?s=chromium%2Fbb%2Fchromium.clang%2FToTWin_dll_%2F1095%2F%2B%2Frecipes%2Fsteps%2FRecipe_engine_bug%2F0%2Flogs%2Fexception%2F0

Traceback (most recent call last):
  File "C:\b\rr\tmp9quqho\rw\checkout\scripts\slave\.recipe_deps\recipe_engine\recipe_engine\util.py", line 151, in raises
    yield
  File "C:\b\rr\tmp9quqho\rw\checkout\scripts\slave\.recipe_deps\recipe_engine\recipe_engine\run.py", line 226, in run_step
    open_step = self._step_runner.open_step(step_config)
  File "C:\b\rr\tmp9quqho\rw\checkout\scripts\slave\.recipe_deps\recipe_engine\recipe_engine\step_runner.py", line 178, in open_step
    step_config, recipe_test_api.DisabledTestData()
  File "C:\b\rr\tmp9quqho\rw\checkout\scripts\slave\.recipe_deps\recipe_engine\recipe_engine\step_runner.py", line 612, in render_step
    new_cmd.extend(item.render(tdata))
  File "C:\b\rr\tmp9quqho\rw\checkout\scripts\slave\.recipe_deps\recipe_engine\recipe_modules\raw_io\api.py", line 88, in render
    os.write(input_fd, self.encode(self.data))
  File "C:\b\rr\tmp9quqho\rw\checkout\scripts\slave\.recipe_deps\recipe_engine\recipe_modules\raw_io\api.py", line 118, in encode
    return data.decode('utf-8', 'replace').encode('utf-8')
MemoryError
Hm... IIUC that step is only merging together ~100MB of data (and so at worst would require ~200MB of memory in the recipes process due to dumb unicode stuff.

It's possible that the build is attaching multiple largeish logs to steps, and the cumulative effect causes the recipe process to run out of memory (it's only 32bit python).

martiniss@, wdyt about having the recipe engine drop all logs after they're transmitted? It means that recipes won't be able to read them back from the step presentation, but arguably this is a good thing anyway. I'll prep a CL.
Any news? This is affecting 7 Windows bots (32-bit python there, I guess) on the clang tot waterfall:
https://ci.chromium.org/p/chromium/g/chromium.clang/console
Should we dupe this against https://crbug.com/735306?
Mergedinto: 735306
Status: Duplicate (was: Available)
Yeah, that looks like the same thing.
Project Member

Comment 9 by bugdroid1@chromium.org, Aug 22

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/tools/build/+/b87f0c19e4b4fb688f387c2665b52e4be67779c1

commit b87f0c19e4b4fb688f387c2665b52e4be67779c1
Author: Robert Iannucci <iannucci@chromium.org>
Date: Wed Aug 22 20:02:23 2018

[bisect] Make debug info step have predictable behavior.

Previously this recipe would run a 'Debug Info' step without any command
which doesn't have well defined behavior (it should probably be an
error, but seems to slip through the cracks). Some of the recipe
simulations mark this as having a retcode of 1, but this should actually
be impossible, leading to simulations which are (I think) out
of line with production behavior.

In addition to this, the recipe would previously modify the
presentation.logs of steps which were already closed which is a no-op
and thus probably not the intended behavior.

Change this to use python.suceeding_step, which has predictable behavior
and has well defined behavior :).

This is necessary for the referenced bug, which is going to make
presentation.logs inaccessible (with an error) after step closure.

R=robertocn@chromium.org

Bug:  833539 
Recipe-Nontrivial-Roll: build_limited_scripts_slave
Change-Id: I6810ffcd1e3f95c202eb5322770cfc3bd19a6238
Reviewed-on: https://chromium-review.googlesource.com/1171881
Reviewed-by: Simon Hatch <simonhatch@chromium.org>
Reviewed-by: David Tu <dtu@chromium.org>
Reviewed-by: Roberto Carrillo <robertocn@chromium.org>
Commit-Queue: Robbie Iannucci <iannucci@chromium.org>

[modify] https://crrev.com/b87f0c19e4b4fb688f387c2665b52e4be67779c1/scripts/slave/recipe_modules/auto_bisect_staging/examples/full.expected/basic_bisect_other_direction.json
[modify] https://crrev.com/b87f0c19e4b4fb688f387c2665b52e4be67779c1/scripts/slave/recipe_modules/auto_bisect_staging/examples/full.expected/failed_build_inconclusive_11.json
[modify] https://crrev.com/b87f0c19e4b4fb688f387c2665b52e4be67779c1/scripts/slave/recipes/bisection/android_bisect.expected/local_basic_recipe_disconnected_device.json
[modify] https://crrev.com/b87f0c19e4b4fb688f387c2665b52e4be67779c1/scripts/slave/recipe_modules/auto_bisect/examples/full.expected/failed_build.json
[modify] https://crrev.com/b87f0c19e4b4fb688f387c2665b52e4be67779c1/scripts/slave/recipe_modules/auto_bisect_staging/examples/full.expected/no_repro.json
[modify] https://crrev.com/b87f0c19e4b4fb688f387c2665b52e4be67779c1/scripts/slave/recipe_modules/auto_bisect/examples/full.expected/gathering_references_no_values.json
[modify] https://crrev.com/b87f0c19e4b4fb688f387c2665b52e4be67779c1/scripts/slave/recipe_modules/auto_bisect/examples/full.expected/no_repro.json
[modify] https://crrev.com/b87f0c19e4b4fb688f387c2665b52e4be67779c1/scripts/slave/recipe_modules/auto_bisect_staging/examples/full.expected/basic_linux_bisect.json
[modify] https://crrev.com/b87f0c19e4b4fb688f387c2665b52e4be67779c1/scripts/slave/recipe_modules/auto_bisect/examples/full.expected/failed_build_inconclusive_11.json
[modify] https://crrev.com/b87f0c19e4b4fb688f387c2665b52e4be67779c1/scripts/slave/recipe_modules/auto_bisect_staging/bisector.py
[modify] https://crrev.com/b87f0c19e4b4fb688f387c2665b52e4be67779c1/scripts/slave/recipe_modules/auto_bisect/examples/full.expected/basic_buildbot_bisect.json
[modify] https://crrev.com/b87f0c19e4b4fb688f387c2665b52e4be67779c1/scripts/slave/recipe_modules/perf_dashboard/api.py
[modify] https://crrev.com/b87f0c19e4b4fb688f387c2665b52e4be67779c1/scripts/slave/recipe_modules/auto_bisect/examples/full.expected/failed_buildbucket_get.json
[modify] https://crrev.com/b87f0c19e4b4fb688f387c2665b52e4be67779c1/scripts/slave/recipes/bisection/android_bisect_staging.py
[modify] https://crrev.com/b87f0c19e4b4fb688f387c2665b52e4be67779c1/scripts/slave/recipe_modules/auto_bisect/bisector.py
[modify] https://crrev.com/b87f0c19e4b4fb688f387c2665b52e4be67779c1/scripts/slave/recipe_modules/auto_bisect/examples/full.expected/basic_bisect_other_direction.json
[modify] https://crrev.com/b87f0c19e4b4fb688f387c2665b52e4be67779c1/scripts/slave/recipe_modules/auto_bisect_staging/examples/full.expected/failed_build.json
[modify] https://crrev.com/b87f0c19e4b4fb688f387c2665b52e4be67779c1/scripts/slave/recipe_modules/auto_bisect/examples/full.expected/return_code.json
[modify] https://crrev.com/b87f0c19e4b4fb688f387c2665b52e4be67779c1/scripts/slave/recipes/bisection/android_bisect.py
[modify] https://crrev.com/b87f0c19e4b4fb688f387c2665b52e4be67779c1/scripts/slave/recipe_modules/auto_bisect_staging/examples/full.expected/return_code_fail.json
[modify] https://crrev.com/b87f0c19e4b4fb688f387c2665b52e4be67779c1/scripts/slave/recipe_modules/auto_bisect/local_bisect.py
[modify] https://crrev.com/b87f0c19e4b4fb688f387c2665b52e4be67779c1/scripts/slave/recipes/bisection/android_bisect.expected/local_basic_recipe_failed_device.json
[modify] https://crrev.com/b87f0c19e4b4fb688f387c2665b52e4be67779c1/scripts/slave/recipe_modules/auto_bisect_staging/examples/full.expected/v8_roll_bisect.json
[modify] https://crrev.com/b87f0c19e4b4fb688f387c2665b52e4be67779c1/scripts/slave/recipe_modules/auto_bisect_staging/test_api.py
[modify] https://crrev.com/b87f0c19e4b4fb688f387c2665b52e4be67779c1/scripts/slave/recipe_modules/auto_bisect/examples/full.expected/return_code_fail.json
[modify] https://crrev.com/b87f0c19e4b4fb688f387c2665b52e4be67779c1/scripts/slave/recipe_modules/auto_bisect/examples/full.expected/multi_depot_recurse_with_uneven_deps_expansion.json
[modify] https://crrev.com/b87f0c19e4b4fb688f387c2665b52e4be67779c1/scripts/slave/recipe_modules/auto_bisect_staging/local_bisect.py
[modify] https://crrev.com/b87f0c19e4b4fb688f387c2665b52e4be67779c1/scripts/slave/recipes/bisection/android_bisect_staging.expected/local_basic_recipe_failed_device.json
[modify] https://crrev.com/b87f0c19e4b4fb688f387c2665b52e4be67779c1/scripts/slave/recipes/bisection/android_bisect.expected/local_basic_recipe_basic_device.json
[modify] https://crrev.com/b87f0c19e4b4fb688f387c2665b52e4be67779c1/scripts/slave/recipe_modules/auto_bisect_staging/examples/full.expected/v8_roll_bisect_bis.json
[modify] https://crrev.com/b87f0c19e4b4fb688f387c2665b52e4be67779c1/scripts/slave/recipe_modules/auto_bisect_staging/examples/full.expected/basic_buildbot_bisect.json
[modify] https://crrev.com/b87f0c19e4b4fb688f387c2665b52e4be67779c1/scripts/slave/recipes/bisection/android_bisect_staging.expected/local_basic_recipe_disconnected_device.json
[modify] https://crrev.com/b87f0c19e4b4fb688f387c2665b52e4be67779c1/scripts/slave/recipe_modules/auto_bisect_staging/examples/full.expected/no_values.json
[modify] https://crrev.com/b87f0c19e4b4fb688f387c2665b52e4be67779c1/scripts/slave/recipe_modules/auto_bisect/examples/full.expected/v8_roll_bisect_bis.json
[modify] https://crrev.com/b87f0c19e4b4fb688f387c2665b52e4be67779c1/scripts/slave/recipe_modules/auto_bisect/examples/full.expected/no_values.json
[modify] https://crrev.com/b87f0c19e4b4fb688f387c2665b52e4be67779c1/scripts/slave/recipe_modules/auto_bisect/examples/full.expected/retest_bisect.json
[modify] https://crrev.com/b87f0c19e4b4fb688f387c2665b52e4be67779c1/scripts/slave/recipe_modules/auto_bisect/examples/full.expected/failed_build_inconclusive_1.json
[modify] https://crrev.com/b87f0c19e4b4fb688f387c2665b52e4be67779c1/scripts/slave/recipe_modules/auto_bisect/examples/full.expected/basic_resource_sizes_bisect.json
[modify] https://crrev.com/b87f0c19e4b4fb688f387c2665b52e4be67779c1/scripts/slave/recipe_modules/auto_bisect_staging/examples/full.expected/return_code.json
[modify] https://crrev.com/b87f0c19e4b4fb688f387c2665b52e4be67779c1/scripts/slave/recipe_modules/auto_bisect_staging/examples/full.expected/failed_build_inconclusive_1.json
[modify] https://crrev.com/b87f0c19e4b4fb688f387c2665b52e4be67779c1/scripts/slave/recipe_modules/auto_bisect_staging/examples/full.expected/retest_bisect.json
[modify] https://crrev.com/b87f0c19e4b4fb688f387c2665b52e4be67779c1/scripts/slave/README.recipes.md
[modify] https://crrev.com/b87f0c19e4b4fb688f387c2665b52e4be67779c1/scripts/slave/recipe_modules/auto_bisect/examples/full.expected/basic_linux_bisect.json
[modify] https://crrev.com/b87f0c19e4b4fb688f387c2665b52e4be67779c1/scripts/slave/recipe_modules/auto_bisect/test_api.py
[modify] https://crrev.com/b87f0c19e4b4fb688f387c2665b52e4be67779c1/scripts/slave/recipes/bisection/android_bisect_staging.expected/local_basic_recipe_basic_device.json
[modify] https://crrev.com/b87f0c19e4b4fb688f387c2665b52e4be67779c1/scripts/slave/recipe_modules/auto_bisect_staging/examples/full.expected/gathering_references_no_values.json
[modify] https://crrev.com/b87f0c19e4b4fb688f387c2665b52e4be67779c1/scripts/slave/recipe_modules/auto_bisect_staging/examples/full.expected/basic_resource_sizes_bisect.json
[modify] https://crrev.com/b87f0c19e4b4fb688f387c2665b52e4be67779c1/scripts/slave/recipe_modules/auto_bisect/examples/full.expected/v8_roll_bisect.json
[modify] https://crrev.com/b87f0c19e4b4fb688f387c2665b52e4be67779c1/scripts/slave/recipe_modules/auto_bisect_staging/examples/full.expected/failed_buildbucket_get.json

Project Member

Comment 10 by bugdroid1@chromium.org, Oct 5

Project Member

Comment 11 by bugdroid1@chromium.org, Oct 5

The following revision refers to this bug:
  https://chromium.googlesource.com/infra/luci/recipes-py/+/acbe9e3142610d958181460254565ef27ad47dc0

commit acbe9e3142610d958181460254565ef27ad47dc0
Author: Robert Iannucci <iannucci@chromium.org>
Date: Fri Oct 05 17:59:10 2018

[StepPresentation] Remove all logs from memory during finalization.

Previously the recipe engine kept all log content in memory for
the duration of the recipe execution. No recipes have been found
that actually need to read this data (which can't read it in some
alternate mechanism). Since the logs can be quite long, this should
reduce the memory consumption of most recipe executions, sometimes
by a lot.

Because the logs are no longer kept in memory, the recipe engine
will now pop an assertion if a recipe attempts to read/write a log
after the step has been finalized.

R=martiniss@chromium.org, nodir@chromium.org

Bug:  833539 ,735306
Change-Id: I8450b473eeaaafcdc0a87c7c542f961848327050
Reviewed-on: https://chromium-review.googlesource.com/c/1171206
Commit-Queue: Robbie Iannucci <iannucci@chromium.org>
Reviewed-by: Stephen Martinis <martiniss@chromium.org>
Reviewed-by: Nodir Turakulov <nodir@chromium.org>

[modify] https://crrev.com/acbe9e3142610d958181460254565ef27ad47dc0/recipe_engine/recipe_test_api.py
[modify] https://crrev.com/acbe9e3142610d958181460254565ef27ad47dc0/recipe_engine/types.py

Sign in to add a comment