New issue
Advanced search Search tips

Issue 896933 link

Starred by 2 users

Issue metadata

Status: Assigned
Owner:
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: Chrome
Pri: 3
Type: Bug



Sign in to add a comment

Attempting to build dEQP in crostini ends up failing due to OOM killing

Project Member Reported by davidri...@chromium.org, Oct 18

Issue description

Trying to build dEQP in crostini on eve with "make -j4" results in the build running out of memory and getting OOM killed:
[ 50%] Built target glcts-es2
[ 50%] Built target glcts-es3
virtual memory exhausted: Cannot allocate memory
virtual memory exhausted: Cannot allocate memory
virtual memory exhausted: Cannot allocate memory
virtual memory exhausted: Cannot allocate memory
virtual memory exhausted: Cannot allocate memory
virtual memory exhausted: Cannot allocate memory
virtual memory exhausted: Cannot allocate memory
virtual memory exhausted: Cannot allocate memory
virtual memory exhausted: Cannot allocate memory
virtual memory exhausted: Cannot allocate memory
virtual memory exhausted: Cannot allocate memory
virtual memory exhausted: Cannot allocate memory
virtual memory exhausted: Cannot allocate memory
virtual memory exhausted: Cannot allocate memory
virtual memory exhausted: Cannot allocate memory
virtual memory exhausted: Cannot allocate memory
virtual memory exhausted: Cannot allocate memory
virtual memory exhausted: Cannot allocate memory
virtual memory exhausted: Cannot allocate memory
virtual memory exhausted: Cannot allocate memory
virtual memory exhausted: Cannot allocate memory
virtual memory exhausted: Cannot allocate memory
virtual memory exhausted: Cannot allocate memory
virtual memory exhausted: Cannot allocate memory
virtual memory exhausted: Cannot allocate memory
virtual memory exhausted: Cannot allocate memory

Repro steps:
git clone https://android.googlesource.com/platform/external/deqp
cd deqp
python external/fetch_sources.py
curl -L https://android-review.googlesource.com/tools/hooks/commit-msg -o .git/hooks/commit-msg && chmod a+x .git/hooks/commit-msg
sudo apt-get install cmake ant
sudo apt install gcc-multilib g++-multilib
mkdir build-egl
cd build-egl
cmake .. -DCMAKE_BUILD_TYPE=Debug -DDEQP_TARGET=x11_egl
make -j

Top on host at time of getting killed:
  PID USER      PR  NI    VIRT    RES    SHR S  %CPU  %MEM     TIME+ COMMAND
 1144 crosvm    20   0   11.7g  10.8g  10.8g S 397.4  69.5   5:01.40 crosvm
 1436 chronos   12  -8  801388 247304 114900 S   0.7   1.5 137:10.57 chrome

I'm not sure what our expectations are with regards to low memory operations/swap, but this definitely doesn't seem ideal if people are trying to use it for development.
 
I ran into failures 2-3 times with make -j4, and the most recent time tried building with make -j2 and the memory usage hasn't bloated in the same way.  I'm not sure if theres some specific targets being built, or if it's the -j4 instead of -j2 that was causing the issues.
Labels: M-72
Owner: chirantan@chromium.org
Status: Assigned (was: Untriaged)
Chirantan, any ideas why the balloon didn't catch this before the host went OOM?
Cc: sonnyrao@chromium.org
The balloon driver will log to /var/log/messages when it changes the config.  Can you see what it says there?

Part of the problem is that VM memory is oversubscribed.  So on a 16GB device, the VM gets 12GB.  However, if chrome os is already using 8GB then the VM can really only use 8GB before we start hitting low memory.  The balloon driver will kick in but we need to first take back the ~4GB that's unused before we actually start reclaiming used memory.  So if the allocations happen fast enough it's definitely possible to still hit the OOM killer.

What we really need here is to be more proactive (rather than reactive) about memory that's assigned to the VM.  So rather than waiting until there is memory pressure on the host to take back guest memory, we might try to keep track of how much memory is used in both host and guest and ensure that memory is not oversubscribed.  At the same time we don't want to limit VM memory too aggressively either.

+sonnyrao, since this seems like the kind of thing he might be interested in.


In any case the logs from the balloon device would tell us a lot about what happened here.
Did the host actually do an OOM kill or was it in the guest?

For use cases like this I wonder if it makes sense to let the guest do some swapping to it's own disk.  It'll be slow(er) but it won't OOM anything
I definitely don't have logs from it anymore, but will try and re-run it later in the week.  It was a guest kill I believe.

Sign in to add a comment