New issue
Advanced search Search tips
Note: Color blocks (like or ) mean that a user may not be available. Tooltip shows the reason.

Issue 914583 link

Starred by 1 user

Issue metadata

Status: Untriaged
Owner: ----
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: Chrome
Pri: 2
Type: Bug
Build-Toolchain



Sign in to add a comment

ccache gave me the wrong results

Project Member Reported by diand...@chromium.org, Dec 12

Issue description

I feel insane saying this, but I've done enough testing that I'm pretty confident...

I somehow managed to get my chroot into a state where ccache was causing me problems.  Specifically, I had been playing around with enabling various debug options for the kernel via USE flags.  I had just added in "debugobjects", so I was now doing:

time FEATURES="-buildpkg" USE="debugobjects kasan kmemleak lockdebug kgdb vtconsole" emerge-${BOARD} --nodeps chromeos-kernel-4_19

...and my board didn't boot and was crashing as per b/119071879 comment #10.  If I turned off "debugobjects" it seemed to work.  Also if I reverted various patches in my tree it also seemed to work.  I could either revert 2 patches related to the display driver or a series of patches related to iommus.  I didn't see the connection but it was very consistent...

...so I set out to debug...

I finally ended up in kgdb and after a bunch of head banging I found that the problem was that two files had a different idea of the size of the same structure:

---

(gdb) frame 3
#3  0xffffff9008bddb00 in dpu_plane_init (dev=0xffffffc0d9343680, pipe=<optimized out>, type=DRM_PLANE_TYPE_PRIMARY, possible_crtcs=<optimized out>, master_plane_id=0)
    at /mnt/host/source/src/third_party/kernel/v4.19/drivers/gpu/drm/msm/disp/dpu1/dpu_plane.c:1547
1547            pdpu->pipe_hw = dpu_hw_sspp_init(pipe, kms->mmio, kms->catalog,
(gdb) print sizeof(struct dpu_power_handle)
$24 = 168

(gdb) frame 2
#2  0xffffff9008bbfa6c in dpu_hw_sspp_init (idx=SSPP_VIG0, addr=0x0, catalog=0xffffffc0d8c30080, is_virtual_pipe=<optimized out>)
    at /mnt/host/source/src/third_party/kernel/v4.19/drivers/gpu/drm/msm/disp/dpu1/dpu_hw_sspp.c:715
715                     kgdb_breakpoint();
(gdb) print sizeof(struct dpu_power_handle)
$25 = 176

---

...so I assumed the kernel incremental build was wrong.  I blew away "/build/cheza/var/cache/portage/sys-kernel/chromeos-kernel-4_19" and re-built.  I confirmed that my new build took up lots of CPU cycles so I'm pretty sure I blew away the right dir.

...and the problem persisted.


I kept poking.  I soon found that I could make even trivial changes to the files involved (even just adding braces to "if" statements or adding a char to an error message) and the problem would go away.  If I then stashed my changes the problems came back!  Adding / changing comments didn't affect the problem though.  This really pointed to ccache, which works on files after the C preprocessor strips comments.

...so I temporarily moved my ccache dir away:
  mv /var/cache/chromeos-cache/distfiles/ccache /var/cache/chromeos-cache/distfiles/ccache-x
  mkdir -p /var/cache/chromeos-cache/distfiles/ccache
  sudo chown dianders.portage /var/cache/chromeos-cache/distfiles/ccache

...and then I touched a comment.  The problem was fixed!

...and then I put the original ccache file back and touched a comment again.  The problem came back.

===

I'm not sure I can create the problematic state at will, but I figured I'd document it for now since I spent all day debugging this...  If I see it again I'll add more details here...

===

FYI I've now cleared my ccache:

  CCACHE_DIR=/var/cache/chromeos-cache/distfiles/ccache ccache -C

...but I did tarball the ~10GB ccache first and threw the state of my tree at <https://chromium.googlesource.com/chromiumos/third_party/kernel/+log/refs/sandbox/dianders/181212-ccache-problems> in case somehow has the interest to actually dig into this.
 
ccache bugs are pretty nasty when they happen. I have not seen one in a while...

Can you post the preprocessed files and compiler command lines for dpu_plane.c and dpu_hw_sspp.c?


team - what component can we assign to this bug so it does not show up in Chrome OS UI triage?
Components: Tools>ChromeOS-Toolchain
> Can you post the preprocessed files and compiler command lines for dpu_plane.c and dpu_hw_sspp.c?

I can try.  I'm not sure it's going to be terribly easy to inject myself into the kernel's build system to do this, but I can try to futz it...
Components: Infra>Client>ChromeOS>Build
will add "build" as a component. ccache is not part of the toolchain but I don't know if it has a clear owner.
easiest is prob to update our ccache version and see if that addresses things.  otherwise, debugging ccache errors like this is a bit of a pita :/.
To generate the pre-processed files add the following to drivers/gpu/drm/msm/Makefile:

subdir-ccflags-y += -save-temps

there is probably a smaller hammer to only apply the flag to the files you are interested in, however I dunno how to specify that for files in sub-directories.

I wonder if this is because the msm display driver is weirdly integrated into the kernel build system and doesn't use the normal recursive makefile rules that all other kernel drivers use. Instead it splits up the driver into many sub-directories and then combines them all into one big module. It's been on my todo list to undo this and make either individual modules for each sub-component of the larger graphics "card" driver or flatten the hierarchy out and put things into one directory instead of so many sub-directories.

Either way, does the ccache corruption happen anywhere else in the kernel? Because if it does then I'm totally wrong and this isn't related to the problem at hand. Otherwise, it's all the more reason to fix the way this driver is built in the kernel.
> subdir-ccflags-y += -save-temps

Oddly this causes the compile to fail.  

/mnt/host/source/src/third_party/kernel/v4.19/drivers/gpu/drm/msm/disp/dpu1/dpu_encoder.c:1739:45: error: address of 'dpu_enc->base' will always evaluate to 'true'
      [-Werror,-Wpointer-bool-conversion]
 trace_dpu_enc_vsync_event_work(((&dpu_enc->base) ? (&dpu_enc->base)->base.id : -1), wakeup_time);
                                   ~~~~~~~~~^~~~  ~

...which seems like a bug, but why did it only get reported when I turned on -save-temps?  If I add "-save-temps" more globally I start getting tons of warnings.  Do you want me to file a bug about this, or is it expected?

In any case I'm slightly worried that adding "-save-temps" will be enough to make ccache stop showing the problem.  

===

I'm also a little worried because I didn't think and I did a "repo sync" today.  That probably means my compiler changed and (hopefully) ccache will ignore all the old stuff.  I'll try to repro anyway...  Doh, looks like I can't...

===

I'm inclined to wait to see if this type of thing happens again and/or upgrade ccache.  ...then refer back to this.



Can you pass "-Wno-error" in addition to "-save-temps". I have seen that clang produces extra warnings with "-save-temps" that can cause compiles to fail.
I find getting the new warnings only under -save-temps very weird. 
Can you file a separate bug for that? 

The only instance of that I have seen before is documented in https://bugs.chromium.org/p/chromium/issues/detail?id=649740


and there is an update to that in http://peter.eisentraut.org/blog/2014/12/01/ccache-and-clang-part-3/

but I cannot see how what dianders is getting is related to that.


@12: created bug #916736

===

I'll also note that with my chroot upgrade my old ccache doesn't seem to repro the problem anymore.  Sigh.  If I see it again I'll try to find a way to gather more data.  

Sign in to add a comment