Issue metadata
Sign in to add a comment
|
ccache gave me the wrong results |
||||||||||||||||||||||
Issue description
I feel insane saying this, but I've done enough testing that I'm pretty confident...
I somehow managed to get my chroot into a state where ccache was causing me problems. Specifically, I had been playing around with enabling various debug options for the kernel via USE flags. I had just added in "debugobjects", so I was now doing:
time FEATURES="-buildpkg" USE="debugobjects kasan kmemleak lockdebug kgdb vtconsole" emerge-${BOARD} --nodeps chromeos-kernel-4_19
...and my board didn't boot and was crashing as per b/119071879 comment #10. If I turned off "debugobjects" it seemed to work. Also if I reverted various patches in my tree it also seemed to work. I could either revert 2 patches related to the display driver or a series of patches related to iommus. I didn't see the connection but it was very consistent...
...so I set out to debug...
I finally ended up in kgdb and after a bunch of head banging I found that the problem was that two files had a different idea of the size of the same structure:
---
(gdb) frame 3
#3 0xffffff9008bddb00 in dpu_plane_init (dev=0xffffffc0d9343680, pipe=<optimized out>, type=DRM_PLANE_TYPE_PRIMARY, possible_crtcs=<optimized out>, master_plane_id=0)
at /mnt/host/source/src/third_party/kernel/v4.19/drivers/gpu/drm/msm/disp/dpu1/dpu_plane.c:1547
1547 pdpu->pipe_hw = dpu_hw_sspp_init(pipe, kms->mmio, kms->catalog,
(gdb) print sizeof(struct dpu_power_handle)
$24 = 168
(gdb) frame 2
#2 0xffffff9008bbfa6c in dpu_hw_sspp_init (idx=SSPP_VIG0, addr=0x0, catalog=0xffffffc0d8c30080, is_virtual_pipe=<optimized out>)
at /mnt/host/source/src/third_party/kernel/v4.19/drivers/gpu/drm/msm/disp/dpu1/dpu_hw_sspp.c:715
715 kgdb_breakpoint();
(gdb) print sizeof(struct dpu_power_handle)
$25 = 176
---
...so I assumed the kernel incremental build was wrong. I blew away "/build/cheza/var/cache/portage/sys-kernel/chromeos-kernel-4_19" and re-built. I confirmed that my new build took up lots of CPU cycles so I'm pretty sure I blew away the right dir.
...and the problem persisted.
I kept poking. I soon found that I could make even trivial changes to the files involved (even just adding braces to "if" statements or adding a char to an error message) and the problem would go away. If I then stashed my changes the problems came back! Adding / changing comments didn't affect the problem though. This really pointed to ccache, which works on files after the C preprocessor strips comments.
...so I temporarily moved my ccache dir away:
mv /var/cache/chromeos-cache/distfiles/ccache /var/cache/chromeos-cache/distfiles/ccache-x
mkdir -p /var/cache/chromeos-cache/distfiles/ccache
sudo chown dianders.portage /var/cache/chromeos-cache/distfiles/ccache
...and then I touched a comment. The problem was fixed!
...and then I put the original ccache file back and touched a comment again. The problem came back.
===
I'm not sure I can create the problematic state at will, but I figured I'd document it for now since I spent all day debugging this... If I see it again I'll add more details here...
===
FYI I've now cleared my ccache:
CCACHE_DIR=/var/cache/chromeos-cache/distfiles/ccache ccache -C
...but I did tarball the ~10GB ccache first and threw the state of my tree at <https://chromium.googlesource.com/chromiumos/third_party/kernel/+log/refs/sandbox/dianders/181212-ccache-problems> in case somehow has the interest to actually dig into this.
,
Dec 14
team - what component can we assign to this bug so it does not show up in Chrome OS UI triage?
,
Dec 14
,
Dec 14
> Can you post the preprocessed files and compiler command lines for dpu_plane.c and dpu_hw_sspp.c? I can try. I'm not sure it's going to be terribly easy to inject myself into the kernel's build system to do this, but I can try to futz it...
,
Dec 14
will add "build" as a component. ccache is not part of the toolchain but I don't know if it has a clear owner.
,
Dec 14
easiest is prob to update our ccache version and see if that addresses things. otherwise, debugging ccache errors like this is a bit of a pita :/.
,
Dec 14
To generate the pre-processed files add the following to drivers/gpu/drm/msm/Makefile: subdir-ccflags-y += -save-temps there is probably a smaller hammer to only apply the flag to the files you are interested in, however I dunno how to specify that for files in sub-directories.
,
Dec 14
I wonder if this is because the msm display driver is weirdly integrated into the kernel build system and doesn't use the normal recursive makefile rules that all other kernel drivers use. Instead it splits up the driver into many sub-directories and then combines them all into one big module. It's been on my todo list to undo this and make either individual modules for each sub-component of the larger graphics "card" driver or flatten the hierarchy out and put things into one directory instead of so many sub-directories. Either way, does the ccache corruption happen anywhere else in the kernel? Because if it does then I'm totally wrong and this isn't related to the problem at hand. Otherwise, it's all the more reason to fix the way this driver is built in the kernel.
,
Dec 19
> subdir-ccflags-y += -save-temps
Oddly this causes the compile to fail.
/mnt/host/source/src/third_party/kernel/v4.19/drivers/gpu/drm/msm/disp/dpu1/dpu_encoder.c:1739:45: error: address of 'dpu_enc->base' will always evaluate to 'true'
[-Werror,-Wpointer-bool-conversion]
trace_dpu_enc_vsync_event_work(((&dpu_enc->base) ? (&dpu_enc->base)->base.id : -1), wakeup_time);
~~~~~~~~~^~~~ ~
...which seems like a bug, but why did it only get reported when I turned on -save-temps? If I add "-save-temps" more globally I start getting tons of warnings. Do you want me to file a bug about this, or is it expected?
In any case I'm slightly worried that adding "-save-temps" will be enough to make ccache stop showing the problem.
===
I'm also a little worried because I didn't think and I did a "repo sync" today. That probably means my compiler changed and (hopefully) ccache will ignore all the old stuff. I'll try to repro anyway... Doh, looks like I can't...
===
I'm inclined to wait to see if this type of thing happens again and/or upgrade ccache. ...then refer back to this.
,
Dec 19
Can you pass "-Wno-error" in addition to "-save-temps". I have seen that clang produces extra warnings with "-save-temps" that can cause compiles to fail.
,
Dec 19
I find getting the new warnings only under -save-temps very weird. Can you file a separate bug for that? The only instance of that I have seen before is documented in https://bugs.chromium.org/p/chromium/issues/detail?id=649740
,
Dec 19
and there is an update to that in http://peter.eisentraut.org/blog/2014/12/01/ccache-and-clang-part-3/ but I cannot see how what dianders is getting is related to that.
,
Dec 19
@12: created bug #916736 === I'll also note that with my chroot upgrade my old ccache doesn't seem to repro the problem anymore. Sigh. If I see it again I'll try to find a way to gather more data. |
|||||||||||||||||||||||
►
Sign in to add a comment |
|||||||||||||||||||||||
Comment 1 by lloz...@google.com
, Dec 13