platform_AddPrinter fails with with ERROR: Server never terminated |
|||||||||
Issue descriptionhttps://cros-goldeneye.corp.google.com/chromeos/healthmonitoring/buildDetails?buildbucketId=8934887945423558336 https://stainless.corp.google.com/browse/chromeos-autotest-results/239876830-chromeos-test/ ... 09/20 02:20:59.086 ERROR| logging_manager:0626| Testname: platform_AddPrinter 09/20 02:20:59.086 ERROR| logging_manager:0626| ERROR: Server never terminated 09/20 02:20:59.086 ERROR| logging_manager:0626| tko parser: parsing test None CLIENT_JOB.0 ... ... 09/20 02:20:41.904 INFO |platform_AddPrinte:0113| scheduler is running no system default destination device for FakePrinterID: socket://127.0.0.1/ FakePrinterID accepting requests since Thu Sep 20 02:20:41 2018 printer FakePrinterID is idle. enabled since Thu Sep 20 02:20:41 2018 09/20 02:20:41.905 DEBUG| utils:0219| Running 'lp -d FakePrinterID /usr/local/autotest/tests/platform_AddPrinter/to_print.pdf' 09/20 02:20:51.931 DEBUG| test:0381| Test failed due to ERROR: Server never terminated. Exception log follows the after_iteration_hooks. 09/20 02:20:51.932 DEBUG| test:0386| Starting after_iteration_hooks for platform_AddPrinter 09/20 02:20:51.933 DEBUG| base_sysinfo:0124| Loggable saves logs to /usr/local/autotest/results/default/platform_AddPrinter/sysinfo/iteration.1/interrupts.after 09/20 02:20:51.933 DEBUG| utils:0219| Running 'mkdir -p /usr/local/autotest/results/default/platform_AddPrinter/sysinfo/iteration.1/var/spool' 09/20 02:20:51.940 DEBUG| utils:0219| Running 'rsync --no-perms --chmod=ugo+r -a --safe-links --exclude=/crash/**autoserv* --exclude=/crash/*.core /var/spool/crash /usr/local/autotest/results/default/platform_AddPrinter/sysinfo/iteration.1/var/spool' 09/20 02:20:51.948 DEBUG| utils:0219| Running 'rm -rf /var/spool/crash/*' 09/20 02:20:51.955 DEBUG| base_sysinfo:0124| Loggable saves logs to /usr/local/autotest/results/default/platform_AddPrinter/sysinfo/iteration.1/meminfo.after 09/20 02:20:51.957 DEBUG| base_sysinfo:0124| Loggable saves logs to /usr/local/autotest/results/default/platform_AddPrinter/sysinfo/iteration.1/slabinfo.after 09/20 02:20:51.957 DEBUG| base_sysinfo:0124| Loggable saves logs to /usr/local/autotest/results/default/platform_AddPrinter/sysinfo/iteration.1/schedstat.after 09/20 02:20:51.957 DEBUG| utils:0219| Running 'logger "autotest finished iteration /usr/local/autotest/results/default/platform_AddPrinter/sysinfo/iteration.1"' 09/20 02:20:51.963 DEBUG| test:0391| after_iteration_hooks completed 09/20 02:20:51.963 WARNI| test:0606| The test failed with the following exception Traceback (most recent call last): File "/usr/local/autotest/common_lib/test.py", line 600, in _exec _call_test_function(self.execute, *p_args, **p_dargs) File "/usr/local/autotest/common_lib/test.py", line 800, in _call_test_function return func(*args, **dargs) File "/usr/local/autotest/common_lib/test.py", line 464, in execute postprocess_profiled_run, args, dargs) File "/usr/local/autotest/common_lib/test.py", line 371, in _call_run_once self.run_once(*args, **dargs) File "/usr/local/autotest/tests/platform_AddPrinter/platform_AddPrinter.py", line 189, in run_once self.print_a_page(os.path.join(current_dir, golden_file)); File "/usr/local/autotest/tests/platform_AddPrinter/platform_AddPrinter.py", line 123, in print_a_page raise error.TestFail('ERROR: Server never terminated') TestFail: ERROR: Server never terminated 09/20 02:20:51.969 DEBUG| test:0611| Running cleanup for test. 09/20 02:20:51.971 DEBUG| logging_manager:0627| Logging subprocess finished ...
,
Sep 20
I don't see anything obvious here https://chromium.googlesource.com/chromium/src/+log/71.0.3556.0..71.0.3557.0?n=10000 at least not related to CUPS, there are a few print preview related changes. Actually it looks like something broke on Chrome OS. https://stainless.corp.google.com/search?view=matrix&row=build&col=test&first_date=2018-09-13&last_date=2018-09-27&suite=%5CQbvt-cq%5CE&test=platform_AddPrinter&exclude_cts=false&exclude_not_run=false&exclude_non_release=true&exclude_au=false&exclude_acts=true&exclude_retried=false&exclude_non_production=false starting in 11082. So something in https://crosland.corp.google.com/log/11081.0.0..11082.0.0 the only changes that mention cups are from vapier@.
,
Sep 20
We appear to have a hole in our commit queue, I am not able to find any runs of the platform_AddPrinter test in the CQ run that committed the CLs that mention CUPS, this does not appear to exist in bvt-tast-cq and I suspect we mistakenly only run that and not bvt-cq in the CQ :-/. We probably need to revert the changes that look suspect, for now I will focus on sealing off the CQ hole.
,
Sep 20
,
Sep 20
platform_AddPrinter is also failing in all PFQ informational builders running HWTest: peach_pit-tot-chrome-pfq-informational tricky-tot-chrome-pfq-informationa eve-tot-chrome-pfq-informational veyron_minnie-tot-chrome-pfq-informational
,
Sep 20
Looking for a narrow list of suspect changes (chrome and cros) in the pfq-informational builders.
,
Sep 20
I am fairly confident this is a CrOS side change, between 11081 and 11082 we have the same Chrome version, but we see the test fail.
,
Sep 20
I don't have a deep understanding of which suites we run where, but from http://stainless/search?view=matrix&row=builder_name&col=build&first_date=2018-08-01&last_date=2018-09-01&builder_name=-paladin%24&suite=%5Ebvt-cq%24&exclude_cts=false&exclude_not_run=false&exclude_non_release=false&exclude_au=true&exclude_acts=true&exclude_retried=true&exclude_non_production=false, I think that bvt-cq stopped running on -paladin builders after R70-10932.0.0-rc1 on Aug 3 UTC. This lines up with my https://crrev.com/c/1146146, which introduced some bvt-tast suites and updated where we run them. I'll need some help debugging this from people who understand chromeos_config.py. I don't know that we can turn off the Tast suites without causing problems (e.g. Tast tests run on the Chrome CQ, so we should run them on the Chrome OS CQ to protect that).
,
Sep 20
We still need to fix the printer failure, we can do that here, and fork the CQ coverage problem to https://bugs.chromium.org/p/chromium/issues/detail?id=887536
,
Sep 20
bhthompson@ - 11081 also failed in addPrinter. I'm looking at pfq-informational diffs and it seems like there are no CrOS changes between failures, even though this doesn't seem like a failure that could eb caused by a chrome change. I'm trying to verify that.
,
Sep 20
platform_AddPrinter.generic is supposed to be run on CQ while platform_AddPrinter.epson is not since it downloads Epson driver from production Omaha. If the problem is platform_AddPrinter.generic, then I think most likely it's the CrOS side.
,
Sep 20
The actual test failed is platform_AddPrinter.generic
,
Sep 20
It looks like pdftopdf failed? /var/log/message: 2018-09-20T09:20:45.948080+00:00 DEBUG cupsd[3316]: [Job 1] envp[29]="AUTH_I****" 2018-09-20T09:20:45.948093+00:00 DEBUG cupsd[3316]: [Job 1] STATE: +connecting-to-device 2018-09-20T09:20:45.948106+00:00 DEBUG cupsd[3316]: [Job 1] Looking up \"127.0.0.1\"... 2018-09-20T09:20:45.948118+00:00 DEBUG cupsd[3316]: [Job 1] WARN: not compiled with DBus support 2018-09-20T09:20:45.948131+00:00 DEBUG cupsd[3316]: [Job 1] \'CM Color Calibration\' Mode in SPOOLER-LESS: Off 2018-09-20T09:20:45.948143+00:00 DEBUG cupsd[3316]: [Job 1] Getting input from file 2018-09-20T09:20:45.948156+00:00 DEBUG cupsd[3316]: [Job 1] foomatic-rip version 1.17.8 running... 2018-09-20T09:20:45.948274+00:00 DEBUG cupsd[3316]: [Job 1] Process is dying with \"Unable to open PPD file /var/spool/cups/tmp/5ba36669e648d 2018-09-20T09:20:45.948292+00:00 DEBUG cupsd[3316]: [Job 1] \", exit stat 9 2018-09-20T09:20:45.948304+00:00 DEBUG cupsd[3316]: [Job 1] Cleaning up... 2018-09-20T09:20:45.948316+00:00 DEBUG cupsd[3316]: [Job 1] pdftopdf: Last filter determined by the PPD: foomatic-rip; FINAL_CONTENT_TYPE: application/vnd.cups-pdf => pdftopdf will log pages in page_log. 2018-09-20T09:20:45.948328+00:00 DEBUG cupsd[3316]: [Job 1] PAGE: 1 1 2018-09-20T09:20:45.948341+00:00 DEBUG cupsd[3316]: [Job 1] PID 2 (/usr/libexec/cups/filter/pdftopdf) did not catch or ignore signal 13. 2018-09-20T09:20:45.948580+00:00 DEBUG cupsd[3316]: [Job 1] hrDeviceDesc=\"Unknown\" 2018-09-20T09:20:45.948604+00:00 DEBUG cupsd[3316]: [Job 1] prtGeneralCurrentLocalization type is 0, expected 2! 2018-09-20T09:20:45.948617+00:00 DEBUG cupsd[3316]: [Job 1] backendWaitLoop(snmp_fd=8, addr=0x7f08b9bf5c08, side_cb=0x7f08b98abb00) 2018-09-20T09:20:45.948631+00:00 DEBUG cupsd[3316]: [Job 1] End of messages 2018-09-20T09:20:45.948645+00:00 DEBUG cupsd[3316]: [Job 1] printer-state=3(idle) 2018-09-20T09:20:45.948659+00:00 DEBUG cupsd[3316]: [Job 1] printer-state-message="Filter failed" 2018-09-20T09:20:45.948674+00:00 DEBUG cupsd[3316]: [Job 1] printer-state-reasons=none
,
Sep 20
xiaochu@ - Somehow it doesn't *seem* like there were any CrOS changes between successful and failed runs. I narrowed down the potential chrome change failures to: #523533 - #592641 I'm attempting a bisect on tricky: https://cros-goldeneye.corp.google.com/chromeos/legoland/builderHistory?buildConfig=tricky-tot-chrome-pfq-informational-tryjob&buildBranch=master cros tryjob commands: 10042 [2018-09-20 10:02:20 -0700] cros tryjob tricky-tot-chrome-pfq-informational-tryjob --chrome_version 4f1097d5a1eacbc383307664fca37365410abfa6 10043 [2018-09-20 10:02:57 -0700] cros tryjob tricky-tot-chrome-pfq-informational-tryjob --chrome_version 6d9de2f10b8588204101b30085b7e9c62bafba57 10044 [2018-09-20 10:03:37 -0700] cros tryjob tricky-tot-chrome-pfq-informational-tryjob --chrome_version 7f6fac7d516212457e6c093b8db016423ba2f85f 10045 [2018-09-20 10:04:09 -0700] cros tryjob tricky-tot-chrome-pfq-informational-tryjob --chrome_version 2457438e33bea6b89c3638226e6ac9824b61a80f 10046 [2018-09-20 10:04:38 -0700] cros tryjob tricky-tot-chrome-pfq-informational-tryjob --chrome_version 406eb8d071e631e9bbbed0149e8d8a94b1aad033
,
Sep 20
I'm suspecting this change but not exactly sure: https://chromium-review.googlesource.com/c/chromiumos/overlays/chromiumos-overlay/+/1232833 Reason: Process is dying with \"Unable to open PPD file /var/spool/cups/tmp/5ba36669e648d
,
Sep 20
If we think that is it, let us revert first, we can always reland. https://chromium-review.googlesource.com/c/chromiumos/overlays/chromiumos-overlay/+/1236994
,
Sep 20
This is still failing even after the revert from comment #16 :( https://cros-goldeneye.corp.google.com/chromeos/healthmonitoring/buildDetails?buildbucketId=8934844191606918272
,
Sep 20
stevenjb@ is pfq running CrOS 11084 now? It looks like starting from 11084, platform_AddPrinter is green again: https://stainless.corp.google.com/search?test=%5Eplatform%5C_AddPrinter%5C.generic%24&exclude_non_release=true&exclude_cts=true&col=build&row=board_model&view=matrix&first_date=2018-09-14&last_date=2018-09-20
,
Sep 20
I'm not sure what builders those are for exactly, but I am looking at the pfq-informational builders, which use "tot" Chrome OS + tot Chrome: https://cros-goldeneye.corp.google.com/chromeos/legoland/builderSummary?builderGroups=chrome_pfq%2Cchrome_informational&limit=4&buildBranch=master
,
Sep 20
We're in the process of updating our ghostscript package. It shouldn't have slipped through CQ though.
,
Sep 20
In the failed logs, looks like there's a denied symlink traversal at /var/spool/cups/tmp/...
,
Sep 20
,
Sep 20
it's likely the reverted CL should have fixed any symlink failures under /var/spool/cups
,
Sep 20
Latest run in stainless is clean. Closing this, will get back to patching CUPS.
,
Sep 20
vapier@ - This failed after the revert in comment #16. Did you mean a different revert? (Failing build https://cros-goldeneye.corp.google.com/chromeos/healthmonitoring/buildDetails?buildbucketId=8934844191606918272 inludes chromiumos-overlay @ 25eadd23 which includes "25eadd2 Revert "cups: tighten symlink exception to /var/spool/cups" by Bernie Thompson ยท" : https://chromium.googlesource.com/chromiumos/overlays/chromiumos-overlay/+log/25eadd23 skau@: What "latest run in stainless" ?
,
Sep 21
R71-11084.0.0 indicates passing. go/xlmnh
,
Sep 21
I'm not familiar with stainless, and I can't find a way to get back to the builder... are those results from the 'release' builders? I didn't realize that the failure was showing up there also. It looks like the pfq-informational builders just picked up post 11084 builds, so hopefully they will start passing (or at least stop failing with this symptom). |
|||||||||
►
Sign in to add a comment |
|||||||||
Comment 1 by bhthompson@google.com
, Sep 20