New issue
Advanced search Search tips
Note: Color blocks (like or ) mean that a user may not be available. Tooltip shows the reason.

Issue 887500 link

Starred by 1 user

Issue metadata

Status: Fixed
Owner:
Closed: Sep 20
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: Chrome
Pri: 1
Type: Bug



Sign in to add a comment

platform_AddPrinter fails with with ERROR: Server never terminated

Project Member Reported by bhthompson@google.com, Sep 20

Issue description

https://cros-goldeneye.corp.google.com/chromeos/healthmonitoring/buildDetails?buildbucketId=8934887945423558336
https://stainless.corp.google.com/browse/chromeos-autotest-results/239876830-chromeos-test/

...
09/20 02:20:59.086 ERROR|   logging_manager:0626| Testname: platform_AddPrinter
09/20 02:20:59.086 ERROR|   logging_manager:0626| ERROR: Server never terminated
09/20 02:20:59.086 ERROR|   logging_manager:0626| tko parser: parsing test None CLIENT_JOB.0
...

...
09/20 02:20:41.904 INFO |platform_AddPrinte:0113| scheduler is running
no system default destination
device for FakePrinterID: socket://127.0.0.1/
FakePrinterID accepting requests since Thu Sep 20 02:20:41 2018
printer FakePrinterID is idle.  enabled since Thu Sep 20 02:20:41 2018
09/20 02:20:41.905 DEBUG|             utils:0219| Running 'lp -d FakePrinterID /usr/local/autotest/tests/platform_AddPrinter/to_print.pdf'
09/20 02:20:51.931 DEBUG|              test:0381| Test failed due to ERROR: Server never terminated. Exception log follows the after_iteration_hooks.
09/20 02:20:51.932 DEBUG|              test:0386| Starting after_iteration_hooks for platform_AddPrinter
09/20 02:20:51.933 DEBUG|      base_sysinfo:0124| Loggable saves logs to /usr/local/autotest/results/default/platform_AddPrinter/sysinfo/iteration.1/interrupts.after
09/20 02:20:51.933 DEBUG|             utils:0219| Running 'mkdir -p /usr/local/autotest/results/default/platform_AddPrinter/sysinfo/iteration.1/var/spool'
09/20 02:20:51.940 DEBUG|             utils:0219| Running 'rsync --no-perms --chmod=ugo+r -a --safe-links --exclude=/crash/**autoserv* --exclude=/crash/*.core /var/spool/crash /usr/local/autotest/results/default/platform_AddPrinter/sysinfo/iteration.1/var/spool'
09/20 02:20:51.948 DEBUG|             utils:0219| Running 'rm -rf /var/spool/crash/*'
09/20 02:20:51.955 DEBUG|      base_sysinfo:0124| Loggable saves logs to /usr/local/autotest/results/default/platform_AddPrinter/sysinfo/iteration.1/meminfo.after
09/20 02:20:51.957 DEBUG|      base_sysinfo:0124| Loggable saves logs to /usr/local/autotest/results/default/platform_AddPrinter/sysinfo/iteration.1/slabinfo.after
09/20 02:20:51.957 DEBUG|      base_sysinfo:0124| Loggable saves logs to /usr/local/autotest/results/default/platform_AddPrinter/sysinfo/iteration.1/schedstat.after
09/20 02:20:51.957 DEBUG|             utils:0219| Running 'logger "autotest finished iteration /usr/local/autotest/results/default/platform_AddPrinter/sysinfo/iteration.1"'
09/20 02:20:51.963 DEBUG|              test:0391| after_iteration_hooks completed
09/20 02:20:51.963 WARNI|              test:0606| The test failed with the following exception
Traceback (most recent call last):
  File "/usr/local/autotest/common_lib/test.py", line 600, in _exec
    _call_test_function(self.execute, *p_args, **p_dargs)
  File "/usr/local/autotest/common_lib/test.py", line 800, in _call_test_function
    return func(*args, **dargs)
  File "/usr/local/autotest/common_lib/test.py", line 464, in execute
    postprocess_profiled_run, args, dargs)
  File "/usr/local/autotest/common_lib/test.py", line 371, in _call_run_once
    self.run_once(*args, **dargs)
  File "/usr/local/autotest/tests/platform_AddPrinter/platform_AddPrinter.py", line 189, in run_once
    self.print_a_page(os.path.join(current_dir, golden_file));
  File "/usr/local/autotest/tests/platform_AddPrinter/platform_AddPrinter.py", line 123, in print_a_page
    raise error.TestFail('ERROR: Server never terminated')
TestFail: ERROR: Server never terminated
09/20 02:20:51.969 DEBUG|              test:0611| Running cleanup for test.
09/20 02:20:51.971 DEBUG|   logging_manager:0627| Logging subprocess finished
...
 
Owner: vapier@chromium.org
We appear to have a hole in our commit queue, I am not able to find any runs of the platform_AddPrinter test in the CQ run that committed the CLs that mention CUPS, this does not appear to exist in bvt-tast-cq and I suspect we mistakenly only run that and not bvt-cq in the CQ :-/. 

We probably need to revert the changes that look suspect, for now I will focus on sealing off the CQ hole. 
Cc: derat@chromium.org
platform_AddPrinter is also failing in all PFQ informational builders running HWTest:

peach_pit-tot-chrome-pfq-informational
tricky-tot-chrome-pfq-informationa
eve-tot-chrome-pfq-informational
veyron_minnie-tot-chrome-pfq-informational

Looking for a narrow list of suspect changes (chrome and cros) in the pfq-informational builders.

I am fairly confident this is a CrOS side change, between 11081 and 11082 we have the same Chrome version, but we see the test fail.
Cc: akes...@chromium.org dgarr...@chromium.org jrbarnette@chromium.org
Components: -Internals>Printing>CUPS Infra>Client>ChromeOS>CI
Labels: -Pri-2 Pri-1
Owner: derat@chromium.org
Status: Assigned (was: Untriaged)
Summary: bvt-cq not running on -paladin builders since Aug 3 (was: Peppy Chrome PFQ builder failed platform_AddPrinter with ERROR: Server never terminated)
I don't have a deep understanding of which suites we run where, but from http://stainless/search?view=matrix&row=builder_name&col=build&first_date=2018-08-01&last_date=2018-09-01&builder_name=-paladin%24&suite=%5Ebvt-cq%24&exclude_cts=false&exclude_not_run=false&exclude_non_release=false&exclude_au=true&exclude_acts=true&exclude_retried=true&exclude_non_production=false, I think that bvt-cq stopped running on -paladin builders after R70-10932.0.0-rc1 on Aug 3 UTC.

This lines up with my https://crrev.com/c/1146146, which introduced some bvt-tast suites and updated where we run them.

I'll need some help debugging this from people who understand chromeos_config.py. I don't know that we can turn off the Tast suites without causing problems (e.g. Tast tests run on the Chrome CQ, so we should run them on the Chrome OS CQ to protect that).
Owner: vapier@chromium.org
Summary: platform_AddPrinter fails with with ERROR: Server never terminated (was: bvt-cq not running on -paladin builders since Aug 3)
We still need to fix the printer failure, we can do that here, and fork the CQ coverage problem to https://bugs.chromium.org/p/chromium/issues/detail?id=887536
Cc: rbpotter@chromium.org
Owner: steve...@chromium.org
Status: Started (was: Assigned)
bhthompson@ - 11081 also failed in addPrinter. I'm looking at pfq-informational diffs and it seems like there are no CrOS changes between failures, even though this doesn't seem like a failure that could eb caused by a chrome change. I'm trying to verify that.

platform_AddPrinter.generic is supposed to be run on CQ while platform_AddPrinter.epson is not since it downloads Epson driver from production Omaha.

If the problem is platform_AddPrinter.generic, then I think most likely it's the CrOS side.
The actual test failed is platform_AddPrinter.generic
It looks like pdftopdf failed? 

/var/log/message:

2018-09-20T09:20:45.948080+00:00 DEBUG cupsd[3316]: [Job 1] envp[29]="AUTH_I****"
2018-09-20T09:20:45.948093+00:00 DEBUG cupsd[3316]: [Job 1] STATE: +connecting-to-device
2018-09-20T09:20:45.948106+00:00 DEBUG cupsd[3316]: [Job 1] Looking up \"127.0.0.1\"...
2018-09-20T09:20:45.948118+00:00 DEBUG cupsd[3316]: [Job 1] WARN: not compiled with DBus support
2018-09-20T09:20:45.948131+00:00 DEBUG cupsd[3316]: [Job 1] \'CM Color Calibration\' Mode in SPOOLER-LESS: Off
2018-09-20T09:20:45.948143+00:00 DEBUG cupsd[3316]: [Job 1] Getting input from file 
2018-09-20T09:20:45.948156+00:00 DEBUG cupsd[3316]: [Job 1] foomatic-rip version 1.17.8 running...
2018-09-20T09:20:45.948274+00:00 DEBUG cupsd[3316]: [Job 1] Process is dying with \"Unable to open PPD file /var/spool/cups/tmp/5ba36669e648d
2018-09-20T09:20:45.948292+00:00 DEBUG cupsd[3316]: [Job 1] \", exit stat 9
2018-09-20T09:20:45.948304+00:00 DEBUG cupsd[3316]: [Job 1] Cleaning up...
2018-09-20T09:20:45.948316+00:00 DEBUG cupsd[3316]: [Job 1] pdftopdf: Last filter determined by the PPD: foomatic-rip; FINAL_CONTENT_TYPE: application/vnd.cups-pdf => pdftopdf will log pages in page_log.
2018-09-20T09:20:45.948328+00:00 DEBUG cupsd[3316]: [Job 1] PAGE: 1 1
2018-09-20T09:20:45.948341+00:00 DEBUG cupsd[3316]: [Job 1] PID 2 (/usr/libexec/cups/filter/pdftopdf) did not catch or ignore signal 13.
2018-09-20T09:20:45.948580+00:00 DEBUG cupsd[3316]: [Job 1] hrDeviceDesc=\"Unknown\"
2018-09-20T09:20:45.948604+00:00 DEBUG cupsd[3316]: [Job 1] prtGeneralCurrentLocalization type is 0, expected 2!
2018-09-20T09:20:45.948617+00:00 DEBUG cupsd[3316]: [Job 1] backendWaitLoop(snmp_fd=8, addr=0x7f08b9bf5c08, side_cb=0x7f08b98abb00)
2018-09-20T09:20:45.948631+00:00 DEBUG cupsd[3316]: [Job 1] End of messages
2018-09-20T09:20:45.948645+00:00 DEBUG cupsd[3316]: [Job 1] printer-state=3(idle)
2018-09-20T09:20:45.948659+00:00 DEBUG cupsd[3316]: [Job 1] printer-state-message="Filter failed"
2018-09-20T09:20:45.948674+00:00 DEBUG cupsd[3316]: [Job 1] printer-state-reasons=none


xiaochu@ - Somehow it doesn't *seem* like there were any CrOS changes between successful and failed runs.

I narrowed down the potential chrome change failures to:
#523533 - #592641

I'm attempting a bisect on tricky:

https://cros-goldeneye.corp.google.com/chromeos/legoland/builderHistory?buildConfig=tricky-tot-chrome-pfq-informational-tryjob&buildBranch=master

cros tryjob commands:

10042  [2018-09-20 10:02:20 -0700] cros tryjob tricky-tot-chrome-pfq-informational-tryjob --chrome_version 4f1097d5a1eacbc383307664fca37365410abfa6
10043  [2018-09-20 10:02:57 -0700] cros tryjob tricky-tot-chrome-pfq-informational-tryjob --chrome_version 6d9de2f10b8588204101b30085b7e9c62bafba57
10044  [2018-09-20 10:03:37 -0700] cros tryjob tricky-tot-chrome-pfq-informational-tryjob --chrome_version 7f6fac7d516212457e6c093b8db016423ba2f85f
10045  [2018-09-20 10:04:09 -0700] cros tryjob tricky-tot-chrome-pfq-informational-tryjob --chrome_version 2457438e33bea6b89c3638226e6ac9824b61a80f
10046  [2018-09-20 10:04:38 -0700] cros tryjob tricky-tot-chrome-pfq-informational-tryjob --chrome_version 406eb8d071e631e9bbbed0149e8d8a94b1aad033


I'm suspecting this change but not exactly sure: https://chromium-review.googlesource.com/c/chromiumos/overlays/chromiumos-overlay/+/1232833

Reason:

Process is dying with \"Unable to open PPD file /var/spool/cups/tmp/5ba36669e648d


If we think that is it, let us revert first, we can always reland. 

https://chromium-review.googlesource.com/c/chromiumos/overlays/chromiumos-overlay/+/1236994
I'm not sure what builders those are for exactly, but I am looking at the pfq-informational builders, which use "tot" Chrome OS + tot Chrome:

https://cros-goldeneye.corp.google.com/chromeos/legoland/builderSummary?builderGroups=chrome_pfq%2Cchrome_informational&limit=4&buildBranch=master

We're in the process of updating our ghostscript package.  It shouldn't have slipped through CQ though.
In the failed logs, looks like there's a denied symlink traversal at /var/spool/cups/tmp/...
Owner: skau@chromium.org
it's likely the reverted CL should have fixed any symlink failures under /var/spool/cups
Status: Fixed (was: Started)
Latest run in stainless is clean.  Closing this, will get back to patching CUPS.
vapier@ - This failed after the revert in comment #16. Did you mean a different revert? (Failing build https://cros-goldeneye.corp.google.com/chromeos/healthmonitoring/buildDetails?buildbucketId=8934844191606918272 inludes chromiumos-overlay @ 25eadd23 which includes "25eadd2 Revert "cups: tighten symlink exception to /var/spool/cups" by Bernie Thompson ยท" :
https://chromium.googlesource.com/chromiumos/overlays/chromiumos-overlay/+log/25eadd23

skau@: What "latest run in stainless" ?


R71-11084.0.0 indicates passing.

go/xlmnh
I'm not familiar with stainless, and I can't find a way to get back to the builder... are those results from the 'release' builders? I didn't realize that the failure was showing up there also.

It looks like the pfq-informational builders just picked up post 11084 builds, so hopefully they will start passing (or at least stop failing with this symptom).

Sign in to add a comment