New issue
Advanced search Search tips

Issue 841892 link

Starred by 1 user

Issue metadata

Status: Available
Owner: ----
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: Chrome
Pri: 1
Type: Bug



Sign in to add a comment

ABORT: Job aborted unexpectedly is on multiple tests for informational bots

Project Member Reported by sammiequon@chromium.org, May 10 2018

Issue description

ex:

https://cros-goldeneye.corp.google.com/chromeos/healthmonitoring/buildDetails?buildbucketId=8946929227055236320


security_ProfilePermissions.login         ABORT: Job aborted unexpectedly
login_OwnershipApi                      [ PASSED ]
security_SandboxStatus                  [ PASSED ]
security_RootfsStatefulSymlinks         [ PASSED ]
platform_CrosDisksDBus                  [ PASSED ]
login_OwnershipTaken                    [ PASSED ]
security_SysLogPermissions              [ PASSED ]
security_ChromiumOSLSM                  [ PASSED ]
security_RootCA                         [ PASSED ]
security_OpenSSLBlacklist               [ PASSED ]
platform_FilePerms                      [ PASSED ]
security_SandboxLinuxUnittests          [ PASSED ]
security_Minijail_seccomp               [ PASSED ]
login_LogoutProcessCleanup              [ FAILED ]
login_LogoutProcessCleanup                ABORT: Job aborted unexpectedly
logging_CrashSender                     [ PASSED ]
security_mprotect                       [ PASSED ]
login_MultiUserPolicy                   [ PASSED ]
security_DbusOwners                     [ PASSED ]
security_ProtocolFamilies               [ PASSED ]
security_Minijail0                      [ PASSED ]
login_CryptohomeIncognito               [ PASSED ]
login_Cryptohome                        [ FAILED ]
login_Cryptohome                          ABORT: Job aborted unexpectedly
security_StatefulPermissions            [ PASSED ]
login_LoginSuccess                      [ FAILED ]
login_LoginSuccess                        ABORT: Job aborted unexpectedly

 
INFO	----	----	kernel=3.8.11	localtime=May 10 05:46:56	timestamp=1525956416	
START	----	----	timestamp=1525956440	localtime=May 10 05:47:20	
	START	login_Cryptohome	login_Cryptohome	timestamp=1525956440	localtime=May 10 05:47:20	
INFO	----	----	Job aborted by autotest_system on 1525958492
Owner: nxia@chromium.org
Status: Assigned (was: Untriaged)
+nxia
Components: Infra>Client>ChromeOS
Another example:

I see "  @@@STEP_LINK@[Test-History]: Suite job@https://stainless.corp.google.com/search?test=^Suite\ job$&first_date=2018-04-12&last_date=2018-05-10&row=model&col=build&view=matrix@@@
  Will return from run_suite with status: INFRA_FAILURE"

in https://luci-logdog.appspot.com/v/?s=chromeos/buildbucket/cr-buildbucket.appspot.com/8946927370710783472/+/steps/HWTest__chrome-informational_/0/stdout

Problems seemed to start yesterday (Wed) around 9 AM Pacific.

Comment 4 by nxia@chromium.org, May 10 2018

following the aborted test login_Cryptohome, I see errors saying browser failed to start.

https://pantheon.corp.google.com/storage/browser/chromeos-autotest-results/198853814-chromeos-test/chromeos6-row2-rack11-host19/debug/



06:17:55 DEBUG|  > stdout=[], stderr=[Traceback (most recent call last):
  File "/usr/local/autotest/bin/screenshot.py", line 20, in <module>
    image = crtcScreenshot(args.crtc)
  File "/usr/local/autotest/cros/graphics/gbm.py", line 236, in crtcScreenshot
    "Unable to take screenshot. There may not be anything on the screen.")
RuntimeError: Unable to take screenshot. There may not be anything on the screen.
]
06:17:55 ERROR| Failed with LoginException while starting the browser backend.

...

06:17:58 INFO | Browser is closed.
06:17:58 ERROR| Timed out logging in, tries=1, error=LoginException("Timed out going through login screen. Browser didn't launch. ",)






It kept retrying "sh -c cryptohome-path user 'test@test.test'" until it's aborted. so would say it's a product issue rather than a lab infra issue. 


Comment 5 by nxia@chromium.org, May 10 2018

Cc: ayatane@chromium.org
Following #3, the caroline HWTests look strange. 


chromeos6-row1-rack23-host13 was occupied by a test (http://cautotest-prod/afe/#tab_id=view_job&object_id=198775403) for almost 8 hours.

    2018-05-10 08:24:40  OK http://cautotest.corp.google.com/tko/retrieve_logs.cgi?job=/results/hosts/chromeos6-row1-rack23-host13/861307-cleanup/
    2018-05-10 01:18:03  -- http://cautotest.corp.google.com/tko/retrieve_logs.cgi?job=/results/198775403-chromeos-test/
    2018-05-10 01:17:04  OK http://cautotest.corp.google.com/tko/retrieve_logs.cgi?job=/results/hosts/chromeos6-row1-rack23-host13/858992-reset/



the last related log I found is from lucifer, but the job was never run.

05/10 01:18:02.310 INFO |        luciferlib:0342| Spawning '/usr/bin/env', ['/usr/bin/env', 'GOOGLE_APPLICATION_CREDENTIALS=/creds/service_accounts/lucifer-drone.json', '/usr/local/autotest/bin/job_reporter', '--jobdir', '/usr/local/autotest/leases', '--run-job-path', '/opt/infra-tools/usr/bin/lucifer_run_job', '--lucifer-level', 'STARTING', '--job-id', '198775403', '--results-dir', u'/usr/local/autotest/results/198775403-chromeos-test/chromeos6-row1-rack23-host13', '--execution-tag', u'198775403-chromeos-test/chromeos6-row1-rack23-host13'], u'/usr/local/autotest/results/198775403-chromeos-test/chromeos6-row1-rack23-host13/lucifer/job_reporter_output.log'



 + ayatane@, any thoughts?



Looking at the logs, autoserv hanged

05/10 01:18:32.154 DEBUG|          autotest:1281| AUTOTEST_STATUS::START	----	----	timestamp=1525940311	localtime=May 10 01:18:31	
05/10 01:18:32.155 INFO |        server_job:0218| START	----	----	timestamp=1525940311	localtime=May 10 01:18:31	
05/10 01:18:32.215 DEBUG|          autotest:1281| AUTOTEST_STATUS::	START	cheets_PerformanceAppTest	cheets_PerformanceAppTest	timestamp=1525940311	localtime=May 10 01:18:31	
05/10 01:18:32.216 INFO |        server_job:0218| 	START	cheets_PerformanceAppTest	cheets_PerformanceAppTest	timestamp=1525940311	localtime=May 10 01:18:31	
05/10 08:24:12.406 ERROR|   logging_manager:0626| Current thread 0x00007fdb95ff1740:
05/10 08:24:12.407 DEBUG|          autoserv:0394| Received SIGTERM


The bottom of the client logs is

01:18:54 DEBUG| 'adb get-state'
01:18:54 DEBUG| adb get-state: device
01:18:54 DEBUG| Running 'adb shell 'logcat -c''
01:18:54 DEBUG| "adb shell 'logcat -c'"
01:18:54 INFO | Start PerformanceAppTest
01:18:54 DEBUG| Running 'android-sh -c 'getprop ro.data_mounted''
01:18:54 DEBUG| "android-sh -c 'getprop ro.data_mounted'"
01:18:54 DEBUG| Running 'adb get-state'
01:18:54 DEBUG| 'adb get-state'
01:18:54 DEBUG| adb get-state: device
01:18:54 DEBUG| Running 'adb shell 'am instrument -w -e targetpackage com.android.performanceLaunch -e launchcount 15 -e recordtrace false -r com.android.performanceapp.tests/android.support.test.runner.AndroidJUnitRunner''
01:18:54 DEBUG| "adb shell 'am instrument -w -e targetpackage com.android.performanceLaunch -e launchcount 15 -e recordtrace false -r com.android.performanceapp.tests/android.support.test.runner.AndroidJUnitRunner'"

Looks like adb hanged
The timeout on the test is 480 minutes = 8 hours, so that's why it took 8 hours

Comment 8 by nxia@chromium.org, May 10 2018

Cc: philipchen@chromium.org nxia@chromium.org adurbin@chromium.org
Owner: khmel@chromium.org
+ sheriffs & ARC Constables

khmel@, any inputs for the hung cheets_PerformanceAppTest ?

Comment 9 by khmel@chromium.org, May 10 2018

I think this is not adb, this is device hangs when it does not react to ssh, ping.
Similar to b/79404722 and several other reports from caroline.

Comment 10 by nxia@chromium.org, Jun 8 2018

Cc: -nxia@chromium.org
Owner: ----
Status: Available (was: Assigned)

Sign in to add a comment