New issue
Advanced search Search tips
Note: Color blocks (like or ) mean that a user may not be available. Tooltip shows the reason.

Issue 780108 link

Starred by 1 user

Issue metadata

Status: Archived
Owner: ----
Closed: May 2018
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: ----
Pri: 3
Type: Bug



Sign in to add a comment

backup scheduler server piling up a hilarious number of logs

Project Member Reported by akes...@chromium.org, Oct 31 2017

Issue description

chromeos-test@chromeos-server8:/usr/local/autotest/logs$ ls | wc
1127919 1127919 38351206

Presumably because we've just been crashlooping the scheduler on it for months and months.

I've deleted the logs for now.
 
10/31 10:59:03.448 INFO |        monitor_db:0150| os.environ: {'USERNAME': 'chromeos-test', 'SUDO_COMMAND': '/usr/local/autotest/scheduler/monitor_db.py /usr/local/autotest/results --production', 'TERM': 'linux', 'SHELL': '/bin/bash', 'TZ': 'America/Los_Angeles', 'DJANGO_SETTINGS_MODULE': 'autotest_lib.frontend.settings', 'SUDO_UID': '0', 'SUDO_GID': '0', 'LOGNAME': 'chromeos-test', 'USER': 'chromeos-test', 'NO_GCE_CHECK': 'False', 'MAIL': '/var/mail/chromeos-test', 'PATH': '/usr/sbin:/usr/bin:/sbin:/bin', 'SUDO_USER': 'root', 'HOME': '/usr/local/google/home/chromeos-test'}
10/31 10:59:03.448 WARNI| metadata_reporter:0142| Elasticsearch db deprecated, no metadata will be reported.
10/31 10:59:03.450 INFO | metadata_reporter:0150| Metadata reporting thread is started.
10/31 10:59:03.459 INFO |    connectionpool:0207| Starting new HTTP connection (1): metadata.google.internal
10/31 10:59:03.539 INFO |        monitor_db:0200| 10:59:03 10/31/17> dispatcher starting
10/31 10:59:03.539 INFO |        monitor_db:0201| My PID is 28004
10/31 10:59:03.614 NOTIC|      cros_logging:0038| ts_mon was set up.
10/31 10:59:03.712 ERROR|        monitor_db:0181| Server chromeos-server8.mtv.corp.google.com does not have role of scheduler running in status primary.
Traceback (most recent call last):
  File "/usr/local/autotest/scheduler/monitor_db.py", line 164, in main_without_exception_handling
    initialize()
  File "/usr/local/autotest/scheduler/monitor_db.py", line 218, in initialize
    role='scheduler')
  File "/usr/local/autotest/site_utils/server_manager_utils.py", line 374, in confirm_server_has_role
    'status primary.' % (hostname, role))
ServerActionError: Server chromeos-server8.mtv.corp.google.com does not have role of scheduler running in status primary.
10/31 10:59:03.716 INFO |     ts_mon_config:0207| Waiting for ts_mon flushing process to finish...
10/31 10:59:03.732 INFO |     ts_mon_config:0213| Finished waiting for ts_mon process.
10/31 10:59:03.750 INFO | metadata_reporter:0164| Waiting up to 5 seconds for metadata reporting thread to complete.
10/31 10:59:03.751 ERROR|        monitor_db:0100| Exception escaping in monitor_db
Traceback (most recent call last):
  File "/usr/local/autotest/scheduler/monitor_db.py", line 96, in main
    main_without_exception_handling()
  File "/usr/local/autotest/scheduler/monitor_db.py", line 189, in main_without_exception_handling
    _drone_manager.shutdown()
AttributeError: 'NoneType' object has no attribute 'shutdown'
We could also add a 10 minute sleep prior to shutdown for services that see themselves as non-primary.
Cc: pprabhu@chromium.org dshi@chromium.org shuqianz@chromium.org ayatane@chromium.org
I honestly think that the 'backup' status hasn't been useful except as a step during provision (it's ready but not in prod yet etc). I don't remember a single instance where we brought a backup into primary status with confidence that it'll work.

Tell me if I'm wrong. If not, perhaps we can stop having these backup servers completely?
server8 should have a cron job to clean up logs

chromeos-test@chromeos-server8:~$ crontab -l
# HEADER: This file was autogenerated at 2017-09-15 16:45:12 -0700 by puppet.
# HEADER: While it can still be managed manually, it is definitely not recommended.
# HEADER: Note particularly that the comments starting with 'Puppet Name' should
# HEADER: not be deleted, as doing so could cause duplicate cron jobs.
# Puppet Name: update_test_control_files
0 17 * * 1,2,3,4,5 /usr/local/autotest/utils/test_importer.py >> /var/log/test_importer.log 2>&1
# Puppet Name: clean autotest service logs
0 12 * * * /usr/bin/find /usr/local/autotest/logs/ -type f -name '*.log.*' -o -name '*_log_*' -mtime +14 -delete
Status: Archived (was: Untriaged)
No more backup server

Sign in to add a comment