backup scheduler server piling up a hilarious number of logs |
|||
Issue descriptionchromeos-test@chromeos-server8:/usr/local/autotest/logs$ ls | wc 1127919 1127919 38351206 Presumably because we've just been crashlooping the scheduler on it for months and months. I've deleted the logs for now.
,
Oct 31 2017
We could also add a 10 minute sleep prior to shutdown for services that see themselves as non-primary.
,
Oct 31 2017
,
Oct 31 2017
I honestly think that the 'backup' status hasn't been useful except as a step during provision (it's ready but not in prod yet etc). I don't remember a single instance where we brought a backup into primary status with confidence that it'll work. Tell me if I'm wrong. If not, perhaps we can stop having these backup servers completely?
,
Oct 31 2017
server8 should have a cron job to clean up logs chromeos-test@chromeos-server8:~$ crontab -l # HEADER: This file was autogenerated at 2017-09-15 16:45:12 -0700 by puppet. # HEADER: While it can still be managed manually, it is definitely not recommended. # HEADER: Note particularly that the comments starting with 'Puppet Name' should # HEADER: not be deleted, as doing so could cause duplicate cron jobs. # Puppet Name: update_test_control_files 0 17 * * 1,2,3,4,5 /usr/local/autotest/utils/test_importer.py >> /var/log/test_importer.log 2>&1 # Puppet Name: clean autotest service logs 0 12 * * * /usr/bin/find /usr/local/autotest/logs/ -type f -name '*.log.*' -o -name '*_log_*' -mtime +14 -delete
,
May 11 2018
No more backup server |
|||
►
Sign in to add a comment |
|||
Comment 1 by akes...@chromium.org
, Oct 31 201710/31 10:59:03.448 INFO | monitor_db:0150| os.environ: {'USERNAME': 'chromeos-test', 'SUDO_COMMAND': '/usr/local/autotest/scheduler/monitor_db.py /usr/local/autotest/results --production', 'TERM': 'linux', 'SHELL': '/bin/bash', 'TZ': 'America/Los_Angeles', 'DJANGO_SETTINGS_MODULE': 'autotest_lib.frontend.settings', 'SUDO_UID': '0', 'SUDO_GID': '0', 'LOGNAME': 'chromeos-test', 'USER': 'chromeos-test', 'NO_GCE_CHECK': 'False', 'MAIL': '/var/mail/chromeos-test', 'PATH': '/usr/sbin:/usr/bin:/sbin:/bin', 'SUDO_USER': 'root', 'HOME': '/usr/local/google/home/chromeos-test'} 10/31 10:59:03.448 WARNI| metadata_reporter:0142| Elasticsearch db deprecated, no metadata will be reported. 10/31 10:59:03.450 INFO | metadata_reporter:0150| Metadata reporting thread is started. 10/31 10:59:03.459 INFO | connectionpool:0207| Starting new HTTP connection (1): metadata.google.internal 10/31 10:59:03.539 INFO | monitor_db:0200| 10:59:03 10/31/17> dispatcher starting 10/31 10:59:03.539 INFO | monitor_db:0201| My PID is 28004 10/31 10:59:03.614 NOTIC| cros_logging:0038| ts_mon was set up. 10/31 10:59:03.712 ERROR| monitor_db:0181| Server chromeos-server8.mtv.corp.google.com does not have role of scheduler running in status primary. Traceback (most recent call last): File "/usr/local/autotest/scheduler/monitor_db.py", line 164, in main_without_exception_handling initialize() File "/usr/local/autotest/scheduler/monitor_db.py", line 218, in initialize role='scheduler') File "/usr/local/autotest/site_utils/server_manager_utils.py", line 374, in confirm_server_has_role 'status primary.' % (hostname, role)) ServerActionError: Server chromeos-server8.mtv.corp.google.com does not have role of scheduler running in status primary. 10/31 10:59:03.716 INFO | ts_mon_config:0207| Waiting for ts_mon flushing process to finish... 10/31 10:59:03.732 INFO | ts_mon_config:0213| Finished waiting for ts_mon process. 10/31 10:59:03.750 INFO | metadata_reporter:0164| Waiting up to 5 seconds for metadata reporting thread to complete. 10/31 10:59:03.751 ERROR| monitor_db:0100| Exception escaping in monitor_db Traceback (most recent call last): File "/usr/local/autotest/scheduler/monitor_db.py", line 96, in main main_without_exception_handling() File "/usr/local/autotest/scheduler/monitor_db.py", line 189, in main_without_exception_handling _drone_manager.shutdown() AttributeError: 'NoneType' object has no attribute 'shutdown'