New issue
Advanced search Search tips
Note: Color blocks (like or ) mean that a user may not be available. Tooltip shows the reason.

Issue 852087 link

Starred by 1 user

Issue metadata

Status: Fixed
Owner:
Closed: Aug 6
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: ----
Pri: 1
Type: Bug

Blocked on:
issue 845337



Sign in to add a comment

sysmon is broken on staging

Project Member Reported by pprabhu@chromium.org, Jun 12 2018

Issue description

Broke some time yesterday. Look at all but the top graphs.

https://viceroy.corp.google.com/chromeos/lab_staging?duration=8d


 
Traceback (most recent call last):
  File "/usr/local/google/home/chromeos-test/chromiumos/chromite/venv/chromite/scripts/sysmon/loop.py", line 34, in loop_once
    self._callback()
  File "/usr/local/google/home/chromeos-test/chromiumos/chromite/venv/chromite/scripts/sysmon/mainlib.py", line 57, in __call__
    self._collect_prod_hosts()
  File "/usr/local/google/home/chromeos-test/chromiumos/chromite/venv/chromite/scripts/sysmon/mainlib.py", line 79, in __call__
    self._callback()
  File "/usr/local/google/home/chromeos-test/chromiumos/chromite/venv/chromite/scripts/sysmon/prod_metrics.py", line 27, in collect_prod_hosts
    servers = list(_get_servers())
  File "/usr/local/google/home/chromeos-test/chromiumos/chromite/venv/chromite/scripts/sysmon/prod_metrics.py", line 42, in _get_servers
    hostname=_get_hostname(server),
  File "/usr/local/google/home/chromeos-test/chromiumos/chromite/venv/chromite/scripts/sysmon/prod_metrics.py", line 58, in _get_hostname
    return server['hostname'].partition('.')[0]
TypeError: string indices must be integers

Due to atest using skylab now.  When did we change the default to Skylab?  Last I heard we added the --skylab flag for testing first.
Blockedon: 845337
This is another fallout of https://chromium-review.googlesource.com/c/chromiumos/third_party/autotest/+/1087506


After CL:1087506, 'atest server' is backed by skylab_inventory data directly. This means that servers from both prod and staging environment are available everywhere.

I suspect that we want to let just the sentinel server report all servers at this point, and turn down the reporting of prod_hosts from the staging master entirely.
Re #2: nxia@ landed it some time last week :(
Labels: -Pri-2 Pri-0
Owner: ayatane@chromium.org
I notice that issue 845337 is blocked.
If this bug really means that alerts and dashboards are broken more widely, we need to fix this pronto.

Note that --json is supported with --skylab at this point.
The only change needed for sysmon should be to run just one instance of it, and possible report the environment correctly (if we do report prod vs staging)
You could also revert nxia@'s CL instead, but need to wait for prod push for that to take effect.


P0 unless someone confirms alerts still work.
Status: Assigned (was: Untriaged)
Labels: -Pri-0 Pri-1
nxia's change is not pushed to prod yet, because it's broken staging, so WAI.
Status: Started (was: Assigned)
Revert the change and reland later.
Project Member

Comment 10 by bugdroid1@chromium.org, Jul 18

The following revision refers to this bug:
  https://chrome-internal.googlesource.com/chromeos/chromeos-admin/+/b1c452bc692bc95387267366b40dc55ddab019fc

commit b1c452bc692bc95387267366b40dc55ddab019fc
Author: Allen Li <ayatane@chromium.org>
Date: Wed Jul 18 20:56:56 2018

Status: Fixed (was: Started)
Status: Assigned (was: Fixed)

Sign in to add a comment