New issue
Advanced search Search tips
Note: Color blocks (like or ) mean that a user may not be available. Tooltip shows the reason.

Issue 903974 link

Starred by 1 user

Issue metadata

Status: Verified
Owner:
Closed: Nov 13
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: Chrome
Pri: 1
Type: Bug



Sign in to add a comment

sentinel service is crashing

Project Member Reported by pprabhu@chromium.org, Nov 9

Issue description

2018-11-09 14:14:56,101 INFO| Syncing cros-full-0018.mtv.corp.google.com ..
2018-11-09 14:14:56,481 INFO| Fetched 1 shard and 40 master Shardinfos. Syncing...
2018-11-09 14:14:56,482 INFO| ...Done Shardinfos.
2018-11-09 14:14:57,146 INFO| Fetched 155 shard and 28688 master Labels. Syncing...
2018-11-09 14:14:57,146 ERRO| Label 910520 (chromeos2-row10-rack4-host7) does not exist in master DB
2018-11-09 14:14:57,146 INFO| ...Done Labels.
2018-11-09 14:14:59,084 INFO| Fetched 606 shard and 9451 master Hostinfos. Syncing...
2018-11-09 14:14:59,084 WARN| Host validity mismatched: chromeos6-row4-rack18-host20 [self.invalid: 1, other.invalid: 0]
2018-11-09 14:14:59,084 WARN| other_host.shard_id = 212 is not in valid_shard_ids set([240L]). Forced to None.
2018-11-09 14:14:59,084 WARN| Inconsistent HostLabels: {'host_id': 8193L, 'label_ids': set([])} != {'host_id': 8193L, 'label_ids': set([784131L, 784133L, 999817L, 215L, 741520L, 741521L, 741522L, 187672L, 91677L, 399646L, 230048L, 12066L, 70566L, 70567L, 730920L, 859695L, 811954L, 811955L, 811956L, 1114421L, 785466L, 398523L, 398524L, 81094L, 152520L, 357321L, 252106L, 362199L, 27352L, 25948L, 10974L, 9952L, 97251L, 102628L, 402917L, 1082727L, 104683L])}
2018-11-09 14:14:59,395 INFO| Shard has 1342 labels, and 2 necessary labels are missing.
2018-11-09 14:14:59,395 WARN| Sentinel stopped outside one-shot run. Probable crash.
Traceback (most recent call last):
  File "/usr/lib/python2.7/runpy.py", line 162, in _run_module_as_main
    "__main__", fname, loader, pkg_name)
  File "/usr/lib/python2.7/runpy.py", line 72, in _run_code
    exec code in run_globals
  File "/usr/local/google/home/chromeos-test/chromiumos/chromeos-admin/venv/sentinel/service.py", line 879, in <module>
    sys.exit(main())
  File "/usr/local/google/home/chromeos-test/chromiumos/chromeos-admin/venv/sentinel/service.py", line 862, in main
    sync_db_loop()
  File "/usr/local/google/home/chromeos-test/chromiumos/chromeos-admin/venv/sentinel/service.py", line 754, in sync_db_loop
    _sync_once()
  File "/usr/local/google/home/chromeos-test/chromiumos/chromeos-admin/venv/sentinel/service.py", line 726, in _sync_once
    shard_db.sync_to_master(master_db)
  File "/usr/local/google/home/chromeos-test/chromiumos/chromeos-admin/venv/sentinel/service.py", line 648, in sync_to_master
    unrecovered_errors
  File "/usr/local/google/home/chromeos-test/chromiumos/chromeos-admin/venv/sentinel/service.py", line 416, in sync
    shard_db, master_db, unrecovered_errors)
  File "/usr/local/google/home/chromeos-test/chromiumos/chromeos-admin/venv/sentinel/service.py", line 270, in sync
    (label.id, label.name,
AttributeError: 'long' object has no attribute 'id'

 
Owner: xianuowang@chromium.org
Status: Assigned (was: Untriaged)
Almost surely the culprit is https://chrome-internal-review.googlesource.com/c/chromeos/chromeos-admin/+/698114

which added the failing code.

Since Garry is out today, I'll chump in a revert so that it has the opportunity to go through staging over the weekend.
Feel free to fix it correctly and reland before next push.

Project Member

Comment 2 by bugdroid1@chromium.org, Nov 9

The following revision refers to this bug:
  https://chrome-internal.googlesource.com/chromeos/chromeos-admin/+/cb3cca6b1a86f0093da48cb7373fcce1a025043d

commit cb3cca6b1a86f0093da48cb7373fcce1a025043d
Author: Prathmesh Prabhu <pprabhu@google.com>
Date: Fri Nov 09 22:32:06 2018

Cc: xianuowang@chromium.org
Owner: pprabhu@chromium.org
Status: Started (was: Assigned)
Reverted blamed CL: https://chrome-internal-review.googlesource.com/c/chromeos/chromeos-admin/+/714658

Will let Garry deal with the original bug (reland) etc.

Keeping bug open until next push to make sure I got the right CL.
Ahhh.. I just looked at it and seems I made a stupid mistake here, so on my cl line 270 it's:
"for label in labels_from_master:"
which should be
"for label in labels_from_master.values():"
because labels_from_master is a dict which key is label id and value is Label object, and I was treat it as a set.

BTW: what is reland process we do? should I submit fix to original review or create a new review?


Still pending push to prod for this bug.


For reland: I advise first reuploading the same CL, then fix in follow up patchsets. This allows the reviewer to see just the diff for the fix.

Using the "RELAND" button on gerrit will re-upload just the same CL, then you can iterate.

Status: Verified (was: Started)

Sign in to add a comment