New issue
Advanced search Search tips

Issue 888046 link

Starred by 1 user

Issue metadata

Status: Duplicate
Owner:
Closed: Nov 1
Components:
EstimatedDays: ----
NextAction: ----
OS: Chrome
Pri: 1
Type: Bug



Sign in to add a comment

Stable version update failures aren't detected

Reported by jrbarnette@chromium.org, Sep 21

Issue description

 Bug 888039  has revealed that we have no monitoring for automated stable
version updates.  If the script quits running, bad things will happen.
Compounding the problem, the gestation time for "bad things" is 6 months
or so.

We need liveness monitoring for the script that ensures that it runs and
that the updates happen.

 
Labels: -Chase-Pending Chase
Owner: pprabhu@chromium.org
Run more often and alert after a week.
Status: Started (was: Untriaged)
While I'm here, I'll also handle issue 697141
Project Member

Comment 4 by bugdroid1@chromium.org, Sep 25

The following revision refers to this bug:
  https://chrome-internal.googlesource.com/chromeos/chromeos-admin/+/313e3f4aecb1c714800df671afb63f504c592162

commit 313e3f4aecb1c714800df671afb63f504c592162
Author: Prathmesh Prabhu <pprabhu@chromium.org>
Date: Tue Sep 25 20:24:45 2018

This stack adds metrics to track success of the cron job:

https://chromium-review.googlesource.com/c/chromiumos/third_party/autotest/+/1244740


Project Member

Comment 6 by bugdroid1@chromium.org, Sep 26

The following revision refers to this bug:
  https://chromium.googlesource.com/chromiumos/third_party/autotest/+/c9cf13b05c46e67c2b34488dd5c2fe32b41e1b21

commit c9cf13b05c46e67c2b34488dd5c2fe32b41e1b21
Author: Prathmesh Prabhu <pprabhu@chromium.org>
Date: Wed Sep 26 17:31:51 2018

stable_images: Drop unused commandline flag

BUG= chromium:888046 
TEST=unittests

Change-Id: I43bff08e42b8d72cf393400bcdb92e7917e08a26
Reviewed-on: https://chromium-review.googlesource.com/1244585
Commit-Ready: Prathmesh Prabhu <pprabhu@chromium.org>
Tested-by: Prathmesh Prabhu <pprabhu@chromium.org>
Reviewed-by: Alex Zamorzaev <zamorzaev@chromium.org>

[modify] https://crrev.com/c9cf13b05c46e67c2b34488dd5c2fe32b41e1b21/site_utils/stable_images/assign_stable_images.py

Project Member

Comment 7 by bugdroid1@chromium.org, Sep 26

The following revision refers to this bug:
  https://chromium.googlesource.com/chromiumos/third_party/autotest/+/4e2532f8d65012d6aa6a5698cc68b74776d1e769

commit 4e2532f8d65012d6aa6a5698cc68b74776d1e769
Author: Prathmesh Prabhu <pprabhu@chromium.org>
Date: Wed Sep 26 17:31:52 2018

stable_images: Add commandline flag to specify AFE.

BUG= chromium:888046 
TEST=Run with --dry-run

Change-Id: I2a77100cd93aa5697b9b00c6fa8086606791e1c1
Reviewed-on: https://chromium-review.googlesource.com/1244586
Commit-Ready: Prathmesh Prabhu <pprabhu@chromium.org>
Tested-by: Prathmesh Prabhu <pprabhu@chromium.org>
Reviewed-by: Alex Zamorzaev <zamorzaev@chromium.org>

[modify] https://crrev.com/4e2532f8d65012d6aa6a5698cc68b74776d1e769/site_utils/stable_images/assign_stable_images.py

Project Member

Comment 8 by bugdroid1@chromium.org, Sep 26

The following revision refers to this bug:
  https://chromium.googlesource.com/chromiumos/third_party/autotest/+/b61abe1a3db3153a549178c763302e683921e88b

commit b61abe1a3db3153a549178c763302e683921e88b
Author: Prathmesh Prabhu <pprabhu@chromium.org>
Date: Wed Sep 26 17:31:53 2018

stable_images: Use logging module instead of raw print()

This will allow us to do log rotation internal to the script, dropping
the intermediate bash script with hand-rolled log rotation.

BUG= chromium:888046 
TEST=unittests, run with --dry-run w, w/o --log-dir

Change-Id: I6f98c094167d94a1e15670fd4fd5d3c28f24f315
Reviewed-on: https://chromium-review.googlesource.com/1244587
Commit-Ready: Prathmesh Prabhu <pprabhu@chromium.org>
Tested-by: Prathmesh Prabhu <pprabhu@chromium.org>
Reviewed-by: Alex Zamorzaev <zamorzaev@chromium.org>

[modify] https://crrev.com/b61abe1a3db3153a549178c763302e683921e88b/site_utils/stable_images/assign_stable_images.py
[add] https://crrev.com/b61abe1a3db3153a549178c763302e683921e88b/site_utils/loglib.py

Project Member

Comment 9 by bugdroid1@chromium.org, Sep 26

The following revision refers to this bug:
  https://chromium.googlesource.com/chromiumos/third_party/autotest/+/a80e3c4a86fe27988458f1d52797a801c90f7825

commit a80e3c4a86fe27988458f1d52797a801c90f7825
Author: Prathmesh Prabhu <pprabhu@chromium.org>
Date: Wed Sep 26 17:31:53 2018

stable_images: Simplify support for --dry-run

Sharing implementation via class hierarchies is bad. In particular,
subclasses for --dry-run are an overkill.

BUG= chromium:888046 
TEST=Run with --dry-run; unittests

Change-Id: Ic96471a95e019645b3a314c2b3d07a7d08640878
Reviewed-on: https://chromium-review.googlesource.com/1244588
Commit-Ready: Prathmesh Prabhu <pprabhu@chromium.org>
Tested-by: Prathmesh Prabhu <pprabhu@chromium.org>
Reviewed-by: Alex Zamorzaev <zamorzaev@chromium.org>

[modify] https://crrev.com/a80e3c4a86fe27988458f1d52797a801c90f7825/site_utils/stable_images/assign_stable_images.py
[modify] https://crrev.com/a80e3c4a86fe27988458f1d52797a801c90f7825/site_utils/stable_images/assign_stable_images_unittest.py

Project Member

Comment 10 by bugdroid1@chromium.org, Sep 26

The following revision refers to this bug:
  https://chromium.googlesource.com/chromiumos/third_party/autotest/+/3182136f86a68c1d36c03ffc9823cc0071ca2ee7

commit 3182136f86a68c1d36c03ffc9823cc0071ca2ee7
Author: Prathmesh Prabhu <pprabhu@chromium.org>
Date: Wed Sep 26 17:31:54 2018

stable_images: log to different logging levels

Instead of spewing everything to a single level.
Also, drop the unnecessary redirection through report()

BUG= chromium:888046 
TEST=unittests; --dry-run

Change-Id: I32f7062a586138171970c9a3c303defc35276e81
Reviewed-on: https://chromium-review.googlesource.com/1244589
Commit-Ready: ChromeOS CL Exonerator Bot <chromiumos-cl-exonerator@appspot.gserviceaccount.com>
Tested-by: Prathmesh Prabhu <pprabhu@chromium.org>
Reviewed-by: Alex Zamorzaev <zamorzaev@chromium.org>

[modify] https://crrev.com/3182136f86a68c1d36c03ffc9823cc0071ca2ee7/site_utils/stable_images/assign_stable_images.py
[modify] https://crrev.com/3182136f86a68c1d36c03ffc9823cc0071ca2ee7/site_utils/stable_images/assign_stable_images_unittest.py

Project Member

Comment 11 by bugdroid1@chromium.org, Sep 27

The following revision refers to this bug:
  https://chromium.googlesource.com/chromiumos/third_party/autotest/+/27da99d4bbd88bb9a54f4ba74d66b1bf90e2193e

commit 27da99d4bbd88bb9a54f4ba74d66b1bf90e2193e
Author: Prathmesh Prabhu <pprabhu@chromium.org>
Date: Thu Sep 27 00:15:26 2018

stable_images: Send success metrics from assign_stable_images

BUG= chromium:888046 
TEST=Run with --dry-run, see beautiful metrics

Change-Id: Idd911035a133938bebacca1ca92fc4c5d8142ee4
Reviewed-on: https://chromium-review.googlesource.com/1244740
Tested-by: Prathmesh Prabhu <pprabhu@chromium.org>
Commit-Queue: Prathmesh Prabhu <pprabhu@chromium.org>
Trybot-Ready: Prathmesh Prabhu <pprabhu@chromium.org>
Reviewed-by: Alex Zamorzaev <zamorzaev@chromium.org>

[modify] https://crrev.com/27da99d4bbd88bb9a54f4ba74d66b1bf90e2193e/site_utils/stable_images/assign_stable_images.py

Need a push to prod; then add precomputation and alert.

The plan is to add a precomputation to sum over the metrics from the past week, and alert when this is 0. The script only runs once per week, and missing runs only affect us after months.

I think it's good to wait and watch for a week and only alert if all runs in the past week fail (this will ensure we don't alert because that one day the script ran at the time when we know the lab was broken ... etc)
Project Member

Comment 13 by bugdroid1@chromium.org, Sep 28

The following revision refers to this bug:
  https://chromium.googlesource.com/chromiumos/third_party/autotest/+/3cf026f02e698cb35ac3b8cf46f40cd00da71c39

commit 3cf026f02e698cb35ac3b8cf46f40cd00da71c39
Author: Prathmesh Prabhu <pprabhu@chromium.org>
Date: Fri Sep 28 02:44:37 2018

stable_images: Drop noop function

BUG= chromium:888046 
TEST=unittests

Change-Id: If2137c119ae6a62300aa1d5806058c6a4941d3ad
Reviewed-on: https://chromium-review.googlesource.com/1244751
Commit-Ready: ChromeOS CL Exonerator Bot <chromiumos-cl-exonerator@appspot.gserviceaccount.com>
Tested-by: Prathmesh Prabhu <pprabhu@chromium.org>
Reviewed-by: Alex Zamorzaev <zamorzaev@chromium.org>

[modify] https://crrev.com/3cf026f02e698cb35ac3b8cf46f40cd00da71c39/site_utils/stable_images/assign_stable_images.py

No metrics in pcon/ yet, I believe that the necessary CL in #11 has been pushed to prod.

Checcking...
> No metrics in pcon/ yet, I believe that the necessary CL in #11 has been pushed to prod.

It seems like there's some possibility that  bug 888039  is preventing
metrics.

Labels: -Chase Chase-Pending
moving back to C-P to reassess when we have bandwidth, looks like code was landed but metrics missing?
Owner: akes...@chromium.org
Status: Assigned (was: Started)
Mergedinto: 898934
Status: Duplicate (was: Assigned)

Sign in to add a comment