LUCI Scheduler Triggers Erroring out. |
||||
Issue descriptionThis LUCI Scheduler trigger is failing repeatedly: https://luci-scheduler.appspot.com/jobs/chromeos/trigger_3 Sample failure: https://luci-scheduler.appspot.com/jobs/chromeos/trigger_3/9093452591556548144 [00:53:58.198] New invocation is queued and will start shortly [00:53:59.415] Starting the invocation (attempt 1) [00:54:01.995] Starting the invocation (attempt 2) [00:54:04.943] Starting the invocation (attempt 3) [00:54:08.335] Starting the invocation (attempt 4) [00:54:12.629] Starting the invocation (attempt 5) [00:54:18.008] Too many attempts, giving up This one appears to have similar issues, but not all the time: https://luci-scheduler.appspot.com/jobs/chromeos/trigger_4
,
Dec 5
,
Dec 5
Looking...
,
Dec 5
Logs are suspiciously empty, except OOMs directly from GAE from various random handlers: "While handling this request, the process that handled this request was found to be using too much memory and was terminated." My theory is that gitiles poller uses more memory that is available to a process by default, and it cannot ever finish. I'm bumping the process memory limit from default 128MB to 1G. Hopefully it would help.
,
Dec 5
The following revision refers to this bug: https://chromium.googlesource.com/infra/luci/luci-go.git/+/5f9096fe721d0f6a1be065f66c91a13beb3a6f36 commit 5f9096fe721d0f6a1be065f66c91a13beb3a6f36 Author: Vadim Shtayura <vadimsh@chromium.org> Date: Wed Dec 05 01:39:26 2018 [luci-scheduler] Bump instance class from default F1 to F4_G1. There's quite a lot of OOM errors in the GAE logs. R=tandrii@chromium.org BUG= 911881 Change-Id: I770e79018ee511a162de97362038e0dae2c7f33d Reviewed-on: https://chromium-review.googlesource.com/c/1362400 Reviewed-by: Andrii Shyshkalov <tandrii@chromium.org> Commit-Queue: Andrii Shyshkalov <tandrii@chromium.org> [modify] https://crrev.com/5f9096fe721d0f6a1be065f66c91a13beb3a6f36/scheduler/appengine/frontend/app.yaml
,
Dec 5
Thanks! Is there anything we could do to reduce memory usage?
,
Dec 5
implement batching on gitiles server for refs endpoint :)
,
Dec 5
So, Vadim's change helped -- it appears that you've recently changed from watching all refs/tags/* to a more narrow regex. Right? Assuming so, my theory is: for each refs/tag/xyz no longer matched by regex, scheduler emitted a line to its internal per-task log kept in RAM. these lines aren't truncated at the time of additions[1], and so ate too much RAM however, these lines are truncated right before storing them to data store. Thus, we can see these lines in the log[2] after we increased max RAM usage: [01:53:01.980] Ref refs/tags/57.0.2951.4 is no longer watched --- the log has been cut here --- [01:53:04.469] Ref refs/tags/64.0.3280.0 is no longer watched [1] https://cs.chromium.org/chromium/infra/go/src/go.chromium.org/luci/scheduler/appengine/engine/utils.go?type=cs&q=%22debugLog(c+context.Context%22&sq=package:chromium&g=0&l=53 [2] https://luci-scheduler.appspot.com/jobs/chromeos/trigger_3/9093448877323132480
,
Dec 5
The following revision refers to this bug: https://chromium.googlesource.com/infra/luci/luci-go.git/+/6558993c36ba7d6f96cee2f888820173d4d8f2c6 commit 6558993c36ba7d6f96cee2f888820173d4d8f2c6 Author: Andrii Shyshkalov <tandrii@chromium.org> Date: Wed Dec 05 02:05:16 2018 scheduler: log counts of refs in play during gitiles task execution. R=vadimsh Bug: 911881 Change-Id: I2ed8ffd05d6d5ead923cf893138b17ca32307d9c Reviewed-on: https://chromium-review.googlesource.com/c/1362406 Reviewed-by: Vadim Shtayura <vadimsh@chromium.org> Commit-Queue: Andrii Shyshkalov <tandrii@chromium.org> [modify] https://crrev.com/6558993c36ba7d6f96cee2f888820173d4d8f2c6/scheduler/appengine/task/gitiles/gitiles.go
,
Dec 5
Yeah, looks like we'll need a ring buffer for logs at the very least. But the amount of RAM needed is still O(<total number of refs>), we'll just reduce the coefficient (I wonder how much).
,
Dec 5
I filed 911906 to fix the logging. Closing this issue as "mitigated".
,
Dec 5
The problematic trigger is using: gitiles: < repo: "https://chromium.googlesource.com/chromium/src" refs: "regexp:refs/tags/72\\..*" > We also have seen issues here (which can probably be made more specific): gitiles: < repo: "https://chromium.googlesource.com/chromium/src" refs: "regexp:refs/tags/[^/]+" > |
||||
►
Sign in to add a comment |
||||
Comment 1 by dgarr...@chromium.org
, Dec 5