luci-py proto comparison failure only in recipe run |
||||
Issue descriptionWe're in a weird state where appengine/swarming/handlers_prpc_test.py succeeds on macOS, google linux and when run locally on a bot, but fails in the recipe run by the CQ. I've seen that reverting the last 3 CLS resolves the problem, but I don't know why; https://chromium-review.googlesource.com/c/1388106 https://chromium-review.googlesource.com/c/1388104 https://chromium-review.googlesource.com/c/1385589 There's a backlog of CL to commit so I'll revert these 3 for now, but we need to investigate and figure out why this happened.
,
Dec 21
Do you have a link to a failure?
,
Dec 21
https://logs.chromium.org/logs/infra/buildbucket/cr-buildbucket.appspot.com/8926502226027139040/+/steps/presubmit/0/stdout or https://logs.chromium.org/logs/infra/buildbucket/cr-buildbucket.appspot.com/8926500022234909600/+/steps/presubmit/0/stdout I tried two workarounds, one to stop encoding as textpb, one to ensure time is more strictly set.
,
Dec 21
Huh... Are you absolutely sure it's not just a flake? Reverted CLs look completely irrelevant to handlers_prpc_test.py...
,
Dec 21
I know. I spent the day trying to reproduce this. This did unblock the other CLs.
,
Dec 21
Nothing makes sense; - The changes I do to Swarming protos all pass fine without any problem. - This change always fails; https://chromium-review.googlesource.com/c/1387805 To not take any chance, I've re-created the CL fresh, with a new local branch and it still fails the same way: https://chromium-review.googlesource.com/c/1389195 This is annoying because this blocks the fix for issue 915406 but I really don't understand what's happening here.
,
Dec 21
This doesn't make any sense, this CL passes the CQ; https://chromium-review.googlesource.com/c/infra/luci/luci-py/+/1389196 I don't understand why the swarming bot one doesn't. Maybe the presubmit checks in swarming_bot mutates the presubmit check sys.path state?
,
Dec 21
I can reproduce the failure locally with a virtual env on Linux. Here's the diff (seen by modifying the test to set self.maxDiff = None): - version: "bd5fadf7768b3f4965cf1fe8b58c6c4018d1039e40eb253215dac1193372ffeb" + version: "8a90698034b4b49f82ec83c82937464937be542c3b435977ead9d7ad35f4ff71"
,
Dec 22
recipe_engine is using protobuf 3.6.0, as specified in https://chromium.googlesource.com/infra/luci/recipes-py/+/master/.vpython This only applies if you are importing the _pb2 files in the recipe_engine process. Otherwise you'll get whatever version of protobuf is specified in the .vpython environment above your test script. Nothing on the bots uses the `-vpython-spec` option. The version of the proto compiler you use should probably match the version of the protobuf library that you use in your scripts. If these are in appengine land, they probably are using the dev_appserver copy of protobuf which I have no idea which version that is. You can see where protobuf comes from (and version) by import google.protobuf print google.protobuf, google.protobuf.__version__
,
Dec 26
I relanded all 3 of my changes, making sure to trigger swarming/handlers_prpc_test.py each time in CQ, so I don't know what else I can do about this. Assigning back to maruel. Let me know if there's something I can do to help though.
,
Jan 10
Sorry for the trouble, I found the bug. Indeed not related at all.
,
Jan 10
Please do share. I'm curious.
,
Jan 10
The following revision refers to this bug: https://chromium.googlesource.com/infra/luci/luci-py.git/+/10138a3917660926a866a972e2a964e3359a48bf commit 10138a3917660926a866a972e2a964e3359a48bf Author: Marc-Antoine Ruel <maruel@chromium.org> Date: Thu Jan 10 19:05:31 2019 swarming: fix flaky test Use the calculated bot version. Bug: 917474 Change-Id: I3cb4f4a4794a29516d93a17f09c54858ebdb986f Reviewed-on: https://chromium-review.googlesource.com/c/1403941 Reviewed-by: Quinten Yearsley <qyearsley@chromium.org> Commit-Queue: Marc-Antoine Ruel <maruel@chromium.org> [modify] https://crrev.com/10138a3917660926a866a972e2a964e3359a48bf/appengine/swarming/handlers_prpc_test.py
,
Jan 10
The CL above fixed the problem, which was a incorrect expectation. I have no clue why reverting your CL had any effect on that. Please accept my apology for the disruption. |
||||
►
Sign in to add a comment |
||||
Comment 1 by bugdroid1@chromium.org
, Dec 21