Issue metadata
Sign in to add a comment
|
fetch_telemetry_binary_dependencies takes 90-120+ seconds on the bots for what should be a no-op |
||||||||||||||||||||||
Issue descriptionIt looks like the fetch_telemetry_binary_dependencies hook in `runhooks` takes 90+ seconds whenever it is run on a bot. I would assume that this is re-running the same downloads over and over again, even though the files should already exist on the machine. This is a fairly serious cycle time regression and is hopefully an easy fix. martiniss@, does it make sense for you to look at this?
,
Dec 9 2017
90% of time is spent computing file hash. Telemetry just has a quite inefficient filehash function:
def CalculateHash(file_path):
"""Calculates and returns the hash of the file at file_path."""
sha1 = hashlib.sha1()
with open(file_path, 'rb') as f:
while True:
# Read in 1mb chunks, so it doesn't all have to be loaded into memory.
chunk = f.read(1024 * 1024)
if not chunk:
break
sha1.update(chunk)
return sha1.hexdigest()
(https://cs.chromium.org/chromium/src/third_party/catapult/common/py_utils/py_utils/cloud_storage.py?rcl=d624b3ced2c81d4fb4ea98a8dbb4532272cc1e0a&l=446)
What does swarming use to compute file hash? I guess switching Telemetry to reuse that may fix the problem.
,
Dec 9 2017
+Maruel/Erick for question about file hashing
,
Dec 11 2017
Interestingly, the file hashing implementation in #4 seems almost the same as Telemetry (except I don't know what you use for 'algo' in production). If that's true, the best we can do to improve the speed is by multiprocessing the code to check & fetch file. If the builder machines have 4 cores, that probably fetch_telemetry_binary_dependencies's runtime to 30+ s. Dirk: is that good enough for you?
,
Dec 11 2017
I think we need to figure out how to get this down to a couple seconds or less when it should be a no-op, just like the other hooks are, and I feel like we must be doing something less intelligently than we could be doing here, because none of the other download hooks take this long. How much data are we downloading, and in how many files, again? Is this something we could solve by versioning something, maybe? Or adding a 'got' sha1 that we save when we do actually download the file, so all we have to do is compare the checked-in sha1 against the got sha1, rather than recomputing it every time?
,
Dec 11 2017
#6: checking in the computed sha for the file is essentially the same as just ignoring the binary content of the file & check if it exists at all. I think the best solution moving forward is modifying this to use cipd package system, rather than trying to invent yet-another-method of managing large number of binary files. Do you have any doc about how to use cipd? I would be happy to modify telemetry to use that instead.
,
Dec 11 2017
Actually the short term fix for this is fetch_telemetry_binary_dependencies also produce precomputed hash files on the builder (which seems like the builder would keep these around). Upon detecting version mismatch between checked-in .sha1 files & .computed-sha1, it will delete binary & computed-sha1 & try to refetch things again.
,
Dec 11 2017
Right, that's what I was suggesting (not checking in the computed sha for the file). That said, I still feel like we're missing something.
,
Dec 11 2017
#9: I can prototype & see how it goes.
,
Dec 11 2017
The following revision refers to this bug: https://chromium.googlesource.com/catapult/+/e188f9648ada03c31681a688c4fa07407d43aa7a commit e188f9648ada03c31681a688c4fa07407d43aa7a Author: Nghia Nguyen <nednguyen@google.com> Date: Mon Dec 11 23:24:10 2017 Make sure cloud_storage.GetIfHashChanged also save the binary's hash upon first time fetching it Bug: chromium:793609 Change-Id: Ide32946e5c83d141978efc8d0b180136213616e0 TBR=perezju@chromium.org Change-Id: Ide32946e5c83d141978efc8d0b180136213616e0 Reviewed-on: https://chromium-review.googlesource.com/820131 Reviewed-by: Ned Nguyen <nednguyen@google.com> Commit-Queue: Ned Nguyen <nednguyen@google.com> [modify] https://crrev.com/e188f9648ada03c31681a688c4fa07407d43aa7a/common/py_utils/py_utils/cloud_storage_unittest.py [modify] https://crrev.com/e188f9648ada03c31681a688c4fa07407d43aa7a/common/py_utils/py_utils/cloud_storage.py [modify] https://crrev.com/e188f9648ada03c31681a688c4fa07407d43aa7a/.gitignore
,
Dec 11 2017
I will check the runtime of fetch_telemetry_binary_dependencies again tomorrow. Should be few seconds.
,
Dec 11 2017
,
Dec 11 2017
The following revision refers to this bug: https://chromium.googlesource.com/chromium/src.git/+/98335bb2216f982670ac80bc969e1d7123ef4501 commit 98335bb2216f982670ac80bc969e1d7123ef4501 Author: Ned Nguyen <nednguyen@google.com> Date: Mon Dec 11 23:32:32 2017 Ignore .fetchts Bug:793609 NOTRY=true TBR=dpranke@chromium.org Change-Id: I770559876a6dd6b7de0a3575e02a1b854858a414 Reviewed-on: https://chromium-review.googlesource.com/820939 Reviewed-by: Ned Nguyen <nednguyen@google.com> Commit-Queue: Ned Nguyen <nednguyen@google.com> Cr-Commit-Position: refs/heads/master@{#523248} [modify] https://crrev.com/98335bb2216f982670ac80bc969e1d7123ef4501/tools/perf/page_sets/.gitignore
,
Dec 12 2017
The NextAction date has arrived: 2017-12-12
,
Dec 12 2017
"gclient runhooks" takes 38 sec now: https://ci.chromium.org/buildbot/chromium.win/Win7%20Tests%20%281%29/74488 So I think this is fixed
,
Dec 12 2017
Nice! Thank you for fixing this so quickly :). |
|||||||||||||||||||||||
►
Sign in to add a comment |
|||||||||||||||||||||||
Comment 1 by nedngu...@google.com
, Dec 9 2017Owner: nedngu...@google.com