factory: Logs clean up |
|||||
Issue descriptionCurrently factory.log contains lots of system information, and sometimes makes it bloated and hard to debug. I wonder if we can clean up current logs to make the log looking better, for example: - Most plugins track system status but we probably don't need to see their logs. Should we move plugin messages to their own file? - Calculation of "tail of /var/log/messages" after unexpected shutdown was usually painful. Should we utilize pstore/console-ramoops? Or find a way to compress them, and only open when needed? For example, if testlog will replace factory.log in future, maybe we can make a "log reader program" that will reconstruct & combine views of log data so people can see some logs only when they press ENTER on that item. - test_list_iterator.py usually prints lots of messages -maybe can be packed into less lines. - duplicated messages. For example when terminating a test I'll see [INFO] goofy process_utils.py:385 2017-01-04 17:01:49.417 Sending signal 15 to 2643 [INFO] goofy invocation.py:834 2017-01-04 17:01:49.494 Test SMT.TypeCLeftTest.BarrierTypeCTest FAILED: Aborted: Operator requested run/re-run of certain tests [ERROR] goofy invocation.py:844 2017-01-04 17:01:49.496 Test SMT.TypeCLeftTest.BarrierTypeCTest FAILED: Aborted: Operator requested run/re-run of certain tests [INFO] goofy process_utils.py:385 2017-01-04 17:01:49.643 Sending signal 15 to 2643 [INFO] goofy goofy:852 2017-01-04 17:01:49.770 Killed SMT.TypeCLeftTest.BarrierTypeCTest That info and error were redundant, and the sending signal msg was logged twice
,
Jan 17 2017
For Goofy plugins, most plugins now we have are code snippets moved from Goofy. We should fix them so a plugin would terminate itself if there are critical errors or repeating errors instead of keeping producing endless logs. In normal situation, we can reduce the log by avoiding printing duplicated information. For example, battery monitor should only print battery percentage if it is different enough from last logging.
,
Jan 25 2017
Temporarily assign to Nooger. yllin, you can try to see if there's something can be improved here.
,
Aug 8 2017
For the duplicated messages:
How about adding a filter to filtering the duplicated messages from the very last output contents?
For example:
logging.info('a')
logging.info('b')
logging.info('a')
logging.info('a')
We should have the log message:
[INFO] a
[INFO] b
[INFO] a
,
Aug 8 2017
The trick of "removing last output contents" may not work, since as you can find in the example above, it was info a info b error b info a So last 'a' won't be removed since last info was b. Second 'b' should not be removed since it's a message for different level.
,
Aug 8 2017
For this kind of the outputs, we should introduce the timestamp to the filter. Usually, we don't need the same kinds of the output contents printing in a short time, for example, maybe 3 seconds. If the message shows up again within 3 seconds, then we should filter it.
,
Aug 8 2017
Hmmm. I think we may first try figuring out why there would be multiple repeated message and if they are really necessary. Filtering has its advantages (easy to apply) and disadvantages (low chance to lost real messages that should not be filtered, for example real repeated error in a short time). But if we can provide more info indicating how many messages were filtered and when, then I'm fine with that.
,
Aug 9 2017
For the pluging messages: We can gather the system monitor related logs into a file like system_monitor.log, and also pipes the needed information to factory.log. As Earl's exampls, Only when battery is 20% gap from the last logging then we pipes the information to factory.log. Also, currently, there are some monitors only pipes information to factory.log, and some monitors also logs the information to /var/log/messages via syslog, for example thermal_monitor. I think we should also unify these logs.
,
Aug 9 2017
sgtm
,
Aug 9 2017
The following revision refers to this bug: https://chromium.googlesource.com/chromiumos/platform/factory/+/117543cf4b73e2e67fa55ea6e3891ff0affb8ed8 commit 117543cf4b73e2e67fa55ea6e3891ff0affb8ed8 Author: Yilun Lin <yllin@google.com> Date: Wed Aug 09 12:18:04 2017 test_list_iterator: Packs log messages into less lines. test_list_iterator.py usually prints lots of messages, we would like to reduce the efforts while we are paging the log messages. TEST=manual check BUG=chromium:681624 Change-Id: If6593fb9ee49dcaa8ec90647fc2e8da0498c3198 Reviewed-on: https://chromium-review.googlesource.com/607727 Commit-Ready: Yilun Lin <yllin@chromium.org> Tested-by: Yilun Lin <yllin@chromium.org> Reviewed-by: Hung-Te Lin <hungte@chromium.org> [modify] https://crrev.com/117543cf4b73e2e67fa55ea6e3891ff0affb8ed8/py/goofy/test_list_iterator.py
,
Aug 9 2017
The following revision refers to this bug: https://chromium.googlesource.com/chromiumos/platform/factory/+/c37b78f7a8202c4a0c674c3232d711c7c2f83c47 commit c37b78f7a8202c4a0c674c3232d711c7c2f83c47 Author: Yilun Lin <yllin@google.com> Date: Wed Aug 09 12:18:04 2017 process_utils: Reduces excessive messages that killing process. There are excessive messages while we are killing a pytest. The /var/log/factory.log is flooded with the messages like: Sending signal 15 to 2643 Sending signal 15 to 2643 Sending signal 15 to 2643 We want to reduce the overwhelming messages to make the factory.log clearly logging the useful information. TEST=make test TEST=manual kill a pytest and check log messages. BUG=chromium:681624 Change-Id: If8981405e76172d548a7b8a36e3ba507678a56f1 Reviewed-on: https://chromium-review.googlesource.com/607734 Commit-Ready: Yilun Lin <yllin@chromium.org> Tested-by: Yilun Lin <yllin@chromium.org> Reviewed-by: Hung-Te Lin <hungte@chromium.org> [modify] https://crrev.com/c37b78f7a8202c4a0c674c3232d711c7c2f83c47/py/utils/process_utils.py
,
Aug 15
do we have anything left?
,
Aug 20
We can still do more here.
,
Sep 7
@yllin can you list what else can be done, and see if we should assign it as good first bug for nooglers (unless if you still want to work on this)?
,
Sep 7
I haven't touched this for a long time. Need time to refresh my memory... In my rough impression: we can reduce the log messages from system monitor plugins, like thermal, battery. Feel free to assign this to fshao@
,
Nov 27
Re-assign to jamesqaq, assuming fshao@ hasn't started yet (let us know if you've done something). |
|||||
►
Sign in to add a comment |
|||||
Comment 1 by shunhsingou@chromium.org
, Jan 17 2017