New issue
Advanced search Search tips

Issue 902514 link

Starred by 1 user

Issue metadata

Status: Fixed
Owner: ----
Closed: Jan 16
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: Chrome
Pri: 2
Type: Bug

Blocked on:
issue 901471



Sign in to add a comment

Weird telemetry_unittests CrOS+Amd64 only failure due to missing SUID bit on chrome-sandbox

Project Member Reported by gab@google.com, Nov 6

Issue description

Example failed try job : https://ci.chromium.org/p/chromium/builders/luci.chromium.try/chromeos-amd64-generic-rel/125113

https://chromium-review.googlesource.com/c/chromium/src/+/1305891/14 makes only this very specific bot timeout reproducibly on telemetry.core.tracing_controller_unittest.StartupTracingTest.testCloseBrowserBeforeTracingIsStopped only...

I've been trying to debug this for a few days now and added reentrancy detection (could have been an explanation) but no luck. 

Not sure if it's an issue with the platform or my CL but I now have to move on and will sadly be forced to disable this feature on CrOS+Arm64...


Example failure log :

[95/96] telemetry.core.tracing_controller_unittest.StartupTracingTest.testCloseBrowserBeforeTracingIsStopped failed unexpectedly 254.1829s:
  Chose browser: PossibleCrOSBrowser(browser_type=cros-chrome)
  Created trace config file in /tmp/chrome-trace-config.json
  (Re)starting the ui (logs the user out)
  Restarting Chrome (pid=17665) with remote port
  dbus-send --system --type=method_call --dest=org.chromium.SessionManager /org/chromium/SessionManager org.chromium.SessionManagerInterface.EnableChromeTesting boolean:true array:string:"--enable-net-benchmarking,--metrics-recording-only,--no-default-browser-check,--no-first-run,--ignore-background-tasks,--enable-gpu-benchmarking,--deny-permission-prompts,--autoplay-policy=no-user-gesture-required,--disable-background-networking,--disable-component-extensions-with-background-pages,--disable-default-apps,--disable-search-geolocation-disclosure,--enable-smooth-scrolling,--enable-threaded-compositing,--remote-debugging-port=0,--start-maximized,--ash-disable-system-sounds,--mute-audio,--oobe-skip-postlogin,--disable-logging-redirect,--vmodule=*/chromeos/net/*=2\,*/chromeos/login/*=2\,chrome_browser_main_posix=2,--allow-failed-policy-fetch-for-test,--disable-gaia-services,--trace-config-file=/tmp/chrome-trace-config.json" array:string:
  Failed with BrowserConnectionGoneException while starting the browser backend.
  *************** BROWSER STANDARD OUTPUT ***************
  Cannot get standard output on CrOS
  *********** END OF BROWSER STANDARD OUTPUT ************
  ********************* BROWSER LOG *********************
  No log file
  ***************** END OF BROWSER LOG ******************
  ********************* SYMBOLIZED MINIDUMP *********************
  ***************** END OF SYMBOLIZED MINIDUMP ******************
  Closing browser (pid=17823) ...
  No devtools clients for issuing clock sync.
  Remove trace config file in /tmp/chrome-trace-config.json
  Created trace config file in /tmp/chrome-trace-config.json
  (Re)starting the ui (logs the user out)
  Browser is closed.
  No devtools clients for issuing clock sync.
  Remove trace config file in /tmp/chrome-trace-config.json
  Traceback (most recent call last):
    File "/b/s/w/ir/third_party/catapult/telemetry/telemetry/decorators.py", line 219, in wrapper
      func(*args, **kwargs)
    File "/b/s/w/ir/third_party/catapult/telemetry/telemetry/core/tracing_controller_unittest.py", line 174, in testCloseBrowserBeforeTracingIsStopped
      with self.possible_browser.BrowserSession(self.browser_options) as browser:
    File "/b/s/w/ir/.swarming_module/lib/python2.7/contextlib.py", line 17, in __enter__
      return self.gen.next()
    File "/b/s/w/ir/third_party/catapult/telemetry/telemetry/internal/browser/possible_browser.py", line 92, in BrowserSession
      browser = self.Create()
    File "/b/s/w/ir/third_party/catapult/telemetry/telemetry/internal/backends/chrome/cros_browser_finder.py", line 105, in Create
      browser_backend, self._platform_backend, startup_args)
    File "/b/s/w/ir/third_party/catapult/telemetry/telemetry/internal/browser/browser.py", line 48, in __init__
      self._browser_backend.Start(startup_args, startup_url=startup_url)
    File "/b/s/w/ir/third_party/catapult/common/py_trace_event/py_trace_event/trace_event_impl/decorators.py", line 75, in traced_function
      return func(*args, **kwargs)
    File "/b/s/w/ir/third_party/catapult/telemetry/telemetry/internal/backends/chrome/cros_browser_backend.py", line 80, in Start
      self.BindDevToolsClient()
    File "/b/s/w/ir/third_party/catapult/common/py_trace_event/py_trace_event/trace_event_impl/decorators.py", line 75, in traced_function
      return func(*args, **kwargs)
    File "/b/s/w/ir/third_party/catapult/telemetry/telemetry/internal/backends/chrome/chrome_browser_backend.py", line 118, in BindDevToolsClient
      raise exceptions.BrowserConnectionGoneException(self.browser, e)
  BrowserConnectionGoneException: Timed out while waiting 240s for _GetDevToolsClient.
  Found Minidump: False
  Stack Trace:
  ********************************************************************************
  	Cannot get stack trace on CrOS
  ********************************************************************************
  Standard output:
  ********************************************************************************
  	Cannot get standard output on CrOS
  ********************************************************************************
  System log:
  (Not implemented)
 
Do you mean arm64 or amd64? FWIW, the bot is amd.

And did you see the comment in the other bug w/ the stack trace? (crbug.com/901471#c4) It appears the test is failing because the change you're trying to submit causes Chrome to crash when under that test's specific use of the browser. Something to do w/ file permissions on the chrome-sandbox binary? Granted, it's a strange error, and IDK what could be causing it.
Summary: Weird telemetry_unittests CrOS+Amd64 only failure (was: Weird telemetry_unittests CrOS+Arm64 only failure)
Sorry I did mean amd64. I saw the comment but it makes no sense to me... seems like a platform/bot configuration issue and I've run out of cycles to spend trying to debug it. I'll be fine without this feature on CrOS64 but filled this since I figured platform owners might want to investigate.

Feel free to dupe into the other one if it's the same thing; in my mind the other one was about the lack of appropriate logging for this error. The error itself then remains a mystery...
Owner: bpastene@chromium.org
Status: Assigned (was: Untriaged)
I'm inclined to agree that it's test/bot issue. The more I look at it... the more I think that specific file permissions check has never passed.

I'll disable the specific telemetry test for now, and I'll try to find time to figure out why it's failing. Apologies for the disruption.

And I'll keep the two bugs separate since they're covering different things (this weird telemetry failure vs debugging of telemetry tests on that bot)
Project Member

Comment 4 by bugdroid1@chromium.org, Nov 7

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/5db1ab8dc0279dc7d77b731e4815a57114dd8642

commit 5db1ab8dc0279dc7d77b731e4815a57114dd8642
Author: Gabriel Charette <gab@chromium.org>
Date: Wed Nov 07 05:14:06 2018

Add a trace event for the duration of ScopedBlockingCalls

This helps identify sections of a trace event that were blocked on external
dependencies and shed some light on the "descheduled" portion of an event.

This CL is non-trivial for 3 reasons:

 1) This new TRACE_EVENT kicks off tracing earlier than before in some
    configurations. This means that main() of basic utilities all of a sudden
    needed AtExitManagers (first patch set of this CL). But that then spread out
    to many main()'s and was painful. Since the Singleton is only required in
    tracing to know whether it's enabled : this CL specializes
    Singleton/TraceEventETWExport to avoid instantiating the instance only to
    notice it's not enabled (guaranteed default state).

 2) We do not want tracing events while a thread is idle (i.e. when it's
    sleeping on its own WaitableEvent/ConditionVariable while waiting for more
    tasks). Ideally we simply wouldn't record the trace event when it's not
    nested within another event but tracing doesn't have a notion of "current
    event depth". As such this CL opts for a declarative model where owners of
    the few WaitableEvents/ConditionVariables used to sleep-while-idle can flag
    those as irrelevant for tracing.

 3) Possibility of reentrancy if the trace event macros perform a blocking
    call themselves. Added DCHECKs to confirm this doesn't occur (was
    initial suspected cause of CrOS+amd64 only failures but it kept
    failing despite that and this feature had to be disabled there..)

Bonus:
  * Singleton::GetIfExists() avoids instantiating TraceEventETWExport
    when switches::kTraceExportEventsToETW isn't present.
  * The declarative model brought forth by (2) also enables not instantiating
    debug::ScopedEventWaitActivity objects while a thread is merely sleeping.
    This will avoid polluting crash metadata with wait-acitvity for sleeping
    threads.
  * In a follow-up CL we will be able to remove
    base::internal::ScopedClearBlockingObserverForTesting which was previously
    necessary specifically to avoid a situation in TaskScheduler unit tests
    where WaitableEvents used for the test logic would instantiate
    ScopedBlockingCalls and interfere with the very logic under test.
    (not done in this one as it's large enough on its own)

Bug: 899897,  902514 
Change-Id: I93b63e83ac6bd9e5e9d4e29a9b86c62c73738835
Reviewed-on: https://chromium-review.googlesource.com/c/1305891
Commit-Queue: Gabriel Charette <gab@chromium.org>
Reviewed-by: danakj <danakj@chromium.org>
Reviewed-by: François Doray <fdoray@chromium.org>
Reviewed-by: Peter Kasting <pkasting@chromium.org>
Reviewed-by: Charlie Harrison <csharrison@chromium.org>
Reviewed-by: Avi Drissman <avi@chromium.org>
Reviewed-by: Misha Efimov <mef@chromium.org>
Cr-Commit-Position: refs/heads/master@{#605966}
[modify] https://crrev.com/5db1ab8dc0279dc7d77b731e4815a57114dd8642/base/memory/singleton.h
[modify] https://crrev.com/5db1ab8dc0279dc7d77b731e4815a57114dd8642/base/message_loop/message_pump_default.cc
[modify] https://crrev.com/5db1ab8dc0279dc7d77b731e4815a57114dd8642/base/synchronization/condition_variable.h
[modify] https://crrev.com/5db1ab8dc0279dc7d77b731e4815a57114dd8642/base/synchronization/condition_variable_posix.cc
[modify] https://crrev.com/5db1ab8dc0279dc7d77b731e4815a57114dd8642/base/synchronization/condition_variable_win.cc
[modify] https://crrev.com/5db1ab8dc0279dc7d77b731e4815a57114dd8642/base/synchronization/waitable_event.h
[modify] https://crrev.com/5db1ab8dc0279dc7d77b731e4815a57114dd8642/base/synchronization/waitable_event_posix.cc
[modify] https://crrev.com/5db1ab8dc0279dc7d77b731e4815a57114dd8642/base/synchronization/waitable_event_win.cc
[modify] https://crrev.com/5db1ab8dc0279dc7d77b731e4815a57114dd8642/base/task/task_scheduler/scheduler_worker.cc
[modify] https://crrev.com/5db1ab8dc0279dc7d77b731e4815a57114dd8642/base/threading/scoped_blocking_call.cc
[modify] https://crrev.com/5db1ab8dc0279dc7d77b731e4815a57114dd8642/base/threading/scoped_blocking_call.h
[modify] https://crrev.com/5db1ab8dc0279dc7d77b731e4815a57114dd8642/base/trace_event/trace_event_etw_export_win.cc
[modify] https://crrev.com/5db1ab8dc0279dc7d77b731e4815a57114dd8642/base/trace_event/trace_event_etw_export_win.h
[modify] https://crrev.com/5db1ab8dc0279dc7d77b731e4815a57114dd8642/cc/raster/single_thread_task_graph_runner.cc
[modify] https://crrev.com/5db1ab8dc0279dc7d77b731e4815a57114dd8642/content/public/test/nested_message_pump_android.cc
[modify] https://crrev.com/5db1ab8dc0279dc7d77b731e4815a57114dd8642/content/renderer/categorized_worker_pool.cc

Cc: palmer@chromium.org vapier@chromium.org kerrnel@chromium.org
Summary: Weird telemetry_unittests CrOS+Amd64 only failure due to missing SUID bit on chrome-sandbox (was: Weird telemetry_unittests CrOS+Amd64 only failure)
Hrmm, it looks like this failure would've been avoided prior to https://chromium-review.googlesource.com/c/chromiumos/chromite/+/1112816/

The change (and the one linked in its description) implies we'd don't need the suid bit on the sandbox binary in CrOS, but the failures you were seeing were caused by its absence. I don't have much context here, so hopefully +some others do.

Should we revert https://chromium-review.googlesource.com/c/chromiumos/chromite/+/1112816/?
Cc: -palmer@chromium.org mnissler@chromium.org allenwebb@chromium.org jorgelo@chromium.org
no, we do not want to reenable a set*id sandbox.  supporting a namespace-only/set*id-less approach is a general Linux thing ( issue 312380 ), and one we were able to more actively leverage in CrOS.

you'll want to find a way to do what you need w/out set*id access.
Project Member

Comment 8 by bugdroid1@chromium.org, Jan 16 (6 days ago)

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/40c2f1577996de8510142d8868a65621baaba1e2

commit 40c2f1577996de8510142d8868a65621baaba1e2
Author: Andrey Malets <malets@yandex-team.ru>
Date: Wed Jan 16 14:29:40 2019

Fix thread cache after clone() in CanCreateProcessInNewUserNS

Since CanCreateProcessInNewUserNS uses base::ForkWithFlags() to create new
process, which in turn uses raw clone(), no function registered through
pthread_atfork() gets called in child.

This leads to invalid state when g_thread_id in platform_thread_posix.cc is
stale and PlatformThread::CurrentId() returns wrong thread id on Linux.

Despite the fact that the child process in CanCreateProcessInNewUserNS has very
short lifetime, it may use PlatformThread APIs when tracing was enabled at the
time of the call (this is the case when valid --trace-config-file= argument was
passed to Chrome at startup).

Also add a DCHECK to ensure we do not treat child crash and normal exit with
code 1 the same (this would have been very helpful to debug the error
originally).

Bug: 899897,  902514 
Change-Id: I8e2c82c8030537f5a62b725970d4ea5ec5948659
Reviewed-on: https://chromium-review.googlesource.com/c/1407830
Commit-Queue: Robert Sesek <rsesek@chromium.org>
Reviewed-by: Robert Sesek <rsesek@chromium.org>
Reviewed-by: Gabriel Charette <gab@chromium.org>
Cr-Commit-Position: refs/heads/master@{#623228}
[modify] https://crrev.com/40c2f1577996de8510142d8868a65621baaba1e2/base/process/launch_posix.cc
[modify] https://crrev.com/40c2f1577996de8510142d8868a65621baaba1e2/base/threading/platform_thread_internal_posix.h
[modify] https://crrev.com/40c2f1577996de8510142d8868a65621baaba1e2/base/threading/platform_thread_posix.cc
[modify] https://crrev.com/40c2f1577996de8510142d8868a65621baaba1e2/base/threading/scoped_blocking_call.cc
[modify] https://crrev.com/40c2f1577996de8510142d8868a65621baaba1e2/sandbox/linux/services/credentials.cc

Comment 9 by gab@chromium.org, Jan 16 (6 days ago)

Cc: mal...@yandex-team.ru
Owner: ----
Status: Fixed (was: Assigned)
Fixed by malex@yandex; woohoo, thanks!

Comment 10 by vapier@chromium.org, Jan 16 (6 days ago)

that must have been fun to debug.  nice job!

Sign in to add a comment