New issue
Advanced search Search tips
Note: Color blocks (like or ) mean that a user may not be available. Tooltip shows the reason.

Issue 665691 link

Starred by 3 users

Issue metadata

Status: Fixed
Owner:
Closed: Jan 2017
Cc:
EstimatedDays: ----
NextAction: ----
OS: Mac
Pri: 2
Type: Bug

Blocked on:
issue 599484
issue 629712
issue 650898
issue 667512
issue 667549
issue 669356

Blocking:
issue 607545
issue 624049
issue 673921



Sign in to add a comment

Investigate test errors on 10.12 image.

Project Member Reported by erikc...@chromium.org, Nov 16 2016

Issue description

Labels: OS-Mac

Comment 2 by d...@chromium.org, Nov 17 2016

Blocking: 607545
Blockedon: 667549
Project Member

Comment 4 by bugdroid1@chromium.org, Nov 22 2016

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/f2a388826d809c7c0491d360d8da5512e4f97879

commit f2a388826d809c7c0491d360d8da5512e4f97879
Author: erikchen <erikchen@chromium.org>
Date: Tue Nov 22 07:05:14 2016

Don't run nacl_integration tests on mac on fyi waterfall.

The tests are no longer run on mac anywhere.

BUG= 665691 

Review-Url: https://codereview.chromium.org/2519953003
Cr-Commit-Position: refs/heads/master@{#433804}

[modify] https://crrev.com/f2a388826d809c7c0491d360d8da5512e4f97879/testing/buildbot/chromium.fyi.json

Failures as of 11/28/2016:

browser_tests
ConstrainedWindowMacTest.BrowserWindowFullscreen
ClipboardApiTest.Extension
PluginPowerSaverBrowserTest.PosterTests
BrowserWindowControllerTest.FullscreenResizeFlags
ExtensionApiTest.BookmarkManager
SSLClientCertificateSelectorCocoaTest.HideShow
WebstoreInlineInstallerTest.BlockInlineInstallFromFullscreenForBrowser
OmniboxViewMacBrowserTest.CopyToPasteboard
OutOfProcessPPAPITest.FlashClipboard
PluginPowerSaverBrowserTest.SmallCrossOrigin
ServiceProcessControlBrowserTest.LaunchAndIPC
SpellCheckMessageFilterPlatformMacBrowserTest.SpellCheckReturnMessage
FindBarBrowserTest.EscapeKey
RenderViewContextMenuMacBrowserTest.ServicesFiltering
DesktopCaptureApiTest.ChooseDesktopMedia
ServiceProcessControlBrowserTest.LaunchAndReconnect
BrowserWindowControllerTest.FullscreenToolbarExposedForTabstripChanges

components_unittests
BookmarkNodeDataTest.WriteToClipboardURL
BookmarkUtilsTest.CopyPaste
BookmarkUtilsTest.PasteNonEditableNodes
BookmarkNodeDataTest.WriteToClipboardFolderAndURL
BookmarkNodeDataTest.JustURL
SpellcheckPlatformMacTest.IgnoreWords_EN_US
BookmarkNodeDataTest.MetaInfo
BookmarkNodeDataTest.Folder
BookmarkNodeDataTest.WriteToClipboardMultipleURLs
BookmarkNodeDataTest.FolderWithChild
BookmarkNodeDataTest.WriteToClipboardEmptyFolder
BookmarkNodeDataTest.MultipleNodes
BookmarkUtilsTest.PasteBookmarkFromURL
BookmarkNodeDataTest.URL
BookmarkUtilsTest.MakeTitleUnique
BookmarkUtilsTest.CopyPasteMetaInfo
BookmarkNodeDataTest.WriteToClipboardFolderWithChildren
SpellcheckPlatformMacTest.SpellCheckIgnoresOrthography
SpellcheckPlatformMacTest.SpellCheckSuggestions_EN_US

content_browsertests 
CaptureScreenshotTest.CaptureScreenshotArea
CaptureScreenshotTest.CaptureScreenshot

content_unittests
WebDragDestTest.Data
WebDragDestTest.URL
MacSandboxTest.ClipboardAccess

interactive_ui_tests
ClipboardTest/0.GetSequenceNumber
OmniboxViewTest.CutTextToClipboard
ClipboardTest/0.TextTest
ClipboardTest/0.TrickyHTMLTest
OmniboxViewTest.CutURLToClipboard
SitePerProcessInteractiveBrowserTest.FullscreenElementInSubframe
SitePerProcessInteractiveBrowserTest.FullscreenElementInABAAndExitViaJS
OmniboxViewTest.CopyURLToClipboard
SitePerProcessInteractiveBrowserTest.FullscreenElementInMultipleSubframes
ClipboardTest/0.WebSmartPasteTest
ClipboardTest/0.UnicodeHTMLTest
SitePerProcessInteractiveBrowserTest.FullscreenElementInABAAndExitViaEscapeKey
OmniboxViewTest.CopyTextToClipboard
ClipboardTest/0.RTFTest
ClipboardTest/0.DataTest
ClipboardTest/0.MultipleDataTest
ClipboardTest/0.HTMLTest
ClipboardTest/0.BookmarkTest
ClipboardTest/0.SharedBitmapTest
ClipboardTest/0.MultiFormatTest
OmniboxViewTest.Paste
ExtensionApiTest.FocusWindowDoesNotExitFullscreen
ClipboardTest/0.URLTest

mojo_system_unittests
WaiterTest.Basic
WaiterTest.TimeOut
 
net_unittests
CertVerifyProcTest.LargeKey
KeygenHandlerTest.SmokeTest
VerifyMixed/CertVerifyProcWeakDigestTest.VerifyDetectsAlgorithm/0
KeygenHandlerTest.ConcurrencyTest
CertVerifyProcTest.MacCRLIntermediate
VerifyEndEntity/CertVerifyProcWeakDigestTest.VerifyDetectsAlgorithm/1
VerifyEndEntity/CertVerifyProcWeakDigestTest.VerifyDetectsAlgorithm/0
VerifyMixed/CertVerifyProcWeakDigestTest.VerifyDetectsAlgorithm/1
VerifyIncompleteEndEntity/CertVerifyProcWeakDigestTest.VerifyDetectsAlgorithm/1
VerifyIncompleteEndEntity/CertVerifyProcWeakDigestTest.VerifyDetectsAlgorithm/0
CertVerifyProcTest.RejectWeakKeys

ui_base_unittests
ClipboardMacTest.ReadImageNonRetina
ClipboardUtilMacTest.CheckForLeak
ClipboardUtilMacTest.PasteboardItemWithTitle
OSExchangeDataTest.TestFileToURLConversion
OSExchangeDataTest.URLAndString
ClipboardUtilMacTest.PasteboardItemFromUrl
OSExchangeDataTest.TestPickledData
OSExchangeDataTest.StringDataGetAndSet
OSExchangeDataTest.TestURLExchangeFormats
ClipboardMacTest.ReadImageRetina
ClipboardUtilMacTest.PasteboardItemWithFilePath

unit_tests
UrlDropControllerTest.DragAndDropText
ServiceProcessControlMac.TestGTMSMJobSubmitRemove
UrlDropControllerTest.DragAndDropURL
FindPasteboardTest.ReadingFromPboardUpdatesFindText
DownloadUtilMacTest.AddFileToPasteboardTest
BookmarkContextMenuControllerTest.CutCopyPasteNode
UrlDropControllerTest.DragAndDropTextParsableAsURL
FindPasteboardTest.SendsNotificationWhenTextChanges
ClipboardUtilsTest.GetClipboardText

views_unittests
DragDropClientMacTest.PasteboardToOSExchangeTest
TextfieldTest.DragAndDrop_InitiateDrag
TextfieldTest.DragAndDrop_ToTheRight
TextfieldTest.DragAndDrop_ToTheLeft
TextfieldTest.DragAndDrop_Canceled
DragDropClientMacTest.BasicDragDrop

Comment 6 by kbr@chromium.org, Nov 29 2016

Blockedon: 667512

Comment 7 by tapted@chromium.org, Nov 29 2016

Blockedon: 669356
Cc: sdy@chromium.org
I've been trying to repro these errors with no success. I've tried:

running the tests on a 10.12.1 machine.
Running the tests on a 10.12.0 VM [sdy tried this].
ssh-ing into build9-m1 and running the tests there.
vnc-ing into build9-m1 and running the tests there.

The fact that these failures are deterministic on build9-m1, but that I can't repro them when ssh/vnc-ing in is very suspicious.
sdy reports this is also passes on 10.12.1 VM.
I can reproduce the clipboard errors locally.

Modify the test ClipboardMacTest.ReadImageRetina to leak 100000 UniquePasteboards. Now the test fails with the same symptoms.

Reset ClipboardMacTest.ReadImageRetina. It still fails! There's a PB database somewhere that needs to be cleared.
Cc: tapted@chromium.org
More observations:
Restarting the machine seems to clear the pasteboard state [everything works]
After causing ui_base_unittests to fail, I tried running views_unittests - the exact same set of tests fail.
Project Member

Comment 12 by bugdroid1@chromium.org, Nov 30 2016

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/b3048cc1a9f9feb6a3f38d364cee1a79bd3fcb23

commit b3048cc1a9f9feb6a3f38d364cee1a79bd3fcb23
Author: erikchen <erikchen@chromium.org>
Date: Wed Nov 30 09:50:45 2016

Fix a leak in a MacViews pasteboard test.

BUG= 665691 

Review-Url: https://codereview.chromium.org/2537953002
Cr-Commit-Position: refs/heads/master@{#435201}

[modify] https://crrev.com/b3048cc1a9f9feb6a3f38d364cee1a79bd3fcb23/ui/views/cocoa/drag_drop_client_mac_unittest.mm

"""
sudo log show --info --debug --predicate 'subsystem == "com.apple.CFPasteboard"'
"""

"""
  45979 2016-11-29 22:20:49.895567-0800 0x52b02    Default     0x0                  30327  browser_tests: (CoreFoundation) [com.apple.CFPasteboard.general] failed to create global data
  45980 2016-11-29 22:20:49.895580-0800 0x52b07    Error       0x0                  30327  browser_tests: (CoreFoundation) [com.apple.CFPasteboard.general] Connection to 'pboard' server had an error:         <error: 0x7fffe4991ca0> { count = 1, transaction: 0, voucher = 0x0, contents =
  45981         "XPCErrorDescription" => <string: 0x7fffe4991f18> { length = 18, contents = "Connection invalid" }
  45982 }
...
...
...
1860467 2016-11-30 02:33:34.840786-0800 0x1043d7   Default     0x0                  68736  sync_integration_tests: (CoreFoundation) [com.apple.CFPasteboard.general] failed to create global data
1860468 2016-11-30 02:33:34.840794-0800 0x104423   Error       0x0                  68736  sync_integration_tests: (CoreFoundation) [com.apple.CFPasteboard.general] Connection to 'pboard' server had a        n error: <error: 0x7fff9ced6ca0> { count = 1, transaction: 0, voucher = 0x0, contents =
1860469         "XPCErrorDescription" => <string: 0x7fff9ced6f18> { length = 18, contents = "Connection invalid" }
1860470 }
1860471 2016-11-30 02:33:37.896625-0800 0x10447e   Error       0x0                  68768  ui_base_unittests: (CoreFoundation) [com.apple.CFPasteboard.general] Failed to obtain 'pboard' service port:         <error: 0x7fff9ced6ca0> { count = 1, transaction: 0, voucher = 0x0, contents =
1860472         "XPCErrorDescription" => <string: 0x7fff9ced6f18> { length = 18, contents = "Connection invalid" }
1860473 }
"""

For some reason, the test binaries can't connect to the pboard service. 

In contrast, here's what it looks like when I manually run a test from my ssh session:
"""
2016-11-30 13:34:24.186193-0800 0x1aa19    Debug       0x0                  27725  pboard: (CoreFoundation) [com.apple.CFPasteboard.sudden-termination] sudden termination disabled
2016-11-30 13:34:24.186208-0800 0x1aa19    Debug       0x0                  27725  pboard: (CoreFoundation) [com.apple.CFPasteboard.sudden-termination] sudden termination enabled
2016-11-30 13:34:24.186964-0800 0x1aa19    Info        0x0                  27725  pboard: (CoreFoundation) [com.apple.CFPasteboard.general] Sucessfuly started pboard: 'CFPBS:186A5:'
2016-11-30 13:34:24.186997-0800 0x1aa19    Info        0x0                  27725  pboard: (CoreFoundation) [com.apple.CFPasteboard.general] Setting up the 'com.apple.pasteboard.1' connection (for pboard)
2016-11-30 13:34:24.187057-0800 0x1aa19    Info        0x0                  27725  pboard: (CoreFoundation) [com.apple.CFPasteboard.general] Setting up the 'com.apple.coreservices.uauseractivitypasteboardclient.xpc' connection
"""
Cc: rsesek@chromium.org
Based on pseudocode for ___CFPasteboardSetup from CoreFoundation, it looks like the following xpc message is failing:
"""
    rax = xpc_connection_create_mach_service("com.apple.pasteboard.1", rax, 0x0);
    *___CFPasteboardServerConnection = rax;
    ...
    xpc_connection_resume(*___CFPasteboardServerConnection);
    rbx = xpc_dictionary_create(0x0, 0x0, 0x0);
    xpc_dictionary_set_string(rbx, "com.apple.pboard.message", "com.apple.pboard.check-in");
    r15 = xpc_connection_send_message_with_reply_sync(*___CFPasteboardServerConnection, rbx);
    xpc_release(rbx);
    if (xpc_get_type(r15) == __xpc_type_error) goto loc_14469d;
"""

I bet our sandbox is getting in the way, although I don't know why it affects ui_base_unittests, which I would expect to not spin up the sandbox.
actually, this *should* be called from the browser process. 
Pseudocode for __CFHandlePasteboardXPCEvent:

    r14 = xpc_dictionary_get_string(rbx, "com.apple.pboard.message");
    if (strcmp(r14, "com.apple.pboard.check-in") == 0x0) goto loc_156a2c;
...
loc_156a2c:
    r14 = xpc_dictionary_create_reply(rbx);
    xpc_dictionary_set_mach_send(r14, "com.apple.pboard.port", *(int32_t *)_mach_task_self_);
    if (getaudit_addr(var_50, 0x30) == 0x0) {
            xpc_dictionary_set_int64(r14, "com.apple.pboard.token", sign_extend_64(var_2C));
    }
    rax = xpc_dictionary_get_remote_connection(rbx);
    xpc_connection_send_message(rax, r14);
    xpc_release(r14);
    goto loc_156a8f;
The pboard process is never receiving the XPC message [confirmed with lldb], although we could have known this from 
"""
"XPCErrorDescription" => <string: 0x7fff9ced6f18> { length = 18, contents = "Connection invalid" }
"""
The logs in c#13 are slightly deceptive, as I accidentally dropped one of the lines for browser_tests and interactive_ui_tests. The logs always come in sets of 3:

"""
2016-11-30 14:17:41.971602-0800 0x62da5    Error       0x0                  48425  browser_tests: (CoreFoundation) [com.apple.CFPasteboard.general] Connection to 'pboard' server had an error: <error: 0x7fffd05fdca0> { count = 1, transaction: 0, voucher = 0x0, contents =
	"XPCErrorDescription" => <string: 0x7fffd05fdf18> { length = 18, contents = "Connection invalid" }
}
2016-11-30 14:17:41.971623-0800 0x62d97    Error       0x0                  48425  browser_tests: (CoreFoundation) [com.apple.CFPasteboard.general] Failed to obtain 'pboard' service port: <error: 0x7fffd05fdca0> { count = 1, transaction: 0, voucher = 0x0, contents =
	"XPCErrorDescription" => <string: 0x7fffd05fdf18> { length = 18, contents = "Connection invalid" }
}
2016-11-30 14:17:41.971629-0800 0x62d97    Default     0x0                  48425  browser_tests: (CoreFoundation) [com.apple.CFPasteboard.general] failed to create global data
"""

"Failed to create mach service connection" is never emitted, so we know that the problem lies somewhere very close to sending the mach msg.
We know that the relevant range is
"""
    rax = qos_class_main();
    rax = dispatch_get_global_queue(rax, 0x0);
    rax = xpc_connection_create_mach_service("com.apple.pasteboard.1", rax, 0x0);
    *___CFPasteboardServerConnection = rax;
    if (rax == 0x0) goto loc_144642;

loc_1444c7:
    xpc_connection_set_event_handler(rax, void ^(void * _block, void * arg1) {
        rbx = arg1;
        var_20 = *___stack_chk_guard;
        if (xpc_get_type(rbx) == __xpc_type_error) {
                rbx = xpc_copy_description(rbx);
                if (os_log_type_enabled(*__CFPasteboardLog, 0x10) != 0x0) {
                        r15 = rsp;
                        rax = rsp;
                        rsi = *__CFPasteboardLog;
                        *(int8_t *)(rax + 0xfffffffffffffff0) = 0x2;
                        *(int8_t *)(rax + 0xfffffffffffffff1) = 0x1;
                        *(int8_t *)(rax + 0xfffffffffffffff2) = 0x22;
                        *(int8_t *)(rax + 0xfffffffffffffff3) = 0x8;
                        *(rax + 0xfffffffffffffff4) = rbx;
                        _os_log_impl(0xffffffffffeb20c5, rsi, 0x10, "Connection to 'pboard' server had an error: %{public}s", rax + 0xfffffffffffffff0, 0xc);
                }
                free(rbx);
        }
        if (*___stack_chk_guard != var_20) {
                __stack_chk_fail();
        }
        return;
    });
    xpc_connection_resume(*___CFPasteboardServerConnection);
    rbx = xpc_dictionary_create(0x0, 0x0, 0x0);
    xpc_dictionary_set_string(rbx, "com.apple.pboard.message", "com.apple.pboard.check-in");
    r15 = xpc_connection_send_message_with_reply_sync(*___CFPasteboardServerConnection, rbx);
"""
I think that the tests are not being run on the VM in the right session. First, some background:

When the machine reboot, it reads and starts a service from:
/Library/LaunchDaemons//org.chromium.infra.service_manager.plist

This service, after many layers of indirection, will eventually run the recipe, and in turn the tests. Using "launchctl procinfo" to examine the test [actually "ps aux | grep browser_tests | awk -v N=2 '{print $N}' | xargs sudo launchctl procinfo"] shows:

"""
audit info
	session id = 100000
	uid = 4294967295
	success mask = 0x0
	failure mask = 0x0
	flags = is_initial
"""

[full text attached].

I created a LaunchAgent on a 10.12 device to directly run ui_base_unittests.
"""
audit info
        session id = 100007
...
        flags = has_graphic_access,has_tty,has_console_access,has_authenticated
"""

Finally, I noticed that I had created a LaunchAgent, whereas the vm was using a LaunchDaemon! I tried creating both a LaunchAgent and a LaunchDaemon on my local 10.12 machine and voila! using a Daemon has problems and an agent does not.
browser_test_proc_info.txt
7.0 KB View Download
Nice find! That conceptually makes sense, since LaunchAgents are generally meant to be associated with user sessions, whereas LaunchDaemons don't have a GUI session and are jobs.

https://developer.apple.com/library/content/documentation/MacOSX/Conceptual/BPSystemStartup/Chapters/CreatingLaunchdJobs.html
I ssh-ed into another random bot to see if this was a configuration mistake.

build179-m1 from Builder: Mac10.11 Tests
https://build.chromium.org/p/chromium.mac/builders/Mac10.11%20Tests

Also has plists in /Library/LaunchDaemons. 

Next steps: I'm going to modify build9-m1 to see if moving the plist into /Library/LaunchAgents fixes the problem. 
This article has a nice description of LaunchAgents vs LaunchDaemons: http://www.grivet-tools.com/blog/2014/launchdaemons-vs-launchagents/

LaunchDaemons are run on system start, but don't have access to the GUI. I'm guessing that in Sierra, macOS no longer allows processes without the "has_graphic_access" audit info flag to access the clipboard. 

LaunchAgents are run when the user logs in but have access to the GUI. I'm kind of surprised that our browser/interactive tests have ever worked when triggered from LaunchDaemons.
Cc: mark@chromium.org
+mark, who in conversation said "it should definitely be an agent. I wonder what changed".

Comment 24 by mark@chromium.org, Dec 1 2016

Bots that run tests and expect a UI session definitely need to have their bot stuff kicked off via a LaunchAgent that specifies LimitLoadToSessionType Aqua, and have auto-login set up. UI stuff won’t work from a LaunchDaemon.

Comment 25 by d...@chromium.org, Dec 1 2016

Cc: dsansome@chromium.org
+dsansome who apparently owns service_manager

Dave - Looks like we need to make sure buildbot's process on Mac is launched from /Library/LaunchAgents (or ~/LaunchAgents) with service_manager to ensure it has full access to the UI in 10.12+ (see comment #22).


Owner: ddoman@chromium.org
ddoman is already moving service_manager from a LaunchDaemon to a LaunchAgent.  He's started the rollout in https://chrome-internal-review.googlesource.com/c/306975/.
Owner: erikc...@chromium.org
Now, it runs as an agent within an aqua session.
: https://chrome-internal.googlesource.com/infra/puppet/+/master/puppetm/etc/puppet/modules/chrome_infra/files/service_manager/org.chromium.infra.service_manager.agent.plist

However, browser_tests still fail but a less number of tests are failing: https://build.chromium.org/p/chromium.fyi/builders/Chromium%20Mac%2010.11%20Force%20Mac%20Toolchain/builds/14869

erikchen@, Would you be able to find out if those failures are still from the same cause?

```
audit info
        session id = 100006
        uid = 500
        success mask = 0x3000
        failure mask = 0x3000
        flags = has_graphic_access,has_tty,has_console_access
sandboxed = no
container = (no container)
```
browser_test_proc_info_with_agent.txt
8.7 KB View Download
Blockedon: 629712
Cc: erikc...@chromium.org
Owner: ddoman@chromium.org
Tests that fail:

browser_tests:
PluginPowerSaverBrowserTest.SmallCrossOrigin
PluginPowerSaverBrowserTest.PosterTests
QUnitBrowserTestRunner.Remoting_Webapp_Js_Unittest
SSLClientCertificateSelectorCocoaTest.HideShow
 
mojo_system_unittests:
WaiterTest.Basic
WaiterTest.TimeOut

net_unittests:
CertVerifyProcTest.LargeKey
CertVerifyProcTest.RejectWeakKeys
VerifyMixed/CertVerifyProcWeakDigestTest.VerifyDetectsAlgorithm/0
VerifyMixed/CertVerifyProcWeakDigestTest.VerifyDetectsAlgorithm/1
CertVerifyProcTest.MacCRLIntermediate
VerifyEndEntity/CertVerifyProcWeakDigestTest.VerifyDetectsAlgorithm/1
VerifyEndEntity/CertVerifyProcWeakDigestTest.VerifyDetectsAlgorithm/0
VerifyIncompleteEndEntity/CertVerifyProcWeakDigestTest.VerifyDetectsAlgorithm/1
VerifyIncompleteEndEntity/CertVerifyProcWeakDigestTest.VerifyDetectsAlgorithm/0

net_unittests failures are  issue 629712 . 

mojo_system_unittests do not fail when sshed/vnc-ed into the machine. The errors suggest that the process is running in a low-priority mode, which is why all the timings are off.
"""
../../mojo/edk/system/waiter_unittest.cc:130: Failure
Expected: (elapsed) < ((2 + 1) * test::EpsilonDeadline()), actual: 106979 vs 60000
../../mojo/edk/system/waiter_unittest.cc:154: Failure
Expected: (elapsed) < ((2 + 1) * test::EpsilonDeadline()), actual: 89956 vs 60000
../../mojo/edk/system/waiter_unittest.cc:204: Failure
Expected: (elapsed) < ((5 + 1) * test::EpsilonDeadline()), actual: 152557 vs 120000
"""

QUnitBrowserTestRunner.Remoting_Webapp_Js_Unittest does not fail when sshed/vnc-ed into the machine. Logs don't seem to provide very useful information. The remaining browser_tests do reproduce when ssh-ed in.

ddoman: There are still 3 failures [2 from mojo_system_unittests, 1 from browser_tests] that do not reproduce when sshed/vnced into the machine. This suggest some type of test harness difference. Note that the audit you posted is different from the one I posted in c#19 when creating a launch agent. Can you dig into those issues further?

How are you triggering auto-login for the bots? Theoretically, LaunchAgents don't trigger unless someone logs into the machine.

Comment 30 by d...@chromium.org, Dec 6 2016

There's only one way to trigger auto login, and it's baked into the image (System Preferences -> Users and Groups -> Login Options. Automatic login drop down on the right is set to chrome-bot).

Side note: If you're getting double prompted for credentials after the initial VNC password prompt, then that usually is a sign the window server has crashed and the second prompt is actually logging the user back in.
How do you automatically log in if there's a password? mark@ said that when we initially set this up, he actually had to bake the password into the scripts he used.

[The Window Server is not crashing on this bot].

Comment 32 by d...@chromium.org, Dec 6 2016

It's done automatically with the package that creates the chrome-bot (https://github.com/MagerValp/CreateUserPkg) user during image creation.
Blockedon: 650898
Blockedon: 599484
ddoman: Ping? The mojo_system_unittests fail on the bot deterministically, but don't fail when I ssh in and run them [or use VNC]. This implies there's a difference between the environment in which the tests are being run. Note that the tests that are failing are highly suggestive of the tests being at a lowered priority level. 

Are we doing anything that would change the scheduling/prioritization of processes?
Project Member

Comment 36 by bugdroid1@chromium.org, Dec 12 2016

The following revision refers to this bug:
  https://chrome-internal.googlesource.com/infra/puppet/+/cb0b54d0d753676469d20df38ec31bb5290a0df5

commit cb0b54d0d753676469d20df38ec31bb5290a0df5
Author: Scott Lee <ddoman@chromium.org>
Date: Mon Dec 12 06:25:23 2016

erikchen,

I found that the buildbot slave was being given lower values in NumberOfFiles and NumberOfProcesses than it used to be given.
Thus, I increased the value from 10240 to 20000 for NumberOfFiles, and from 1064 to 2000 for NumberOfProcesses (# of childprocess).

If it doesn't make any different result, then I will make a change such that buildbot slave process is launched with its own plist, instead of with service_manager.
It doesn't seem like the limit changes made any different results.

>> Are we doing anything that would change the scheduling/prioritization of processes?

Yes, just to give you background information, buildbot slave, which takes a build request. compiles chroimum code, and runs the tests, used to run as a launchd agent.

However, we are migrating it to our own service startup program, called service_manager, so that service_manager runs it 

* Before
#1. launchd launches buildbot_slave, of which plist is located in ${HOME}/Library/LaunchAgents

* Now
#1. launchd launches service_manager, of which plist is located in /Library/LaunchAgents
#2. service_manager runs buildbot slave.

I will make a change to run buildbot slave with its own plist, and investigate further to find out more info.
Project Member

Comment 39 by bugdroid1@chromium.org, Dec 13 2016

The following revision refers to this bug:
  https://chrome-internal.googlesource.com/infra/puppet/+/7e5465d0cd4fb05b57098ddd849186a2f7c836f6

commit 7e5465d0cd4fb05b57098ddd849186a2f7c836f6
Author: Scott Lee <ddoman@chromium.org>
Date: Mon Dec 12 23:58:08 2016

Comment 40 Deleted

Labels: Restrict-View-Google
Labels: -Restrict-View-Google
Hi erikchen,

I made a change to run buildbot slave with its own plist, but the same failures are still occurring.
For example, in the following build, I can find the same tests with failures
: https://build.chromium.org/p/chromium.fyi/builders/Chromium%20Mac%2010.11%20Force%20Mac%20Toolchain/builds/14927


Here are my summary.
#1. originally, buildbot slave daemon was running as an agent with its own plist placed under ${HOME}/Library/LaunchAgents.

#2. A change was made to run buildbot slave daemon with service_manager so that the daemon was started with service_manager.
However, this caused many failures in the build because service_manager was running as a daemon, and, therefore, its child processes,
such as browser_tests and interactive_ui_tests, didn't have graphic access. This is when this bug was reported.

This is an example build failed when service_manager was running as a daemon.
https://build.chromium.org/p/chromium.fyi/builders/Chromium%20Mac%2010.11%20Force%20Mac%20Toolchain/builds/14842

#3. I made a change such to run service_manager as an agent.
As a result, the buildbot slave daemon ran within an aqua session and "launchctl procinfo" command showed
that the buildbot daemon processes and its child processes have graphic access.

Although many test failures were gone, there were still a small number of failures in browser_tests and mojo.

#4. I made another change to run buildbot with its own plist placed under ${HOME}/Library/LaunchAgents
: i.e., start the buildbot slave daemon with the same plist that was used in #1.
However, the same test failures has occurred in browser_tests and mojo.
https://build.chromium.org/p/chromium.fyi/builders/Chromium%20Mac%2010.11%20Force%20Mac%20Toolchain/builds/14927

-----------------
erikchen@, AFAIK, there has no change made to the buildbot slave daemon such that its scheduling/prioritization would be changed, and,
now, it is running with the same plist file it used to run with originally.
https://chrome-internal.googlesource.com/infra/puppet/+/master/puppetm/etc/puppet/modules/chrome_infra/templates/setup/darwin/org.chromium.buildbot.slave.plist.erb


> There are still 3 failures [2 from mojo_system_unittests, 1 from browser_tests] that do not reproduce when sshed/vnced into the machine. This suggest some type of test harness difference. Note that the audit you posted is different from the one I posted in c#19 when creating a launch agent. Can you dig into those issues further?

Just for your information, the audit info of #3 and #4 were the same.
In my understanding, session ID is a login session ID, the reason why the LaunchAgent running directly ui_base_unittests
showed a different session ID is probably because you created and loaded the agent in your ssh/vnc session.

I don't know what else I can try and test out to resolve those test failures, but in my opinion,
it is unlikely that the remaining test failures are caused from the changes made for service_manager migration.

When you sshed/vnced into the machine and browser_tests ran successfully, what's the nice value of the process?
I just sshed into one and found that all browser_tests processes were having 0 in NI.
Cc: ddoman@chromium.org
Owner: erikc...@chromium.org
ddoman: Thanks for the detailed update. I will continue to investigate.
Was able to reproduce the mojo_system_unittests error on a local 10.12 device by launching it as a LaunchAgent. Does not reproduce on 10.11 machines, as expected. Adding the ProcessType "Interactive" key fixes the issue.

https://chrome-internal-review.googlesource.com/#/c/311798/

Comment 45 by d...@chromium.org, Dec 14 2016

Blocking: 673921
Project Member

Comment 46 by bugdroid1@chromium.org, Dec 15 2016

The following revision refers to this bug:
  https://chrome-internal.googlesource.com/infra/puppet/+/e1a549b682b4ff1f03453f566656b2b5160fcf33

commit e1a549b682b4ff1f03453f566656b2b5160fcf33
Author: erikchen <erikchen@google.com>
Date: Wed Dec 14 02:56:05 2016

Project Member

Comment 47 by bugdroid1@chromium.org, Dec 15 2016

The following revision refers to this bug:
  https://chrome-internal.googlesource.com/infra/puppet/+/ff08cf35f259726d16f65ae7928f6d44df548da9

commit ff08cf35f259726d16f65ae7928f6d44df548da9
Author: Scott Lee <ddoman@chromium.org>
Date: Thu Dec 15 06:29:38 2016

+erikchen@,

I am sorry that I didn't realize the following CL was to add <ProcessType> to service_manager.plist.
: https://chrome-internal-review.googlesource.com/#/c/311798/

As I mentioned above, I made a change to start buildbot slave daemon with its own plist, as a result,
your CL didn't have an impact to the buildbot slave daemon since it was launched with a different plist.
: #1 in https://bugs.chromium.org/p/chromium/issues/detail?id=665691#c42

That's why the mojo tests were still failing in the following builds, which were triggered after your CL was landed.
- https://build.chromium.org/p/chromium.fyi/builders/Chromium%20Mac%2010.11%20Force%20Mac%20Toolchain/builds/14939
- https://build.chromium.org/p/chromium.fyi/builders/Chromium%20Mac%2010.11%20Force%20Mac%20Toolchain/builds/14940

I landed the following CL to start buildbot slave daemon with service_manager again, and verified that service_manager is running within interactive mode.
```
$ launchctl procinfo the_pid_of_service_daemon
    ...
    spawn type = interactive
```

I will check the result of mojo_tests tomorrow again.
: https://build.chromium.org/p/chromium.fyi/builders/Chromium%20Mac%2010.11%20Force%20Mac%20Toolchain/builds/14942


ddoman: It looks like every 5th build or so is going purple. The latest one is 14947. Note that while we lose contact at 15:55:27, the next build doesn't start until 16:44:42. Looking at the machine logs, it looks like it's still happily chugging away performing browser_tests. This suggests that's there's something with at the infra layer?
* #14947
It ended at 15:55:27, and I could find that master.chromium.fyi has been started at 15:57:21. 
: http://shortn/_REfDmWKBGG
"2016-12-15T15:57:21	master1	master.chromium.fyi	_make_start	success
"

I believe that the master process was stopped(killed) in 15:55, and started with "make start" at 15:57:21.

As a result, you can find a purple build in other builders under master.chromium.fyi.
- CrWin7Goma  #37119 went purple and ended at 15:55:10
- Blimp Linux Engine #3129 went purple and ended at 15:55:31

* #14941
This was interrupted due to my CL that killed the running buildbot slave process and restarted it with service_manager.

* #14938
It was interrupted and ended at Dec 14, 13:50:02, and I could find that
the master was started at 13:51:17.

https://build.chromium.org/p/chromium.fyi/builders/Chromium%20Mac%2010.11%20Force%20Mac%20Toolchain/builds/14947

* #13933
It was interrupted and ended at Dec 13, 17:48:06, I could find the master was restarted at the following schedule:

2016-12-13T17:49:14	master1	master.chromium.fyi	_make_start	success



> This suggests that's there's something with at the infra layer?
It was just coincident that there have been needs to restart chromium.fyi for various reasons.

You may find out what CLs have been landed to schedule a master restart in the following git repo with looking at the history.
: https://chrome-internal.googlesource.com/infradata/master-manager.git




erikchen:

I can see that mojo_tests no longer fails in recent builds, but the following browser tests still fail.
- PluginPowerSaverBrowserTest.SmallCrossOrigin
- PluginPowerSaverBrowserTest.PosterTests

I am not sure if those are failing only in 10.12 or not.
If those tests succeed in 10.11, then feel free to keep this open, continue investigation, and ask for a help if necessary.

Feel free to close this ticket and file a new one if necessary.

Comment 52 by sdy@chromium.org, Dec 16 2016

Cc: -sdy@chromium.org
Un-cc'ing myself for now, feel free to re-add me if I can help :).
Project Member

Comment 53 by bugdroid1@chromium.org, Dec 16 2016

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/fde5cb085de250b5ac8b96ff8f95e1d54038ae2d

commit fde5cb085de250b5ac8b96ff8f95e1d54038ae2d
Author: erikchen <erikchen@chromium.org>
Date: Fri Dec 16 22:29:30 2016

Disable two plugin power saver tests on macOS.

The tests fail on macOS 10.12 and need to be investigated by the PPS team.

BUG=599484,  665691 

Review-Url: https://codereview.chromium.org/2585433002
Cr-Commit-Position: refs/heads/master@{#439218}

[modify] https://crrev.com/fde5cb085de250b5ac8b96ff8f95e1d54038ae2d/chrome/browser/plugins/plugin_power_saver_browsertest.cc

Project Member

Comment 54 by bugdroid1@chromium.org, Dec 18 2016

The following revision refers to this bug:
  https://chrome-internal.googlesource.com/infra/puppet/+/1c78b4650e34bb1fd8f3defcd865844e0b33f897

commit 1c78b4650e34bb1fd8f3defcd865844e0b33f897
Author: Scott Lee <ddoman@chromium.org>
Date: Thu Dec 15 07:15:01 2016

Comment 55 by kbr@chromium.org, Dec 19 2016

Cc: kbr@chromium.org
Labels: -Pri-3 Pri-2
Thanks to everyone here for fixing these test harness problems on 10.12. How close do you think we are to being able to use 10.12 on some of the test bots? I'd like to deploy it on the GPU bots in Issue 673921. Thanks.

Status: Fixed (was: Assigned)
10.12 toolchain is now green: https://build.chromium.org/p/chromium.fyi/builders/Chromium%20Mac%2010.11%20Force%20Mac%20Toolchain

Currently waiting on Infra-Labs to start rolling out 10.12 [and to determine whether we want to roll out 10.12.1 or 10.12.2].

https://bugs.chromium.org/p/chromium/issues/detail?id=659213#c7

Comment 57 by kbr@chromium.org, Dec 19 2016

Awesome! Thank you!

Personally I'd vote for 10.12.2 -- Apple fixed a lot of graphics driver bugs in that release.

Comment 58 by d...@chromium.org, Dec 19 2016

Owner: d...@chromium.org
Status: Assigned (was: Fixed)
I'd rather roll 10.12.2. No sense of running old point releases when people are usually forced to upgrade to the latest.

I'm reopening this and assigning to me, because the first thing i'd reinstall is this force toolchain bot and ensure it still rolls green with 10.12.2.

Comment 59 by kbr@chromium.org, Dec 19 2016

Sweet.

Comment 60 by d...@chromium.org, Dec 20 2016

build9-m1 is 10.12.2 starting with https://build.chromium.org/p/chromium.fyi/builders/Chromium%20Mac%2010.11%20Force%20Mac%20Toolchain/builds/14981

I'll check up on it in a few hours.

Comment 62 by d...@chromium.org, Jan 3 2017

Status: Fixed (was: Assigned)
10.12.2 image seems fine.

Sign in to add a comment