New issue
Advanced search Search tips
Note: Color blocks (like or ) mean that a user may not be available. Tooltip shows the reason.

Issue 840127 link

Starred by 2 users

Issue metadata

Status: Fixed
Owner:
Closed: Nov 15
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: ----
Pri: 1
Type: Bug
Proj-Servicification

Blocking:
issue 904240
issue 818593



Sign in to add a comment

ServiceManager should pass GUID per service instance to ServiceManagerListener for disambiguation

Project Member Reported by chromium...@appspot.gserviceaccount.com, May 5 2018

Issue description

"NetworkServiceRestartBrowserTest.BrowserIOFactoryInfo" is flaky.

This issue was created automatically by the chromium-try-flakes app. Please find the right owner to fix the respective test/step and assign this issue to them. If the step/test is infrastructure-related, please add Infra-Troopers label and change issue status to Untriaged. When done, please remove the issue from Sheriff Bug Queue by removing the Sheriff-Chromium label.

We have detected 5 recent flakes. List of all flakes can be found at https://chromium-try-flakes.appspot.com/all_flake_occurrences?key=ahVzfmNocm9taXVtLXRyeS1mbGFrZXNyQAsSBUZsYWtlIjVOZXR3b3JrU2VydmljZVJlc3RhcnRCcm93c2VyVGVzdC5Ccm93c2VySU9GYWN0b3J5SW5mbww.

Flaky tests should be disabled within 30 minutes unless culprit CL is found and reverted. Please see more details here: https://sites.google.com/a/chromium.org/dev/developers/tree-sheriffs/sheriffing-bug-queues#triaging-auto-filed-flakiness-bugs
 
Labels: -Sheriff-Chromium
Owner: chongz@chromium.org
Status: Assigned (was: Untriaged)
This test was recently added at https://chromium-review.googlesource.com/c/chromium/src/+/1033806.

chongz@, could you take a look?

Failure log from https://luci-milo.appspot.com/buildbot/tryserver.chromium.win/win7_chromium_rel_ng/161088:

[ RUN      ] NetworkServiceRestartBrowserTest.BrowserIOFactoryInfo
DevTools listening on ws://127.0.0.1:51984/devtools/browser/db97a1e2-ddcb-4894-81ee-e30fbcd072e8
[4188:4408:0505/072355.503:FATAL:process_map.cc(60)] Check failed: it_and_inserted.second.
Backtrace:
	base::debug::StackTrace::StackTrace [0x02D2F970+32]
	base::debug::StackTrace::StackTrace [0x02D29A2D+13]
	logging::LogMessage::~LogMessage [0x02CBB513+83]
	memory_instrumentation::ProcessMap::OnServicePIDReceived [0x01F9A806+102]
	service_manager::mojom::ServiceManagerListenerStubDispatch::Accept [0x0314458D+1201]
	service_manager::mojom::ServiceManagerListenerStub<mojo::RawPtrImplRefTraits<service_manager::mojom::ServiceManagerListener> >::Accept [0x00FC8843+19]
	mojo::InterfaceEndpointClient::HandleValidatedMessage [0x02AC0C85+541]
	mojo::FilterChain::Accept [0x02AC7743+131]
	mojo::InterfaceEndpointClient::HandleIncomingMessage [0x02AC1B32+106]
	mojo::internal::MultiplexRouter::ProcessIncomingMessage [0x02ABE370+698]
	mojo::internal::MultiplexRouter::Accept [0x02ABDEF7+295]
	mojo::FilterChain::Accept [0x02AC7743+131]
	mojo::Connector::ReadSingleMessage [0x02AC581A+364]
	mojo::Connector::ReadAllAvailableMessages [0x02AC5F19+87]
	mojo::Connector::OnHandleReadyInternal [0x02AC5DE4+126]
	base::internal::Invoker<base::internal::BindState<void (__thiscall content::RedirectToFileResourceHandler::Writer::*)(int),base::internal::UnretainedWrapper<content::RedirectToFileResourceHandler::Writer> >,void __cdecl(int)>::RunOnce [0x01E1F97F+15]
	mojo::SimpleWatcher::DiscardReadyState [0x012230D8+24]
	base::internal::Invoker<base::internal::BindState<void (__cdecl*)(std::basic_string<char,std::char_traits<char>,std::allocator<char> > const &,std::vector<std::basic_string<char,std::char_traits<char>,std::allocator<char> >,std::allocator<std::basic_strin [0x01DB1A53+19]
	mojo::SimpleWatcher::OnHandleReady [0x02D90270+224]
	base::internal::Invoker<base::internal::BindState<void (__thiscall mojo::SimpleWatcher::*)(int,unsigned int,mojo::HandleSignalsState const &),base::WeakPtr<mojo::SimpleWatcher>,int,unsigned int,mojo::HandleSignalsState>,void __cdecl(void)>::Run [0x02D904EA+58]
	base::debug::TaskAnnotator::RunTask [0x02D4D654+308]
	base::internal::IncomingTaskQueue::RunTask [0x02D716B9+105]
	base::MessageLoop::RunTask [0x02CEEA77+519]
	base::MessageLoop::DeferOrRunPendingTask [0x02CEEDCD+157]
	base::MessageLoop::DoWork [0x02CEEFFA+506]
	base::MessagePumpForIO::DoRunLoop [0x02D2BDC5+133]
	base::MessagePumpWin::Run [0x02D2B0BE+110]
	base::MessageLoop::Run [0x02CEE594+116]
	base::RunLoop::Run [0x02CD668C+204]
	base::Thread::Run [0x02D1C884+164]
	content::BrowserProcessSubThread::IOThreadRun [0x01AE9844+36]
	content::BrowserProcessSubThread::Run [0x01AE97C8+168]
	base::Thread::ThreadMain [0x02D1CB17+631]
	base::PlatformThread::SetCurrentThreadPriority [0x02CDCEE5+533]
	BaseThreadInitThunk [0x7714336A+18]
	RtlInitializeExceptionChain [0x77859902+99]
	RtlInitializeExceptionChain [0x778598D5+54]

Project Member

Comment 2 by chromium...@appspot.gserviceaccount.com, May 7 2018

Labels: Sheriff-Chromium
Detected 5 new flakes for test/step "NetworkServiceRestartBrowserTest.BrowserIOFactoryInfo". To see the actual flakes, please visit https://chromium-try-flakes.appspot.com/all_flake_occurrences?key=ahVzfmNocm9taXVtLXRyeS1mbGFrZXNyQAsSBUZsYWtlIjVOZXR3b3JrU2VydmljZVJlc3RhcnRCcm93c2VyVGVzdC5Ccm93c2VySU9GYWN0b3J5SW5mbww. This message was posted automatically by the chromium-try-flakes app. Since flakiness is ongoing, the issue was moved back into Sheriff Bug Queue (unless already there).
Labels: -Sheriff-Chromium
Disabling test on Windows and removing from Sheriff queue.
Will let chongz@ decide if/when to re-enable the test.
Project Member

Comment 4 by bugdroid1@chromium.org, May 7 2018

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/9cbbe3833f83d4206f92ae7cca1d2e42acea3dc8

commit 9cbbe3833f83d4206f92ae7cca1d2e42acea3dc8
Author: Guido Urdaneta <guidou@chromium.org>
Date: Mon May 07 14:51:06 2018

Disable flaky NetworkServiceRestartBrowserTest.BrowserIOFactoryInfo on Windows

TBR=chongz@chromium.org

Bug:  840127 
Change-Id: Idb63b21d4b1ea0c6b0eca07f6bcec0f7871dffbd
Reviewed-on: https://chromium-review.googlesource.com/1046659
Reviewed-by: Guido Urdaneta <guidou@chromium.org>
Commit-Queue: Guido Urdaneta <guidou@chromium.org>
Cr-Commit-Position: refs/heads/master@{#556436}
[modify] https://crrev.com/9cbbe3833f83d4206f92ae7cca1d2e42acea3dc8/content/browser/network_service_restart_browsertest.cc

Status: Started (was: Assigned)
Blockedon: 818593
Cc: primiano@chromium.org hjd@chromium.org
Labels: -Pri-1 OS-Windows Pri-2
Lowering priority since it's test only and has been disabled (potential 'process_map.cc' issue is long standing  issue 818593 ).

+hjd@, primiano@ - Owners of '//src/services/resource_coordinator/memory_instrumentation/process_map.cc':
My test intentionally crashes Network Service process and restarts it, and it seems that |ProcessMap| could receive new PID before the old one was removed.
Can I have some help on what might be happening? Thanks!

Comment 7 by hjd@chromium.org, May 8 2018

I'm afraid I'm OOO for the next two weeks so I might be slow to respond.

This sounds like it is to do with the semantics of the ServiceManager. The ServiceManager (apparently) doesn't guarantee you see OnServiceStopped for the crashing service before you see OnServicePIDReceived for the restarting service and so we end up trying to emplace into instances_ before we've erased the old entry.

Given this there's two paths forward:

a) Change ServiceManager to have the guarantee.
b) Change ProcessMap to cope without the guarantee.

Does that seem right?

If so:
For a): I don't know much about the ServiceManager code so I can't comment on how difficult/easy a) would be.
For b): the ProcessMap code is much simpler but it's going to be pretty hard to write something that does the right thing in all cases (in most cases OnServiceStopped should clear the map but when OnServiceStopped comes after the OnServicePIDReceived for the restarted service you have to *not* clear the map and so you have to be able to tell the restarted service identity from the old service identity etc).

Sorry to not be much help! People familiar with ServiceManager have better information.

Cc: roc...@chromium.org
Re #c7: Thanks for the detailed analysis, that sounds reasonable!

+rockot@:
My feeling is that we should do a), but do you have any other comments before I dig into it?
Thanks!


Comment 9 by roc...@chromium.org, May 14 2018

(a) would be the wrong thing to do. It's important to note that a service instance can become unreachable (i.e. new connections won't be routed to it) before it's stopped, so what you're observing is an entirely legitimate order of events.

There are two good options I can think of to make ServiceManagerListener more useful in light of this ambiguity:

1. Add an OnServiceUnreachable event which ProcessMap can use instead. This is simple, but has the drawback that it wouldn't accurately track process lifetime. i.e. an unreachable service process may continue running for an arbitrary period of time (or even indefinitely, if the service is behaving badly).

2. Generate a GUID for each created service instance which can be passed to all events on ServiceManagerListener for disambiguation, since Identity alone cannot be sufficient.

I think the second option is the right way to go.
Components: Internals>Services>ServiceManager
Labels: -OS-Windows
Summary: ServiceManager should pass GUID per service instance to ServiceManagerListener for disambiguation (was: "NetworkServiceRestartBrowserTest.BrowserIOFactoryInfo" is flaky)
Thanks for the detailed explanation! That makes sense.

I'll implement the second option probably in a week or two.

(Changing OS to ALL since it could potentially affect other OS as well)
Status: Assigned (was: Started)
[chongz]:   Issue 876179  is possible a duplicate of this.  Any progress in the last two months?
Cc: chongz@chromium.org
Owner: ----
Status: Available (was: Assigned)
Apologize for the delay - I was working on other stuff and probably won't have time for this in the near future.

Marked as available so others could pick up.

Components: Internals>Services>Network
Adding the services>network label - I think we want this fixed before we launch the network service on beta or stable, at least.
Cc: mmenke@chromium.org ssid@chromium.org erikc...@chromium.org
 Issue 876179  has been merged into this issue.
Labels: Hotlist-KnownIssue
Labels: Sheriff-Chromium
Detected 3 new flakes for test/step "InMemoryApp/NetworkContextConfigurationHttpsStrippingPacBrowserTest.PacHttpsUrlStripping/0". To see the actual flakes, please visit https://chromium-try-flakes.appspot.com/all_flake_occurrences?key=ahVzfmNocm9taXVtLXRyeS1mbGFrZXNyZQsSBUZsYWtlIlpJbk1lbW9yeUFwcC9OZXR3b3JrQ29udGV4dENvbmZpZ3VyYXRpb25IdHRwc1N0cmlwcGluZ1BhY0Jyb3dzZXJUZXN0LlBhY0h0dHBzVXJsU3RyaXBwaW5nLzAM. This message was posted automatically by the chromium-try-flakes app. Since flakiness is ongoing, the issue was moved back into Sheriff Bug Queue (unless already there).
 Issue 878076  has been merged into this issue.
Labels: -Pri-2 Pri-1
Owner: hjd@chromium.org
Status: Assigned (was: Available)
Some flaky tests are being root-caused into this issue.
 - InMemoryApp/NetworkContextConfigurationHttpsStrippingPacBrowserTest.PacHttpsUrlStripping/0
 - OnDiskApp/NetworkContextConfigurationHttpsStrippingPacBrowserTest.PacHttpsUrlStripping/0

They don't appear to be disabled yet, but I'm disabling them to meet the "Flaky tests should be disabled within 30 minutes" SLA.

assigning an owner based on  Issue 818593 
Labels: -Sheriff-Chromium
Project Member

Comment 21 by bugdroid1@chromium.org, Aug 28

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/c4bed4374c9793fadf2b6b9686668cb0fe4d3b28

commit c4bed4374c9793fadf2b6b9686668cb0fe4d3b28
Author: Trent Apted <tapted@chromium.org>
Date: Tue Aug 28 04:31:41 2018

Disable flaky */NetworkContextConfigurationHttpsStrippingPacBrowserTest.PacHttpsUrlStripping

TBR=hjd@chromium.org

Bug:  840127 
Change-Id: Iaa027caeac28a2989ea382bde5a6a44756294bec
Reviewed-on: https://chromium-review.googlesource.com/1192369
Reviewed-by: Trent Apted <tapted@chromium.org>
Commit-Queue: Trent Apted <tapted@chromium.org>
Cr-Commit-Position: refs/heads/master@{#586580}
[modify] https://crrev.com/c4bed4374c9793fadf2b6b9686668cb0fe4d3b28/chrome/browser/net/network_context_configuration_browsertest.cc

Right now Identiy has name, user_id, and instance_id fields. The instance_id may be used by clients to select different instances, e.g. for partitioning service requests into separate processes. So if you want a Blink renderer instance but you want to ensure you aren't reusing the one used for some other origin, you might generate a unique instance_id and use that when connecting.

Because all this instance and process management stuff is inherently racy to some extent, that Identity structure alone is not enough for global disambiguation, as discussed earlier in the thread.

Rather than modifying ServiceManagerListener to hack around this, my proposal now would be to revamp Identity as such:

- name and user_id stay the same
- what we call "instance_id" becomes "partition", and remains a way for sufficiently privileged clients to select isolated service instances
- a 4th field (repurposing the name "instance_id") is added to Identity. It is an error to set this field in any Identity used to call BindInterface and is effectively a "read-only" field, always generated and provided by the Service Manager. It's basically just a GUID generated when the instance is created.

Then ServiceManagerListener has sufficient information to disambiguate one instance from another even over time.

There would still be a problem of the reverse mapping, i.e. the OS may re-use a PID and we have no real way to guarantee that notification about a new process using $PID isn't received before notification about the death of an old process that was using $PID. But I don't think this is a concern in practice?

Blocking: 818593
Blockedon: -818593
Cc: -roc...@chromium.org
Owner: roc...@chromium.org
I'm gonna take this.
Owner: rockot@google.com
Project Member

Comment 27 by bugdroid1@chromium.org, Nov 9

The following revision refers to this bug:
  https://chromium.googlesource.com/chromium/src.git/+/1060490ecae01406391a04a8109e03f5c2ddfb00

commit 1060490ecae01406391a04a8109e03f5c2ddfb00
Author: Ken Rockot <rockot@google.com>
Date: Fri Nov 09 20:52:08 2018

[service-manager] Add globally unique ID to Identity

Adds a globally_unique_id field to each service's Identity, generated
and assigned by the Service Manager when starting any new service
instance.

This ensures that every running service instance has an Identity that
is globally unique across space and time.

TBR=jam@chromium.org

Bug:  840127 , 895591 
Change-Id: If32863b7d8562e3a2cd9625d8eed2db8e83c67cb
Reviewed-on: https://chromium-review.googlesource.com/c/1318806
Commit-Queue: Ken Rockot <rockot@google.com>
Reviewed-by: John Abd-El-Malek <jam@chromium.org>
Reviewed-by: Tom Sepez <tsepez@chromium.org>
Reviewed-by: Oksana Zhuravlova <oksamyt@chromium.org>
Cr-Commit-Position: refs/heads/master@{#606960}
[modify] https://crrev.com/1060490ecae01406391a04a8109e03f5c2ddfb00/content/browser/renderer_host/render_process_host_impl.cc
[modify] https://crrev.com/1060490ecae01406391a04a8109e03f5c2ddfb00/content/public/common/services.md
[modify] https://crrev.com/1060490ecae01406391a04a8109e03f5c2ddfb00/services/service_manager/README.md
[modify] https://crrev.com/1060490ecae01406391a04a8109e03f5c2ddfb00/services/service_manager/public/cpp/identity.cc
[modify] https://crrev.com/1060490ecae01406391a04a8109e03f5c2ddfb00/services/service_manager/public/cpp/identity.h
[modify] https://crrev.com/1060490ecae01406391a04a8109e03f5c2ddfb00/services/service_manager/public/cpp/identity_mojom_traits.h
[modify] https://crrev.com/1060490ecae01406391a04a8109e03f5c2ddfb00/services/service_manager/public/cpp/service_binding.h
[modify] https://crrev.com/1060490ecae01406391a04a8109e03f5c2ddfb00/services/service_manager/public/mojom/BUILD.gn
[modify] https://crrev.com/1060490ecae01406391a04a8109e03f5c2ddfb00/services/service_manager/public/mojom/connector.mojom
[modify] https://crrev.com/1060490ecae01406391a04a8109e03f5c2ddfb00/services/service_manager/service_manager.cc
[modify] https://crrev.com/1060490ecae01406391a04a8109e03f5c2ddfb00/services/service_manager/service_manifests.md
[modify] https://crrev.com/1060490ecae01406391a04a8109e03f5c2ddfb00/services/service_manager/tests/connect/BUILD.gn
[modify] https://crrev.com/1060490ecae01406391a04a8109e03f5c2ddfb00/services/service_manager/tests/connect/connect_unittest.cc
[modify] https://crrev.com/1060490ecae01406391a04a8109e03f5c2ddfb00/services/service_manager/tests/service_manager/BUILD.gn
[add] https://crrev.com/1060490ecae01406391a04a8109e03f5c2ddfb00/services/service_manager/tests/service_manager/service_manager_listener_unittest.cc

Status: Fixed (was: Assigned)

Comment 29 by rockot@google.com, Jan 17 (5 days ago)

Blocking: 904240

Sign in to add a comment