[remoting host] First P2P relay connection fails |
|||
Issue descriptionWhat steps will reproduce the problem? (1) Start host process (2) Connect (Me2Me) using web-site with ?mods=force_relay_connection What is the expected result? Should connect successfully first time. What happens instead? The first connection attempt fails, very soon after PIN authentication (usually less than 1s, well within the 15s ICE timeout). Subsequent connection attempts are successful. Please use labels and text to provide additional information. Seen on Linux, connecting to a ToT host build, from same machine. It's consistently reproducible.
,
Sep 11
Looks like the problem is caused by this error from the TURN server, seen at the client (Chrome) end: ERROR:turnport.cc(1754)] Received TURN CreatePermission error response, code=403; pruned connection.
,
Sep 18
The problem is not reproducible when doing local development of the web site (maybe TestGAIA is the reason?). In the host, I tried adding a 10-second delay before signaling relay candidates (so all the other gathered candidates are signaled immediately, but the relay candidates are sent after 10-second timer). This had no effect. The failure case still failed immediately - the PeerConnection failed within the 10s delay. The successful case connected immediately. A relayed connection was established within the 10s window, before the relay candidates were sent from host to client. In the client, the selected candidate-pair was peerreflexive/relay, but after 10s this changed to relay/relay once the remote candidate was received (which became the selected remote candidate on the client).
,
Sep 20
It looks like there's a timing issue in the ICE stack in Chrome? In the host process, I tried delaying signaling all non-"host" candidates. That means: srflx and relay candidates were signaled to the client after 10s delay. This experiment caused the failure to happen consistently. Recall that the host and client are on the same machine, and the client (website) was applying config['iceTransportPolicy'] = 'relay'. So: The client receives typ=host candidates from the host. The client gathers only relay candidates (no host/srflx). The client goes into ICE "checking" state. At this point, the ICE stack applies a short timeout (< 100ms) before failing. Client is waiting for the host to signal srflx/relay candidates, which are delayed by 10s. The short timeout on waiting for more remote candidates seems to cause the failure. It's not clear what the actual timeout is here. The time between ICE checking and ICE failure seems to vary between 10ms and 80ms. So it looks like the trigger for the failure is: too much time between the host signaling typ=host and typ=srflx candidates. If I tweak the host to batch the typ=host and typ=srflx candidates into a single message, the connection always succeeds. This can be done by increasing this value from 20ms to 100ms, but it's a crude hack: https://cs.chromium.org/search/?q=kTransportInfoSendDelayMs+file:webrtc_transport.cc Also, I haven't found a way to trigger the problem without the force_relay_connection mod. It's not clear if this problem could account for increased P2P failures in the wild?
,
Jan 17
(6 days ago)
This issue only affected force-relay connections, because of a weird timing issue when host generated the full set of ICE candidates, but client-side list was pruned to relay-only candidates. Also, the issue disappeared since we pushed a server-side fix for the ICE config. |
|||
►
Sign in to add a comment |
|||
Comment 1 by lambroslambrou@chromium.org
, Aug 25