New issue
Advanced search Search tips
Note: Color blocks (like or ) mean that a user may not be available. Tooltip shows the reason.

Issue 672952 link

Starred by 3 users

Issue metadata

Status: Assigned
Owner:
Last visit > 30 days ago
Cc:
Components:
EstimatedDays: ----
NextAction: ----
OS: Android
Pri: 2
Type: Bug



Sign in to add a comment

Error pages are confusing and not very helpful

Project Member Reported by mdw@chromium.org, Dec 9 2016

Issue description

Application Version (from "Chrome Settings > About Chrome"): 56.0.2924.18
Android Build Number (from "Android Settings > About Phone/Tablet"): LXI22.50-62
Device: Moto E

Steps to reproduce:

1) Put the device on a slow network (e.g., GIN-2gpoor)
2) Try to load any page.
3) Wait for 30-60 seconds.
4) See a network error page, e.g., DNS_PROBE_FINISHED_NXDOMAIN or ERR_CONNECTION_TIMED_OUT)

Observed behavior: 

The typical network error page is shown (screenshot of one example attached).

Overall, the network error pages are quite unfriendly to the user:
- They use overly technical language ("aws1.mdw.la's server DNS address could not be found"). 
- They don't provide the user with any hints about what might be causing the problem.
- They don't suggest how the user could fix the problem.
- And, in this case, the error is wrong -- while technically this is a DNS "error", it's really caused by the device being on a very slow network.

Expected behavior:

On the assumption that many users in emerging markets see these error pages often, we need to spend some time making them more user friendly. Some suggestions:

- Redesign the page with a more user-friendly look and feel.

- Provide more helpful context or advice rather than using technical jargon.

- Provide more information on what might fix the issue (not a link to a Help Center website, though, which is unlikely to load).

- Recognize when an error is likely because the network connection speed is slow, and provide guidance accordingly. A user on a slow network doesn't care if it's a DNS timeout or a connection timeout or anything else; they need to be given choices (e.g., load page later, show saved offline content, try again, etc.)


See internal doc for more details on experiment:
https://docs.google.com/a/google.com/document/d/19ctmz25Dd4FsCUeFJeaIhWGBTENcVo5_KD626fRE4DM/edit?usp=sharing
 
Screenshot_2016-12-08-11-01-35.png
41.5 KB View Download
Cc: juliatut...@chromium.org rachelis@chromium.org
julia: Is this the expected error message? We did work for DNS errors with probe jobs to disambiguate whether the problem is an invalid host name (with suggestions for alternates in that case), bad DNS server, or local network connection but weak internet connection. But this looks a bit rough.
This is the expected behavior of the probe system -- that is, it thinks it is NXDOMAIN, so it shows the error text for a nonexistent domain.

The error text is janky (there's no such thing as a "DNS address", and many users wouldn't know it anyway), but that's not the probe system misbehaving.
Cc: edwardjung@chromium.org
Rachel and I are all for reducing jargon and making network errors work better for users. 

re #2: Julia, what would you suggest for the better wording for 'DNS address', just 'address'? 
Can we determine this was caused by a slow network? If it was, what would really be useful suggestions beyond reloading? Moving to a place with a better signal, connecting to wifi? If a user is on 2G then it will be slow and the user might be able to control their signal strength if for example they are on a bus. 



Comment 5 by mdw@chromium.org, Dec 13 2016

Status: Assigned (was: Untriaged)
Hi, I'd like to get this bug assigned to someone and on a target milestone. Chris, for sake of making sure there's an owner, you get the hot potato for now, but please feel free to delegate.

I think there's two issues here, which may need to be assigned to different people.

A) the probe seems to think that the domain name does not exist but I believe it actually does. There may be some edge cases there where we need to be able to separate out the root cause. This would go more to Julia.

B) the error page could be improved from jargon perspective. That feels more like a Ux issue.

The nxdomain case is intended for cases like navigating o a name that doesn't exist (say HTTPS://noname.bentzel.net). In that case we _should_ show link doctor suggestions for alternate URLs.

Cc: -juliatut...@chromium.org cbentzel@chromium.org
Labels: M-58
Owner: juliatut...@chromium.org
Here's a screenshot I expect when the domain name itself is wrong - see that there are suggestions for fixed domain as well as what to look at.

So here I want to start not so much with the text of the error page, but why the probe is ending up with the result it does.
Screenshot_2016-12-14-13-10-03 (1).png
104 KB View Download

Comment 8 by rachelis@google.com, Jan 11 2017

Cc: srahim@chromium.org
Thank guys! 

For context, Ed and I did a pass on all net error strings over a year ago, and had conversations with engineers about what was causing them and what actions users could take to resolve the problem. :)  We continue to be very supportive of making net error pages easier to understand for EM users, since they see many more of them! 


The biggest opportunity I see here is that we are showing a DNS error when in reality we should be showing a connection timed out error. I'm unsure as to whether that's the problem being incorrectly identified, or if the DNS error is a more specific version of the connection timed out error. It seems like the former, since this isn't a DNS timed out error. I've attached an image of how the connection timed out error looks for reference.

Are there potentially other net errors which should be showing a timed-out error but are actually showing something else? 

Are there actions other than those we've listed in the timed out error which the user can take?

Also, +Shimi just FYI. She may not have time to help out with this now, but it seems like a good one for her to keep in mind as she starts thinking more broadly about strings in Chrome.
Screen Shot 2017-01-11 at 2.34.19 PM.png
43.3 KB View Download

Comment 9 by rachelis@google.com, Jan 11 2017

Oh, and this is the error shown when the internet is disconnected. Also for reference.
Screen Shot 2017-01-11 at 2.50.11 PM.png
38.3 KB View Download

Comment 10 by mdw@chromium.org, Jan 11 2017

Rachel - apart from just massaging the strings, how do you feel about the overall look and feel of the error pages?

Given the number of errors that our users see in EM, I am wondering if redesigning these pages to use less technical language, richer (and more intuitive) iconography, and generally be more visually appealing wouldn't be worthwhile.

On the technical side, it seems to me that Chrome should know if you have a particularly slow or lossy network connection and tune the error messages accordingly. My understanding is that this error is shown due to an error in a single main-frame request, but does not necessarily take into account the overall network conditions the user is operating under. If Chrome believes the user is on a slow network, how it conveys this error to the user may be different (not "site cannot be reached" but maybe "poor network quality" with options to try again later, etc.).

//Rachel - apart from just massaging the strings, how do you feel about the overall look and feel of the error pages?

I'd be open to changing them. :) I am conscious of course that the dino has a fan following of it's own. Here's an exploration from a while back -
 https://docs.google.com/presentation/d/1jrXKzfOfQmNVG4PaHq6NXULfkALo-ANRNBcDsM--JV8/edit#slide=id.g1353cc3b16_0_47

The other caveat is that more images = larger binary. :)


//Given the number of errors that our users see in EM, I am wondering if redesigning these pages to use less technical language, richer (and more intuitive) iconography, and generally be more visually appealing wouldn't be worthwhile.

This was exactly the goal of the audit (minus richer iconography). I'd be happy to do another pass and iterate. If we did, it would be valuable to identify principles (and have your / high level) buy in for them. In doing this work, it was challenging to balance the need for technical correctness with the users' needs. Would we ever consider a balance which weights user needs significantly more heavily?

The other challenge was that we heard disagreement between engineers about what the errors actually were and what their implications were for users. 

// On the technical side, it seems to me that Chrome should know if you have a particularly slow or lossy network connection and tune the error messages accordingly. My understanding is that this error is shown due to an error in a single main-frame request, but does not necessarily take into account the overall network conditions the user is operating under. If Chrome believes the user is on a slow network, how it conveys this error to the user may be different (not "site cannot be reached" but maybe "poor network quality" with options to try again later, etc.).

Big +1. :)

I assume that there are multiple errors that are likely to be caused by a slow network and not just this one, correct?
Hey folks, 
Pinging this piece up. I think the open question is about whether we're able to identify when a slow connection is the higher-level cause of the error (in more than just this case). Is that already clear? 

Comment 14 by mdw@chromium.org, Feb 15 2017

Hey Rachel, thanks for the ping. We can certainly look into surfacing signals on network quality / speed / loss to help with the error messages, though that may be a fair bit of engineering work to do. In the meantime how do you feel about taking a pass on the overall look and feel of the error messages we are presenting to users?

As far as balancing ease-of-use with technical details, my thinking would be that the overall error message should be pretty user-friendly, and if we need to convey additional technical details (I am not sure that "DNS_PROBE_FINISHED_NXDOMAIN" is at quite the right level, though) we could do that either in smaller text or below a "More details" expando or something. I'm not a UI expert by any stretch, but I do think that throwing this amount of confusing technical jargon at a user is not that useful.

Chris - I think we should probably fork this bug (which is about the UI itself) and have a separate bug tracking the DNS resolution error problem (comments #6 and #7) - what do you think?

thanks.

Happy to fork the underlying DNS issue to a separate bug and keep this UI focused.

Note that if there really is a bad domain name, that the screenshot should like #7 with a suggested domain name to use. It's possible that could get some UI love to more clearly show the suggestion. But I'm not an expert on that. 

Comment 16 by mdw@chromium.org, Feb 15 2017

Cc: juliatut...@chromium.org
Owner: rachelis@chromium.org
OK, forked off crbug.com/692786 for the DNS probe issue.

Reassigning this one to Rachel as it pertains to the UI.
Cc: aposner@chromium.org
Thanks Matt!

//In the meantime how do you feel about taking a pass on the overall look and feel of the error messages we are presenting to users?

I'm definitely open to this. The stack is pretty.. fully stacked this quarter, so it's something we can add to the backlog. 

Some context for making this UX work a reality:
- Edward and I did a pass on the 60+ net errors about a year ago to make the language simpler and more actionable. There are a lot of different possible errors with different possible user actions, and understanding each case was necessary to write the strings. It was a pretty substantial UX project, and all the strings are the product of negotiations with developers to balance technical and user needs. :) However, one exciting change might be if our balance of values has changed, allowing us to focus more on the user. If so, we may be able to make more substantial improvements.
https://docs.google.com/presentation/d/18bmk0A1BS3GEo6iRXophmpquUzrpMgA8EgFTo8SS__I/edit#slide=id.p

- I  have explored the idea of simplifying the UX pretty lightly, here: https://docs.google.com/presentation/d/1jrXKzfOfQmNVG4PaHq6NXULfkALo-ANRNBcDsM--JV8/edit#slide=id.g1353cc3b16_0_47

The question / complexity comes from figuring out how/if this change to dramatically simplify things applies to all of the net errors. Would we remove net error codes (which developers rely on for troubleshooting)? How much context would we remove for users who may be able to solve some problems themselves? 

Or - are we just looking for a more visual tweak? We could make changes to colors, typography, and illustration style - though I'm sure that the t-rex (and game) would be controversial to change. :) 

One more small question - is there someone available to implement these changes if we make them? Edward was responsible for implementing the initial pass, though it may be more kind to his schedule to find someone who can support him this time around.

+ariel who is thinking about net errors :)

Comment 18 by mdw@chromium.org, Feb 16 2017

Thanks Rachel. Someone from my team can do the actual implementation once we have a design.

I guess we need to decide how to prioritize this work. I'll get some data on how often users see these error pages which will help.

What I had in mind was something very much like your proposed refresh of the visuals, without necessarily going through the strings again. I think we want to separate out the high level information that most users will understand from the low-level details that developers/network admins/etc. care about.

We should keep the dino game of course!!
I believe Ariel and Edward may have already dug into this data recently (?) :)

//What I had in mind was something very much like your proposed refresh of the visuals, without necessarily going through the strings again. I think we want to separate out the high level information that most users will understand from the low-level details that developers/network admins/etc. care about.

Shall we find some time to meet so we can contextualize and brainstorm a bit?

// We should keep the dino game of course!!

Much relieved. :D
My two cents here...

Let's please not hide the error codes in an effort to simplify things.

There are a very limited class of errors which are actionable and meaningful to less-technical users. Perhaps the only meaningful errors are:

 * "Your internet is not working/working poorly"
 * "This site is broken / or you are being p0wnd"
 * "This site doesn't exist"

Beyond that, the myriad of possible problems and possible remedies is often not something that users are empowered to fix on their own without a deeper technical understanding of their environment.

* Do they need to change their DNS resolver?
* Do they need to switch to a different WiFi network?
* Do they need to configure a proxy for this environment? If so what is the address or PAC script?
* Is their router acting up? Maybe it needs to be restarted?
* Is there a third party LSP messing up the connection?
* Is someone really trying to MITM their bank ... or did the server just deploy a bad config?
* Is their antivirus product buggy and messing up ages? Are they even using one? If so how to disable its DPI.
* Is the connection being reset because the server is misbehaving or because they are not using the right proxy?
* Is one of their Chrome extensions tamping with headers and messing things up?

etc. These are just some examples, and not necessarily ones that can be translated into connectivity/probe tests.

As usage of Chrome has shifted from being a secondary browser on the system to the primary one, users are increasingly experiencing network error pages not in response to something bugged in Chrome, but something bugged in their networking setup/environment.

Chrome's error pages then are also fulfilling a as diagnostics for system networking. This is a hard and unsolved problem, which even Operating systems haven't figured out. There are network diagnostic tools on all the major platforms, and they all suck. Fixing network issues remains a dark art, where regular users generally need assistance from an admin with knowledge of the network setup. The information at the layer of abstraction where we encounter the error is generally quite poor, and more steps are needed to make additional inferences.

In fact Chrome used to have a more general network diagnostic tool in the past (net-internals#tests), but it didn't prove any more useful than the crappy system ones. Maybe with a substantial investment one could build out a more useful diagnostic tool.

Given all of this, if something went wrong outside of the limited class of errors users can understand and resolve on their own, then either the user will need to reach out for help, or they will need to put on a technical hat and try to resolve it themselves.

Simplifying the detailed view by obscuring error codes or making the technical description less accurate, doesn't serve either of these two constituencies.

My experience has been that the single most valuable piece of information is the (non-internationalized) error code. This serves as a launching point to search or ask others for help. A web search by error code will often yield forum postings or bug reports that help in solving (maybe listing correlations between other software or setups that we didn't list in the error page). The compact format of an error code is also handy for manually typing on another computer when you have no connectivity from the borked computer.

I would argue that making the error code more prominent on the page is a better change we could make. Maybe even auto-linking it to a search in case they have some connectivity.

On a technical side, doing more special casing for the "you have a bad/broken connection to the internet" sounds great and I am all for that. Calling out poor quality network connections if we have the ability to do that certainly may be helpful. Special casing checks for problems that we could identify clearly were useful in the past (for instance (for instance calling out specific misbehaving AV that were breaking TLS led to actual resolutions)

....But for the general case of ambiguous network problems I am very skeptical about the impact of simply re-wording things and giving it a fresher face (without expending more effort to do probes or additional classification from our code).

Ultimately we aren't solving the root problem that users have no idea what went wrong, and the best we can offer them is a checklist of things that *might* be wrong.

Comment 21 by mdw@chromium.org, Feb 16 2017

Thanks Eric - that is useful feedback. As we iterate on this we need to ensure there's a way for users to reach out and get help if they aren't able to diagnose on their own. My claim is that the vast majority of errors seen by mobile users in emerging markets are not any of the class of "hard to diagnose" issues though; they come down to slow or flaky networks. So, I think we should strive to make the common case more user friendly while not degrading the value of the errors for less common cases.
Re: comment #18
> I guess we need to decide how to prioritize this work. I'll get some data on how often users see these error pages which will help.

The timeline I used for Android in India - https://uma.googleplex.com/p/chrome/timeline_v2/?sid=f1a2923bd0fae146edff28f14b9c584f

You may want to use 7 day aggregation. 

Internet disconnected and name not resolved are by far the most shown errors shown from ~2% of page loads and accounting for ~50% of all the net errors shown. Internet disconnected is trending upwards in the last couple of months. These two errors are the top two in every country.

Aside, but related, the errors are dwarfed by the number of aborts and could be related to the connectivity issues.

I believe I shouldn't quote exact stats in a open bug so put metrics in a sheet with the UMA opt in multiplier applied:

https://docs.google.com/a/google.com/spreadsheets/d/1zy9EOSvTRfA1TIjsaEbBqxpaM70MiiVbRXctdLUvavM/edit?usp=sharing

We show a lot of errors each day!

@eroman, what you say makes a lot of sense. 


Thanks for considering my comments and the extra data!

I wonder if the high occurrence of "name not resolved" isn't just a misclassification of "Internet disconnected"  (or a more general "your internet is broken") ?
> I wonder if the high occurrence of "name not resolved" isn't just a misclassification of "Internet disconnected"  (or a more general "your internet is broken") ?

Would we misclassify these errors? 'Name not resolved' seems more specific, and a connection was made to the ISP but the DNS lookup failed. If the connection was suddenly disconnected, that should return 'Internet disconnected', right? 
Name resolution is often (possibly "usually") done through OS syscalls--it needs to include (e.g.) host file lookup and similar things.  Even if we're connecting to a name server, it could easily be a local one or one running on a local router that doesn't have connectivity.  So I think we can often misclassify "Internet disconnected" into other errors. 

It might be worthwhile (though scary and possibly flaky) to include logic at the error handling stage to figure out if various errors "really" mean internet disconnected.  But I'd be worried about flaky connections confusing things, as well as having a reliable signal for whether the internet is connected or not (e.g. enterprises might block google connections).

#25: We actually do some of those. Julia added probing to Chrome at ERR_NAME_NOT_RESOLVED to help disambiguate if the root cause is:

  - Have a local network connection but no internet connection
  - Have an internet connection but a broken DNS server
  - The actual domain name is invalid

We then swap out the error page to something more specific.

---

So the concern here is that the probe logic may not be able to diagnose well on certain networks.

We did this probing for this specific error code because it is both one of the most common ones, as well as has a number of root causes - some of which the user may be able to fix (such as clicking on a suggested domain name when there is a typo).
My intuition is that legitimate ERR_NAME_NOT_RESOLVED should be rare on top level frames.

A legitimate ERR_NAME_NOT_RESOLVED would be the result of navigating to an incorrect URL (either a user error in pasting a URL, or a blatant website error) -- which from my own browsing patterns practically never happens.

Whereas connectivity issues like not using the right proxies, or flaky DNS may contribute ERR_NAME_NOT_RESOLVED.

Which is why I am thinking-out-loud whether some of these may be symptomatic of connectivity problems that are not caught by our probes.
I have legitimate ones happen occasionally - just today I clicked on a link
that led to a site which no longer existed (I assume - NXDOMAIN came back).
But I agree that if we had to choose based on no probes, max likelihood
would be flaky network.
Labels: Hotlist-UX-Backlog-rachelis
Cc: petewil@chromium.org
+petewil to cc - I'm interested in this bug for offline_pages.  This is by far the most common net error our users see while trying to offline pages.  Also, we should maybe offer to make an offline copy if the user ends up on this error page.

Sign in to add a comment