I went to my bank website the other day and entered my account number with a trailing space. An error message popped that said, "Account number must consist of numeric values only." I thought to myself, "Seriously?! You couldn't have just stripped the space for me?". If I were any less of a computer geek, I may even have thought, "What? There are only numbers in there!" (not being able to see space).
The Calculator that comes with Ubuntu on the other hand merrily accepts spaces and commas, but oddly doesn't like trailing dots (without any ensuing digits).
So, that begs the question. Exactly how forgiving should web forms be? I don't think trimming whitespace is too much to ask, but what about other integer fields?
- Should they allow +/- signs?
- How many spaces should be allowed between the sign and the number?
- What about commas 开发者_运维技巧for thousands separators?
- What about in other parts of the world where use dots instead?
- What if they're in between every 4 digits instead of every 3?
- What about hexidecimal and octal representations?
- Scientific notation?
- What if I accidentally hit the quote button when I'm trying to hit enter, should that be stripped too?
It would be very easy for me to strip out all non-digit characters, and that would be extremely forgiving, but what if the user made an actual mistake that affects the input and should have been caught, but now I've just stripped it out?
What about things like phone numbers (which have a huge variety of formats), postal codes, zip codes, credit card numbers, usernames, emails, URLs (should I assume http? What about .com while I'm at it?)?
Where do you draw the line?
For something as important as banking, I don't mind it complaining about my input, especially if the other option is mistakenly transferring a bucketload of money into some stranger's account instead of my wife's (because of a missing or incorrect digit for example).
A classic example is one of my banks which disallows monetary values unless they have ".99" at the end (where 9 can be any digit of course). The vast majority of things I do are for exact dollar amounts and it's a tad annoying to have to always enter 500.00 instead of just 500.
But I'll be happier about that the first time I avoid accidentally paying somebody $5072 instead of $50.72 just because I forgot the decimal point. Actually, that's pretty unlikely since it also asks for confirmation and I'm pretty anal in controlling my money :-)
Having said that, the general rule I try to follow is "be liberal in what you accept, be strict in what you produce".
This allows other software using my output to expect a limited range of possibilities (making their lives easier). But it makes my software more useful if it can handle simple misteaks.
You draw the line at the point where the computer is guessing at what the correct input should be.
For example, a license key input box I wrote once accepts spaces and dashes and both upper and lower case, even though internally the keys were without said spaces, dashes and were all upper case. I could do that, since I knew that none of the keys actually had spaces or dashes.
Your example about URLs is another good one. I've noticed that modern browsers (I'm using Chrome), when something like 'flowers' is typed into the address bar, it knows it should search for it since it's not a valid URL. If instead, I type 'st' it auto corrects (or auto-suggests) 'stackoverflow.com' since it's a bookmark.
A well-written input system will complain when it would otherwise be forced to guess what the correct input should be.
Numeric input:
Stripping non-digits seems reasonable to me, but the problem is conflicting decimal notation. Some regions expect ,
(comma) to denote the decimal separator, while others use .
(period). Unless the input would likely be in other bases, I would only assume base 10. If it's reasonable to assume non-base 10 input (base-16 for color input, for example), I would go with standard conventions for denoting the bases: leading 0 means base 8, leading 0x means base 16.
String input:
This gets a lot more complicated. It mostly depends on what the input is actually meant to represent. A username should exclude characters that will cause trouble, but the meaning of 'cause trouble' will vary depending on the use of the application and the system itself. URLs have a concrete definition of what qualifies, but that definition is rather broad. Fortunately, many languages come with tools to discern URLs, without you having to code your own parsing (whether the language does it perfectly or not is another question).
In the end, it's really a case-by-case basis. I do like paxadiablo's general rule, though: Accept as much as you can, output only what you must.
It totally depends on how the data is going to be used.
If the input is a monetary amount, for a transaction for example, then the inputted variable should be normalised to a set of standards for sure.
If it's simply a case of a phone number, then it is unlikely the stored data will provide any functional sort of use so you can be more forgiving.
There is nothing wrong with forcing correct format to make displayed look nicer, but you have to balance user irritation with micro benefits.
Once you start collecting data you can scan through it and see what sort of patterns emerge, and you can auto strip off inputted format.
Where do you draw the line?
When the consequences of accepting "invalid" data outweigh the irritation of not accepting it.
Should they allow +/- signs?
If negative values are valid, then of course they should.
If not, then don't just silently strip minus signs, as it totally changes the meaning of the data. Stripping pluses is less of a problem.
What if [thousands separators are] in between every 4 digits instead of every 3?
In countries that use three-digit grouping, "1,0000" can be assumed to be a typo. But is it a typo for "10,000" or for "1,000"? I wouldn't dare guess, as a wrong guess could cost the user $9,000.
What about hexidecimal and octal representations?
Unless you're running the search feature for unicode.org, I can't imagine why anyone would use hexidecimal in web form.
And "01234" is almost certainly intended to be 1234 instead of 668.
What about things like...credit card numbers
Please allow spaces or hyphens in credit card numbers. It's really annoying when I have to type an undelimited 16-digit number.
I think you're over reacting a little bit. If there's anything in the field that shouldn't be there, strip it. otherwise try to force the input into whatever format you want, and if it doesn't fit, reject it.
I would say "Accept anything but process only valid data".
Expect your users to behave like a computer noob. Validate the input data using regular expressions and other validators.
Search for standard regular expressions for urls, emails and stuff.
Throw in a regular exp like this "/(?:([a-zA-Z0-9][\s,]+))([a-zA-Z0-9]+)$/" for comma or space separated values. With minor tweaking this exp will work for any number of comma separated values.
The one that irritates me as a user is credit card numbers, conventionally these appear as groups of 4 digits with spaces separating them but the odd webform will only accept a single string of digits with no spaces and no indication that this is the format it's seeking. Similarly telephone numbers, humans often use spaces to improve clarity, webforms sometimes accept the spaces and sometimes don't.
精彩评论