Is FILTER_SANITIZE_EMAIL pointless if already using FILTER_VALIDATE_EMAIL?_问答_开发者

I am just creating a registration form, and I am looking only to insert valid and safe emails into the database.

Several sites (including w3schools) recommend running FILTER_SANITIZE_EMAIL before running FILTER_VALIDATE_EMAIL to be safe; however, this could change the submitted email from an invalid into a valid email, which could not be what the user wanted, for example:

The user has the email address jeff!@gmail.com, but accidentally inserts jeff"@gmail.com.

FILTER_SANITIZE_EMAIL would remove the " making the email jeff@gmail.com which FILTER_VALIDATE_EMAIL would say is valid even though it's not the user's actual email address.

To avoid this problem, I plan only to run FILTER_VALIDATE_EMAIL. (assuming I don't intend to output/process any emails declared invalid)

This will tell me whether or not the email is valid. If it is then there should be no need to pass it through FILTER_SANITIZE_EMAIL because any illegal/unsafe characters, would've already caused the email to be returned invalid, correct?

I also don't know of any email approved as valid by FILTER_VALIDATE_EMAIL that could be used for injection/xss due to the fact that white spaces, parentheses () and semicolons would invalidate the email. Or am I wrong?

(note: I wi开发者_如何学运维ll be using prepared statements to insert the data in addition to this, I just wanted to clear this up)

Here's how to insert only valid emails.

<?php
$original_email = 'jeff"@gmail.com';

$clean_email = filter_var($original_email,FILTER_SANITIZE_EMAIL);

if ($original_email == $clean_email && filter_var($original_email,FILTER_VALIDATE_EMAIL)){
   // now you know the original email was safe to insert.
   // insert into database code go here. 
}

FILTER_VALIDATE_EMAIL and FILTER_SANITIZE_EMAIL are both valuable functions and have different uses.

Validation is testing if the email is a valid format. Sanitizing is to clean the bad characters out of the email.

<?php
$email = "test@hostname.com"; 
$clean_email = "";

if (filter_var($email,FILTER_VALIDATE_EMAIL)){
    $clean_email =  filter_var($email,FILTER_SANITIZE_EMAIL);
} 

// another implementation by request. Which is the way I would suggest
// using the filters. Clean the content and then make sure it's valid 
// before you use it. 

$email = "test@hostname.com"; 
$clean_email = filter_var($email,FILTER_SANITIZE_EMAIL);

if (filter_var($clean_email,FILTER_VALIDATE_EMAIL)){
    // email is valid and ready for use
} else {
    // email is invalid and should be rejected
}

PHP is open source, so these questions are easily answered by just using it.

Source for FILTER_SANITIZE_EMAIL:

/* {{{ php_filter_email */
#define SAFE        "$-_.+"
#define EXTRA       "!*'(),"
#define NATIONAL    "{}|\\^~[]`"
#define PUNCTUATION "<>#%\""
#define RESERVED    ";/?:@&="

void php_filter_email(PHP_INPUT_FILTER_PARAM_DECL)
{
    /* Check section 6 of rfc 822 http://www.faqs.org/rfcs/rfc822.html */
    const unsigned char allowed_list[] = LOWALPHA HIALPHA DIGIT "!#$%&'*+-=?^_`{|}~@.[]";
    filter_map     map;

    filter_map_init(&map);
    filter_map_update(&map, 1, allowed_list);
    filter_map_apply(value, &map);
}

Source for FILTER_VALIDATE_EMAIL:

void php_filter_validate_email(PHP_INPUT_FILTER_PARAM_DECL) /* {{{ */
{
const char regexp[] = "/^(?!(?:(?:\\x22?\\x5C[\\x00-\\x7E]\\x22?)|(?:\\x22?[^\\x5C\\x22]\\x22?)){255,})(?!(?:(?:\\x22?\\x5C[\\x00-\\x7E]\\x22?)|(?:\\x22?[^\\x5C\\x22]\\x22?)){65,}@)(?:(?:[\\x21\\x23-\\x27\\x2A\\x2B\\x2D\\x2F-\\x39\\x3D\\x3F\\x5E-\\x7E]+)|(?:\\x22(?:[\\x01-\\x08\\x0B\\x0C\\x0E-\\x1F\\x21\\x23-\\x5B\\x5D-\\x7F]|(?:\\x5C[\\x00-\\x7F]))*\\x22))(?:\\.(?:(?:[\\x21\\x23-\\x27\\x2A\\x2B\\x2D\\x2F-\\x39\\x3D\\x3F\\x5E-\\x7E]+)|(?:\\x22(?:[\\x01-\\x08\\x0B\\x0C\\x0E-\\x1F\\x21\\x23-\\x5B\\x5D-\\x7F]|(?:\\x5C[\\x00-\\x7F]))*\\x22)))*@(?:(?:(?!.*[^.]{64,})(?:(?:(?:xn--)?[a-z0-9]+(?:-+[a-z0-9]+)*\\.){1,126}){1,}(?:(?:[a-z][a-z0-9]*)|(?:(?:xn--)[a-z0-9]+))(?:-+[a-z0-9]+)*)|(?:\\[(?:(?:IPv6:(?:(?:[a-f0-9]{1,4}(?::[a-f0-9]{1,4}){7})|(?:(?!(?:.*[a-f0-9][:\\]]){7,})(?:[a-f0-9]{1,4}(?::[a-f0-9]{1,4}){0,5})?::(?:[a-f0-9]{1,4}(?::[a-f0-9]{1,4}){0,5})?)))|(?:(?:IPv6:(?:(?:[a-f0-9]{1,4}(?::[a-f0-9]{1,4}){5}:)|(?:(?!(?:.*[a-f0-9]:){5,})(?:[a-f0-9]{1,4}(?::[a-f0-9]{1,4}){0,3})?::(?:[a-f0-9]{1,4}(?::[a-f0-9]{1,4}){0,3}:)?)))?(?:(?:25[0-5])|(?:2[0-4][0-9])|(?:1[0-9]{2})|(?:[1-9]?[0-9]))(?:\\.(?:(?:25[0-5])|(?:2[0-4][0-9])|(?:1[0-9]{2})|(?:[1-9]?[0-9]))){3}))\\]))$/iD";

pcre       *re = NULL;
pcre_extra *pcre_extra = NULL;
int preg_options = 0;
int         ovector[150]; /* Needs to be a multiple of 3 */
int         matches;


/* The maximum length of an e-mail address is 320 octets, per RFC 2821. */
if (Z_STRLEN_P(value) > 320) {
    RETURN_VALIDATION_FAILED
}

re = pcre_get_compiled_regex((char *)regexp, &pcre_extra, &preg_options TSRMLS_CC);
if (!re) {
    RETURN_VALIDATION_FAILED
}
matches = pcre_exec(re, NULL, Z_STRVAL_P(value), Z_STRLEN_P(value), 0, 0, ovector, 3);

/* 0 means that the vector is too small to hold all the captured substring offsets */
if (matches < 0) {
    RETURN_VALIDATION_FAILED
}

}

I read the same article and thought the same thing: Simply changing an invalid variable is not good enough. We need to actually tell the user that there was a problem, instead of just ignoring it. The solution, I think, is to compare the original to the sanitized version. I.e. to use the w3schools example, just add:

$cleanfield = filter_var($field, FILTER_SANITIZE_EMAIL);

if ($cleanfield != $field){
    return FALSE;
}

The "proper" way of doing this is asking for the user's email two times (which is common/good practice). But to answer your question, FILTER_SANITIZE_EMAIL is not pointless. It's a filter that sanitizes emails and it does its job well.

You need to understand that a filter that validates either returns true or false whereas a filter that sanitizes actually modifies the given variable. The two do not serve the same purpose.

always use validation filters early at moment of input, while sanitization is better used late in output as it clean the value before it reach the user