Saturday, December 16, 2006

A better RegEx pattern for matching e-mail addresses

Posted in Tiffany B Brown Blog.

Below is a more refined version.

^[-+.\w]{1,64}@[-.\w]{1,64}\.[-.\w]{2,6}$

Just as with the previous pattern, this one will match most valid e-mail addresses including:

  • Addresses with periods and plus signs (e.g. ‘tiffany.brown’ or ‘hotc0derch1ck+todolist’)
  • Top-level British and Australian domain names such as ‘.co.uk’ and ‘.com.au’
  • New top-level domains such as ‘.museum’ and ‘.travel’

This pattern takes advantage of the \w character type. It’s a simpler way of waying “a - z (both upper and lower case), 0 - 9 and the underscore character” (though for many languages, \w means any alphanumeric character).

It also checks to see whether a user or domain name contains at least one, but no more than 64 alphanumeric characters. Sixty-four is the maximum character length for user and domain names under SMTP.

This pattern should work with most regular expression engines.

No comments: