Validating an e-mail address

Question: How do I verify if a string of characters is a valid e-mail address?

First, you can only be sure about the second half of the e-mail address (the domain), as (in order to protect the anonymity of their users) many e-mail servers don’t give immediate responses when checked to see if the first part of the e-mail address is valid (although some will send a bounce notification at a later date, once an e-mail has been attempted).

Second, you can only TRULY verify if the domain address is accurate if the testing application has internet access.

So, without making a DNS call (or before), you can’t be absolutely sure that the user or the domain actually exists, and can never be sure if the user (and therefore the email) is an actual e-mail address.

But you can check to see if the format is valid using a regular expression. And this is where things get REALLY tricky, as how restrictive you want to be in your filtering depends on you, and while there is a defined standard, simply adhering to that standard may exclude e-mail addresses that are in use.

It seems that the most common regular expression that is suggest on the web is the following:

^[A-Z0-9a-z._%+-]+@[A-Za-z0-9.-]+\.[A-Za-z]{2,4}$

This will cover ALMOST all e-mail address you will run into, and will exclude obnoxious e-mails like badpirate@gmail.com.nospam

However, it will exclude a few e-mail addresses that are in use (but are probably being excluded and having trouble in lots of places:

  • kevin@yesthistldexists.museum – Yes .museum is a valid Top Level Domain
  • kevin@kevin@logichigh.com – Yes the current e-mail RFC doesn’t allow this type of e-mail address HOWEVER older RFC’s did, and so there may be some folks still using this format (and not being able to use this format in lots of other places)
  • ??@??web.jp – International characters in domains and user names are already being normalized to ascii friendly code by browsers and e-mail clients, so they are being used regularly, however if you are checking before that normalization occurs, these sorts of e-mail addresses will get tossed

Therefore, I’ve also written a super lax e-mail format checker that will catch all scenarios. This reg ex would probably be best used if you plan on checking to see if the domain exists after checking to see if the format loosely matches SOME format that COULD be in use :)

^.+@.+\.[A-Za-z]{2}[A-Za-z]*$

Finally, if you’d like to implement this in cocoa code:

BOOL NSStringIsValidEmail(NSString *checkString)
{
	NString *stricterFilterString = @"[A-Z0-9a-z._%+-]+@[A-Za-z0-9.-]+\\.[A-Za-z]{2,4}"; 
	NSString *laxString = @".+@.+\.[A-Za-z-]{2}[A-Za-z]*";
	NSString *emailRegex = stricterFilter ? stricterFilterString : laxString;
	NSPredicate *emailTest = [NSPredicate predicateWithFormat:@"SELF MATCHES %@", emailRegex];
	return [emailTest evaluateWithObject:checkString];
}

12 Comments

  • By Eric Hayes, December 17, 2010 @ 6:10 pm

    Hey, followed this from StackOverflow… Gave you a + 1, but it turns out this doesn’t work. There are typo’s in it which make it not compile… Would you mind fixing and re-posting it? (line 3, NSString not NString) (line 4, +\\. not +\.) i don’t speak regex, so i assume the later was the correct change, as the \ needed to be escaped. :-) thanks, -eric

  • By T. Berg, August 8, 2012 @ 2:29 pm

    Got to this page thru StackOverflow, it does the job perfectly, thanks a million !

  • By Julian, October 4, 2012 @ 9:36 am

    The ‘lax’ checker could be written as

    ^.+@.+\.[A-Za-z]{2,}$

  • By Danny, February 6, 2013 @ 11:46 am

    Consider this my +1 to you since you don’t have a G+ button with which I could otherwise commend you.

  • By Akhil, June 6, 2013 @ 7:44 pm

    Tnx a lots.. It works fine. I am having a doubt relating to validating a website. How to check, whether a website is valid or not ? I want to validate whether it is in http://www.name.com or http://www.name.co.te

  • By BadPirate, June 20, 2013 @ 11:47 am

    Websites are hard. I used the very open validation because there are so many possibilities (and they add more every day) of valid website domains. If website validation is important to you, I would suggest attempting a DNS lookup to validate sites (though I understand that there are some people who like using their IP address as their domain, yuck).

  • By Jeff, June 19, 2014 @ 4:17 am

    Works great for most cases. Does not allow hyphens or underscores in domain names for laxed.

  • By mahboud, July 12, 2014 @ 10:56 am

    New TLDs like .museum and .travel will not pass this test.

  • By BadPirate, July 13, 2014 @ 2:22 am

    I mention that in the discussion of lax vs strict. I use lax, but you get more false positives.

  • By Bartosz, July 24, 2014 @ 4:14 am

    Hey, the lax string is really bad written because it missed “-” in domain,
    Correct should be:
    NSString *laxString = @”.+@.+\.[A-Za-z-]{2}[A-Za-z]*”;

    Please correct, my app is public with this bug unfortunatelly

  • By BadPirate, September 18, 2014 @ 11:14 am

    Corrected.

Other Links to this Post

  1. ???????????????-IOS???????-IOS????? — April 8, 2014 @ 5:39 am

RSS feed for comments on this post. TrackBack URI

Leave a comment

WordPress Themes