I recently had a requirement for Scribblar (more on that site in another post) to verify if a domain name or page URL entered by a user is valid. Luckily ActionScript 3 features support for Regular Expressions, however my RegExp skills are non existent. So I reached out via Twitter to see if anyone could help. It took all of 10 minutes and a quick session on pastebin for Robert 'Da Man' Hall to sort the problem out for me. In order to preserve this nugget of knowledge for future generations, here it is.
var regex:RegExp = /^http(s)?:\/\/((\d+\.\d+\.\d+\.\d+)|(([\w-]+\.)+([a-z,A-Z][\w-]*)))(:[1-9][0-9]*)?(\/([\w-.\/:%+@&=]+[\w- .\/?:%+@&=]*)?)?(#(.*))?$/i;
Usage
var url:String = "http://www.google.com";
var regex:RegExp = /^http(s)?:\/\/((\d+\.\d+\.\d+\.\d+)|(([\w-]+\.)+([a-z,A-Z][\w-]*)))(:[1-9][0-9]*)?(\/([\w-.\/:%+@&=]+[\w- .\/?:%+@&=]*)?)?(#(.*))?$/i;
trace(regex.test(url)); // returns true if valid url is found
var regex:RegExp = /^http(s)?:\/\/((\d+\.\d+\.\d+\.\d+)|(([\w-]+\.)+([a-z,A-Z][\w-]*)))(:[1-9][0-9]*)?(\/([\w-.\/:%+@&=]+[\w- .\/?:%+@&=]*)?)?(#(.*))?$/i;
trace(regex.test(url)); // returns true if valid url is found
Thanks Robert!


Looks like alien jibberish, but neat stuff.
Now he had two problems.
:)
http://www.google
http://w.google
http://1.g
http://wwwwwwwwwwwwwwwwwwwwwww.google
var pattern:String = "http://www.google.com";
if(string == pattern) {
trace('true');
} else {
trace('false')
}
http://www.google.com/?a=b
but hopefully :
http://www.google.com/c?a=b
is OK.
If I knew regex well enough, I would have tried to fix it, but it's not the case.
var regex:RegExp = /^http(s)?:\/\/((\d+\.\d+\.\d+\.\d+)|(([\w-]+\.)+([a-z,A-Z][\w-]*)))(:[1-9][0-9]*)?(\/([\w-.\/\?:%+@&=]+[\w- .\/\?:%+@&=]*)?)?(#(.*))?$/i;
I've added a "\?" in front of ":%+@&=" then I guess I've corrected a bug but I'm not sure, I've added a "\" in front of "?:%+@&=", because the question mark should be escaped, no ?
If my "\" isn't correct, it probably should also be remove in my first correction...
I tested it with a few urls and it worked well for me.
shouldn't the "+" be escaped too ?
I also hope someone will correct me if I'm making mistakes.
The IP address part should be corrected with : "(\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3})"
I will allow IP from 0.0.0.0 up to 999.999.999.999... (which isn't valid but it's still better than 9999999.9999.99999999.99)
And about the port part, ports can be up to 65535, not infinity, so this one is better : "(:[1-9][0-9]{0,4})", but it will allow ports from "1" up to "99999". I've no idea how to add better restrictions.
From : http://www.ietf.org/rfc/rfc1738.txt some valid characters are still missing, so I've added to the regex : "'\(\)$,\*!"
The complete regex looks like this now :
var regex:RegExp = /^http(s)?:\/\/((\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3})|(([\w-]+\.)+([a-z,A-Z][\w-]*)))(:[1-9][0-9]{0,4})?(\/([\w-.\/\?:%\+@&=]+[\w- .\/\?:%\+@&='\(\)$,\*!]*)?)?(#(.*))?$/i;
I still hope somebody will be able to tell us if it's correct or if I'm posting stupidities :-/
Remember : I'm quite new to regex, so I'm not sure everything I write is OK. From my test it's OK, but we never know.
I have a couble of comments though:
1. These expressions will not accept http://test" target="_blank">http://test/file" target="_blank">http://test" target="_blank">http://test/file or http://test" target="_blank">http://test because there is no extension while I should be able to pass it, hence it will not accept http://localhost
2. the http:// should be optional because i may need to pass relative paths
3. It doesn't either accept ftp://test.com, I assume this is because the http part.
4. This also does not accept e-mail addresses while it should because e-mail address is also a url, right? actually it should accept http://tes@t.com
^(?:(?:http|https|ftp|telnet|gopher|ms\-help|file|notes)://)?(?:(?:[a-z][\w~%!&',;=\-\.$\(\)\*\+]*):.*@)?(?:(?:[a-z0-9][\w\-]*[a-z0-9]*\.)*(?:(?:(?:(?:[a-z0-9][\w\-]*[a-z0-9]*)(?:\.[a-z0-9]+)?)|(?:(?:(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.){3}(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)))(?::[0-9]+)?))?(?:(?:(?:/(?:[\w`~!$=;\-\+\.\^\(\)\|\{\}\[\]]|(?:%\d\d))+)*/(?:[\w`~!$=;\-\+\.\^\(\)\|\{\}\[\]]|(?:%\d\d))*)(?:\?[\^#]+)?(?:#[a-z0-9]\w*)?)?$
which means you will have at least a domain name
^(http://)?(?:(?:[a-z0-9][\w\-]*[a-z0-9]*\.)*(?:(?:(?:(?:[a-z0-9][\w\-]*[a-z0-9]*)(?:\.[a-z0-9]+)?)|(?:(?:(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.){3}(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)))(?::[0-9]+)?))(?:(?:(?:/(?:[\w`~!$=;\-\+\.\^\(\)\|\{\}\[\]]|(?:%\d\d))+)*/(?:[\w`~!$=;\-\+\.\^\(\)\|\{\}\[\]]|(?:%\d\d))*)(?:\?[\^#]+)?(?:#[a-z0-9]\w*)?)?$
I hope this helps
function IsValidRegEx($value:String, $regEx:Object):Boolean
{
if (!$value || !$regEx) return false;
return ($value.match($regEx) != null);
}
function IsValidUri($value:String):Boolean
{
return IsValidRegEx($value, /^(?:(?:http|https|ftp|telnet|gopher|ms\-help|file|notes):\/\/)?(?:(?:[a-z][\w~%!&',;=\-\.$\(\)\*\+]*):.*@)?(?:(?:[a-z0-9][\w\-]*[a-z0-9]*\.)*(?:(?:(?:(?:[a-z0-9][\w\-]*[a-z0-9]*)(?:\.[a-z0-9]+)?)|(?:(?:(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.){3}(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)))(?::[0-9]+)?))?(?:(?:(?:\/(?:[\w`~!$=;\-\+\.\^\(\)\|\{\}\[\]]|(?:%\d\d))+)*\/(?:[\w`~!$=;\-\+\.\^\(\)\|\{\}\[\]]|(?:%\d\d))*)(?:\?[\^#]+)?(?:#[a-z0-9]\w*)?)?$/);
}
If you'd like to check only for http and without user name and password in url you can replace the expression with this one
/^(?:http:\/\/)?(?:(?:[a-z0-9][\w\-]*[a-z0-9]*\.)*(?:(?:(?:(?:[a-z0-9][\w\-]*[a-z0-9]*)(?:\.[a-z0-9]+)?)|(?:(?:(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.){3}(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)))(?::[0-9]+)?))?(?:(?:(?:\/(?:[\w`~!$=;\-\+\.\^\(\)\|\{\}\[\]]|(?:%\d\d))+)*\/(?:[\w`~!$=;\-\+\.\^\(\)\|\{\}\[\]]|(?:%\d\d))*)(?:\?[\^#]+)?(?:#[a-z0-9]\w*)?)?$/