/^([A-Fa-f0-9]{1,4}:){7}[A-Fa-f0-9]{1,4}$|^[A-Fa-f0-9]{1,4}::([A-Fa-f0-9]{1,4}:){0,5}[A-Fa-f0-9]{1,4}$|^([A-Fa-f0-9]{1,4}:){2}:([A-Fa-f0-9]{1,4}:){0,4}[A-Fa-f0-9]{1,4}$|^([A-Fa-f0-9]{1,4}:){3}:([A-Fa-f0-9]{1,4}:){0,3}[A-Fa-f0-9]{1,4}$|^([A-Fa-f0-9]{1,4}:){4}:([A-Fa-f0-9]{1,4}:){0,2}[A-Fa-f0-9]{1,4}$|^([A-Fa-f0-9]{1,4}:){5}:([A-Fa-f0-9]{1,4}:){0,1}[A-Fa-f0-9]{1,4}$|^([A-Fa-f0-9]{1,4}:){6}:[A-Fa-f0-9]{1,4}$/
$ cat ips.txt
2001:0db8:0000:0000:0000:0000:1428:57ab
2001:0db8:0000:0000:0000::1428:57ab
2001:0db8:0:0:0:0:1428:57ab
2001:0db8:0:0::1428:57ab
2001:0db8::1428:57ab
2001:db8::1428:57ab
$ cat shortener.pl
#!/usr/bin/perl
while (<STDIN>)
{
my $ip = $_;
# step 1: remove all leading zeroes
s/^0*([A-Fa-f0-9])/$1/g;
s/:0*([A-Fa-f1-9])/:$1/g;
# step 2: collapse blocks of zeroes
s/:+0+(:0+)*:0+:+/::/g;
# all done.
print "$ip \t\t\t–> $_";
}
$ cat ips.txt | ./shortener.pl
2001:0db8:0000:0000:0000:0000:1428:57ab
–> 2001:db8::1428:57ab
2001:0db8:0000:0000:0000::1428:57ab
–> 2001:db8::1428:57ab
2001:0db8:0:0:0:0:1428:57ab
–> 2001:db8::1428:57ab
2001:0db8:0:0::1428:57ab
–> 2001:db8::1428:57ab
2001:0db8::1428:57ab
–> 2001:db8::1428:57ab
2001:db8::1428:57ab
–> 2001:db8::1428:57ab
[php]return( $ip == @inet_ntop(@inet_pton($ip))) ? true : false;[/php]
This works on both IPv4 and IPv6 addresses to validate them. So far so good, right?
I ran into a problem validating something like this:
2001:0db8:0000:0000:0000:0000:1428:57ab
If the user inputs all the zeros, the validation function fails because the return statement compares $ip with 2001:db8::1428:57ab which is another valid way to express the same data. It doesn't match what the user typed though and causes the comparison to fail, telling people they gave an invalid IP.
IPv6 addresses apparently can be expressed according to the following rules:
- A series of "0"s in a 16bit block can by represented by "0".
- A series of blocks containing only "0"s can be suppressed and represented by "::" (this can be done only once)
- Leading zeros in a group can also be omitted (as in ::1 for localhost).
Which means the following are all proper methods of expressing the same address:
2001:0db8:0000:0000:0000:0000:1428:57ab
2001:0db8:0000:0000:0000::1428:57ab
2001:0db8:0:0:0:0:1428:57ab
2001:0db8:0:0::1428:57ab
2001:0db8::1428:57ab
2001:db8::1428:57ab
I'm no regex guru, so I'm sort of stuck on what to do with this :(