Technology Programming

Named Regexp Groups

Regular Expressions are often cryptic so cryptic, they're "write only." Once they're written, they're not easily understood and difficult to maintain. A new feature in Ruby 1.9 aims to fix this.

Ruby 1.9 introduces the concept of a named capture group. These named groups can be defined at the beginning of a regexp statement using a trick that tells for the regexp engine to look for exactly 0 instances of the group.

Later, the group can be recalled using the \g element, followed by the name of the group. You can think of named capture groups as subroutines inside of a regular expression.

Combined with the /x option, regular expressions can now be written in a very pleasing, readable and maintainable way. The following example will parse strings that looks like user:ip_address:admin_flag.

#!/usr/bin/env ruby19users = %w{   alice:10.23.52.112:true   bob:192.168.10.34:false }user_regexp = %r{   (?<username> [a-z]+ ){0}  (?<ip_number> [0-9]{1,3} ){0}   (?<ip_address> (\g<ip_number>\.){3}\g<ip_number> ){0}  (?<admin> true | false ){0}  \g<username>:\g<ip_address>:\g<admin> }xusers.each do|u|   r = user_regexp.match(u)   puts "User #{r[:username]} is from #{r[:ip_address]}" end
Note that when each of the groups is defined, the group is followed by a {0} quantifier. This trick tells the regexp engine to look for exactly zero of the group, essentially ignoring it for now.

These groups are recalled later with the \g element.

And finally, for perhaps the most exciting part of this feature--when used with Regexp#match, the capture groups are stored in a hash for easy access. If you look at the puts statement at the end, portions of the string are recalled using the group names from the match data. Very slick indeed!

Leave a reply