PCRE stands for Perl Compatible Regular Expressions, it is mainly used for pattern matching. If you want to learn more about PCRE, take a good read of its manual -
shell>man pcre
shell>man pcrematching
shell>man pcrepartial
shell>man pcrepattern
shell>man pcreperform
So why do you need to learn regular expressions(regex), here's the answer -
http://geek00l.blogspot.com/2006/12/regex-magic-for-netsexcanalyst.html
Next look at the tool that comes with pcre - pcretest, as the name implies, you can use pcretest to test your regex. Lets go -
shell>man pcre
shell>man pcrematching
shell>man pcrepartial
shell>man pcrepattern
shell>man pcreperform
So why do you need to learn regular expressions(regex), here's the answer -
http://geek00l.blogspot.com/2006/12/regex-magic-for-netsexcanalyst.html
Next look at the tool that comes with pcre - pcretest, as the name implies, you can use pcretest to test your regex. Lets go -
shell>pcre --help
Usage: pcretest [options] [input file [output file]]
Input and output default to stdin and stdout.
This version of pcretest is not linked with readline().
Options:
-b show compiled code (bytecode)
-C show PCRE compile-time options and exit
-d debug: show compiled code and information (-b and -i)
-dfa force DFA matching for all subjects
-help show usage information
-i show information about compiled patterns
-m output memory used information
-o
-p use POSIX interface
-q quiet: do not output PCRE version number at start
-S
-s output store (memory) used information
-t time compilation and execution
-t
-tm time execution (matching) only
-tm
If you have already read the man pages above, you should be able to understand some of the options, I normally use the option -C to check the compiles-time option first -
shell>pcretest -C
PCRE version 7.7 2008-05-07
Compiled with
UTF-8 support
Unicode properties support
Newline sequence is LF
\R matches all Unicode newlines
Internal link size = 2
POSIX malloc threshold = 10
Default match limit = 10000000
Default recursion depth limit = 10000000
Match recursion uses stack
Other option I usually use is -t to test on the time compilation and execution of particular regex I write.
shell>pcretest -t
PCRE version 7.7 2008-05-07
re>
So you may see the prompt goes to interactive mode - re>, it is for you to define your regex, bear in mind that your regex must use forward slash as delimeter, for example -
re>/[a-z0-9]+/
This means your regex is [a-z0-9]+, once you enter you will see this -
Compile time 0.0028 milliseconds
data>
You may notice the compile time for this regex is 0.0028 milliseconds, now you try to put any data to see if they match the regex,
data>ABC
Once you hit the enter, you will see this -
Execute time 0.0008 milliseconds
No match
The execution time is 0.0008 milliseconds and there's no match, lets change the data -
data> abc
Execute time 0.0004 milliseconds
0: abc
We can now see the execution time is 0.0004 milliseconds and the data seems to match the regex.
You can also figure out multiple regex compile time on the fly by defining them in a file instead of using interactive mode. For example I write the lines below to a file - pcre-testing.txt
/\d{,10000}/
/([a-z0-9]+)?/i
Do remember that if you want to test multi regex at once, you have to split them with a blank line, you can't do like this and it will incur errors -
/\d{,10000}/
/([a-z0-9]+)?/i
Now we can run this -
shell>pcretest -t pcre-testing
PCRE version 7.7 2008-05-07
/\d{,10000}/
Compile time 0.0032 milliseconds
/([a-z0-9]+)?/i
Compile time 0.0054 milliseconds
There are other options that you may want to try out, but I think I have given you enough guide to carry on, you may be interested in reading some of my related posts here -
http://geek00l.blogspot.com/2007/11/regex-learning-tool-kregexpeditor.html
http://geek00l.blogspot.com/2007/07/visualregexp-nice-regex-learning-tool.html
I advocate pcretest because it comes with pcre and available in HeX, and you can evaluate the performance of the regex quickly.
shell>pcretest -C
PCRE version 7.7 2008-05-07
Compiled with
UTF-8 support
Unicode properties support
Newline sequence is LF
\R matches all Unicode newlines
Internal link size = 2
POSIX malloc threshold = 10
Default match limit = 10000000
Default recursion depth limit = 10000000
Match recursion uses stack
Other option I usually use is -t to test on the time compilation and execution of particular regex I write.
shell>pcretest -t
PCRE version 7.7 2008-05-07
re>
So you may see the prompt goes to interactive mode - re>, it is for you to define your regex, bear in mind that your regex must use forward slash as delimeter, for example -
re>/[a-z0-9]+/
This means your regex is [a-z0-9]+, once you enter you will see this -
Compile time 0.0028 milliseconds
data>
You may notice the compile time for this regex is 0.0028 milliseconds, now you try to put any data to see if they match the regex,
data>ABC
Once you hit the enter, you will see this -
Execute time 0.0008 milliseconds
No match
The execution time is 0.0008 milliseconds and there's no match, lets change the data -
data> abc
Execute time 0.0004 milliseconds
0: abc
We can now see the execution time is 0.0004 milliseconds and the data seems to match the regex.
You can also figure out multiple regex compile time on the fly by defining them in a file instead of using interactive mode. For example I write the lines below to a file - pcre-testing.txt
/\d{,10000}/
/([a-z0-9]+)?/i
Do remember that if you want to test multi regex at once, you have to split them with a blank line, you can't do like this and it will incur errors -
/\d{,10000}/
/([a-z0-9]+)?/i
Now we can run this -
shell>pcretest -t pcre-testing
PCRE version 7.7 2008-05-07
/\d{,10000}/
Compile time 0.0032 milliseconds
/([a-z0-9]+)?/i
Compile time 0.0054 milliseconds
There are other options that you may want to try out, but I think I have given you enough guide to carry on, you may be interested in reading some of my related posts here -
http://geek00l.blogspot.com/2007/11/regex-learning-tool-kregexpeditor.html
http://geek00l.blogspot.com/2007/07/visualregexp-nice-regex-learning-tool.html
I advocate pcretest because it comes with pcre and available in HeX, and you can evaluate the performance of the regex quickly.
Enjoy (;])
1 comment:
Good post. It's definitely important to test regular expressions.
Post a Comment