Go Back   Rhinocerus > Newsgroup > Newsgroup comp.lang.perl.misc

Reply
 
Thread Tools Display Modes
  #1 (permalink)  
Old 08-07-2012, 05:29 PM
Peng Yu
Guest
 
Posts: n/a
Default How to debug a regex with (?DEFINE)?

Hi,

I'm trying to extract the nested namespace in the following code. But
the code can only extract the inner namespace. It is very hard for me
to see what is wrong. Does anybody know some tricks how to debug a
regex like the following? Thanks!

~/linux/test/perl/man/perlre/(?/(/DEFINE$ cat
main_namespace_multiple.pl
#!/usr/bin/env perl

use strict;
use warnings;

my $text=<<'EOF';
namespace A {
namespace B {
}
}
EOF

# Build pattern that matches only namespaces...
my $namespace_pattern = qr{
((?&namespace)) # Match and capture (possibly nested)
namespace

# Define each component...
(?(DEFINE)
(?<namespace_token>
\b [A-Za-z_]\w* \b
)

(?<namespace_keyword>
\b namespace \b
)

# Namespace is keyword + name + block...
(?<namespace>
(?&namespace_keyword) \s+ (?&namespace_token) \s*
\{
(?&namespace_body)
\}
)

(?<namespace_body>
(?:
\s*
(?&namespace)
\s*
)
|
(?&block)
)

(?<block>
\{
(?: (?&block) | . )*?
\}
)
)
}xs;

my ($extracted) = $text =~ $namespace_pattern;

print "text = $text\n";
print "extracted = $extracted\n";


Regards,
Peng
Reply With Quote
Alt Today
Advertising
 
and become member of Rhinocerus
Standard Sponsored Links

  #2 (permalink)  
Old 08-12-2012, 05:33 PM
Ben Morrow
Guest
 
Posts: n/a
Default Re: How to debug a regex with (?DEFINE)?


Quoth Peng Yu <pengyu.ut@gmail.com>:
>
> I'm trying to extract the nested namespace in the following code. But
> the code can only extract the inner namespace. It is very hard for me
> to see what is wrong. Does anybody know some tricks how to debug a
> regex like the following? Thanks!


The best way to debug something like this is to run it under 'use re
"debug"'. The output is rather verbose and a little arcane, so it takes
a bit of patience to pick through it, but it tells you *exactly* what
the regex engine is doing and where it fails.

In this case the important bit (reformatted slightly) is

29 <e B {> <%n }%n}%n> | 62: OPEN5 'namespace_body'(64)
29 <e B {> <%n }%n}%n> | 64: BRANCH(72)
29 <e B {> <%n }%n}%n> | 65: STAR(67)
SPACE can match 3 times out of
2147483647...
32 <e B {%n > <}%n}%n> | 67: GOSUB4[-26](70)
32 <e B {%n > <}%n}%n> | 41: OPEN4 'namespace'(43)
32 <e B {%n > <}%n}%n> | 43: GOSUB3[-12](46)
32 <e B {%n > <}%n}%n> | 31: OPEN3 'namespace_keyword'(33)
32 <e B {%n > <}%n}%n> | 33: BOUND(34)
failed...

[snip some attempts to claw back spaces from the \s*; using (?>) or \s*+
might be a good idea...]

failed...
29 <e B {> <%n }%n}%n> | 72: BRANCH(76)
29 <e B {> <%n }%n}%n> | 73: GOSUB6[+5](76)
29 <e B {> <%n }%n}%n> | 78: OPEN6 'block'(80)
29 <e B {> <%n }%n}%n> | 80: EXACT <{>(82)
failed...
BRANCH failed...
failed...

Here it's matched as far as 'namespace B {' and it's trying to match
(?&namespace_body), but that requires either a sub-namespace or a
(?<block>) (with explicit braces) so it doesn't match. You need to add
an empty case here:

> (?<namespace_body>
> (?:
> \s*
> (?&namespace)
> \s*
> )
> |
> (?&block)


| \s*

> )


though I suspect that you actually want to allow whitespace around
blocks, so what you want is

(?<namespace_body>
\s*
(?: (?&namespace)
| (?&block)
| # empty
)
\s*
)

or perhaps

(?<namespace>
(?&namespace_keyword) \s+ (?&namespace_token) \s*
\{ \s* (?: (?&namespace_body) \s* )* \}
)

(?<namespace_body>
(?&namespace) | (?&block)
)

to allow multiple blocks-or-namespaces within one namespace.

> (?<block>
> \{
> (?: (?&block) | . )*?
> \}


You want to be careful about using .*? to mean 'match anything
until...'. This will correctly match good input, but it is too
permissive; for instance, this

namespace A { { { } }

will match, despite not having balanced braces. You want

(?: (?&block) | [^\{] )*

to prevent that, and if you also want to forbid namespaces within blocks
you need

(?: (?&block)
| (?! (?&namespace_keyword) | \{ ) .
)*

Ben

Reply With Quote
 
Reply

Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are Off
Pingbacks are Off
Refbacks are Off




All times are GMT. The time now is 01:21 PM.


Copyright ©2009

LinkBacks Enabled by vBSEO 3.3.0 RC2 © 2009, Crawlability, Inc.