|
|||
|
Hi,
I'm trying to extract the nested namespace in the following code. But the code can only extract the inner namespace. It is very hard for me to see what is wrong. Does anybody know some tricks how to debug a regex like the following? Thanks! ~/linux/test/perl/man/perlre/(?/(/DEFINE$ cat main_namespace_multiple.pl #!/usr/bin/env perl use strict; use warnings; my $text=<<'EOF'; namespace A { namespace B { } } EOF # Build pattern that matches only namespaces... my $namespace_pattern = qr{ ((?&namespace)) # Match and capture (possibly nested) namespace # Define each component... (?(DEFINE) (?<namespace_token> \b [A-Za-z_]\w* \b ) (?<namespace_keyword> \b namespace \b ) # Namespace is keyword + name + block... (?<namespace> (?&namespace_keyword) \s+ (?&namespace_token) \s* \{ (?&namespace_body) \} ) (?<namespace_body> (?: \s* (?&namespace) \s* ) | (?&block) ) (?<block> \{ (?: (?&block) | . )*? \} ) ) }xs; my ($extracted) = $text =~ $namespace_pattern; print "text = $text\n"; print "extracted = $extracted\n"; Regards, Peng |
|
|
||||
|
||||
|
|
|
|||
|
Quoth Peng Yu <pengyu.ut@gmail.com>: > > I'm trying to extract the nested namespace in the following code. But > the code can only extract the inner namespace. It is very hard for me > to see what is wrong. Does anybody know some tricks how to debug a > regex like the following? Thanks! The best way to debug something like this is to run it under 'use re "debug"'. The output is rather verbose and a little arcane, so it takes a bit of patience to pick through it, but it tells you *exactly* what the regex engine is doing and where it fails. In this case the important bit (reformatted slightly) is 29 <e B {> <%n }%n}%n> | 62: OPEN5 'namespace_body'(64) 29 <e B {> <%n }%n}%n> | 64: BRANCH(72) 29 <e B {> <%n }%n}%n> | 65: STAR(67) SPACE can match 3 times out of 2147483647... 32 <e B {%n > <}%n}%n> | 67: GOSUB4[-26](70) 32 <e B {%n > <}%n}%n> | 41: OPEN4 'namespace'(43) 32 <e B {%n > <}%n}%n> | 43: GOSUB3[-12](46) 32 <e B {%n > <}%n}%n> | 31: OPEN3 'namespace_keyword'(33) 32 <e B {%n > <}%n}%n> | 33: BOUND(34) failed... [snip some attempts to claw back spaces from the \s*; using (?>) or \s*+ might be a good idea...] failed... 29 <e B {> <%n }%n}%n> | 72: BRANCH(76) 29 <e B {> <%n }%n}%n> | 73: GOSUB6[+5](76) 29 <e B {> <%n }%n}%n> | 78: OPEN6 'block'(80) 29 <e B {> <%n }%n}%n> | 80: EXACT <{>(82) failed... BRANCH failed... failed... Here it's matched as far as 'namespace B {' and it's trying to match (?&namespace_body), but that requires either a sub-namespace or a (?<block>) (with explicit braces) so it doesn't match. You need to add an empty case here: > (?<namespace_body> > (?: > \s* > (?&namespace) > \s* > ) > | > (?&block) | \s* > ) though I suspect that you actually want to allow whitespace around blocks, so what you want is (?<namespace_body> \s* (?: (?&namespace) | (?&block) | # empty ) \s* ) or perhaps (?<namespace> (?&namespace_keyword) \s+ (?&namespace_token) \s* \{ \s* (?: (?&namespace_body) \s* )* \} ) (?<namespace_body> (?&namespace) | (?&block) ) to allow multiple blocks-or-namespaces within one namespace. > (?<block> > \{ > (?: (?&block) | . )*? > \} You want to be careful about using .*? to mean 'match anything until...'. This will correctly match good input, but it is too permissive; for instance, this namespace A { { { } } will match, despite not having balanced braces. You want (?: (?&block) | [^\{] )* to prevent that, and if you also want to forbid namespaces within blocks you need (?: (?&block) | (?! (?&namespace_keyword) | \{ ) . )* Ben |
|
|
![]() |
| Thread Tools | |
| Display Modes | |
|
|