Go Back   Rhinocerus > Newsgroup > Newsgroup comp.lang.perl.misc

Reply
 
Thread Tools Display Modes
  #1 (permalink)  
Old 07-18-2012, 04:01 AM
Jason C
Guest
 
Posts: n/a
Default Stupid regex problem, s/// catching extra letter

I know better than to work late at night, but sometimes it just can't be helped :-)

I'm doing a simple s///, converting "www." to "http://www." when "www." occurs without a preceding "http://". Here's what I'm doing:

$text = "www.example.com";
$text =~ s#[^(http://)]www\.#http://www\.#gi;
print $text;

If $text is this, though:

$text = "<div>www.example.com</div>";

the regex is catching the > in <div>, printing:

<divhttp://www.example.com</div>

Where am I screwing up?
Reply With Quote
Alt Today
Advertising
 
and become member of Rhinocerus
Standard Sponsored Links

  #2 (permalink)  
Old 07-18-2012, 04:57 AM
Christian Winter
Guest
 
Posts: n/a
Default Re: Stupid regex problem, s/// catching extra letter

Am 18.07.2012 06:01, schrieb Jason C:
> I know better than to work late at night, but sometimes it just can't be helped :-)
>
> I'm doing a simple s///, converting "www." to "http://www."
> when "www." occurs without a preceding "http://". Here's what I'm doing:
>
> $text = "www.example.com";
> $text =~ s#[^(http://)]www\.#http://www\.#gi;
> print $text;
>
> If $text is this, though:
>
> $text = "<div>www.example.com</div>";
>
> the regex is catching the > in <div>, printing:
>
> <divhttp://www.example.com</div>
>
> Where am I screwing up?


You don't want to use a character class (square brackets).
[^(http://)] tells perl to look for any character not listed
inside the square brackets after the negation (^), so this
might as well read [^)(/:hpt].

What you're trying to do is a zero width negative look-behind
assertion.
s#(?<!http://)www\.#http://www.#gi should do the trick.
The "(?<!...)" tells the regex engine to only match the following
pattern if it is not preceded by the pattern in the look-behind,
without capturing anything.

"perldoc perlre" has good explanations for character classes
and look-around assertions.

-Chris
Reply With Quote
  #3 (permalink)  
Old 07-18-2012, 05:05 AM
Jason C
Guest
 
Posts: n/a
Default Re: Stupid regex problem, s/// catching extra letter

On Wednesday, July 18, 2012 12:57:00 AM UTC-4, thepoet wrote:
> What you're trying to do is a zero width negative look-behind
> assertion.
> s#(?<!http://)www\.#http://www.#gi should do the trick.
> The "(?<!...)" tells the regex engine to only match the following
> pattern if it is not preceded by the pattern in the look-behind,
> without capturing anything.
>
> "perldoc perlre" has good explanations for character classes
> and look-around assertions.
>
> -Chris


Thanks for the help, Chris. Character classes aren't exactly intuitive when a symbol changes definition completely based on context, so I'm still struggling with that a little.

The modification you suggested was perfect, though! Thanks again :-)
Reply With Quote
  #4 (permalink)  
Old 07-18-2012, 12:30 PM
Rainer Weikusat
Guest
 
Posts: n/a
Default Re: Stupid regex problem, s/// catching extra letter

Jason C <jwcarlton@gmail.com> writes:
> On Wednesday, July 18, 2012 12:57:00 AM UTC-4, thepoet wrote:
>> What you're trying to do is a zero width negative look-behind
>> assertion.
>> s#(?<!http://)www\.#http://www.#gi should do the trick.
>> The "(?<!...)" tells the regex engine to only match the following
>> pattern if it is not preceded by the pattern in the look-behind,
>> without capturing anything.
>>
>> "perldoc perlre" has good explanations for character classes
>> and look-around assertions.
>>
>> -Chris

>
> Thanks for the help, Chris. Character classes aren't exactly
> intuitive when a symbol changes definition completely based on
> context, so I'm still struggling with that a little.


A character class denotes an unordered set of characters, meaning

[^http://]
[^htp:/]
[^ppppth/]
[^:/hpt]
[^h:t/p]

all represent identical sets and they all match a single character.
But you wanted to match the string http:// and a regex matching a
string is just the string itself, IOW, THIS sequence of characters.
Reply With Quote
 
Reply

Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are Off
Pingbacks are Off
Refbacks are Off




All times are GMT. The time now is 10:51 PM.


Copyright ©2009

LinkBacks Enabled by vBSEO 3.3.0 RC2 © 2009, Crawlability, Inc.