Go Back   Rhinocerus > Newsgroup > Newsgroup comp.lang.perl.misc

Reply
 
Thread Tools Display Modes
  #1 (permalink)  
Old 06-12-2012, 10:15 PM
Cal Dershowitz
Guest
 
Posts: n/a
Default an effective script for grabbing and putting images from or to awebsite

I'm trying to figure out what the better methods are dealing with images
that from server to user and then to server somewhere else again. I've
kludged together this script for for this purpose, and I think it can
use some beautifying:

$ perl lh2.pl
img: WWW::Mechanize::Image=HASH(0xa7bbc5c)
https://sites.google.com/site/luther...=278&width=420

ext: jpg?height=278&width=420
ext: jpg
img: WWW::Mechanize::Image=HASH(0xa7c5784)
https://sites.google.com/site/luther...=279&width=420

ext: jpg?height=279&width=420
ext: jpg
img: WWW::Mechanize::Image=HASH(0xa7c58ec)
https://sites.google.com/site/luther...=280&width=420

ext: jpg?height=280&width=420
ext: jpg
img: WWW::Mechanize::Image=HASH(0xa7c54dc)
https://sites.google.com/site/luther...=315&width=420

ext: JPG?height=315&width=420
ext: JPG
downloaded 4 images from https://sites.google.com/site/lutherhavennm/mission
to folder site_20
$ cat lh2.pl
#!/usr/bin/perl -w
use strict;
use feature ':5.10';
use WWW::Mechanize;
use LWP::Simple;
use Errno qw[ EEXIST ];

# get information about images
my $domain = 'https://sites.google.com/site/lutherhavennm/mission';
my $m = WWW::Mechanize->new();
$m->get($domain);
my @list = $m->images();

# create new folder and download images to it.
my $counter = 0;
my $dir = &mk_new_dir;
for my $img (@list) {
print "img: $img\n";
my $url = $img->url_abs();
print "$url \n";

my $ext = ($url =~ m/([^.]+)$/)[0];
print "ext: $ext\n";
$ext =~ s/\?.+//;
print "ext: $ext\n";
$counter++;
my $filename = $dir . "/image_" . $counter. '.' . $ext;
getstore( $url, $filename ) or die "Can't download '$url': $@\n";
}

# output
print "downloaded ", $counter, " images from ", $domain, "\n";
print "to folder ", $dir, "\n";

sub mk_new_dir {
my $counter2 = 1;
while (1) {
my $word = "site";
my $name = $word . '_' . $counter2++;
if ( mkdir $name, 0755 ) {
return $name; # success, return new dir name
}
else {
next if $!{EEXIST}; # mkdir failed because file exists
die sprintf "(%d) %s", $!, $!; # other failure; bail out!
}
}
}
$

Is this what jpg's look like on the internet, with the question mark
after what is the traditional extension? If so, then I think I want to
make a more-sophisticated capture.

Is what I do with regex better done with a module? (which one?) a split?

Thanks for your comment,
--
Cal
Reply With Quote
Alt Today
Advertising
 
and become member of Rhinocerus
Standard Sponsored Links

  #2 (permalink)  
Old 06-15-2012, 09:56 PM
Martijn Lievaart
Guest
 
Posts: n/a
Default Re: an effective script for grabbing and putting images from or to awebsite

On Wed, 13 Jun 2012 14:45:11 +0100, Ben Morrow wrote:

> This provides the additional advantage of following the proper
> write-close-rename idiom for making a file available atomically
> (probably not important in this case, but it never does any harm).


Maybe not important in this case, but a good habit to get into.

M4
Reply With Quote
  #3 (permalink)  
Old 06-15-2012, 11:02 PM
Cal Dershowitz
Guest
 
Posts: n/a
Default Re: an effective script for grabbing and putting images from or toa website

On 06/15/2012 03:56 PM, Martijn Lievaart wrote:
> On Wed, 13 Jun 2012 14:45:11 +0100, Ben Morrow wrote:
>
>> This provides the additional advantage of following the proper
>> write-close-rename idiom for making a file available atomically
>> (probably not important in this case, but it never does any harm).

>
> Maybe not important in this case, but a good habit to get into.


There's a second part of the script that I can't quite seem to get
wrangled. I'll excerpt as opposed to making a full listing:


my @files = <$path*>;

my $big_int = 1;
for my $name (@files) {
print "name is $name\n";
$name =~ m/.*\.(w+)/;
my $ext = $1;
print "ext is $ext\n";

}

output:

name is /home/dan/Desktop/upload_luther/lh1.jpg
Use of uninitialized value $ext in concatenation (.) or string at
upload9.pl line 31.
ext is
name is /home/dan/Desktop/upload_luther/lh2.jpg
Use of uninitialized value $ext in concatenation (.) or string at
upload9.pl line 31.
ext is
name is /home/dan/Desktop/upload_luther/lh3.jpg
Use of uninitialized value $ext in concatenation (.) or string at
upload9.pl line 31.
ext is
$

Why does this regex and capture not store 'jpg' in $ext?
--
Cal
Reply With Quote
  #4 (permalink)  
Old 06-16-2012, 12:10 AM
Jim Gibson
Guest
 
Posts: n/a
Default Re: an effective script for grabbing and putting images from or to a website

In article <5c2dnd3ipqJmI0bSnZ2dnUVZ_s6dnZ2d@supernews.com> , Cal
Dershowitz <cal@example.invalid> wrote:

> On 06/15/2012 03:56 PM, Martijn Lievaart wrote:
> > On Wed, 13 Jun 2012 14:45:11 +0100, Ben Morrow wrote:
> >
> >> This provides the additional advantage of following the proper
> >> write-close-rename idiom for making a file available atomically
> >> (probably not important in this case, but it never does any harm).

> >
> > Maybe not important in this case, but a good habit to get into.

>
> There's a second part of the script that I can't quite seem to get
> wrangled. I'll excerpt as opposed to making a full listing:
>
>
> my @files = <$path*>;
>
> my $big_int = 1;
> for my $name (@files) {
> print "name is $name\n";
> $name =~ m/.*\.(w+)/;


You need \w instead of w.

The '.*' would be unnecessary if you anchored the pattern, or if only
one period ever occurred in your file names, and might be faster:

$name =~ m/\.(\w+)$/;

> my $ext = $1;
> print "ext is $ext\n";
>
> }
>


--
Jim Gibson
Reply With Quote
  #5 (permalink)  
Old 06-16-2012, 01:20 AM
Ben Morrow
Guest
 
Posts: n/a
Default Re: an effective script for grabbing and putting images from or to a website


Quoth Jim Gibson <jimsgibson@gmail.com>:
> In article <5c2dnd3ipqJmI0bSnZ2dnUVZ_s6dnZ2d@supernews.com> , Cal
> Dershowitz <cal@example.invalid> wrote:
>
> > my @files = <$path*>;
> >
> > my $big_int = 1;
> > for my $name (@files) {
> > print "name is $name\n";
> > $name =~ m/.*\.(w+)/;

>
> You need \w instead of w.


....and, you *also* need to check the match succeeded before you look at
$1:

$name =~ m/.*\.(\w+)/ or die "'$name' has no extension";

otherwise you run the risk of picking up $1 from some entirely other
pattern match. The $N variables are slightly strangely scoped, so this
is a little less likely than it might be, but it can and does happen and
causes *very* strange bugs when it does. Some other action may be more
appropriate that 'die'; perhaps printing a warning and using 'next' to
skip to the next file.

As a general rule it's easier to avoid the $N variables where possible,
and instead use the return value of the match:

my ($ext) = $name =~ m/.*\.(\w+)/;

That way even if the match fails $ext will be undef rather than some
random other value. The brackets around $ext are important: they force
the assignment to occur in list context, so that the match returns a
list of captures rather than a boolean for success or failure.

Ben

Reply With Quote
  #6 (permalink)  
Old 06-16-2012, 01:36 AM
Cal Dershowitz
Guest
 
Posts: n/a
Default Re: an effective script for grabbing and putting images from or toa website

On 06/15/2012 05:02 PM, Cal Dershowitz wrote:
> On 06/15/2012 03:56 PM, Martijn Lievaart wrote:


> There's a second part of the script that I can't quite seem to get
> wrangled. I'll excerpt as opposed to making a full listing:
>
>
> my @files = <$path*>;
>
> my $big_int = 1;
> for my $name (@files) {
> print "name is $name\n";
> $name =~ m/.*\.(w+)/;
> my $ext = $1;
> print "ext is $ext\n";
>
> }
>
> output:
>
> name is /home/dan/Desktop/upload_luther/lh1.jpg
> Use of uninitialized value $ext in concatenation (.) or string at
> upload9.pl line 31.
> ext is
> name is /home/dan/Desktop/upload_luther/lh2.jpg
> Use of uninitialized value $ext in concatenation (.) or string at
> upload9.pl line 31.
> ext is
> name is /home/dan/Desktop/upload_luther/lh3.jpg
> Use of uninitialized value $ext in concatenation (.) or string at
> upload9.pl line 31.
> ext is
> $
>
> Why does this regex and capture not store 'jpg' in $ext?


$ perl upload9.pl
syntax error at upload9.pl line 23, near "/$ext/;"
Execution of upload9.pl aborted due to compilation errors.
$ cat upload9.pl
....

my @list = $ftp->dir();
my $big_int = 1;
for my $name (@files) {
print "name is $name\n";
my ($ext) = $name =~ /([^.]*)$/;
for my $image (@list){
if ( $image =~ m/$ext/
print "image is $image\n";
}
print "ext is $ext\n";
}




$

.... or how do you match to a variable you already have?
--
Cal
Reply With Quote
  #7 (permalink)  
Old 06-16-2012, 12:49 PM
Rainer Weikusat
Guest
 
Posts: n/a
Default Re: an effective script for grabbing and putting images from or to a website

Ben Morrow <ben@morrow.me.uk> writes:
> Quoth Jim Gibson <jimsgibson@gmail.com>:
>> In article <5c2dnd3ipqJmI0bSnZ2dnUVZ_s6dnZ2d@supernews.com> , Cal
>> Dershowitz <cal@example.invalid> wrote:
>>
>> > my @files = <$path*>;
>> >
>> > my $big_int = 1;
>> > for my $name (@files) {
>> > print "name is $name\n";
>> > $name =~ m/.*\.(w+)/;

>>
>> You need \w instead of w.

>
> ...and, you *also* need to check the match succeeded before you look at
> $1:
>
> $name =~ m/.*\.(\w+)/ or die "'$name' has no extension";
>
> otherwise you run the risk of picking up $1 from some entirely other
> pattern match. The $N variables are slightly strangely scoped, so this
> is a little less likely than it might be, but it can and does happen and
> causes *very* strange bugs when it does.


[...]

> As a general rule it's easier to avoid the $N variables where possible,
> and instead use the return value of the match:


What's so difficult with providing information instead of
scaremongering?

According to perlre(1),

The numbered match variables ($1, $2, $3, etc.)

[...]

are all dynamically scoped until the end of the
enclosing block or until the next successful match,

Practically, this means the simple way to use $1 etc correctly is to
avoid using them except if the match supposed to set them was
successful, in this case (assuming that $ext is a hitherto untouched
my variable)

$name =~ m/.*\.(\w+)/ and $ext = $1;

In more complicated cases,

if (<some re match>) {
# $1 etc valid here
}

or

<some re match> && do {
# also here
};

can be used.


Reply With Quote
  #8 (permalink)  
Old 06-16-2012, 01:27 PM
Rainer Weikusat
Guest
 
Posts: n/a
Default Re: an effective script for grabbing and putting images from or to a website

Rainer Weikusat <rweikusat@mssgmbh.com> writes:

[...]

> According to perlre(1),
>
> The numbered match variables ($1, $2, $3, etc.)
>
> [...]
>
> are all dynamically scoped until the end of the
> enclosing block or until the next successful match,


This text could do with an additional explanation: The way 'scoped' is
used here doesn't really make sense. What this means is that $1,
.... are always local (like local) to the enclosing block and that
their values always correspond with whatever was or wasn't captured by
the last successful match inside this block. Example demonstrating
most of this:

----------------
$a = 'Hallo';

{
$a =~ /(H)/;

{
$a =~ /(l)(l)/;
print "$1, $2\n";

$a =~ /(x)/;
print "$1, $2\n";

$a =~ /(a)/;
print "$1, $2\n";
}

print "$1, $2\n";
}

print "$1, $2\n";
Reply With Quote
  #9 (permalink)  
Old 06-16-2012, 03:52 PM
Ben Morrow
Guest
 
Posts: n/a
Default Re: an effective script for grabbing and putting images from or to a website


Quoth Rainer Weikusat <rweikusat@mssgmbh.com>:
> Ben Morrow <ben@morrow.me.uk> writes:
> >
> > ...and, you *also* need to check the match succeeded before you look at
> > $1:
> >
> > $name =~ m/.*\.(\w+)/ or die "'$name' has no extension";
> >
> > otherwise you run the risk of picking up $1 from some entirely other
> > pattern match. The $N variables are slightly strangely scoped, so this
> > is a little less likely than it might be, but it can and does happen and
> > causes *very* strange bugs when it does.

>
> > As a general rule it's easier to avoid the $N variables where possible,
> > and instead use the return value of the match:

>
> What's so difficult with providing information instead of
> scaremongering?


I did. I showed the OP how to use $N correctly, and also provided an
alternative I (at least) find easier to use. Where is the
scaremongering?

> According to perlre(1),
>
> The numbered match variables ($1, $2, $3, etc.)
> [...]
> are all dynamically scoped until the end of the
> enclosing block or until the next successful match,


I omitted that detail, since it was not relevant at that point.

> Practically, this means the simple way to use $1 etc correctly is to
> avoid using them except if the match supposed to set them was
> successful,


....as I said...

> in this case (assuming that $ext is a hitherto untouched
> my variable)
>
> $name =~ m/.*\.(\w+)/ and $ext = $1;


Ugly and crude. Assigning the return value of the match is much cleaner.

> In more complicated cases,
>
> if (<some re match>) {
> # $1 etc valid here
> }


That is a useful construction sometimes, but it's usually clearer for
exceptional flow control ('match failed') to be the branch which diverts
from the normal flow. Hence my suggestion to 'match or die', or
equivalently something like

unless (/.../) {
warn "..."
next;
}

That way the entire block can be ignored as 'error handling' when
skimming through the code.

I find this sub useful for that purpose:

sub wnext {
no warnings "exiting";
warn $_[0];
next;
}

since it allows

/.../ or wnext "no extension on '$file'";

analogously to the standard 'or die' idiom.

> <some re match> && do {
> # also here
> };


God, that's ugly. If you mean 'if', write 'if', and if you mean 'and',
write 'and'.

Ben

Reply With Quote
  #10 (permalink)  
Old 06-16-2012, 03:54 PM
Ben Morrow
Guest
 
Posts: n/a
Default Re: an effective script for grabbing and putting images from or toa website


Quoth Cal Dershowitz <cal@example.invalid>:
>
> $ perl upload9.pl
> syntax error at upload9.pl line 23, near "/$ext/;"
> Execution of upload9.pl aborted due to compilation errors.
> $ cat upload9.pl
> ...
>
> my @list = $ftp->dir();
> my $big_int = 1;
> for my $name (@files) {
> print "name is $name\n";


You will (or, at least, we will) find your code much easier to read if
you indent it properly.

> my ($ext) = $name =~ /([^.]*)$/;
> for my $image (@list){
> if ( $image =~ m/$ext/


There are two syntax errors in that line, neither of which has anything
to do with the pattern match.

> print "image is $image\n";
> }


Ben

Reply With Quote
  #11 (permalink)  
Old 06-16-2012, 05:05 PM
Rainer Weikusat
Guest
 
Posts: n/a
Default Re: an effective script for grabbing and putting images from or to a website

Ben Morrow <ben@morrow.me.uk> writes:
> Quoth Rainer Weikusat <rweikusat@mssgmbh.com>:
>> Ben Morrow <ben@morrow.me.uk> writes:
>> >
>> > ...and, you *also* need to check the match succeeded before you look at
>> > $1:
>> >
>> > $name =~ m/.*\.(\w+)/ or die "'$name' has no extension";
>> >
>> > otherwise you run the risk of picking up $1 from some entirely other
>> > pattern match. The $N variables are slightly strangely scoped, so this
>> > is a little less likely than it might be, but it can and does happen and
>> > causes *very* strange bugs when it does.

>>
>> > As a general rule it's easier to avoid the $N variables where possible,
>> > and instead use the return value of the match:

>>
>> What's so difficult with providing information instead of
>> scaremongering?

>
> I did. I showed the OP how to use $N correctly, and also provided an
> alternative I (at least) find easier to use. Where is the
> scaremongering?


You didn't write anything regarding how the $n actually behave, just
asserted they would be 'strangely scoped' and that this could case
'very strange bugs' in rarely occurring situations. That's about as
sacry as it can get and nothing except scary.

[...]

>> Practically, this means the simple way to use $1 etc correctly is to
>> avoid using them except if the match supposed to set them was
>> successful,

>
> ...as I said...


You didn't. You wrote that it would be necessary to 'check the
success of the match', suggested to use die for tha,t and that -
subject to the nameless but surely grave dangers - this feature
shouldn't be used at all.

>> in this case (assuming that $ext is a hitherto untouched
>> my variable)
>>
>> $name =~ m/.*\.(\w+)/ and $ext = $1;

>
> Ugly and crude. Assigning the return value of the match is much
> cleaner.


Chances are that our aesthetic preferences also differ in other
respects. In this case, however, a nice side effect of this construct
is that nothing is assigned if there wasn't anything to assign. And it
isn't necessary to hack around the fact that the match only returns
the intended value in list context. Actually, I would write this as

$name =~ /.*\.(\w+)/ and $ext = 1;

but I purposely kept the m. If someone's preferred style of writing is
different from mine, I can live with that without seeing a need to
bash him into 'my idea of it'.

>> In more complicated cases,
>>
>> if (<some re match>) {
>> # $1 etc valid here
>> }

>
> That is a useful construction sometimes, but it's usually clearer for
> exceptional flow control ('match failed') to be the branch which diverts
> from the normal flow.


A failed match may well be not 'exceptional' at all. This will usually
be the case for parsing something were multiple kinds of 'input
trings' need to be recognized and analyzed.

[...]

>> <some re match> && do {
>> # also here
>> };

>
> God, that's ugly.


Is this supposed to be some prayer for divine punishment of those
dirty heathens who don't dress like we do, whose accent differs from
our accent, whose haircut differs from ours, whose skin color is
somehow different, who come from different villages and who are
generally foreign scum, easily recognizable by their outlandish and
filthy habits?
Reply With Quote
  #12 (permalink)  
Old 06-16-2012, 10:03 PM
Ben Morrow
Guest
 
Posts: n/a
Default Re: an effective script for grabbing and putting images from or to a website


Quoth Rainer Weikusat <rweikusat@mssgmbh.com>:
> Ben Morrow <ben@morrow.me.uk> writes:
> > Quoth Rainer Weikusat <rweikusat@mssgmbh.com>:
> >> Ben Morrow <ben@morrow.me.uk> writes:
> >> >
> >> > ...and, you *also* need to check the match succeeded before you look at
> >> > $1:
> >> >
> >> > $name =~ m/.*\.(\w+)/ or die "'$name' has no extension";
> >> >
> >> > otherwise you run the risk of picking up $1 from some entirely other
> >> > pattern match. The $N variables are slightly strangely scoped, so this
> >> > is a little less likely than it might be, but it can and does happen and
> >> > causes *very* strange bugs when it does.

<snip>
>
> You didn't write anything regarding how the $n actually behave, just
> asserted they would be 'strangely scoped' and that this could case
> 'very strange bugs' in rarely occurring situations. That's about as
> sacry as it can get and nothing except scary.


You appear to be extremely easily scared. BOO!

I suggest you reread the paragraph quoted above, carefully. The strange
scoping of $N is a mitigating condition, not an aggrevating one (that
is, it makes it less likely, rather than more likely, that a programmer
will fall into a bug as a result of failing to check if a match
succeeded or failed).

> >> Practically, this means the simple way to use $1 etc correctly is to
> >> avoid using them except if the match supposed to set them was
> >> successful,

> >
> > ...as I said...

>
> You didn't. You wrote that it would be necessary to 'check the
> success of the match', suggested to use die for tha,t


No, I suggested to use 'or' for that. The 'die' was the means of
diverting control away from the use of $1 if the match failed; I was
careful to mention that in this situation it may not be the most
appropriate way of doing so.

> and that -
> subject to the nameless but surely grave dangers - this feature
> shouldn't be used at all.


The dangers were named ('otherwise you run the risk of picking up $1
from some entirely other pattern match'), and I did not say the feature
should not be used but that there exists another feature which is
usually more convenient.

> >> in this case (assuming that $ext is a hitherto untouched
> >> my variable)
> >>
> >> $name =~ m/.*\.(\w+)/ and $ext = $1;

> >
> > Ugly and crude. Assigning the return value of the match is much
> > cleaner.

>
> Chances are that our aesthetic preferences also differ in other
> respects. In this case, however, a nice side effect of this construct
> is that nothing is assigned if there wasn't anything to assign.


How is that useful, except under rather rare circumstances?

> And it
> isn't necessary to hack around the fact that the match only returns
> the intended value in list context.


Context is an integral feature of Perl. If you don't like it, don't use
Perl.

> Actually, I would write this as
>
> $name =~ /.*\.(\w+)/ and $ext = 1;
>
> but I purposely kept the m.


You will notice that I did the same.

> >> In more complicated cases,
> >>
> >> if (<some re match>) {
> >> # $1 etc valid here
> >> }

> >
> > That is a useful construction sometimes, but it's usually clearer for
> > exceptional flow control ('match failed') to be the branch which diverts
> > from the normal flow.

>
> A failed match may well be not 'exceptional' at all. This will usually
> be the case for parsing something were multiple kinds of 'input
> trings' need to be recognized and analyzed.


That was not the case in the piece of code being discussed. If it had
been, a different choice of construction would have been appropriate.

<snip vaguely offensive nonsense>

Ben

Reply With Quote
  #13 (permalink)  
Old 06-16-2012, 11:02 PM
Cal Dershowitz
Guest
 
Posts: n/a
Default Re: an effective script for grabbing and putting images from or toa website

On 06/16/2012 09:54 AM, Ben Morrow wrote:
>
> Quoth Cal Dershowitz<cal@example.invalid>:
>>
>> $ perl upload9.pl
>> syntax error at upload9.pl line 23, near "/$ext/;"
>> Execution of upload9.pl aborted due to compilation errors.
>> $ cat upload9.pl
>> ...
>>
>> my @list = $ftp->dir();
>> my $big_int = 1;
>> for my $name (@files) {
>> print "name is $name\n";

>
> You will (or, at least, we will) find your code much easier to read if
> you indent it properly.


I knocked the rust of of perltidy.
>
>> my ($ext) = $name =~ /([^.]*)$/;
>> for my $image (@list){
>> if ( $image =~ m/$ext/

>
> There are two syntax errors in that line, neither of which has anything
> to do with the pattern match.
>
>> print "image is $image\n";
>> }


This process is laden with the types of errors that beset the
less-experienced, and sometimes not all related to what I'm struggling
with centrally, which, I think, is best addressed by using grep. But
all of a sudden, I can't use syntax for printing an array that I've used
a hundred times before.

So, I'm stuck, because I can't see whether I'm actually grepping
something or not. While it is true that I'm posting a script with a
syntax error, it's far and away not the first one I hit today, and I've
tried to work through most of them by googling "perl ...."

Anyways:
$ perltidy -b upload10.pl
$ perl upload10.pl
Possible unintended interpolation of @array in string at upload10.pl
line 26.
Global symbol "@array" requires explicit package name at upload10.pl
line 26.
Execution of upload10.pl aborted due to compilation errors.
$ cat upload10.pl
#!/usr/bin/perl -w
use strict;
use feature ':5.10';
use Net::FTP;
my $domain = '';
my $username = '';
my $password = '';
my $ftp = Net::FTP->new( $domain, Debug => 1, Passive => 1 )
or die "Can't connect: $@\n";
$ftp->login( $username, $password ) or die "Couldn't login\n";
$ftp->binary();
$ftp->cwd('/images/') or die "cwd failed $@\n";
my $path = '/home/dan/Desktop/upload_luther/';
my @files = <$path*>;

my @list = $ftp->dir();
for my $name (@files) {
print "name is $name\n";
my ($ext) = $name =~ /([^.]*)$/;
for my $image (@list) {
print "image is $image\n";
my ($ext2) = $image =~ /([^.]*)$/;
my @array = grep ( /$ext2/, @list );
}

print "sub_list is @array\n";
}

$

It's a good time for me to take out the router and complete a task that
I won't fail at. Wood is much-more forgiving.
--
Cal

Reply With Quote
  #14 (permalink)  
Old 06-17-2012, 04:26 AM
Jürgen Exner
Guest
 
Posts: n/a
Default Regular Expression (WAS: an effective script for grabbing and putting images from or to a website)

Cal Dershowitz <cal@example.invalid> wrote:
> my ($ext) = $name =~ /([^.]*)$/;
>
>Can (anyone) talk me through why this captures an extension? The carat
>anchors the regex at the beginning.


No because see below!

> $ at the end. parens return the
>match. The asterisk is to quantify what's in brackets, but what's going
>on with the brackets?


The square brackets define a character class, and the leading carat
negates this class. In other words this class captures anything that is
not a literal dot. Together with the asterisk and the dollar anchor this
becomes: as many characters from the end of the string until the first
dot appears (from the end). Which pretty much describes what some people
call a file name extension.

jue
Reply With Quote
  #15 (permalink)  
Old 06-17-2012, 03:11 PM
Rainer Weikusat
Guest
 
Posts: n/a
Default Re: an effective script for grabbing and putting images from or to a website

Ben Morrow <ben@morrow.me.uk> writes:
> Quoth Rainer Weikusat <rweikusat@mssgmbh.com>:
>> Ben Morrow <ben@morrow.me.uk> writes:
>> > Quoth Rainer Weikusat <rweikusat@mssgmbh.com>:
>> >> Ben Morrow <ben@morrow.me.uk> writes:
>> >> >
>> >> > ...and, you *also* need to check the match succeeded before you look at
>> >> > $1:
>> >> >
>> >> > $name =~ m/.*\.(\w+)/ or die "'$name' has no extension";
>> >> >
>> >> > otherwise you run the risk of picking up $1 from some entirely other
>> >> > pattern match. The $N variables are slightly strangely scoped, so this
>> >> > is a little less likely than it might be, but it can and does happen and
>> >> > causes *very* strange bugs when it does.

> <snip>
>>
>> You didn't write anything regarding how the $n actually behave, just
>> asserted they would be 'strangely scoped' and that this could case
>> 'very strange bugs' in rarely occurring situations. That's about as
>> sacry as it can get and nothing except scary.

>
> You appear to be extremely easily scared. BOO!


Writing about 'strange scoping' which may cause or prevent 'very
strange bugs' makes the matter appear by far more serious and arcane
than it actually is. That's why I referred to it as
'scaremongering'. I didn't write that my assessment of the situation
would be similar to your assesment and especially not that it was
based on your text.

[...]

>> >> Practically, this means the simple way to use $1 etc correctly is to
>> >> avoid using them except if the match supposed to set them was
>> >> successful,
>> >
>> > ...as I said...

>>
>> You didn't. You wrote that it would be necessary to 'check the
>> success of the match', suggested to use die for tha,t

>
> No, I suggested to use 'or' for that. The 'die' was the means of
> diverting control away from the use of $1 if the match failed; I was
> careful to mention that in this situation it may not be the most
> appropriate way of doing so.


This is a completely pointless attempt at confusing the issue by
playing with words and absuing semantic ambiguities inherent in the
way humans use language. It, however, enables me to ask a rethoric
question:

Assuming that

$name =~ /deepfried (whole) elephant roll/ or die('Salatschrecke!')

is 'sensible language use', according to your opinion, how come that
the almost identical

$name =~ /deepfried (whole) elephant roll/ and $quantity = $1;

is 'crude and ugly'?

>> and that - subject to the nameless but surely grave dangers - this
>> featureshouldn't be used at all.

>
> The dangers were named


Indeed. Their names were 'strange scopes' and 'very strange bugs'.
But I'm tired of this weasel-wording exercise.

[...]

> <snip vaguely offensive nonsense>


You could have snipped the 'vaguely offensive nonsense' from your
original text and in this case, you wouldn't have provoked a reply which
pointed out that you're condemning something without any reasons given
because it was different from what you're accustomed to.

In this case, I was actually thinking of something like

test -n "$parameter" && {
# do something with it
Reply With Quote
 
Reply

Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are Off
Pingbacks are Off
Refbacks are Off




All times are GMT. The time now is 10:02 PM.


Copyright ©2009

LinkBacks Enabled by vBSEO 3.3.0 RC2 © 2009, Crawlability, Inc.