|
|||
|
I'm trying to figure out what the better methods are dealing with images
that from server to user and then to server somewhere else again. I've kludged together this script for for this purpose, and I think it can use some beautifying: $ perl lh2.pl img: WWW::Mechanize::Image=HASH(0xa7bbc5c) https://sites.google.com/site/luther...=278&width=420 ext: jpg?height=278&width=420 ext: jpg img: WWW::Mechanize::Image=HASH(0xa7c5784) https://sites.google.com/site/luther...=279&width=420 ext: jpg?height=279&width=420 ext: jpg img: WWW::Mechanize::Image=HASH(0xa7c58ec) https://sites.google.com/site/luther...=280&width=420 ext: jpg?height=280&width=420 ext: jpg img: WWW::Mechanize::Image=HASH(0xa7c54dc) https://sites.google.com/site/luther...=315&width=420 ext: JPG?height=315&width=420 ext: JPG downloaded 4 images from https://sites.google.com/site/lutherhavennm/mission to folder site_20 $ cat lh2.pl #!/usr/bin/perl -w use strict; use feature ':5.10'; use WWW::Mechanize; use LWP::Simple; use Errno qw[ EEXIST ]; # get information about images my $domain = 'https://sites.google.com/site/lutherhavennm/mission'; my $m = WWW::Mechanize->new(); $m->get($domain); my @list = $m->images(); # create new folder and download images to it. my $counter = 0; my $dir = &mk_new_dir; for my $img (@list) { print "img: $img\n"; my $url = $img->url_abs(); print "$url \n"; my $ext = ($url =~ m/([^.]+)$/)[0]; print "ext: $ext\n"; $ext =~ s/\?.+//; print "ext: $ext\n"; $counter++; my $filename = $dir . "/image_" . $counter. '.' . $ext; getstore( $url, $filename ) or die "Can't download '$url': $@\n"; } # output print "downloaded ", $counter, " images from ", $domain, "\n"; print "to folder ", $dir, "\n"; sub mk_new_dir { my $counter2 = 1; while (1) { my $word = "site"; my $name = $word . '_' . $counter2++; if ( mkdir $name, 0755 ) { return $name; # success, return new dir name } else { next if $!{EEXIST}; # mkdir failed because file exists die sprintf "(%d) %s", $!, $!; # other failure; bail out! } } } $ Is this what jpg's look like on the internet, with the question mark after what is the traditional extension? If so, then I think I want to make a more-sophisticated capture. Is what I do with regex better done with a module? (which one?) a split? Thanks for your comment, -- Cal |
|
|
||||
|
||||
|
|
|
|||
|
On Wed, 13 Jun 2012 14:45:11 +0100, Ben Morrow wrote:
> This provides the additional advantage of following the proper > write-close-rename idiom for making a file available atomically > (probably not important in this case, but it never does any harm). Maybe not important in this case, but a good habit to get into. M4 |
|
|||
|
On 06/15/2012 03:56 PM, Martijn Lievaart wrote:
> On Wed, 13 Jun 2012 14:45:11 +0100, Ben Morrow wrote: > >> This provides the additional advantage of following the proper >> write-close-rename idiom for making a file available atomically >> (probably not important in this case, but it never does any harm). > > Maybe not important in this case, but a good habit to get into. There's a second part of the script that I can't quite seem to get wrangled. I'll excerpt as opposed to making a full listing: my @files = <$path*>; my $big_int = 1; for my $name (@files) { print "name is $name\n"; $name =~ m/.*\.(w+)/; my $ext = $1; print "ext is $ext\n"; } output: name is /home/dan/Desktop/upload_luther/lh1.jpg Use of uninitialized value $ext in concatenation (.) or string at upload9.pl line 31. ext is name is /home/dan/Desktop/upload_luther/lh2.jpg Use of uninitialized value $ext in concatenation (.) or string at upload9.pl line 31. ext is name is /home/dan/Desktop/upload_luther/lh3.jpg Use of uninitialized value $ext in concatenation (.) or string at upload9.pl line 31. ext is $ Why does this regex and capture not store 'jpg' in $ext? -- Cal |
|
|||
|
In article <5c2dnd3ipqJmI0bSnZ2dnUVZ_s6dnZ2d@supernews.com> , Cal
Dershowitz <cal@example.invalid> wrote: > On 06/15/2012 03:56 PM, Martijn Lievaart wrote: > > On Wed, 13 Jun 2012 14:45:11 +0100, Ben Morrow wrote: > > > >> This provides the additional advantage of following the proper > >> write-close-rename idiom for making a file available atomically > >> (probably not important in this case, but it never does any harm). > > > > Maybe not important in this case, but a good habit to get into. > > There's a second part of the script that I can't quite seem to get > wrangled. I'll excerpt as opposed to making a full listing: > > > my @files = <$path*>; > > my $big_int = 1; > for my $name (@files) { > print "name is $name\n"; > $name =~ m/.*\.(w+)/; You need \w instead of w. The '.*' would be unnecessary if you anchored the pattern, or if only one period ever occurred in your file names, and might be faster: $name =~ m/\.(\w+)$/; > my $ext = $1; > print "ext is $ext\n"; > > } > -- Jim Gibson |
|
|||
|
Quoth Jim Gibson <jimsgibson@gmail.com>: > In article <5c2dnd3ipqJmI0bSnZ2dnUVZ_s6dnZ2d@supernews.com> , Cal > Dershowitz <cal@example.invalid> wrote: > > > my @files = <$path*>; > > > > my $big_int = 1; > > for my $name (@files) { > > print "name is $name\n"; > > $name =~ m/.*\.(w+)/; > > You need \w instead of w. ....and, you *also* need to check the match succeeded before you look at $1: $name =~ m/.*\.(\w+)/ or die "'$name' has no extension"; otherwise you run the risk of picking up $1 from some entirely other pattern match. The $N variables are slightly strangely scoped, so this is a little less likely than it might be, but it can and does happen and causes *very* strange bugs when it does. Some other action may be more appropriate that 'die'; perhaps printing a warning and using 'next' to skip to the next file. As a general rule it's easier to avoid the $N variables where possible, and instead use the return value of the match: my ($ext) = $name =~ m/.*\.(\w+)/; That way even if the match fails $ext will be undef rather than some random other value. The brackets around $ext are important: they force the assignment to occur in list context, so that the match returns a list of captures rather than a boolean for success or failure. Ben |
|
|||
|
On 06/15/2012 05:02 PM, Cal Dershowitz wrote:
> On 06/15/2012 03:56 PM, Martijn Lievaart wrote: > There's a second part of the script that I can't quite seem to get > wrangled. I'll excerpt as opposed to making a full listing: > > > my @files = <$path*>; > > my $big_int = 1; > for my $name (@files) { > print "name is $name\n"; > $name =~ m/.*\.(w+)/; > my $ext = $1; > print "ext is $ext\n"; > > } > > output: > > name is /home/dan/Desktop/upload_luther/lh1.jpg > Use of uninitialized value $ext in concatenation (.) or string at > upload9.pl line 31. > ext is > name is /home/dan/Desktop/upload_luther/lh2.jpg > Use of uninitialized value $ext in concatenation (.) or string at > upload9.pl line 31. > ext is > name is /home/dan/Desktop/upload_luther/lh3.jpg > Use of uninitialized value $ext in concatenation (.) or string at > upload9.pl line 31. > ext is > $ > > Why does this regex and capture not store 'jpg' in $ext? $ perl upload9.pl syntax error at upload9.pl line 23, near "/$ext/;" Execution of upload9.pl aborted due to compilation errors. $ cat upload9.pl .... my @list = $ftp->dir(); my $big_int = 1; for my $name (@files) { print "name is $name\n"; my ($ext) = $name =~ /([^.]*)$/; for my $image (@list){ if ( $image =~ m/$ext/ ![]() print "image is $image\n"; } print "ext is $ext\n"; } $ .... or how do you match to a variable you already have? -- Cal |
|
|||
|
Ben Morrow <ben@morrow.me.uk> writes:
> Quoth Jim Gibson <jimsgibson@gmail.com>: >> In article <5c2dnd3ipqJmI0bSnZ2dnUVZ_s6dnZ2d@supernews.com> , Cal >> Dershowitz <cal@example.invalid> wrote: >> >> > my @files = <$path*>; >> > >> > my $big_int = 1; >> > for my $name (@files) { >> > print "name is $name\n"; >> > $name =~ m/.*\.(w+)/; >> >> You need \w instead of w. > > ...and, you *also* need to check the match succeeded before you look at > $1: > > $name =~ m/.*\.(\w+)/ or die "'$name' has no extension"; > > otherwise you run the risk of picking up $1 from some entirely other > pattern match. The $N variables are slightly strangely scoped, so this > is a little less likely than it might be, but it can and does happen and > causes *very* strange bugs when it does. [...] > As a general rule it's easier to avoid the $N variables where possible, > and instead use the return value of the match: What's so difficult with providing information instead of scaremongering? According to perlre(1), The numbered match variables ($1, $2, $3, etc.) [...] are all dynamically scoped until the end of the enclosing block or until the next successful match, Practically, this means the simple way to use $1 etc correctly is to avoid using them except if the match supposed to set them was successful, in this case (assuming that $ext is a hitherto untouched my variable) $name =~ m/.*\.(\w+)/ and $ext = $1; In more complicated cases, if (<some re match>) { # $1 etc valid here } or <some re match> && do { # also here }; can be used. |
|
|||
|
Rainer Weikusat <rweikusat@mssgmbh.com> writes:
[...] > According to perlre(1), > > The numbered match variables ($1, $2, $3, etc.) > > [...] > > are all dynamically scoped until the end of the > enclosing block or until the next successful match, This text could do with an additional explanation: The way 'scoped' is used here doesn't really make sense. What this means is that $1, .... are always local (like local) to the enclosing block and that their values always correspond with whatever was or wasn't captured by the last successful match inside this block. Example demonstrating most of this: ---------------- $a = 'Hallo'; { $a =~ /(H)/; { $a =~ /(l)(l)/; print "$1, $2\n"; $a =~ /(x)/; print "$1, $2\n"; $a =~ /(a)/; print "$1, $2\n"; } print "$1, $2\n"; } print "$1, $2\n"; |
|
|||
|
Quoth Rainer Weikusat <rweikusat@mssgmbh.com>: > Ben Morrow <ben@morrow.me.uk> writes: > > > > ...and, you *also* need to check the match succeeded before you look at > > $1: > > > > $name =~ m/.*\.(\w+)/ or die "'$name' has no extension"; > > > > otherwise you run the risk of picking up $1 from some entirely other > > pattern match. The $N variables are slightly strangely scoped, so this > > is a little less likely than it might be, but it can and does happen and > > causes *very* strange bugs when it does. > > > As a general rule it's easier to avoid the $N variables where possible, > > and instead use the return value of the match: > > What's so difficult with providing information instead of > scaremongering? I did. I showed the OP how to use $N correctly, and also provided an alternative I (at least) find easier to use. Where is the scaremongering? > According to perlre(1), > > The numbered match variables ($1, $2, $3, etc.) > [...] > are all dynamically scoped until the end of the > enclosing block or until the next successful match, I omitted that detail, since it was not relevant at that point. > Practically, this means the simple way to use $1 etc correctly is to > avoid using them except if the match supposed to set them was > successful, ....as I said... > in this case (assuming that $ext is a hitherto untouched > my variable) > > $name =~ m/.*\.(\w+)/ and $ext = $1; Ugly and crude. Assigning the return value of the match is much cleaner. > In more complicated cases, > > if (<some re match>) { > # $1 etc valid here > } That is a useful construction sometimes, but it's usually clearer for exceptional flow control ('match failed') to be the branch which diverts from the normal flow. Hence my suggestion to 'match or die', or equivalently something like unless (/.../) { warn "..." next; } That way the entire block can be ignored as 'error handling' when skimming through the code. I find this sub useful for that purpose: sub wnext { no warnings "exiting"; warn $_[0]; next; } since it allows /.../ or wnext "no extension on '$file'"; analogously to the standard 'or die' idiom. > <some re match> && do { > # also here > }; God, that's ugly. If you mean 'if', write 'if', and if you mean 'and', write 'and'. Ben |
|
|||
|
Quoth Cal Dershowitz <cal@example.invalid>: > > $ perl upload9.pl > syntax error at upload9.pl line 23, near "/$ext/;" > Execution of upload9.pl aborted due to compilation errors. > $ cat upload9.pl > ... > > my @list = $ftp->dir(); > my $big_int = 1; > for my $name (@files) { > print "name is $name\n"; You will (or, at least, we will) find your code much easier to read if you indent it properly. > my ($ext) = $name =~ /([^.]*)$/; > for my $image (@list){ > if ( $image =~ m/$ext/ ![]() There are two syntax errors in that line, neither of which has anything to do with the pattern match. > print "image is $image\n"; > } Ben |
|
|||
|
Ben Morrow <ben@morrow.me.uk> writes:
> Quoth Rainer Weikusat <rweikusat@mssgmbh.com>: >> Ben Morrow <ben@morrow.me.uk> writes: >> > >> > ...and, you *also* need to check the match succeeded before you look at >> > $1: >> > >> > $name =~ m/.*\.(\w+)/ or die "'$name' has no extension"; >> > >> > otherwise you run the risk of picking up $1 from some entirely other >> > pattern match. The $N variables are slightly strangely scoped, so this >> > is a little less likely than it might be, but it can and does happen and >> > causes *very* strange bugs when it does. >> >> > As a general rule it's easier to avoid the $N variables where possible, >> > and instead use the return value of the match: >> >> What's so difficult with providing information instead of >> scaremongering? > > I did. I showed the OP how to use $N correctly, and also provided an > alternative I (at least) find easier to use. Where is the > scaremongering? You didn't write anything regarding how the $n actually behave, just asserted they would be 'strangely scoped' and that this could case 'very strange bugs' in rarely occurring situations. That's about as sacry as it can get and nothing except scary. [...] >> Practically, this means the simple way to use $1 etc correctly is to >> avoid using them except if the match supposed to set them was >> successful, > > ...as I said... You didn't. You wrote that it would be necessary to 'check the success of the match', suggested to use die for tha,t and that - subject to the nameless but surely grave dangers - this feature shouldn't be used at all. >> in this case (assuming that $ext is a hitherto untouched >> my variable) >> >> $name =~ m/.*\.(\w+)/ and $ext = $1; > > Ugly and crude. Assigning the return value of the match is much > cleaner. Chances are that our aesthetic preferences also differ in other respects. In this case, however, a nice side effect of this construct is that nothing is assigned if there wasn't anything to assign. And it isn't necessary to hack around the fact that the match only returns the intended value in list context. Actually, I would write this as $name =~ /.*\.(\w+)/ and $ext = 1; but I purposely kept the m. If someone's preferred style of writing is different from mine, I can live with that without seeing a need to bash him into 'my idea of it'. >> In more complicated cases, >> >> if (<some re match>) { >> # $1 etc valid here >> } > > That is a useful construction sometimes, but it's usually clearer for > exceptional flow control ('match failed') to be the branch which diverts > from the normal flow. A failed match may well be not 'exceptional' at all. This will usually be the case for parsing something were multiple kinds of 'input trings' need to be recognized and analyzed. [...] >> <some re match> && do { >> # also here >> }; > > God, that's ugly. Is this supposed to be some prayer for divine punishment of those dirty heathens who don't dress like we do, whose accent differs from our accent, whose haircut differs from ours, whose skin color is somehow different, who come from different villages and who are generally foreign scum, easily recognizable by their outlandish and filthy habits? |
|
|||
|
Quoth Rainer Weikusat <rweikusat@mssgmbh.com>: > Ben Morrow <ben@morrow.me.uk> writes: > > Quoth Rainer Weikusat <rweikusat@mssgmbh.com>: > >> Ben Morrow <ben@morrow.me.uk> writes: > >> > > >> > ...and, you *also* need to check the match succeeded before you look at > >> > $1: > >> > > >> > $name =~ m/.*\.(\w+)/ or die "'$name' has no extension"; > >> > > >> > otherwise you run the risk of picking up $1 from some entirely other > >> > pattern match. The $N variables are slightly strangely scoped, so this > >> > is a little less likely than it might be, but it can and does happen and > >> > causes *very* strange bugs when it does. <snip> > > You didn't write anything regarding how the $n actually behave, just > asserted they would be 'strangely scoped' and that this could case > 'very strange bugs' in rarely occurring situations. That's about as > sacry as it can get and nothing except scary. You appear to be extremely easily scared. BOO! I suggest you reread the paragraph quoted above, carefully. The strange scoping of $N is a mitigating condition, not an aggrevating one (that is, it makes it less likely, rather than more likely, that a programmer will fall into a bug as a result of failing to check if a match succeeded or failed). > >> Practically, this means the simple way to use $1 etc correctly is to > >> avoid using them except if the match supposed to set them was > >> successful, > > > > ...as I said... > > You didn't. You wrote that it would be necessary to 'check the > success of the match', suggested to use die for tha,t No, I suggested to use 'or' for that. The 'die' was the means of diverting control away from the use of $1 if the match failed; I was careful to mention that in this situation it may not be the most appropriate way of doing so. > and that - > subject to the nameless but surely grave dangers - this feature > shouldn't be used at all. The dangers were named ('otherwise you run the risk of picking up $1 from some entirely other pattern match'), and I did not say the feature should not be used but that there exists another feature which is usually more convenient. > >> in this case (assuming that $ext is a hitherto untouched > >> my variable) > >> > >> $name =~ m/.*\.(\w+)/ and $ext = $1; > > > > Ugly and crude. Assigning the return value of the match is much > > cleaner. > > Chances are that our aesthetic preferences also differ in other > respects. In this case, however, a nice side effect of this construct > is that nothing is assigned if there wasn't anything to assign. How is that useful, except under rather rare circumstances? > And it > isn't necessary to hack around the fact that the match only returns > the intended value in list context. Context is an integral feature of Perl. If you don't like it, don't use Perl. > Actually, I would write this as > > $name =~ /.*\.(\w+)/ and $ext = 1; > > but I purposely kept the m. You will notice that I did the same. > >> In more complicated cases, > >> > >> if (<some re match>) { > >> # $1 etc valid here > >> } > > > > That is a useful construction sometimes, but it's usually clearer for > > exceptional flow control ('match failed') to be the branch which diverts > > from the normal flow. > > A failed match may well be not 'exceptional' at all. This will usually > be the case for parsing something were multiple kinds of 'input > trings' need to be recognized and analyzed. That was not the case in the piece of code being discussed. If it had been, a different choice of construction would have been appropriate. <snip vaguely offensive nonsense> Ben |
|
|||
|
On 06/16/2012 09:54 AM, Ben Morrow wrote:
> > Quoth Cal Dershowitz<cal@example.invalid>: >> >> $ perl upload9.pl >> syntax error at upload9.pl line 23, near "/$ext/;" >> Execution of upload9.pl aborted due to compilation errors. >> $ cat upload9.pl >> ... >> >> my @list = $ftp->dir(); >> my $big_int = 1; >> for my $name (@files) { >> print "name is $name\n"; > > You will (or, at least, we will) find your code much easier to read if > you indent it properly. I knocked the rust of of perltidy. > >> my ($ext) = $name =~ /([^.]*)$/; >> for my $image (@list){ >> if ( $image =~ m/$ext/ ![]() > > There are two syntax errors in that line, neither of which has anything > to do with the pattern match. > >> print "image is $image\n"; >> } This process is laden with the types of errors that beset the less-experienced, and sometimes not all related to what I'm struggling with centrally, which, I think, is best addressed by using grep. But all of a sudden, I can't use syntax for printing an array that I've used a hundred times before. So, I'm stuck, because I can't see whether I'm actually grepping something or not. While it is true that I'm posting a script with a syntax error, it's far and away not the first one I hit today, and I've tried to work through most of them by googling "perl ...." Anyways: $ perltidy -b upload10.pl $ perl upload10.pl Possible unintended interpolation of @array in string at upload10.pl line 26. Global symbol "@array" requires explicit package name at upload10.pl line 26. Execution of upload10.pl aborted due to compilation errors. $ cat upload10.pl #!/usr/bin/perl -w use strict; use feature ':5.10'; use Net::FTP; my $domain = ''; my $username = ''; my $password = ''; my $ftp = Net::FTP->new( $domain, Debug => 1, Passive => 1 ) or die "Can't connect: $@\n"; $ftp->login( $username, $password ) or die "Couldn't login\n"; $ftp->binary(); $ftp->cwd('/images/') or die "cwd failed $@\n"; my $path = '/home/dan/Desktop/upload_luther/'; my @files = <$path*>; my @list = $ftp->dir(); for my $name (@files) { print "name is $name\n"; my ($ext) = $name =~ /([^.]*)$/; for my $image (@list) { print "image is $image\n"; my ($ext2) = $image =~ /([^.]*)$/; my @array = grep ( /$ext2/, @list ); } print "sub_list is @array\n"; } $ It's a good time for me to take out the router and complete a task that I won't fail at. Wood is much-more forgiving. -- Cal |
|
|||
|
Cal Dershowitz <cal@example.invalid> wrote:
> my ($ext) = $name =~ /([^.]*)$/; > >Can (anyone) talk me through why this captures an extension? The carat >anchors the regex at the beginning. No because see below! > $ at the end. parens return the >match. The asterisk is to quantify what's in brackets, but what's going >on with the brackets? The square brackets define a character class, and the leading carat negates this class. In other words this class captures anything that is not a literal dot. Together with the asterisk and the dollar anchor this becomes: as many characters from the end of the string until the first dot appears (from the end). Which pretty much describes what some people call a file name extension. jue |
|
|||
|
Ben Morrow <ben@morrow.me.uk> writes:
> Quoth Rainer Weikusat <rweikusat@mssgmbh.com>: >> Ben Morrow <ben@morrow.me.uk> writes: >> > Quoth Rainer Weikusat <rweikusat@mssgmbh.com>: >> >> Ben Morrow <ben@morrow.me.uk> writes: >> >> > >> >> > ...and, you *also* need to check the match succeeded before you look at >> >> > $1: >> >> > >> >> > $name =~ m/.*\.(\w+)/ or die "'$name' has no extension"; >> >> > >> >> > otherwise you run the risk of picking up $1 from some entirely other >> >> > pattern match. The $N variables are slightly strangely scoped, so this >> >> > is a little less likely than it might be, but it can and does happen and >> >> > causes *very* strange bugs when it does. > <snip> >> >> You didn't write anything regarding how the $n actually behave, just >> asserted they would be 'strangely scoped' and that this could case >> 'very strange bugs' in rarely occurring situations. That's about as >> sacry as it can get and nothing except scary. > > You appear to be extremely easily scared. BOO! Writing about 'strange scoping' which may cause or prevent 'very strange bugs' makes the matter appear by far more serious and arcane than it actually is. That's why I referred to it as 'scaremongering'. I didn't write that my assessment of the situation would be similar to your assesment and especially not that it was based on your text. [...] >> >> Practically, this means the simple way to use $1 etc correctly is to >> >> avoid using them except if the match supposed to set them was >> >> successful, >> > >> > ...as I said... >> >> You didn't. You wrote that it would be necessary to 'check the >> success of the match', suggested to use die for tha,t > > No, I suggested to use 'or' for that. The 'die' was the means of > diverting control away from the use of $1 if the match failed; I was > careful to mention that in this situation it may not be the most > appropriate way of doing so. This is a completely pointless attempt at confusing the issue by playing with words and absuing semantic ambiguities inherent in the way humans use language. It, however, enables me to ask a rethoric question: Assuming that $name =~ /deepfried (whole) elephant roll/ or die('Salatschrecke!') is 'sensible language use', according to your opinion, how come that the almost identical $name =~ /deepfried (whole) elephant roll/ and $quantity = $1; is 'crude and ugly'? >> and that - subject to the nameless but surely grave dangers - this >> featureshouldn't be used at all. > > The dangers were named Indeed. Their names were 'strange scopes' and 'very strange bugs'. But I'm tired of this weasel-wording exercise. [...] > <snip vaguely offensive nonsense> You could have snipped the 'vaguely offensive nonsense' from your original text and in this case, you wouldn't have provoked a reply which pointed out that you're condemning something without any reasons given because it was different from what you're accustomed to. In this case, I was actually thinking of something like test -n "$parameter" && { # do something with it |
|
|
![]() |
| Thread Tools | |
| Display Modes | |
|
|