|
|||
|
hi guys,
sorry am feeling a bit prolifit lately. today's show, is: 〈Fuck Python〉 http://xahlee.org/comp/fuck_python.html ------------------------------------ Fuck Python By Xah Lee, 2012-04-08 fuck Python. just fucking spend 2 hours and still going. here's the short story. so recently i switched to a Windows version of python. Now, Windows version takes path using win backslash, instead of cygwin slash. This fucking broke my find/replace scripts that takes a dir level as input. Because i was counting slashes. Ok no problem. My sloppiness. After all, my implementation wasn't portable. So, let's fix it. After a while, discovered there's the 「os.sep」. Ok, replace 「"/"」 to 「os.sep」, done. Then, bang, all hell went lose. Because, the backslash is used as escape in string, so any regex that manipulate path got fucked majorly. So, now you need to find a quoting mechanism. Then, fuck python doc incomprehensible scattered comp-sci-r-us BNF shit. Then, fuck python for “os.path” and “os” modules then string object and string functions inconsistent ball. And FUCK Guido who wants to fuck change python for his idiotic OOP concept of “elegance” so that some of these are deprecated. So after several exploration of “repr()”, “format()”, “‹str›.count()”, “os.path.normpath()”, “re.split()”, “len(re.search().group())” etc, after a long time, let's use “re.escape()”. 2 hours has passed. Also, discovered that “os.path.walk” is now deprecated, and one is supposed to use the sparkling “os.walk”. In the process of refreshing my python, the “os.path.walk” semantics is really one fucked up fuck. Meanwhile, the “os.walk” went into incomprehensible OOP object and iterators fuck. now, it's close to 3 hours. This fix is supposed to be done in 10 min. I'd have done it in elisp in just 10 minutes if not for my waywardness. This is Before def process_file(dummy, current_dir, file_list): current_dir_level = len(re.split("/", current_dir)) - len(re.split("/", input_dir)) cur_file_level = current_dir_level+1 if min_level <= cur_file_level <= max_level: for a_file in file_list: if re.search(r"\.html$", a_file, re.U) and os.path.isfile(current_dir + "/" + a_file): replace_string_in_file(current_dir + "/" + a_file) This is After def process_file(dummy, current_dir, file_list): current_dir = os.path.normpath(current_dir) cur_dir_level = re.sub( "^" + re.escape(input_dir), "", current_dir).count( os.sep) cur_file_level = cur_dir_level + 1 if min_level <= cur_file_level <= max_level: for a_file in file_list: if re.search(r"\.html$", a_file, re.U) and os.path.isfile(current_dir + re.escape(os.sep) + a_file): replace_string_in_file(current_dir + os.sep + a_file) # print "%d %s" % (cur_file_level, (current_dir + os.sep + a_file)) Complete File # -*- coding: utf-8 -*- # Python # find & replace strings in a dir import os, sys, shutil, re # if this this is not empty, then only these files will be processed my_files = [] input_dir = "c:/Users/h3/web/xahlee_org/lojban/hrefgram2/" input_dir = "/cygdrive/c/Users/h3/web/zz" input_dir = "c:/Users/h3/web/xahlee_org/" min_level = 2; # files and dirs inside input_dir are level 1. max_level = 2; # inclusive print_no_change = False find_replace_list = [ ( u"""<iframe style="width:100%;border:none" src="http://xahlee.org/ footer.html"></iframe>""", u"""<iframe style="width:100%;border:none" src="../footer.html"></ iframe>""", ), ] def replace_string_in_file(file_path): "Replaces all findStr by repStr in file file_path" temp_fname = file_path + "~lc~" backup_fname = file_path + "~bk~" # print "reading:", file_path input_file = open(file_path, "rb") file_content = unicode(input_file.read(), "utf-8") input_file.close() num_replaced = 0 for a_pair in find_replace_list: num_replaced += file_content.count(a_pair[0]) output_text = file_content.replace(a_pair[0], a_pair[1]) file_content = output_text if num_replaced > 0: print "◆ ", num_replaced, " ", file_path.replace("\\", "/") shutil.copy2(file_path, backup_fname) output_file = open(file_path, "r+b") output_file.read() # we do this way instead of “os.rename” to preserve file creation date output_file.seek(0) output_file.write(output_text.encode("utf-8")) output_file.truncate() output_file.close() else: if print_no_change == True: print "no change:", file_path # os.remove(file_path) # os.rename(temp_fname, file_path) def process_file(dummy, current_dir, file_list): current_dir = os.path.normpath(current_dir) cur_dir_level = re.sub( "^" + re.escape(input_dir), "", current_dir).count( os.sep) cur_file_level = cur_dir_level + 1 if min_level <= cur_file_level <= max_level: for a_file in file_list: if re.search(r"\.html$", a_file, re.U) and os.path.isfile(current_dir + re.escape(os.sep) + a_file): replace_string_in_file(current_dir + os.sep + a_file) # print "%d %s" % (cur_file_level, (current_dir + os.sep + a_file)) input_dir = os.path.normpath(input_dir) if (len(my_files) != 0): for my_file in my_files: replace_string_in_file(os.path.normpath(my_file) ) else: os.path.walk(input_dir, process_file, "dummy") print "Done." |
|
|
||||
|
||||
|
|
|
|||
|
On 08/04/2012 12:11, Xah Lee wrote:
<cut all> Hi Xah, You clearly didn't want help on this subject, as you really now how to do it anyway. But having read your posts over the years, I'd like to give you an observation on your persona, free of charge! :-) You are actually a talented writer, some may find your occasional profanity offensive but at least it highlights your frustration. You are undoubtedly and proven a good mathematian and more important than that self taught. You have a natural feel for design (otherwise you would not clash with others view of programming). You know a mixture of programming languages. Whether you like it or not, you are in the perfect position to create a new programming language and design a new programming paradigm. Unhindered from all the legacy crap, that keep people like me behind (I actually like BNF for example). It is likely I am wrong, but if that is your destiny there is no point fighting it. Cheers and good luck, Martin |
|
|||
|
Xah Lee <xahlee@gmail.com> wrote:
>hi guys, > >sorry am feeling a bit prolifit lately. > >today's show, is: 'Fuck Python' >http://xahlee.org/comp/fuck_python.html > >------------------------------------ >Fuck Python > By Xah Lee, 2012-04-08 > >fuck Python. > >just fucking spend 2 hours and still going. > >here's the short story. > >so recently i switched to a Windows version of python. Now, Windows >version takes path using win backslash, instead of cygwin slash. This >fucking broke my find/replace scripts that takes a dir level as input. >Because i was counting slashes. > >Ok no problem. My sloppiness. After all, my implementation wasn't >portable. So, let's fix it. After a while, discovered there's the >'os.sep'. Ok, replace "/" to 'os.sep', done. Then, bang, all hell >went lose. Because, the backslash is used as escape in string, so any >regex that manipulate path got fucked majorly. When Microsoft created MS-DOS, they decided to use '\' as the separator in file names. This was at a time when several previously existing interactive operating systems were using '/' as the file name separator and at least one was using '\' as an escape character. As a result of Microsoft's decision to use '\' as the separator, people have had to do extra work to adapt programs written for Windows to run in non-Windows environments, and vice versa. People have had to do extra work to write software that is portable between these environments. People have done extra work while creating tools to make writing portable software easier. And people have to do extra work when they use these tools, because using them is still harder than writing portable code for operating systems that all used '/' as their separator would have been. If you added up the cost of all the extra work that people have done as a result of Microsoft's decision to use '\' as the file name separator, it would probably be enough money to launch the Burj Khalifa into geosynchronous orbit. So, when you say fuck Python, are you sure you're shooting at the right target? -- David Canzi | TIMTOWWTDI (tim-toe-woe-dee): There Is More Than One | Wrong Way To Do It |
|
|||
|
["Followup-To:" header set to comp.lang.lisp.]
On 2012-04-08, David Canzi <dmcanzi@uwaterloo.ca> wrote: > Xah Lee <xahlee@gmail.com> wrote: >>hi guys, >> >>sorry am feeling a bit prolifit lately. >> >>today's show, is: 'Fuck Python' >>http://xahlee.org/comp/fuck_python.html >> >>------------------------------------ >>Fuck Python >> By Xah Lee, 2012-04-08 >> >>fuck Python. >> >>just fucking spend 2 hours and still going. >> >>here's the short story. >> >>so recently i switched to a Windows version of python. Now, Windows >>version takes path using win backslash, instead of cygwin slash. This >>fucking broke my find/replace scripts that takes a dir level as input. >>Because i was counting slashes. >> >>Ok no problem. My sloppiness. After all, my implementation wasn't >>portable. So, let's fix it. After a while, discovered there's the >>'os.sep'. Ok, replace "/" to 'os.sep', done. Then, bang, all hell >>went lose. Because, the backslash is used as escape in string, so any >>regex that manipulate path got fucked majorly. > > When Microsoft created MS-DOS, they decided to use '\' as > the separator in file names. This is false. The MS-DOS (dare I say it) "kernel" accepts both forward and backslashes as separators. The application-level choice was once configurable through a variable in COMMAND.COM. Then they hard-coded it to backslash. However, Microsoft operating systems continued to (and until this day) recognize slash as a path separator. Only, there are broken userland programs on Windows which don't know this. > So, when you say fuck Python, are you sure you're shooting at the > right target? I would have to say, probably yes. |
|
|||
|
On 2012-04-08 17:03, David Canzi <dmcanzi@uwaterloo.ca> wrote:
> If you added up the cost of all the extra work that people have > done as a result of Microsoft's decision to use '\' as the file > name separator, it would probably be enough money to launch the > Burj Khalifa into geosynchronous orbit. So we have another contender for the Most Expensive One-byte Mistake? Poul-Henning Kamp nominated the C/Unix guys: http://queue.acm.org/detail.cfm?id=2010365 hp -- _ | Peter J. Holzer | Deprecating human carelessness and |_|_) | Sysadmin WSR | ignorance has no successful track record. | | | hjp@hjp.at | __/ | http://www.hjp.at/ | -- Bill Code on asrg@irtf.org |
|
|||
|
On 2012-04-08, Peter J. Holzer <hjp-usenet2@hjp.at> wrote:
> On 2012-04-08 17:03, David Canzi <dmcanzi@uwaterloo.ca> wrote: >> If you added up the cost of all the extra work that people have >> done as a result of Microsoft's decision to use '\' as the file >> name separator, it would probably be enough money to launch the >> Burj Khalifa into geosynchronous orbit. > > So we have another contender for the Most Expensive One-byte Mistake? The one byte mistake in DOS and Windows is recognizing two characters as path separators. All code that correctly handles paths is complicated by having to look for a set of characters instead of just scanning for a byte. > http://queue.acm.org/detail.cfm?id=2010365 DOS backslashes are already mentioned in that page, but alas it perpetuates the clueless myth that DOS and windows do not recognize any other path separator. Worse, the one byte Unix mistake being covered is, disappointingly, just a clueless rant against null-terminated strings. Null-terminated strings are infinitely better than the ridiculous encapsulation of length + data. For one thing, if s is a non-empty null terminated string then, cdr(s) is also a string representing the rest of that string without the first character, where cdr(s) is conveniently defined as s + 1. Not only can compilers compress storage by recognizing that string literals are the suffixes of other string literals, but a lot of string manipulation code is simplified, because you can treat a pointer to interior of any string as a string. Because they are recursively defined, you can do elegant tail recursion on null terminated strings: const char *rec_strchr(const char *in, int ch) { if (*in == 0) return 0; else if (*in == ch) return in; else return rec_strchr(in + 1, ch); } length + data also raises the question: what type is the length field? One byte? Two bytes? Four? And then you have issues of byte order. Null terminated C strings can be written straight to a binary file or network socket and be instantly understood on the other end. Null terminated strings have simplified all kids of text manipulation, lexical scanning, and data storage/communication code resulting in immeasurable savings over the years. |
|
|||
|
> so recently i switched to a Windows version of python. Now, Windows
> version takes path using win backslash, instead of cygwin slash. This > fucking broke my find/replace scripts that takes a dir level as input. > Because i was counting slashes. > > Ok no problem. My sloppiness. After all, my implementation wasn't > portable. So, let's fix it. After a while, discovered there's the > 「os.sep」. Ok, replace 「"/"」 to 「os.sep」, done. Then, bang, all hell > went lose. Because, the backslash is used as escape in string, so any > regex that manipulate path got fucked majorly. .... > sorry am feeling a bit prolifit lately. Well, you should learn programming instead of being 'prolifit'. You could replace all backslashes with forward slashes after you get paths from OS: >>> "c:\\windpws\\sux".replace('\\', '/') 'c:/windpws/sux' After this your code will work as is. Note that Windows accepts forward slashes as path names, so you don't need to convert them back. > This fix is supposed to be done in 10 min. Well, I could fix it in 10 seconds. Maybe because I'm a programmer rather than a 'prolifit'. P.S. This was just a "Fuck Xah Lee" essay, nothing personal. It is a part of a series. |
|
|||
|
"Kaz Kylheku" <kaz@kylheku.com> wrote in message
news:20120408114313.85@kylheku.com... > Worse, the one byte Unix mistake being covered is, disappointingly, just a > clueless rant against null-terminated strings. > > Null-terminated strings are infinitely better than the ridiculous > encapsulation of length + data. > > For one thing, if s is a non-empty null terminated string then, cdr(s) is > also > a string representing the rest of that string without the first character, > where cdr(s) is conveniently defined as s + 1. If strings are represented as (ptr,length), then a cdr(s) would have to return (ptr+1,length-1), or (nil,0) if s was one character. No big deal. (Note I saw your post in comp.lang.python; I don't about any implications of that for Lisp.) And if, instead, you want to represent all but the last character of the string, then it's just (ptr,length-1). (Some checking is needed around empty strings, but similar checks are needed around s+1.) In addition, if you want to represent the middle of a string, then it's also very easy: (ptr+a,b). > Not only can compilers compress storage by recognizing that string > literals are > the suffixes of other string literals, but a lot of string manipulation > code is > simplified, because you can treat a pointer to interior of any string as a > string. Yes, the string "bart" also contains "art", "rt" and "t". But with counted strintgs, it can also contain "bar", "ba", "b", etc.... There are a few advantages to counted strings too... > length + data also raises the question: what type is the length field? One > byte? Two bytes? Four? Depends on the architecture. But 4+4 for 32-bits, and 8+8 bytes for 64-bits, I would guess, for general flex strings of any length. There are other ways of encoding a length. (For example I use one short string type of maximum M characters, but the current length N is encoded into the string, without needing any extra count byte (by fiddling about with the last couple of bytes). If you're trying to store a short string in an 8-byte field in a struct, then this will let you use all 8 bytes; a zero-terminated one, only 7.) > And then you have issues of byte order. Which also affects every single value of more than one byte. > Null terminated > C strings can be written straight to a binary file or network socket and > be > instantly understood on the other end. But they can't contains nulls! > Null terminated strings have simplified all kids of text manipulation, > lexical > scanning, and data storage/communication code resulting in immeasurable > savings over the years. They both have their uses. -- Bartc |
|
|||
|
On Sun, 08 Apr 2012 19:14:45 +0000, Kaz Kylheku wrote:
> The one byte mistake in DOS and Windows is recognizing two characters as path > separators. All code that correctly handles paths is complicated by having to > look for a set of characters instead of just scanning for a byte. It's worse when you consider that the "standard" Windows encoding for Japanese is Shift-JIS, which allows a \x5c character (normally backslash, but which also doubles as the Yen character) to occur as the second byte of a multi-byte sequence. Which means that you can't write "encoding-agnostic" pathname-handling functions. > Null-terminated strings are infinitely better than the ridiculous > encapsulation of length + data. Windows provides the worst of both worlds. The Windows API uses null-terminated strings, but the NT API on which the Windows subsystem runs uses length+data. So you can use the NT API to e.g. create registry keys containing an embedded null, so the Windows API can't read them (or rename them, or delete them). |
|
|||
|
On Sun, 08 Apr 2012 04:11:20 -0700, Xah Lee wrote:
> Ok no problem. My sloppiness. After all, my implementation wasn't > portable. So, let's fix it. After a while, discovered there's the > os.sep. Ok, replace "/" to os.sep, done. Then, bang, all hell > went lose. Because, the backslash is used as escape in string, so any > regex that manipulate path got fucked majorly. So, now you need to > find a quoting mechanism. if os.altsep is not None: sep_re = '[%s%s]' % (os.sep, os.altsep) else: sep_re = '[%s]' % os.sep But really, you should be ranting about regexps rather than Python. They're convenient if you know exactly what you want to match, but a nuisance if you need to generate the expression based upon data which is only available at run-time (and re.escape() only solves one very specific problem). |
|
|||
|
Em domingo, 8 de abril de 2012 14h03min28s UTC-3, David Canzi escreveu:
> Xah Lee <xahlee@gmail.com> wrote: > >hi guys, > > > >sorry am feeling a bit prolifit lately. > > > >today's show, is: 'Fuck Python' > >http://xahlee.org/comp/fuck_python.html > > > >------------------------------------ > >Fuck Python > > By Xah Lee, 2012-04-08 > > > >fuck Python. > > > >just fucking spend 2 hours and still going. > > > >here's the short story. > > > >so recently i switched to a Windows version of python. Now, Windows > >version takes path using win backslash, instead of cygwin slash. This > >fucking broke my find/replace scripts that takes a dir level as input. > >Because i was counting slashes. > > > >Ok no problem. My sloppiness. After all, my implementation wasn't > >portable. So, let's fix it. After a while, discovered there's the > >'os.sep'. Ok, replace "/" to 'os.sep', done. Then, bang, all hell > >went lose. Because, the backslash is used as escape in string, so any > >regex that manipulate path got fucked majorly. > > When Microsoft created MS-DOS, they decided to use '\' as > the separator in file names. This was at a time when several > previously existing interactive operating systems were using > '/' as the file name separator and at least one was using '\' > as an escape character. As a result of Microsoft's decision > to use '\' as the separator, people have had to do extra work > to adapt programs written for Windows to run in non-Windows > environments, and vice versa. People have had to do extra work > to write software that is portable between these environments. > People have done extra work while creating tools to make writing > portable software easier. And people have to do extra work when > they use these tools, because using them is still harder than > writing portable code for operating systems that all used '/' > as their separator would have been. yes, absolutely. But you got 2 inaccuracies there: 1) Microsoft didn't create DOS; 2) fucking DOS was written in C, and guess what, it uses \ as escape character. Fucking microsoft. > So, when you say fuck Python, are you sure you're shooting at the > right target? I agree. Fuck winDOS and fucking microsoft. |
|
|||
|
namekuseijin <namekuseijin@gmail.com> writes:
>> When Microsoft created MS-DOS, they decided to use '\' as >> the separator in file names. This was at a time when several >> previously existing interactive operating systems were using >> '/' as the file name separator and at least one was using '\' >> as an escape character. As a result of Microsoft's decision >> to use '\' as the separator, people have had to do extra work >> to adapt programs written for Windows to run in non-Windows >> environments, and vice versa. People have had to do extra work >> to write software that is portable between these environments. >> People have done extra work while creating tools to make writing >> portable software easier. And people have to do extra work when >> they use these tools, because using them is still harder than >> writing portable code for operating systems that all used '/' >> as their separator would have been. > > yes, absolutely. But you got 2 inaccuracies there: 1) Microsoft > didn't create DOS; 2) fucking DOS was written in C, and guess what, it > uses \ as escape character. Fucking microsoft. Actually, it's all due to a lamentable transcription error. You have to understand that at the time, computers didn't have big disk space (if they had disks at all), and didn't have the IP connectivity we have nowadays. They didn't even have a lot printers, printers being as costly as computers, if not more. (At least compared to small cheap computers). Therefore documentation wasn't on-line, and certainly not in-line. People had to write it down by hand, with a pen or pencil, making little marks on sheets of paper. (Happily it was already ballpens, not Quill pens). So what happenned was that some student learned about unix, and it's convention of using dashes to prefix options (so that they're not confused by file names, which assumedly don't start with a dash, or if they do, can be written as ./-dash-file instead). Unfortunately, that student wasn't too good at caligraphy, and wrote his dashes slightly slanted upward. When they wanted to copy the convention on the early versions of DOS, they had a look at those HAND-WRITTEN notes, and misread the dashes for slashes, what was ls -l was read as ls /l. That's how DOS commands got the convention of using slashes to start options. Then of course, confusion was possible with pathname separators, so they had to use another character, and backslash was an obvious, if fateful, choice. -- __Pascal Bourguignon__ http://www.informatimago.com/ A bad day in () is better than a good day in {}. |
|
|||
|
Xah Lee wrote:
« http://xahlee.org/comp/fuck_python.html » David Canzi wrote «When Microsoft created MS-DOS, they decided to use '\' as the separator in file names. *This was at a time when several previously existing interactive operating systems were using '/' as the file name separator and at least one was using '\' as an escape character. *As a result of Microsoft's decision to use '\' as the separator, people have had to do extra work to adapt programs written for Windows to run in non-Windows environments, and vice versa. *People have had to do extra work to write software that is portable between these environments. People have done extra work while creating tools to make writing portable software easier. *And people have to do extra work when they use these tools, because using them is still harder than writing portable code for operating systems that all used '/' as their separator would have been.» namekuseijin wrote: > yes, absolutely. *But you got 2 inaccuracies there: *1) Microsoft didn't create DOS; 2) fucking DOS was written in C, and guess what, it uses \ as escape character. *Fucking microsoft. > > > So, when you say fuck Python, are you sure you're shooting at the > > right target? > > I agree. *Fuck winDOS and fucking microsoft. No. The choice to use backslash than slash is actually a good one. because, slash is one of the useful char, far more so than backslash. Users should be able to use that for file names. i don't know the detailed history of path separator, but if i were to blame, it's fuck unix. The entirety of unix, unix geek, unixers, unix fuckheads. Fuck unix. 〈On Unix Filename Characters Problem〉 http://xahlee.org/UnixResource_dir/w...ame_chars.html 〈On Unix File System's Case Sensitivity〉 http://xahlee.org/UnixResource_dir/_/fileCaseSens.html 〈UNIX Tar Problem: File Length Truncation, Unicode Name Support〉 http://xahlee.org/comp/unix_tar_problem.html 〈What Characters Are Not Allowed in File Names?〉 http://xahlee.org/mswin/allowed_char...ile_names.html 〈Unicode Support in File Names: Windows, Mac, Emacs, Unison, Rsync, USB, Zip〉 http://xahlee.org/mswin/unicode_support_file_names.html 〈The Nature of the Unix Philosophy〉 http://xahlee.org/UnixResource_dir/writ/unix_phil.html Xah |
|
|||
|
>> Ok no problem. My sloppiness. After all, my implementation wasn't
>> portable. So, let's fix it. After a while, discovered there's the >> os.sep. Ok, replace "/" to os.sep, done. Then, bang, all hell >> went lose. Because, the backslash is used as escape in string, so any >> regex that manipulate path got fucked majorly. So, now you need to >> find a quoting mechanism. > > if os.altsep is not None: > sep_re = '[%s%s]' % (os.sep, os.altsep) > else: > sep_re = '[%s]' % os.sep > > But really, you should be ranting about regexps rather than Python. > They're convenient if you know exactly what you want to match, but a > nuisance if you need to generate the expression based upon data which is > only available at run-time (and re.escape() only solves one very specific > problem). It isn't a problem of regular expressions, but a problem of syntax for specification of regular expressions (i.e. them being specified as a string). Common Lisp regex library cl-ppcre allows to specify regex via a parse tree. E.g. "(foo[/\\]bar)" becomes (:REGISTER (:SEQUENCE "foo" (:CHAR-CLASS #\/ #\\) "bar")) This is more verbose, but totally unambiguous and requires no escaping. So this definitely is a problem of Python's regex library, and a problem of lack of support for nice parse tree representation in code. cl-ppcre supports both textual perl-compatible regex specification and parse tree. I would start with a simple string specification, then when shit hits fan I can call cl-ppcre: arse-string to get those parse treesand replaces forward slash with back slash. Moreover, I can automatically convert regexes: (defun scan-auto/ (regex target-string) (let ((fixed-parse-tree (subst '(:char-class #\/ #\\) '(:char-class #\/) (cl-ppcre: arse-string regex):test 'equal))) (cl-ppcre:scan-to-strings fixed-parse-tree target-string))) CL-USER> (scan-auto/ "foo[/]bar" "foo\\bar") "foo\\bar" #() |
|
|
![]() |
| Thread Tools | |
| Display Modes | |
|
|