|
|||
|
On Thu, 12 Apr 2012 09:01:40 -0700 (PDT), John Reye
<jononanon@googlemail.com> wrote: >Hi, > >recently I learned that various "standard C library" functions have >deficiencies in terms of high performance. Have you actually measured the performance impact of calling strlen after each fgets? >Examples include the return values of fgets(), strcpy(), strcat() >(thanks Eric Sosman for mentioning the last 2) > >Example: char * strcat ( char * destination, const char * >source ); >Return Value: destination is returned. > >How useless is that! I already know the distination, in the first >place. Useless is in the eye of the beholder. char buffer[100] = "The input is "; puts(strcat(buffer, flag ? "valid" : "invalid"); or strcat(strcpy(new_large_buffer, original_text_in_small_buffer), additional_text); >Why not return a pointer to the end of the concatenated string, or the >size of the string. This would cost the library no extra performance >cost whatsoever! Was that true 25+ years ago when the language was being designed? >Are these deficiencies _only_ string-related? >Is there no good library which I can use for optimal computing? Any library where fgets returns something other the destination pointer would be non-portable in the extreme. Some might be tempted to say it is not even C. The approach taken by some compiler writers when they need to tweak a standard function (such as add an extra parameter to make it safer) is to create a new function with a similar name that is reserved for the implementation (such as prepending an _ or appending a _s). Then the user has the choice of using the standard function or the system specific extension. >Because if I use the "standard C library" I'll end up _not_ using many >of it's routines. So you would rather loop through fgetc yourself? You might want to take one of your programs which is suffering from this fgets deficiency and replace the calls to fgets and strlen with the fgetc loop and determine the real benefit. Or are you planning to rewrite all the standard functions you don't like? For how many different systems will you do this? And in what newsgroup will you discuss any problems you run into using these functions. >Where is a library, that is done right?? Not for any C system. >Would this library only be a string-handling library, or buffer- >handling library? Or are there other parts of the "standard C library" >that are also deficient. Only you can tell what is deficient. >Surely there must be a good library somewhere, or do all C programmers >really carry their own routines with them? Most suffer these horrendous performance problems in silence by using the standard library. Others have been known to write their own wrapper functions to encapsulate the standard functions. But this is usually done to add functionality (such as common error checking) rather than deal with undocumented performance issues. -- Remove del for email |
|
|
||||
|
||||
|
|
|
|||
|
On 04/12/2012 02:32 PM, John Reye wrote:
> On Apr 12, 6:38 pm, ImpalerCore <jadil...@gmail.com> wrote: >> 1. Make it work. >> 2. Make it pretty. (well designed API, documentation) >> 3. Make it robust. (handle errors gracefully) >> 4. Make it fast. (if you really need to) > > 1. Consider what you need: does it need to be fast? is it throw-away- > program? is it part of a large project? > You only need one rule. You cannot generalize, cause it always... > depends. ![]() "Make it work" is always first - if you don't need the program to work correctly, you don't need to write a new program; there's plenty of existing programs that already don't do whatever it is that the new program was supposed to do. |
|
|||
|
James Kuyper wrote:
) On 04/12/2012 02:32 PM, John Reye wrote: )> On Apr 12, 6:38 pm, ImpalerCore <jadil...@gmail.com> wrote: )>> 1. Make it work. )>> 2. Make it pretty. (well designed API, documentation) )>> 3. Make it robust. (handle errors gracefully) )>> 4. Make it fast. (if you really need to) )> )> 1. Consider what you need: does it need to be fast? is it throw-away- )> program? is it part of a large project? )> You only need one rule. You cannot generalize, cause it always... )> depends. ![]() ) ) "Make it work" is always first - if you don't need the program to work ) correctly, you don't need to write a new program; there's plenty of ) existing programs that already don't do whatever it is that the new ) program was supposed to do. That's the kind of thinking that brought us 2-megabyte XML blobs as "database entities". Some stuff you just can't refactor when it turns out it's a performance killer, so you *have* to account for that from the beginning. Not to mention 'if it works, ship' SaSW, Willem -- Disclaimer: I am in no way responsible for any of the statements made in the above text. For all I know I might be drugged or something.. No I'm not paranoid. You all think I'm paranoid, don't you ! #EOT |
|
|||
|
Barry Schwarz wrote:
) On Thu, 12 Apr 2012 09:01:40 -0700 (PDT), John Reye )<jononanon@googlemail.com> wrote: ) )>Hi, )> )>recently I learned that various "standard C library" functions have )>deficiencies in terms of high performance. ) ) Have you actually measured the performance impact of calling strlen ) after each fgets? ) )>Examples include the return values of fgets(), strcpy(), strcat() )>(thanks Eric Sosman for mentioning the last 2) )> )>Example: char * strcat ( char * destination, const char * )>source ); )>Return Value: destination is returned. )> )>How useless is that! I already know the distination, in the first )>place. ) ) Useless is in the eye of the beholder. ) ) char buffer[100] = "The input is "; ) puts(strcat(buffer, flag ? "valid" : "invalid"); Not much difference there: char buffer[100] = "The input is "; strcat(buffer, flag ? "valid" : "invalid"); puts(buffer); You only have to type the variable name twice. That's a very minor inconvenience. So it's marginally useful. Returning a pointer to the nul terminator would be a lot more useful. ) or ) ) strcat(strcpy(new_large_buffer, ) original_text_in_small_buffer), ) additional_text); If strcpy and strcat returned a pointer to the nul character at the end of the string, that would still work, *and* it would be a lot more efficient. I've seen code where they build a giant string by repeatedly strcat()ing a few words on the end. That's needless O(n^2) performance right there. )>Why not return a pointer to the end of the concatenated string, or the )>size of the string. This would cost the library no extra performance )>cost whatsoever! ) ) Was that true 25+ years ago when the language was being designed? Yes. Duh. It's *NO* performance loss. Not negligible, but zero. Nil. The library just copied a number of characters to somewhere else. And it added a trailing zero to boot. So it knows where that zero went. )>Are these deficiencies _only_ string-related? )>Is there no good library which I can use for optimal computing? ) ) Any library where fgets returns something other the destination ) pointer would be non-portable in the extreme. Some might be tempted ) to say it is not even C. ) ) The approach taken by some compiler writers when they need to tweak a ) standard function (such as add an extra parameter to make it safer) is ) to create a new function with a similar name that is reserved for the ) implementation (such as prepending an _ or appending a _s). Then the ) user has the choice of using the standard function or the system ) specific extension. I think gnu libc has a few of those. It would have been an easy fix to accept those extra functions into the C standard interface, and then everybody would have been happy. )>Surely there must be a good library somewhere, or do all C programmers )>really carry their own routines with them? ) ) Most suffer these horrendous performance problems in silence by using ) the standard library. Others have been known to write their own ) wrapper functions to encapsulate the standard functions. But this is ) usually done to add functionality (such as common error checking) ) rather than deal with undocumented performance issues. The best approach I've seen is to have the build script check for platform-specific extensions (such as strlcpy and strlcat), and if those don't exist, include wrappers which emulate their behaviour. I believe that these functions return the original pointer because the language designers had some half-assed idea about viewing strings as some kind of opaque type that you could manipulate through functions. (I.E. conceptually, strcat takes two strings and returns one string, if you conveniently forget that the first string is changed, and also needs to have enough memory backing it to hold the whole string.) SaSW, Willem -- Disclaimer: I am in no way responsible for any of the statements made in the above text. For all I know I might be drugged or something.. No I'm not paranoid. You all think I'm paranoid, don't you ! #EOT |
|
|||
|
On 04/12/2012 02:30 PM, John Reye wrote:
> On Apr 12, 6:50 pm, James Kuyper <jameskuy...@verizon.net> wrote: [>> John Reye wrote:] >>> Is there no good library which I can use for optimal computing? >> >> I think that you can safely assume that no library you've ever heard of, >> and no library that you ever will hear of, is perfectly optimal. > What about a library that has better interfaces. I guarantee you that the library with better interfaces will still be sub-optimal. >> I almost never write my own routines for any purpose for which a C >> standard library routine can be made to serve, even if the library >> routine is a less-than perfect fit to my desires. The main exception is >> that I will often manually inline code that is roughly equivalent to >> some of the standard str*() and mem*() functions, especially strtok(). >> For the other str*() and mem*() functions, I'll usually inline the code >> only if I need a pointer to the end of the memory that was processed, >> something that the standard library functions don't provide. The >> in-lining is generally trivial for most of those functions. > > Could you perhaps give a brief example of what you mean with inlining, > here. Here's an example of code to concatenate some strings: outstring[0] = '\0'; for(str=0; str < num_strings; str++) strcat(outstring, in_strings[str]); My inline equivalent would be as follows: char *p = outstring; for(str=0; str < num_strings; str++) { const char *q = instrings[str]; while(*p++ = *q++); p--; // move back to terminating null character } It's slightly more wordy, but keeps track of the current end of the string, avoiding the time wasted by strcat() searching for the end of the string. It's a minor issue, and I wouldn't bother with it if it were significantly more difficult to do. Most of the str*() and mem*() functions are easy to inline. Here's some code in a utility library that I inherited responsibility for several years ago. If caused problems when called inside a strtok() loop in the calling routine: if((p=strtok(local_string, ","))!=NULL){ do{ /* Byte_sum = MFAIL if datatype_to_DFNT fails. Get HDF_num_type for p */ if((HDF_num_type=datatype_to_DFNT(p))==MFAIL)byte_ sum=MFAIL; /* If not MFAIL, increment byte_sum by the number of bytes returned from DFKNTsize */ else byte_sum=byte_sum+(long int)DFKNTsize(HDF_num_type); }while((p=strtok(NULL,","))!=NULL && byte_sum!=MFAIL); } To resolve this problem, a subordinate re-wrote the code, following my advice, as follows: while (local_string != NULL && byte_sum != MFAIL) { next_string = strchr(local_string, ','); if (next_string) *next_string++ = '\0'; HDF_num_type = datatype_to_DFNT(local_string); if (HDF_num_type==MFAIL) byte_sum = MFAIL; else byte_sum += (long int)DFKNTsize(HDF_num_type); local_string = next_string; } |
|
|||
|
"John Reye" <jononanon@googlemail.com> wrote in message
news:d965f1e6-f918-4c5b-88f8-b5c09644faef@f5g2000vby.googlegroups.com... > Hi, > > recently I learned that various "standard C library" functions have > deficiencies in terms of high performance. > Examples include the return values of fgets(), strcpy(), strcat() > (thanks Eric Sosman for mentioning the last 2) > > Example: char * strcat ( char * destination, const char * > source ); > Return Value: destination is returned. > > How useless is that! I already know the distination, in the first > place. > Why not return a pointer to the end of the concatenated string, or the > size of the string. This would cost the library no extra performance > cost whatsoever! > > > Are these deficiencies _only_ string-related? It seems that way. But no-one is bothered about it, because writing alternative versions (thin wrappers around standard functions) is trivial: int strcat_l(char *s,char *t,int slen,int tlen){ if (slen<0) slen=strlen(s); if (tlen<0) tlen=strlen(t); memcpy(s+slen,t,tlen+1); return slen+tlen; } int strcpy_l(char *s,char *t,int tlen){ if (tlen<0) tlen=strlen(t); memcpy(s,t,tlen+1); return tlen; } Etc. (In this style that I sometimes use, you can optionally supply a string length if you know it, otherwise -1 is passed. It's not hard to make them faster than the standard functions. (Except when the lengths aren't known; then it would probably be better for these to just call the standard versions, instead of messing with strlen(). But I tend to use this form for other kinds of string processing where a standard version doesn't exist.)) -- Bartc |
|
|||
|
בתאריך יום חמישי, 12 באפריל 2012 19:24:30 UTC+1, מאת John Reye:
> > Well implemented interfaces, will "almost automatically" make you > write good code. > But badly designed interfaces, will make you call strlen and all kinds > of other functions that you wouldn't have to call, if the functions- > interfaces had been designed in a better manner. > There's often a tension between perfomance and ease of use, however. For instance my options parser allows the programmer to take in a string, typically a file name, from the command line. Command line arguments are modifiable, so the OS has almost certainly copied the data from the console input into a temporary buffer. Then the options parser takes a local copy of the command line arguments. Then it makes another copy when the caller requests the program. So we end up, probably in main, with a char pointer we could have had with a simple ptr = argv[index]; However I made the judgement that it's easier that way. Then you don't have to document that argv mustnot be modified, or that if the programmer requests two copies of the filename (unlikely) that they are aliases. And how much is all this shuffling about of data in memory likley to impact performance? -- The options parser and much more, on my website http://www.malcolmmclean.site11.com/www |
|
|||
|
On 04/12/2012 04:07 PM, BartC wrote:
> "John Reye" <jononanon@googlemail.com> wrote in message > news:d965f1e6-f918-4c5b-88f8-b5c09644faef@f5g2000vby.googlegroups.com... .... >> Example: char * strcat ( char * destination, const char * >> source ); >> Return Value: destination is returned. >> >> How useless is that! I already know the distination, in the first >> place. >> Why not return a pointer to the end of the concatenated string, or the >> size of the string. This would cost the library no extra performance >> cost whatsoever! >> >> >> Are these deficiencies _only_ string-related? > > It seems that way. But no-one is bothered about it, because writing > alternative versions (thin wrappers around standard functions) is trivial: ,,, > int strcpy_l(char *s,char *t,int tlen){ > if (tlen<0) tlen=strlen(t); > memcpy(s,t,tlen+1); > return tlen; > } Your wrapper version passes through the input array twice: once for strlen() and once for memcpy(). That's precisely the deficiency he wants to avoid. A minor change to the implementation of strcpy() would allow it to return the desired pointer at no extra cost - the cost for your wrapper is fairly high. -- James Kuyper |
|
|||
|
On Apr 12, 2:13*pm, John Reye <jonona...@googlemail.com> wrote:
> > It's so you can use it in subsequent calls e.g. > > > puts(strcat(foo, bar)); > > I'd rather use > strcat(foo, bar); > puts(foo); > > and have a well-designed interface of strcat, to return something > useful. Nothing stopping you from writing portable replacement functions. No need to bitch on USENET about it. Tom |
|
|||
|
On 4/12/2012 12:01 PM, John Reye wrote:
> Hi, > > recently I learned that various "standard C library" functions have > deficiencies in terms of high performance. Recently I learned that you have a bee in your bonnet. No, hold it, calm down, rewind: I'm not saying it's your fault the bee is there; it may have flown in of its own accord. But your best bet it to shake it out of your hat and settle down before your blood pressure gets any higher. Despite its infelicities, I see little in the C library design that forces implementations or callers to perform poorly. Certainly, there are some interfaces that could, if revised, admit of higher- erforming implementations. For example, malloc() is a fairly "narrow" interface, and one can imagine a super_malloc() to which one could pass information about the expected lifetime of the allocation, its affinity or disaffinity to other allocations, whether it's more likely to be used by the GPU or the DMA chip, and so on. But one can also see that such a super_malloc() could be rather messy to use: char *ptr = malloc(42); vs. super_malloc_options opt = { 0 }; opt.request_size = 42; opt.growth_allowance = 42 * 6; opt.lifetime_hint = SUPER_MALLOC_LIFETIME_BRIEF; opt.cache_choice = SUPER_MALLOC_CACHE_AVOID(&blivet); #if __STDC_VERSION__ > 201306L opt.address_randomization = #ifdef __WINDOWS__ __MS_RANDOMIZE_IF_ABLE__ #else SUPER_MALLOC_RANDOMIZE_IF_ABLE #endif ; #endif char *ptr = super_malloc(&opt); There is no doubt that an interface along the lines of the latter could outperform the familiar malloc() in at least some situations. But which interface would you, as a code writer, prefer to use? Which would you expect to require more debugging time? I reiterate my earlier question about your keyboard: Are you still using nineteenth-century QWERTY? I'm not saying that you are wrong (or right) to do so, just asking you to examine the reasons for your choice, and to reflect on those same reasons as they may apply to other standardized interfaces. > Examples include the return values of fgets(), strcpy(), strcat() > (thanks Eric Sosman for mentioning the last 2) You're welcome, I am sure. But don't read so much into it: In hindsight we may wish that some choices had been made differently, but that's no reason to go ballistic. > Surely there must be a good library somewhere, or do all C programmers > really carry their own routines with them? That's a complicated question I'm too weary to address just now. Ponder it for yourself, and you may discover some useful answers, or if not answers, insights. And f'gawd'sake relax! -- Eric Sosman esosman@ieee-dot-org.invalid |
|
|||
|
"James Kuyper" <jameskuyper@verizon.net> wrote in message news:jm7sc1$18n$1@dont-email.me... > On 04/12/2012 04:07 PM, BartC wrote: >> "John Reye" <jononanon@googlemail.com> wrote in message >> news:d965f1e6-f918-4c5b-88f8-b5c09644faef@f5g2000vby.googlegroups.com... > ... >>> Example: char * strcat ( char * destination, const char * >>> source ); >>> Return Value: destination is returned. >>> >>> How useless is that! I already know the distination, in the first >>> place. >>> Why not return a pointer to the end of the concatenated string, or the >>> size of the string. This would cost the library no extra performance >>> cost whatsoever! >>> >>> >>> Are these deficiencies _only_ string-related? >> >> It seems that way. But no-one is bothered about it, because writing >> alternative versions (thin wrappers around standard functions) is >> trivial: > ,,, >> int strcpy_l(char *s,char *t,int tlen){ >> if (tlen<0) tlen=strlen(t); >> memcpy(s,t,tlen+1); >> return tlen; >> } > > Your wrapper version passes through the input array twice: once for > strlen() and once for memcpy(). Yes, I mentioned my versions work better when the caller knows the lengths. That's precisely the deficiency he wants > to avoid. A minor change to the implementation of strcpy() would allow > it to return the desired pointer at no extra cost - the cost for your > wrapper is fairly high. My point was that wrappers like this can be trivial to write: that length-based strcpy_l() function took only a minute or two. And instantly it can already be faster than the standard function. Being able to deal also with cases where the lengths are not known, is an extra feature provided for convenience; it can be a bit slower, but it also *returns* the length of the string, so it might save the caller having to call strlen! The wrapper functions can have other interfaces too. -- Bartc |
|
|||
|
On Thursday, April 12, 2012 7:51:36 PM UTC+1, Willem wrote:
> James Kuyper wrote: > ) On 04/12/2012 02:32 PM, John Reye wrote: > )> On Apr 12, 6:38 pm, ImpalerCore <jadil...@gmail.com> wrote: > )>> 1. Make it work. > )>> 2. Make it pretty. (well designed API, documentation) > )>> 3. Make it robust. (handle errors gracefully) > )>> 4. Make it fast. (if you really need to) > ) > ) "Make it work" is always first - tho' there are degrees of "work". A fast but inaccurate answer may be better than a "perfectly correct" answer. Sometimes "in good time" is an essential part of the requirement. Your car's anti-lock brakes can't spend hours or minutes deciding the best time to put the brakes on (or take them off). > ) if you don't need the program to work > ) correctly, you don't need to write a new program; there's plenty of > ) existing programs that already don't do whatever it is that the new > ) program was supposed to do. > > That's the kind of thinking that brought us 2-megabyte XML blobs > as "database entities". Some stuff you just can't refactor when > it turns out it's a performance killer, so you *have* to account > for that from the beginning. Not to mention 'if it works, ship' "don't prematurely optimise" is not an excuse to "prematurely pessimize". Besides if it's been properly designed (see Rule 2) the XML blob will be hidden behind an abstraction and a real database be shoved in with little trouble (I'm aware swopping databases isn't quite as easy as this- but it should be!). there other things can't be added on afterwards eg. security |
|
|||
|
On Thursday, April 12, 2012 7:00:19 PM UTC+1, Kaz Kylheku wrote:
> On 2012-04-12, Noob <root@127.0.0.1> wrote: > > John Reye wrote: <snip> > > "Premature optimization is the root of all evil" > > If you're designing a programming language, and working with an interpreted > implementation for the time being, is it a case of premature optimization to be > concerned with that the language features will be suitable for compilation? The Extreme Programmers would say so. Their form of Future Blindness precludes thinking of anything beyond passing the current test case. Doing so is Big Design Up Front (BDUF). |
|
|||
|
On 04/13/2012 06:38 AM, BartC wrote:
> > > "James Kuyper" <jameskuyper@verizon.net> wrote in message > news:jm7sc1$18n$1@dont-email.me... >> On 04/12/2012 04:07 PM, BartC wrote: >>> "John Reye" <jononanon@googlemail.com> wrote in message >>> news:d965f1e6-f918-4c5b-88f8-b5c09644faef@f5g2000vby.googlegroups.com... >> ... >>>> Example: char * strcat ( char * destination, const char * >>>> source ); >>>> Return Value: destination is returned. >>>> >>>> How useless is that! I already know the distination, in the first >>>> place. >>>> Why not return a pointer to the end of the concatenated string, or the >>>> size of the string. This would cost the library no extra performance >>>> cost whatsoever! >>>> >>>> >>>> Are these deficiencies _only_ string-related? >>> >>> It seems that way. But no-one is bothered about it, because writing >>> alternative versions (thin wrappers around standard functions) is >>> trivial: >> ,,, >>> int strcpy_l(char *s,char *t,int tlen){ >>> if (tlen<0) tlen=strlen(t); >>> memcpy(s,t,tlen+1); >>> return tlen; >>> } >> >> Your wrapper version passes through the input array twice: once for >> strlen() and once for memcpy(). > > Yes, I mentioned my versions work better when the caller knows the lengths. Which I consider a very minor advantage; if the length is known, just call memcpy() directly. If it's not known, your version is slower than strcpy(), and no faster than strcpy() followed by strlen(). > That's precisely the deficiency he wants >> to avoid. A minor change to the implementation of strcpy() would allow >> it to return the desired pointer at no extra cost - the cost for your >> wrapper is fairly high. > > My point was that wrappers like this can be trivial to write: that > length-based strcpy_l() function took only a minute or two. And instantly it > can already be faster than the standard function. With only a little extra work, you can write your own strcpy_l() that doesn't call standard library functions, passes through the array only once, and is likely (with a decent compiler) to compile into code with speed comparable to that of strcpy() itself, but with the added advantage of returning a value which contains more useful information than that returned by strcpy(). I wouldn't bother writing such a routine, because it's trivial to inline the code manually; but it would have been slightly more convenient if strcpy() had been defined to work that way in the first place. It's not a sufficiently big issue to justify John Reye's overreaction, but it is a real one. |
|
|||
|
nick_keighley_nospam@hotmail.com wrote:
) On Thursday, April 12, 2012 7:51:36 PM UTC+1, Willem wrote: )> That's the kind of thinking that brought us 2-megabyte XML blobs )> as "database entities". Some stuff you just can't refactor when )> it turns out it's a performance killer, so you *have* to account )> for that from the beginning. Not to mention 'if it works, ship' ) ) "don't prematurely optimise" is not an excuse to "prematurely pessimize". I understand 'pessimize' to be to deliberately make something pessimal. What I guess you're trying to say is something like: "Do prematurely try to get in the right ballpark for optimality." But that's just a guess. ) Besides if it's been properly designed (see Rule 2) the XML blob will be ) hidden behind an abstraction and a real database be shoved in with little ) trouble (I'm aware swopping databases isn't quite as easy as this- but it ) should be!). Uh yeah, no. You see, the 'abstraction' is the source of the problem. Especially if it's an object oriented abstraction. OO and databases really do not mix. SaSW, Willem -- Disclaimer: I am in no way responsible for any of the statements made in the above text. For all I know I might be drugged or something.. No I'm not paranoid. You all think I'm paranoid, don't you ! #EOT |
|
|
![]() |
| Thread Tools | |
| Display Modes | |
|
|