Go Back   Rhinocerus > Newsgroup > Newsgroup comp.lang.c

Reply
 
Thread Tools Display Modes
  #16 (permalink)  
Old 04-11-2012, 11:47 PM
William Ahern
Guest
 
Posts: n/a
Default Re: fgets - design deficiency: no efficient way of finding last character read

James Kuyper <jameskuyper@verizon.net> wrote:
> On 04/11/2012 07:07 PM, John Reye wrote:
> > Even though James Kuyper showed a nice way of determining if the
> > string contains '\n', I still feel that fgets has a RETURN VALUE that
> > simply shouts "deficiency!".
> >
> > char * fgets ( char * str, int num, FILE * stream );
> > Return Value
> > On success, the function returns the same str parameter. etc.
> >
> > Why on earth return an identical pointer most of the time???
> > Returning a count of the number of bytes read would have been a far
> > better choice for the return value, wouldn't it?


> Many of the C standard library functions would have been more useful if
> they'd returned a pointer to the end of a string or buffer, rather than
> to its beginning. I chalk it up to inexperience (with C, that is) by the
> people who invented C. A decent respect for the need to retain backwards
> compatibility means that we can't undo those bad design decisions - but
> that doesn't prevent the creation of new functions with similar
> functionality and a more useful return value.


The designer(s) of fgets() may have been backward looking instead of forward
looking; not intent on making a composable routine--which works well with ad
hoc buffer parsing code--but rather one which works conveniently with the
pre-existing string routines--i.e. read a string then pass that string to
some other string routine which will lazily determine string length while
processing it.

Reply With Quote
Alt Today
Advertising
 
and become member of Rhinocerus
Standard Sponsored Links

  #17 (permalink)  
Old 04-12-2012, 12:23 AM
BartC
Guest
 
Posts: n/a
Default Re: fgets - design deficiency: no efficient way of finding last character read

"William Ahern" <william@wilbur.25thandClement.com> wrote in message
news:6phh59-6ja.ln1@wilbur.25thandClement.com...
> James Kuyper <jameskuyper@verizon.net> wrote:
>> On 04/11/2012 07:07 PM, John Reye wrote:


>> > Why on earth return an identical pointer most of the time???
>> > Returning a count of the number of bytes read would have been a far
>> > better choice for the return value, wouldn't it?

>
>> Many of the C standard library functions would have been more useful if
>> they'd returned a pointer to the end of a string or buffer, rather than
>> to its beginning. I chalk it up to inexperience (with C, that is) by the
>> people who invented C. A decent respect for the need to retain backwards
>> compatibility means that we can't undo those bad design decisions - but
>> that doesn't prevent the creation of new functions with similar
>> functionality and a more useful return value.

>
> The designer(s) of fgets() may have been backward looking instead of
> forward
> looking; not intent on making a composable routine--which works well with
> ad
> hoc buffer parsing code--but rather one which works conveniently with the
> pre-existing string routines--i.e. read a string then pass that string to
> some other string routine which will lazily determine string length while
> processing it.


Except that fgets() can return NULL on error. That makes it harder to use
the return value unchecked.

--
Bartc

Reply With Quote
  #18 (permalink)  
Old 04-12-2012, 12:47 AM
William Ahern
Guest
 
Posts: n/a
Default Re: fgets - design deficiency: no efficient way of finding last character read

BartC <bc@freeuk.com> wrote:
> "William Ahern" <william@wilbur.25thandClement.com> wrote in message
> news:6phh59-6ja.ln1@wilbur.25thandClement.com...

<snip>
> > The designer(s) of fgets() may have been backward looking instead of
> > forward looking; not intent on making a composable routine--which works
> > well with ad hoc buffer parsing code--but rather one which works
> > conveniently with the pre-existing string routines--i.e. read a string
> > then pass that string to some other string routine which will lazily
> > determine string length while processing it.


> Except that fgets() can return NULL on error. That makes it harder to use
> the return value unchecked.


I didn't mean _literally_ read and pass to another routine, without checking
the return value. But, point taken. Although, IME the typical usage is
`while (fgets()) { ... }', which makes it convenient to use in the very
common cases where I/O errors are ignored or treated similarly to EOF.

Reply With Quote
  #19 (permalink)  
Old 04-12-2012, 02:50 AM
Eric Sosman
Guest
 
Posts: n/a
Default Re: fgets - design deficiency: no efficient way of finding last characterread

On 4/11/2012 12:43 PM, John Reye wrote:
> Hello,
>
> The last character read from fgets(buf, sizeof(buf), inputstream) is:
> '\n'
> OR
> any character x, when no '\n' was encountered in sizeof(buf)-1
> consecutive chars, or when x is the last char of the inputstream
>
> ***How can one EFFICIENTLY determine if the last character is '\n'??
> "Efficiently" means: don't use strlen!!!


Kaz' method is pretty slick. However, the time for strlen() is
likely to be insignificant compared to the time for the I/O itself.

> A well-designed fgets function should return the length of characters
> read, should it not??


IMHO that would be a more useful return value than the one fgets()
actually delivers, but this is scarcely the only unfortunate choice
to be found in the Standard library. For example, strcpy() and strcat()
"know" where their output strings end and could return that information
instead of echoing back a value the caller already has. In another
thread we've just rehashed the gotchas of <ctype.h> for the umpty-
skillionth time. No doubt other folks have their own pet peeves.

Tell me, though: Are you using a QWERTY keyboard, despite all its
drawbacks? Legend[*] has it that QWERTY was chosen *on purpose* to
slow down typists in the days when too much speed led to mechanical
jams. On today's keyboards that's not a problem -- So, are you still
using a nineteenth-century keyboard layout? If so, ponder your reasons
for not changing to something more modern, and see if those reasons
shed any light on why people still put up with the Standard Warts And
All Library.
[*] Wikipedia disputes the legend, but a Wikipedia page is only
as good as its most recent editor.

--
Eric Sosman
esosman@ieee-dot-org.invalid
Reply With Quote
  #20 (permalink)  
Old 04-13-2012, 01:49 AM
Nobody
Guest
 
Posts: n/a
Default Re: fgets - design deficiency: no efficient way of finding last character read

On Wed, 11 Apr 2012 15:14:26 -0700, John Reye wrote:

> So fread() more efficient than continous getc(). Or am I wrong?


Maybe, maybe not. getc() is allowed to be implemented as a macro, so
a getc() loop could end up as little more than memcpy().

However: if the C library is thread-safe (which may be a compiler option),
it will end up locking the stream for each call, which will definitely be
worse than a single fread().

In GNU libc 1.x, getc was a light-weight macro. This changed in 2.x due to
thread safety, but it has _unlocked versions of many of the stdio
functions, e.g. fgetc_unlocked:

// libio.h:

#define _IO_getc_unlocked(_fp) \
(_IO_BE ((_fp)->_IO_read_ptr >= (_fp)->_IO_read_end, 0) \
? __uflow (_fp) : *(unsigned char *) (_fp)->_IO_read_ptr++)

// bits/stdio.h:

# ifdef __USE_MISC
/* Faster version when locking is not necessary. */
__STDIO_INLINE int
getc_unlocked (FILE *__fp)
{
return _IO_getc_unlocked (__fp);
}
# endif /* misc */

With the right switches (e.g. disabling thread safety or
-Dgetc=getc_unlocked) and sufficient optimisation, a getc() loop could
realistically be limited by memory bandwidth.

Reply With Quote
  #21 (permalink)  
Old 04-13-2012, 02:08 AM
Nobody
Guest
 
Posts: n/a
Default Re: fgets - design deficiency: no efficient way of finding last character read

On Wed, 11 Apr 2012 22:50:10 -0400, Eric Sosman wrote:

> Tell me, though: Are you using a QWERTY keyboard, despite all its
> drawbacks? Legend[*] has it that QWERTY was chosen *on purpose* to
> slow down typists in the days when too much speed led to mechanical
> jams. On today's keyboards that's not a problem -- So, are you still
> using a nineteenth-century keyboard layout?


A related issue (which clearly isn't legend) is that nearly all computer
keyboards still have the staggered layout of a mechanical typewriter.

And unlike a completely different layout, eliminating the stagger would be
a fairly minor incompatibility (you'd still be using the same finger for
each letter).

Reply With Quote
  #22 (permalink)  
Old 04-13-2012, 03:31 AM
Ben Pfaff
Guest
 
Posts: n/a
Default Re: fgets - design deficiency: no efficient way of finding last character read

Eric Sosman <esosman@ieee-dot-org.invalid> writes:

> Tell me, though: Are you using a QWERTY keyboard, despite all its
> drawbacks? Legend[*] has it that QWERTY was chosen *on purpose* to
> slow down typists in the days when too much speed led to mechanical
> jams. On today's keyboards that's not a problem -- So, are you still
> using a nineteenth-century keyboard layout? If so, ponder your reasons
> for not changing to something more modern, and see if those reasons
> shed any light on why people still put up with the Standard Warts And
> All Library.


The same topic came up here back in 2002. Here's a new copy of
what I posted back then:

Have you used a mechanical typewriter? I have. These things have
an array of letterforms on spokes[1] arranged in a half-circular
pattern in the body of the typewriter. When you hit a key, one of
them lunges forward to the place where the letter should go (the
"cursor position") and strikes the paper through the ribbon.

Now, if there's only of these spokes in motion, there's no
problem. But there's a mutual exclusion problem: if more than one
of them is in motion at once, e.g., one going out and another
coming back, then they'll hit one another and you'll have to take
a moment to disentangle them by hand, which is annoying and
possibly messy. It's a race condition that you will undoubtedly
be bitten by quickly in real typing.

The problem is exacerbated if the letterforms for common digraphs
have adjacent spokes. This is because the closer two spokes are,
the easier they can hit one another: if the spokes are at
opposite ends of the array, then they can only hit at the point
where they converge at the cursor, but if they are adjacent then
they'll hit as soon as they start moving.

One solution, of course, is to introduce serialization through
use of locking: allow only one key to be depressed at a
time. Unfortunately, that reduces parallelism, because many
digraphs that you want to type in the real world do not have
adjacent spokes, even if you just put the keys in alphabetical
order.

The adopted solution, of using a QWERTY layout, is not a real
solution to the problem. Instead, it reduces the chances of the
race condition by putting keys for common digraphs, and therefore
their spokes, far away from each other. You can still jam the
mechanism and have to untangle the spokes, but it happens less
often, at least for English text. This in fact helps you to type
*faster*, not slower, because you don't have to stop so often to
deal with jammed-together spokes.

To conclude: mechanical QWERTY typewriters are at the same time
an example of optimization for the common case and inherently
flawed because of the remaining race condition. This is a great
example of a tradeoff that you should not make when you design a
program!

[1] I don't know any of the proper vocabulary here. I was about 8
years old when I used the one we had at home, and it was thrown
out as obsolete soon after.
Reply With Quote
  #23 (permalink)  
Old 04-13-2012, 07:24 AM
santosh
Guest
 
Posts: n/a
Default Re: fgets - design deficiency: no efficient way of finding lastcharacter read

On Apr 13, 8:31*am, Ben Pfaff <b...@cs.stanford.edu> wrote:
> Eric Sosman <esos...@ieee-dot-org.invalid> writes:
> > * * Tell me, though: Are you using a QWERTY keyboard, despite all its
> > drawbacks? *Legend[*] has it that QWERTY was chosen *on purpose* to
> > slow down typists in the days when too much speed led to mechanical
> > jams. *On today's keyboards that's not a problem -- So, are you still
> > using a nineteenth-century keyboard layout? If so, ponder your reasons
> > for not changing to something more modern, and see if those reasons
> > shed any light on why people still put up with the Standard Warts And
> > All Library.

>
> The same topic came up here back in 2002. *Here's a new copy of
> what I posted back then:
>
> Have you used a mechanical typewriter? I have. <snip>


Yes, we've still got one! A 1932 manufactured Remington. Used it a lot
during the 90s. It's still in excellent condition except that I'm
unable to acquire a ribbon anywhere.

<Good explanation!>
Reply With Quote
  #24 (permalink)  
Old 04-13-2012, 10:57 AM
BartC
Guest
 
Posts: n/a
Default Re: fgets - design deficiency: no efficient way of finding last character read



"santosh" <santosh.k83@gmail.com> wrote in message
news:2261d25f-a1f2-4413-b12d-e0c21fa74997@s10g2000pbc.googlegroups.com...
> On Apr 13, 8:31 am, Ben Pfaff <b...@cs.stanford.edu> wrote:
>> Eric Sosman <esos...@ieee-dot-org.invalid> writes:
>> > Tell me, though: Are you using a QWERTY keyboard, despite all its
>> > drawbacks? Legend[*] has it that QWERTY was chosen *on purpose* to
>> > slow down typists in the days when too much speed led to mechanical
>> > jams. On today's keyboards that's not a problem -- So, are you still
>> > using a nineteenth-century keyboard layout? If so, ponder your reasons
>> > for not changing to something more modern, and see if those reasons
>> > shed any light on why people still put up with the Standard Warts And
>> > All Library.

>>
>> The same topic came up here back in 2002. Here's a new copy of
>> what I posted back then:
>>
>> Have you used a mechanical typewriter? I have. <snip>

>
> Yes, we've still got one! A 1932 manufactured Remington. Used it a lot
> during the 90s. It's still in excellent condition except that I'm
> unable to acquire a ribbon anywhere.


I've got an Underwood No. 5 - from 1931. I use it for addressing envelopes,
as it would probably take a week of trial and error (and a wastepaper bin
full of trashed envelopes) to do it on my laser printer.

--
Bartc

Reply With Quote
  #25 (permalink)  
Old 04-13-2012, 02:39 PM
lawrence.jones@siemens.com
Guest
 
Posts: n/a
Default Re: fgets - design deficiency: no efficient way of finding last character read

Ben Pfaff <blp@cs.stanford.edu> wrote:
>
> To conclude: mechanical QWERTY typewriters are at the same time
> an example of optimization for the common case and inherently
> flawed because of the remaining race condition. This is a great
> example of a tradeoff that you should not make when you design a
> program!


I would assert that it's a great example of the kind of tradeoffs you
frequently *have* to make when designing programs! :-)
--
Larry Jones

Hello, local Navy recruitment office? Yes, this is an emergency... -- Calvin
Reply With Quote
  #26 (permalink)  
Old 04-13-2012, 02:51 PM
Jorgen Grahn
Guest
 
Posts: n/a
Default Re: fgets - design deficiency: no efficient way of finding lastcharacter read

On Wed, 2012-04-11, John Reye wrote:
> On Apr 11, 7:53*pm, Ben Pfaff <b...@cs.stanford.edu> wrote:
>> Rupert Swarbrick <rswarbr...@gmail.com> writes:
>> It's fairly common for machine-generated HTML and XML (which are
>> text-based formats) to be single, very-long lines.

>
> Correct, but I would not read those huge lines, because the '\n' is
> not the logical divider.


True; you use fgets() if you can expect lines to be reasonably short.
Even then you have to handle the case where one extra long doesn't fit
into your buffer.

For the original complaint, I can see many scenarios where you:

fgets()
parse the line (in whatever way your application needs)
during the parsing, discover the lack of an \n
try to fgets() again and restart the parsing

> I however want a nice routine (which I have to code myself), which
> uses realloc, to adjust a buffer to fit everything until the '\n'.
> C standard lib does not have anything like this - so I have to code it
> myself.
>
> I bet C++ has something useful that one could use. It seems that many
> went into C++, to make it the huge bloated monster that it is! But
> still seems worth a look, to relieve me from having to handle this
> stuff at the basic level.


This is the second time you've made this threat. You can't do *both*
that and insulting the C++ programmers here with things like "huge
bloated monster".

Anyway, I can't see that with your extreme attention to micro-
inefficiencies, you would be comfortable with C++ -- it solves these
things with dynamic memory allocations. Do you really prefer that to
a strlen() in an already warm data cache?

(And do you really believe a strlen() is significant compared to the
I/O that preceded it?)

/Jorgen

--
// Jorgen Grahn <grahn@ Oo o. . .
\X/ snipabacken.se> O o .
Reply With Quote
 
Reply

Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are Off
Pingbacks are Off
Refbacks are Off




All times are GMT. The time now is 04:14 AM.


Copyright ©2009

LinkBacks Enabled by vBSEO 3.3.0 RC2 © 2009, Crawlability, Inc.