Go Back   Rhinocerus > Newsgroup > Newsgroup comp.lang.* 1 > Newsgroup comp.lang.fortran

Reply
 
Thread Tools Display Modes
  #31 (permalink)  
Old 03-30-2006, 06:51 PM
glen herrmannsfeldt
Guest
 
Posts: n/a
Default Re: status of quadruple precision arithmetic in g95 and gfortran?

James Giles <jamesgiles@worldnet.att.net> wrote:

> If the compiler make *inconsistent* use of extended precision,
> Kahan's method can actually be less precise than the simple
> sum. This is because Kahan's algorithm is keeping track of
> the discrepancy between the simple sum and what should be
> calculated and adding it back in at each step.



x87 implementations are well known for inconsistent use.

It is expensive to save/reload data frequently, and there
aren't enough x87 registers to keep complicated expressions
completely in registers.

The original 8087 was supposed to be designed to have an
infinite stack, with registers being spilled to memory when
needed. It turned out not to be possible to actually write
the needed software in the end. I don't know if the problems
were fixed in later 80x87 implementations, but it doesn't seem
that it has been done yet, as far as I know.

-- glen
Reply With Quote
Alt Today
Advertising
 
and become member of Rhinocerus
Standard Sponsored Links

  #32 (permalink)  
Old 03-30-2006, 08:03 PM
James Giles
Guest
 
Posts: n/a
Default Re: status of quadruple precision arithmetic in g95 and gfortran?

glen herrmannsfeldt wrote:
> James Giles <jamesgiles@worldnet.att.net> wrote:
>
>> If the compiler make *inconsistent* use of extended precision,
>> Kahan's method can actually be less precise than the simple
>> sum. This is because Kahan's algorithm is keeping track of
>> the discrepancy between the simple sum and what should be
>> calculated and adding it back in at each step.

>
>
> x87 implementations are well known for inconsistent use.


That's why I pointed that out.

> It is expensive to save/reload data frequently, and there
> aren't enough x87 registers to keep complicated expressions
> completely in registers.


But, the instruction to set the precision mode flag in the x87
is simple and fast. It doesn't restrict the exponent range, but
we were talking about consistent precision.

--
J. Giles

"I conclude that there are two ways of constructing a software
design: One way is to make it so simple that there are obviously
no deficiencies and the other way is to make it so complicated
that there are no obvious deficiencies." -- C. A. R. Hoare


Reply With Quote
  #33 (permalink)  
Old 03-30-2006, 09:37 PM
Ian Gay
Guest
 
Posts: n/a
Default Re: status of quadruple precision arithmetic in g95 and gfortran?

"James Giles" <jamesgiles@worldnet.att.net> wrote in
news:xQWWf.655736$qk4.421681@bgtnsc05-news.ops.worldnet.att.net:

> glen herrmannsfeldt wrote:
>> James Giles <jamesgiles@worldnet.att.net> wrote:
>>
>>> If the compiler make *inconsistent* use of extended precision,
>>> Kahan's method can actually be less precise than the simple
>>> sum. This is because Kahan's algorithm is keeping track of
>>> the discrepancy between the simple sum and what should be
>>> calculated and adding it back in at each step.

>>
>>
>> x87 implementations are well known for inconsistent use.

>
> That's why I pointed that out.
>
>> It is expensive to save/reload data frequently, and there
>> aren't enough x87 registers to keep complicated expressions
>> completely in registers.

>
> But, the instruction to set the precision mode flag in the x87
> is simple and fast. It doesn't restrict the exponent range, but
> we were talking about consistent precision.
>

This can be a mixed blessing. On the example which started this
thread, forcing 64-bit stores yields the wrong answer, while
keeping intermediates in the x87 gives the right answer. Of course
if you're using an atgorithm where you _know_ consistencey is
needed, that's another matter.


--
*********** To reply by e-mail, make w single in address
**************
Reply With Quote
  #34 (permalink)  
Old 03-30-2006, 10:08 PM
James Giles
Guest
 
Posts: n/a
Default Re: status of quadruple precision arithmetic in g95 and gfortran?

Ian Gay wrote:
> "James Giles" <jamesgiles@worldnet.att.net> wrote

....
>> But, the instruction to set the precision mode flag in the x87
>> is simple and fast. It doesn't restrict the exponent range, but
>> we were talking about consistent precision.
>>

> This can be a mixed blessing. On the example which started this
> thread, forcing 64-bit stores yields the wrong answer, while
> keeping intermediates in the x87 gives the right answer. Of course
> if you're using an atgorithm where you _know_ consistencey is
> needed, that's another matter.


In fact, Kahan's algorithm should be done in the extended
precision anyway. Unless you're unrolling the loop to do
the SIMD parallel instructions, the loop is simple enough
that all its data should fit into the floating-point processor's
stack easily (there are 16 registers in the stack).

s = 0; c = 0; y = 0; t = 0
do i = 1, N
! here, the 'live' values are S and C
y = x(i) - c
! here, the 'live' values are S and Y
t = s + y
! here, the 'live' values are S, T, and Y
c = (t-s)-y
! here, the 'live' values are T and C
s = t
! this last statement is just a renaming of T for
! the next pass through the loop - it's a no op.
! here, the 'live' values are S and C just like at
! the start of the loop.
end do

Only the 'live' values need be on the stack. And, in the
first assignment you briefly need room for x(i). Well,
the most 'live' values you have at any time is three. That
seems it ought to fit in a stack that's 16 deep. You have to
access some of the values out of stack order, but all the
x87 math instructions allow for that.

So, you can keep the values on the stack and use full extended
precision for the loop and be consistent. This means you can
sum with effectively more precision than the simple summation
loop could using quad. Of course the answer (S) will only hold the
double precision part of the sum. In cases like this thread, where
the actual final answer is exactly representable in double, but the
intermediate calculations are not (but are still within twice the
extended precision), this gives just what you need.

Cost: It's four adds per loop instead of one - that's the main cost.
But there should be no more memory traffic than for the simple
summation. Whether compilers find this implementation is a
different story. At least a *little* data flow analysis is required.
That cost is to test the compiler and choose a good one (even if
it's not "free").

--
J. Giles

"I conclude that there are two ways of constructing a software
design: One way is to make it so simple that there are obviously
no deficiencies and the other way is to make it so complicated
that there are no obvious deficiencies." -- C. A. R. Hoare


Reply With Quote
  #35 (permalink)  
Old 03-30-2006, 10:38 PM
glen herrmannsfeldt
Guest
 
Posts: n/a
Default Re: status of quadruple precision arithmetic in g95 and gfortran?

James Giles <jamesgiles@worldnet.att.net> wrote:
(snip regarding precision of sums, extended precision,
and inconsistent precision)

> In fact, Kahan's algorithm should be done in the extended
> precision anyway. Unless you're unrolling the loop to do
> the SIMD parallel instructions, the loop is simple enough
> that all its data should fit into the floating-point processor's
> stack easily (there are 16 registers in the stack).


Last I knew, it was eight.

> s = 0; c = 0; y = 0; t = 0
> do i = 1, N
> ! here, the 'live' values are S and C
> y = x(i) - c
> ! here, the 'live' values are S and Y
> t = s + y
> ! here, the 'live' values are S, T, and Y
> c = (t-s)-y
> ! here, the 'live' values are T and C
> s = t
> ! this last statement is just a renaming of T for
> ! the next pass through the loop - it's a no op.
> ! here, the 'live' values are S and C just like at
> ! the start of the loop.
> end do


I believe one of the causes of stack flush is a subroutine call,
where the called routine will expect to have the full stack.
While the data could be stored in extended precision temporaries
around a call, I don't know any that do that.

> Only the 'live' values need be on the stack. And, in the
> first assignment you briefly need room for x(i). Well,
> the most 'live' values you have at any time is three. That
> seems it ought to fit in a stack that's 16 deep. You have to
> access some of the values out of stack order, but all the
> x87 math instructions allow for that.


My guess is you have a good chance, at least with some optimizations
turned on.

(snip)

> Cost: It's four adds per loop instead of one - that's the main cost.
> But there should be no more memory traffic than for the simple
> summation. Whether compilers find this implementation is a
> different story. At least a *little* data flow analysis is required.
> That cost is to test the compiler and choose a good one (even if
> it's not "free").


-- glen
Reply With Quote
  #36 (permalink)  
Old 03-30-2006, 11:58 PM
James Giles
Guest
 
Posts: n/a
Default Re: status of quadruple precision arithmetic in g95 and gfortran?

glen herrmannsfeldt wrote:
> James Giles <jamesgiles@worldnet.att.net> wrote:
> (snip regarding precision of sums, extended precision,
> and inconsistent precision)
>
>> In fact, Kahan's algorithm should be done in the extended
>> precision anyway. Unless you're unrolling the loop to do
>> the SIMD parallel instructions, the loop is simple enough
>> that all its data should fit into the floating-point processor's
>> stack easily (there are 16 registers in the stack).

>
> Last I knew, it was eight.


You're right. I was in a hurry and didn't check. (Shows how
often I do any float assembly on an x87!)

Still, only three registers are needed. Eight seems enough.

> I believe one of the causes of stack flush is a subroutine call,
> where the called routine will expect to have the full stack.
> While the data could be stored in extended precision temporaries
> around a call, I don't know any that do that.


Of course, there are no procedure calls in the Kahan summation
algorithm, an any interrupt handler that changes the content of
*any* registers is just broken.

--
J. Giles

"I conclude that there are two ways of constructing a software
design: One way is to make it so simple that there are obviously
no deficiencies and the other way is to make it so complicated
that there are no obvious deficiencies." -- C. A. R. Hoare



Reply With Quote
 
Reply

Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are Off
Pingbacks are Off
Refbacks are Off




All times are GMT. The time now is 09:00 AM.


Copyright ©2009

LinkBacks Enabled by vBSEO 3.3.0 RC2 © 2009, Crawlability, Inc.