Did any compiler fully use 80-bit floating point? Announcing the arrival of Valued Associate #679: Cesar Manara Planned maintenance scheduled April 23, 2019 at 00:00UTC (8:00pm US/Eastern)128-bit floating pointAre there any articles elucidating the history of the POPCOUNT instruction?How adequate would 48-bit floating point be?Why not use fractions instead of floating point?Floating Point on Konrad Zuse's computersWhich pre-IEEE computers had a single precision FPU and implemented double precision floats in software?Commodore BASIC and binary floating point precisionHow was dataflow analysis performed before SSA?Did any early computers use a different radix to improve accuracy of rational arithmetic?Why did some early computer designers eschew integers?

Why wasn't DOSKEY integrated with COMMAND.COM?

Find 108 by using 3,4,6

Why do we bend a book to keep it straight?

How would a mousetrap for use in space work?

Is grep documentation about ignoring case wrong, since it doesn't ignore case in filenames?

Denied boarding although I have proper visa and documentation. To whom should I make a complaint?

How do I find out the mythology and history of my Fortress?

Maximum summed subsequences with non-adjacent items

Performance gap between vector<bool> and array

Hangman Game with C++

How much damage would a cupful of neutron star matter do to the Earth?

Do wooden building fires get hotter than 600°C?

Time to Settle Down!

How does the secondary effect of the Heat Metal spell interact with a creature resistant/immune to fire damage?

Is there a kind of relay that only consumes power when switching?

Is it a good idea to use CNN to classify 1D signal?

How do living politicians protect their readily obtainable signatures from misuse?

SF book about people trapped in a series of worlds they imagine

Did any compiler fully use 80-bit floating point?

What is the appropriate index architecture when forced to implement IsDeleted (soft deletes)?

How come Sam didn't become Lord of Horn Hill?

Can anything be seen from the center of the Boötes void? How dark would it be?

Triggering an ultrasonic sensor

How fail-safe is nr as stop bytes?

Did any compiler fully use 80-bit floating point?

Announcing the arrival of Valued Associate #679: Cesar Manara

Planned maintenance scheduled April 23, 2019 at 00:00UTC (8:00pm US/Eastern)128-bit floating pointAre there any articles elucidating the history of the POPCOUNT instruction?How adequate would 48-bit floating point be?Why not use fractions instead of floating point?Floating Point on Konrad Zuse's computersWhich pre-IEEE computers had a single precision FPU and implemented double precision floats in software?Commodore BASIC and binary floating point precisionHow was dataflow analysis performed before SSA?Did any early computers use a different radix to improve accuracy of rational arithmetic?Why did some early computer designers eschew integers?

There is a paradox about floating point that I'm trying to understand.

Floating point is an eternal struggle with the problem that real numbers are both essential and incomputable. It's the best solution we have for most calculations involving physical quantities, but has the perennial problems of limited precision and range; many volumes have been written about how to deal with these problems, even down to hardware engineers getting headaches implementing support for subnormal numbers, only to have programmers promptly turning this off because it kills performance on workloads where many numbers iterate to zero. (The usual reference here is the introductory document What every computer scientist should know about floating-point arithmetic, but for a more in-depth discussion, it's worth reading the writings of William Kahan, one of the world's foremost experts on the topic, and a very clear writer.)

The usual standard for floating point where substantial precision is required is IEEE-745 double precision, 64 bits. It's the best most hardware provides; doing even slightly better typically requires switching to a software solution for a dramatic slowdown.

The x87 went one better and provided extended precision, 80 bits. A Google search finds many articles about this, and almost all of them lament the problem that when compilers spill temporaries from registers to memory, they round to 64 bits, so the exact results very quasi-randomly depending on the behavior of the optimizer, which admittedly is a problem indeed.

The obvious solution is for the in-memory format to be also 80 bits, so that you get both extended precision and consistency. But I have not encountered any mention, ever, of this being used. It's moot now that one uses SSE2 which doesn't provide extended precision, but I would expect it to have been used in the days when x87 was the available floating-point instruction set.

The paradox is this: on the one hand, there is much discussion of limited precision being a big problem. On the other hand, Intel provided a solution with an extra eleven bits of precision and five bits of exponent, that would cost very little performance to use (since the hardware implemented it whether you used it or not), and yet everyone seemed to behave as though this had no value, and to positively celebrate the move to SSE2 where extended precision is no longer available.

So my question is:

Did any compilers ever make full use of extended precision (i.e. 80 bits in memory as well as in registers)? If not, why not?

edited 2 hours ago

asked 3 hours ago

rwallace

11.2k456164

1

Heh exactly what I was thinking the other day while trying to debug my 128 bit floating point class and was comparing hex output with something working like 32 bit float, 64 bit double and 80 bit long double In my compiler (Borland/Embarcadero) the last is supported for computation (but its unstable for heavy use as it can cause crashes probably due to fact the compiler I use is old 2006 on newer CPU and HW FPU could changed since) but the compiler itself truncate constants to 32/64 bit so it is not the same if you write a=1.0/3.0; or a=1.0; b=3.0; a=a/b; the latter is more precise.

– Spektre
2 hours ago

1

@Spektre It’s been many years since I used Borland Turbo C, but that’s correct behavior. The literal constants 1.0 and 3.0 have type double, so 1.0/3.0 is computed with only double precision, then padded to the length of a long double when it is implicitly converted. You could get full precision by writing a = 1.0L/3.0L;.

– Davislor
1 hour ago

1

@Davislor Its not the ancient Borland Turbo C++ but BDS2006 win32 compiler with VCL I can see this: 0.3 is 64 bit, 0.3f is 32 bit and 0.3l or 0.3L is 64 bit so no 80bit ... just 52+1 bits of mantissa for direct constants

– Spektre
1 hour ago

1

Checking with godbolt.org, GCC, Clang and ICC all perform 80-bit math and 80-bit memory stores with long double variables.

– Davislor
1 hour ago

1

@lvd Sure, going back to x87 now and foregoing vector operations would be a big performance hit as you say. I was talking about the historical context before SSE2 was invented.

– rwallace
13 mins ago

|
show 3 more comments

There is a paradox about floating point that I'm trying to understand.

So my question is:

Did any compilers ever make full use of extended precision (i.e. 80 bits in memory as well as in registers)? If not, why not?

edited 2 hours ago

asked 3 hours ago

rwallace

11.2k456164

1

Heh exactly what I was thinking the other day while trying to debug my 128 bit floating point class and was comparing hex output with something working like 32 bit float, 64 bit double and 80 bit long double In my compiler (Borland/Embarcadero) the last is supported for computation (but its unstable for heavy use as it can cause crashes probably due to fact the compiler I use is old 2006 on newer CPU and HW FPU could changed since) but the compiler itself truncate constants to 32/64 bit so it is not the same if you write a=1.0/3.0; or a=1.0; b=3.0; a=a/b; the latter is more precise.

– Spektre
2 hours ago

1

@Spektre It’s been many years since I used Borland Turbo C, but that’s correct behavior. The literal constants 1.0 and 3.0 have type double, so 1.0/3.0 is computed with only double precision, then padded to the length of a long double when it is implicitly converted. You could get full precision by writing a = 1.0L/3.0L;.

– Davislor
1 hour ago

1

@Davislor Its not the ancient Borland Turbo C++ but BDS2006 win32 compiler with VCL I can see this: 0.3 is 64 bit, 0.3f is 32 bit and 0.3l or 0.3L is 64 bit so no 80bit ... just 52+1 bits of mantissa for direct constants

– Spektre
1 hour ago

1

Checking with godbolt.org, GCC, Clang and ICC all perform 80-bit math and 80-bit memory stores with long double variables.

– Davislor
1 hour ago

1

@lvd Sure, going back to x87 now and foregoing vector operations would be a big performance hit as you say. I was talking about the historical context before SSE2 was invented.

– rwallace
13 mins ago

|
show 3 more comments

There is a paradox about floating point that I'm trying to understand.

So my question is:

Did any compilers ever make full use of extended precision (i.e. 80 bits in memory as well as in registers)? If not, why not?

edited 2 hours ago

asked 3 hours ago

rwallace

11.2k456164

There is a paradox about floating point that I'm trying to understand.

So my question is:

Did any compilers ever make full use of extended precision (i.e. 80 bits in memory as well as in registers)? If not, why not?

history compilers floating-point

edited 2 hours ago

asked 3 hours ago

rwallace

11.2k456164

edited 2 hours ago

asked 3 hours ago

rwallace

11.2k456164

edited 2 hours ago

asked 3 hours ago

rwallace

11.2k456164

asked 3 hours ago

rwallace

11.2k456164

asked 3 hours ago

rwallace

11.2k456164

1

Heh exactly what I was thinking the other day while trying to debug my 128 bit floating point class and was comparing hex output with something working like 32 bit float, 64 bit double and 80 bit long double In my compiler (Borland/Embarcadero) the last is supported for computation (but its unstable for heavy use as it can cause crashes probably due to fact the compiler I use is old 2006 on newer CPU and HW FPU could changed since) but the compiler itself truncate constants to 32/64 bit so it is not the same if you write a=1.0/3.0; or a=1.0; b=3.0; a=a/b; the latter is more precise.

– Spektre
2 hours ago

1

@Spektre It’s been many years since I used Borland Turbo C, but that’s correct behavior. The literal constants 1.0 and 3.0 have type double, so 1.0/3.0 is computed with only double precision, then padded to the length of a long double when it is implicitly converted. You could get full precision by writing a = 1.0L/3.0L;.

– Davislor
1 hour ago

1

@Davislor Its not the ancient Borland Turbo C++ but BDS2006 win32 compiler with VCL I can see this: 0.3 is 64 bit, 0.3f is 32 bit and 0.3l or 0.3L is 64 bit so no 80bit ... just 52+1 bits of mantissa for direct constants

– Spektre
1 hour ago

1

Checking with godbolt.org, GCC, Clang and ICC all perform 80-bit math and 80-bit memory stores with long double variables.

– Davislor
1 hour ago

1

@lvd Sure, going back to x87 now and foregoing vector operations would be a big performance hit as you say. I was talking about the historical context before SSE2 was invented.

– rwallace
13 mins ago

|
show 3 more comments

1

Heh exactly what I was thinking the other day while trying to debug my 128 bit floating point class and was comparing hex output with something working like 32 bit float, 64 bit double and 80 bit long double In my compiler (Borland/Embarcadero) the last is supported for computation (but its unstable for heavy use as it can cause crashes probably due to fact the compiler I use is old 2006 on newer CPU and HW FPU could changed since) but the compiler itself truncate constants to 32/64 bit so it is not the same if you write a=1.0/3.0; or a=1.0; b=3.0; a=a/b; the latter is more precise.

– Spektre
2 hours ago

1

@Spektre It’s been many years since I used Borland Turbo C, but that’s correct behavior. The literal constants 1.0 and 3.0 have type double, so 1.0/3.0 is computed with only double precision, then padded to the length of a long double when it is implicitly converted. You could get full precision by writing a = 1.0L/3.0L;.

– Davislor
1 hour ago

1

@Davislor Its not the ancient Borland Turbo C++ but BDS2006 win32 compiler with VCL I can see this: 0.3 is 64 bit, 0.3f is 32 bit and 0.3l or 0.3L is 64 bit so no 80bit ... just 52+1 bits of mantissa for direct constants

– Spektre
1 hour ago

1

Checking with godbolt.org, GCC, Clang and ICC all perform 80-bit math and 80-bit memory stores with long double variables.

– Davislor
1 hour ago

1

@lvd Sure, going back to x87 now and foregoing vector operations would be a big performance hit as you say. I was talking about the historical context before SSE2 was invented.

– rwallace
13 mins ago

Heh exactly what I was thinking the other day while trying to debug my 128 bit floating point class and was comparing hex output with something working like 32 bit float, 64 bit double and 80 bit long double In my compiler (Borland/Embarcadero) the last is supported for computation (but its unstable for heavy use as it can cause crashes probably due to fact the compiler I use is old 2006 on newer CPU and HW FPU could changed since) but the compiler itself truncate constants to 32/64 bit so it is not the same if you write a=1.0/3.0; or a=1.0; b=3.0; a=a/b; the latter is more precise.

– Spektre
2 hours ago

@Spektre It’s been many years since I used Borland Turbo C, but that’s correct behavior. The literal constants 1.0 and 3.0 have type double, so 1.0/3.0 is computed with only double precision, then padded to the length of a long double when it is implicitly converted. You could get full precision by writing a = 1.0L/3.0L;.

– Davislor
1 hour ago

@Davislor Its not the ancient Borland Turbo C++ but BDS2006 win32 compiler with VCL I can see this: 0.3 is 64 bit, 0.3f is 32 bit and 0.3l or 0.3L is 64 bit so no 80bit ... just 52+1 bits of mantissa for direct constants

– Spektre
1 hour ago

Checking with godbolt.org, GCC, Clang and ICC all perform 80-bit math and 80-bit memory stores with long double variables.

– Davislor
1 hour ago

@lvd Sure, going back to x87 now and foregoing vector operations would be a big performance hit as you say. I was talking about the historical context before SSE2 was invented.

– rwallace
13 mins ago

|
show 3 more comments

2 Answers
2

active

oldest

votes

Yes. For example, the C math library has had full support for long double, which on x87 was 80 bits wide, since C99. Previous versions of the standard library supported only the double type. Conforming C and C++ compilers also perform long double math if you give the operations a long double argument. (Recall that, in C, 1.0/3.0 divides a double by another double, producing a double-precision result, and to get long double precision, you would write 1.0L/3.0L.)

GCC, in particular, even has options such as -ffloat-store to turn off computing intermediate results to a higher precision than a double is supposed to have. However, some versions of gcc (for all languages, not just C) did sometimes spill an 80-bit quantity to 64 bits.

Testing with godbolt.org, GCC, Clang and ICC in x87 mode all perform 80-bit computations and memory stores with long double variables—except that they will optimize constants such as 0.5L to double when that will save memory at no loss of precision. MSVC 2017, however, only supports SSE.

There was never a standard way to specify 80-bit precision in Fortran, despite its main purpose being scientific computation, but some compilers provided extensions such as kind=3. There were other languages that supported 80-bit precision to some degree as well.

Another possible example is Haskell, which provided both exact Rational types and arbitrary-precision floating-point through Data.Number.CReal. So far as I know, no implementation used x87 80-bit hardware, but it might still be an answer to your question.

edited 1 hour ago

answered 1 hour ago

Davislor

1,230210

1

The 2nd sentence of your -ffloat-store paragraph is a little weird. There are no versions of gcc that can always avoid spilling intermediate 80-bit temporaries, because there are only 8 x87 stack registers. And passing args to non-inline functions is done in memory according to the ABI + calling convention, which specifies binary64 double. I don't think there are any versions or options for gcc that will make it spill/reload double locals with 80-bit, so you only ever get 80-bit temporaries in cases where the vars can stay in regs (unless you declare them long double of course.)

– Peter Cordes
28 mins ago

1

(And the x87 regs are all call-clobbered, so any non-inline function call requires spilling everything else, as well as the double you're passing to sin() or whatever.) All the 32-bit calling conventions I'm aware of worked this way, passing FP args on the stack but returning float/double in ST0 as 80-bit. 32-bit MSVC will use x87 with its default calling convention: godbolt.org/z/YGhqFP shows that even in 32-bit though, long double was only 64-bit QWORD like it still is for Windows x64.

– Peter Cordes
26 mins ago

1

@PeterCordes Unless I misunderstand the output from godbolt, what you said applies only if you pass double arguments. When I tested with long double and sinl(), the compiler spilled them as TBYTE, that is, 80-bit.

– Davislor
20 mins ago

1

Apparently MSVC's CRT start files used to reduce x87 precision from 64-bit significand down to 53-bit (64-bit double), to maybe speed up fdiv and fsqrt. And/or to avoid extra intermediate precision without the cost of store/reload to round them. Bruce Dawson's randomascii.wordpress.com/2012/03/21/… article explains more. (And apparently D3D9 library init set x87 precision to 24-bit significand single-precision float! making everything less precise for a tiny speed gain on div/sqrt.) Modern MSVC doesn't do that anymore.

– Peter Cordes
19 mins ago

1

That's correct, you can call functions that take long double arguments, but my point was that gcc can't do that for existing code that uses double. (For standard library functions it could in theory have an option to replace sin with sinl, but some custom function with a double arg might not have a long double version). And if you call sinl(a), double b,c,d,e; locals will all be rounded from 80-bit to 64-bit by the store-reload to save them across that function call. That's probably the more important point.

– Peter Cordes
14 mins ago

|
show 1 more comment

Did any compilers ever make full use of extended precision (i.e. 80
bits in memory as well as in registers)? If not, why not?

Since any calculations inside the x87 fpu have 80bit precision by default, any compiler that's able to generate x87 fpu code, is already using extended precision.
I also remember using long double even in 16-bit compilers for real mode.

The very similar situation was in 68k world, with FPUs like 68881 and 68882 supporting 80bit precision by default and any FPU code without special precautions would keep all register values in that precision. There was also long double datatype.

On the other hand, Intel provided a solution with an extra eleven bits
of precision and five bits of exponent, that would cost very little
performance to use (since the hardware implemented it whether you used
it or not), and yet everyone seemed to behave as though this had no
value

The usage of long double would prevent contemporary compilers from ever making calculations using SSE/whatever registers and instructions. And SSE is actually a very fast engine, able to fetch data in large chunks and make several computations in parallel, every clock. The x87 fpu now is just a legacy, not being very fast. So the deliberate usage of 80bit precision now would be certainly a huge performance hit.

answered 58 mins ago

lvd

2,955721

Right, I was talking about the historical context in which x87 was the only FPU on x86, so no performance hit from using it. Good point about 68881 being a very similar architecture.

– rwallace
9 mins ago

add a comment |

Your Answer

StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "648"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);

else
createEditor();

);

function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
noCode: true, onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);

);

draft saved

draft discarded

StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fretrocomputing.stackexchange.com%2fquestions%2f9751%2fdid-any-compiler-fully-use-80-bit-floating-point%23new-answer', 'question_page');

);

Post as a guest

Name

Required, but never shown

2 Answers
2

active

oldest

votes

2 Answers
2

active

oldest

votes

edited 1 hour ago

answered 1 hour ago

Davislor

1,230210

1

The 2nd sentence of your -ffloat-store paragraph is a little weird. There are no versions of gcc that can always avoid spilling intermediate 80-bit temporaries, because there are only 8 x87 stack registers. And passing args to non-inline functions is done in memory according to the ABI + calling convention, which specifies binary64 double. I don't think there are any versions or options for gcc that will make it spill/reload double locals with 80-bit, so you only ever get 80-bit temporaries in cases where the vars can stay in regs (unless you declare them long double of course.)

– Peter Cordes
28 mins ago

1

(And the x87 regs are all call-clobbered, so any non-inline function call requires spilling everything else, as well as the double you're passing to sin() or whatever.) All the 32-bit calling conventions I'm aware of worked this way, passing FP args on the stack but returning float/double in ST0 as 80-bit. 32-bit MSVC will use x87 with its default calling convention: godbolt.org/z/YGhqFP shows that even in 32-bit though, long double was only 64-bit QWORD like it still is for Windows x64.

– Peter Cordes
26 mins ago

1

@PeterCordes Unless I misunderstand the output from godbolt, what you said applies only if you pass double arguments. When I tested with long double and sinl(), the compiler spilled them as TBYTE, that is, 80-bit.

– Davislor
20 mins ago

1

Apparently MSVC's CRT start files used to reduce x87 precision from 64-bit significand down to 53-bit (64-bit double), to maybe speed up fdiv and fsqrt. And/or to avoid extra intermediate precision without the cost of store/reload to round them. Bruce Dawson's randomascii.wordpress.com/2012/03/21/… article explains more. (And apparently D3D9 library init set x87 precision to 24-bit significand single-precision float! making everything less precise for a tiny speed gain on div/sqrt.) Modern MSVC doesn't do that anymore.

– Peter Cordes
19 mins ago

1

That's correct, you can call functions that take long double arguments, but my point was that gcc can't do that for existing code that uses double. (For standard library functions it could in theory have an option to replace sin with sinl, but some custom function with a double arg might not have a long double version). And if you call sinl(a), double b,c,d,e; locals will all be rounded from 80-bit to 64-bit by the store-reload to save them across that function call. That's probably the more important point.

– Peter Cordes
14 mins ago

|
show 1 more comment

edited 1 hour ago

answered 1 hour ago

Davislor

1,230210

1

The 2nd sentence of your -ffloat-store paragraph is a little weird. There are no versions of gcc that can always avoid spilling intermediate 80-bit temporaries, because there are only 8 x87 stack registers. And passing args to non-inline functions is done in memory according to the ABI + calling convention, which specifies binary64 double. I don't think there are any versions or options for gcc that will make it spill/reload double locals with 80-bit, so you only ever get 80-bit temporaries in cases where the vars can stay in regs (unless you declare them long double of course.)

– Peter Cordes
28 mins ago

1

(And the x87 regs are all call-clobbered, so any non-inline function call requires spilling everything else, as well as the double you're passing to sin() or whatever.) All the 32-bit calling conventions I'm aware of worked this way, passing FP args on the stack but returning float/double in ST0 as 80-bit. 32-bit MSVC will use x87 with its default calling convention: godbolt.org/z/YGhqFP shows that even in 32-bit though, long double was only 64-bit QWORD like it still is for Windows x64.

– Peter Cordes
26 mins ago

1

@PeterCordes Unless I misunderstand the output from godbolt, what you said applies only if you pass double arguments. When I tested with long double and sinl(), the compiler spilled them as TBYTE, that is, 80-bit.

– Davislor
20 mins ago

1

Apparently MSVC's CRT start files used to reduce x87 precision from 64-bit significand down to 53-bit (64-bit double), to maybe speed up fdiv and fsqrt. And/or to avoid extra intermediate precision without the cost of store/reload to round them. Bruce Dawson's randomascii.wordpress.com/2012/03/21/… article explains more. (And apparently D3D9 library init set x87 precision to 24-bit significand single-precision float! making everything less precise for a tiny speed gain on div/sqrt.) Modern MSVC doesn't do that anymore.

– Peter Cordes
19 mins ago

1

That's correct, you can call functions that take long double arguments, but my point was that gcc can't do that for existing code that uses double. (For standard library functions it could in theory have an option to replace sin with sinl, but some custom function with a double arg might not have a long double version). And if you call sinl(a), double b,c,d,e; locals will all be rounded from 80-bit to 64-bit by the store-reload to save them across that function call. That's probably the more important point.

– Peter Cordes
14 mins ago

|
show 1 more comment

edited 1 hour ago

answered 1 hour ago

Davislor

1,230210

edited 1 hour ago

answered 1 hour ago

Davislor

1,230210

edited 1 hour ago

answered 1 hour ago

Davislor

1,230210

answered 1 hour ago

Davislor

1,230210

answered 1 hour ago

Davislor

1,230210

1

The 2nd sentence of your -ffloat-store paragraph is a little weird. There are no versions of gcc that can always avoid spilling intermediate 80-bit temporaries, because there are only 8 x87 stack registers. And passing args to non-inline functions is done in memory according to the ABI + calling convention, which specifies binary64 double. I don't think there are any versions or options for gcc that will make it spill/reload double locals with 80-bit, so you only ever get 80-bit temporaries in cases where the vars can stay in regs (unless you declare them long double of course.)

– Peter Cordes
28 mins ago

1

(And the x87 regs are all call-clobbered, so any non-inline function call requires spilling everything else, as well as the double you're passing to sin() or whatever.) All the 32-bit calling conventions I'm aware of worked this way, passing FP args on the stack but returning float/double in ST0 as 80-bit. 32-bit MSVC will use x87 with its default calling convention: godbolt.org/z/YGhqFP shows that even in 32-bit though, long double was only 64-bit QWORD like it still is for Windows x64.

– Peter Cordes
26 mins ago

1

@PeterCordes Unless I misunderstand the output from godbolt, what you said applies only if you pass double arguments. When I tested with long double and sinl(), the compiler spilled them as TBYTE, that is, 80-bit.

– Davislor
20 mins ago

1

Apparently MSVC's CRT start files used to reduce x87 precision from 64-bit significand down to 53-bit (64-bit double), to maybe speed up fdiv and fsqrt. And/or to avoid extra intermediate precision without the cost of store/reload to round them. Bruce Dawson's randomascii.wordpress.com/2012/03/21/… article explains more. (And apparently D3D9 library init set x87 precision to 24-bit significand single-precision float! making everything less precise for a tiny speed gain on div/sqrt.) Modern MSVC doesn't do that anymore.

– Peter Cordes
19 mins ago

1

That's correct, you can call functions that take long double arguments, but my point was that gcc can't do that for existing code that uses double. (For standard library functions it could in theory have an option to replace sin with sinl, but some custom function with a double arg might not have a long double version). And if you call sinl(a), double b,c,d,e; locals will all be rounded from 80-bit to 64-bit by the store-reload to save them across that function call. That's probably the more important point.

– Peter Cordes
14 mins ago

|
show 1 more comment

1

The 2nd sentence of your -ffloat-store paragraph is a little weird. There are no versions of gcc that can always avoid spilling intermediate 80-bit temporaries, because there are only 8 x87 stack registers. And passing args to non-inline functions is done in memory according to the ABI + calling convention, which specifies binary64 double. I don't think there are any versions or options for gcc that will make it spill/reload double locals with 80-bit, so you only ever get 80-bit temporaries in cases where the vars can stay in regs (unless you declare them long double of course.)

– Peter Cordes
28 mins ago

1

(And the x87 regs are all call-clobbered, so any non-inline function call requires spilling everything else, as well as the double you're passing to sin() or whatever.) All the 32-bit calling conventions I'm aware of worked this way, passing FP args on the stack but returning float/double in ST0 as 80-bit. 32-bit MSVC will use x87 with its default calling convention: godbolt.org/z/YGhqFP shows that even in 32-bit though, long double was only 64-bit QWORD like it still is for Windows x64.

– Peter Cordes
26 mins ago

1

@PeterCordes Unless I misunderstand the output from godbolt, what you said applies only if you pass double arguments. When I tested with long double and sinl(), the compiler spilled them as TBYTE, that is, 80-bit.

– Davislor
20 mins ago

1

Apparently MSVC's CRT start files used to reduce x87 precision from 64-bit significand down to 53-bit (64-bit double), to maybe speed up fdiv and fsqrt. And/or to avoid extra intermediate precision without the cost of store/reload to round them. Bruce Dawson's randomascii.wordpress.com/2012/03/21/… article explains more. (And apparently D3D9 library init set x87 precision to 24-bit significand single-precision float! making everything less precise for a tiny speed gain on div/sqrt.) Modern MSVC doesn't do that anymore.

– Peter Cordes
19 mins ago

1

That's correct, you can call functions that take long double arguments, but my point was that gcc can't do that for existing code that uses double. (For standard library functions it could in theory have an option to replace sin with sinl, but some custom function with a double arg might not have a long double version). And if you call sinl(a), double b,c,d,e; locals will all be rounded from 80-bit to 64-bit by the store-reload to save them across that function call. That's probably the more important point.

– Peter Cordes
14 mins ago

The 2nd sentence of your -ffloat-store paragraph is a little weird. There are no versions of gcc that can always avoid spilling intermediate 80-bit temporaries, because there are only 8 x87 stack registers. And passing args to non-inline functions is done in memory according to the ABI + calling convention, which specifies binary64 double. I don't think there are any versions or options for gcc that will make it spill/reload double locals with 80-bit, so you only ever get 80-bit temporaries in cases where the vars can stay in regs (unless you declare them long double of course.)

– Peter Cordes
28 mins ago

(And the x87 regs are all call-clobbered, so any non-inline function call requires spilling everything else, as well as the double you're passing to sin() or whatever.) All the 32-bit calling conventions I'm aware of worked this way, passing FP args on the stack but returning float/double in ST0 as 80-bit. 32-bit MSVC will use x87 with its default calling convention: godbolt.org/z/YGhqFP shows that even in 32-bit though, long double was only 64-bit QWORD like it still is for Windows x64.

– Peter Cordes
26 mins ago

@PeterCordes Unless I misunderstand the output from godbolt, what you said applies only if you pass double arguments. When I tested with long double and sinl(), the compiler spilled them as TBYTE, that is, 80-bit.

– Davislor
20 mins ago

Apparently MSVC's CRT start files used to reduce x87 precision from 64-bit significand down to 53-bit (64-bit double), to maybe speed up fdiv and fsqrt. And/or to avoid extra intermediate precision without the cost of store/reload to round them. Bruce Dawson's randomascii.wordpress.com/2012/03/21/… article explains more. (And apparently D3D9 library init set x87 precision to 24-bit significand single-precision float! making everything less precise for a tiny speed gain on div/sqrt.) Modern MSVC doesn't do that anymore.

– Peter Cordes
19 mins ago

That's correct, you can call functions that take long double arguments, but my point was that gcc can't do that for existing code that uses double. (For standard library functions it could in theory have an option to replace sin with sinl, but some custom function with a double arg might not have a long double version). And if you call sinl(a), double b,c,d,e; locals will all be rounded from 80-bit to 64-bit by the store-reload to save them across that function call. That's probably the more important point.

– Peter Cordes
14 mins ago

|
show 1 more comment

Did any compilers ever make full use of extended precision (i.e. 80
bits in memory as well as in registers)? If not, why not?

On the other hand, Intel provided a solution with an extra eleven bits
of precision and five bits of exponent, that would cost very little
performance to use (since the hardware implemented it whether you used
it or not), and yet everyone seemed to behave as though this had no
value

answered 58 mins ago

lvd

2,955721

Right, I was talking about the historical context in which x87 was the only FPU on x86, so no performance hit from using it. Good point about 68881 being a very similar architecture.

– rwallace
9 mins ago

add a comment |

Did any compilers ever make full use of extended precision (i.e. 80
bits in memory as well as in registers)? If not, why not?

On the other hand, Intel provided a solution with an extra eleven bits
of precision and five bits of exponent, that would cost very little
performance to use (since the hardware implemented it whether you used
it or not), and yet everyone seemed to behave as though this had no
value

answered 58 mins ago

lvd

2,955721

Right, I was talking about the historical context in which x87 was the only FPU on x86, so no performance hit from using it. Good point about 68881 being a very similar architecture.

– rwallace
9 mins ago

add a comment |

Did any compilers ever make full use of extended precision (i.e. 80
bits in memory as well as in registers)? If not, why not?

On the other hand, Intel provided a solution with an extra eleven bits
of precision and five bits of exponent, that would cost very little
performance to use (since the hardware implemented it whether you used
it or not), and yet everyone seemed to behave as though this had no
value

answered 58 mins ago

lvd

2,955721

Did any compilers ever make full use of extended precision (i.e. 80
bits in memory as well as in registers)? If not, why not?

On the other hand, Intel provided a solution with an extra eleven bits
of precision and five bits of exponent, that would cost very little
performance to use (since the hardware implemented it whether you used
it or not), and yet everyone seemed to behave as though this had no
value

answered 58 mins ago

lvd

2,955721

answered 58 mins ago

lvd

2,955721

answered 58 mins ago

lvd

2,955721

answered 58 mins ago

lvd

2,955721

Right, I was talking about the historical context in which x87 was the only FPU on x86, so no performance hit from using it. Good point about 68881 being a very similar architecture.

– rwallace
9 mins ago

add a comment |

Right, I was talking about the historical context in which x87 was the only FPU on x86, so no performance hit from using it. Good point about 68881 being a very similar architecture.

– rwallace
9 mins ago

Right, I was talking about the historical context in which x87 was the only FPU on x86, so no performance hit from using it. Good point about 68881 being a very similar architecture.

– rwallace
9 mins ago

add a comment |

draft saved

draft discarded

Thanks for contributing an answer to Retrocomputing Stack Exchange!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

搜尋此網誌

Nehtdx

2 Answers
2

Your Answer

Post as a guest

2 Answers
2

2 Answers
2

Post as a guest

Popular posts from this blog

2 Answers 2

Your Answer

Sign up or log in

Post as a guest

Post as a guest

2 Answers 2

2 Answers 2

Sign up or log in

Post as a guest

Post as a guest

Sign up or log in

Post as a guest

Sign up or log in

Post as a guest

Sign up or log in

Post as a guest

Popular posts from this blog

2 Answers
2

2 Answers
2

2 Answers
2