VHDL: Why is it hard to design a floating point unit in hardware?Any good reference for digital architecture implementations of floating point arithmetic operations?Digital Architecture Design Question: Fast way to perform a floating point exponential operationHow to convert a floating point number to integer, using VHDL?VHDL 2008 fixed and floating point type synthesis support?Microcontrollers with Floating Point HardwareWhy is Floating point non-synthesizable in verilogSTM32F4 - Floating point unit ( FPU )FPGA Floating-point to Unsigned 32bitsSTM32 non floating point unit chipsfloating point conversion VHDL 2008

How do you earn the reader's trust?

why "American-born", not "America-born"?

How to test if argument is a single space?

Why is this python script running in background consuming 100 % CPU?

Proto-Indo-European (PIE) words with IPA

nginx conf: http2 module not working in Chrome in ubuntu 18.04

Does science define life as "beginning at conception"?

How do I write real-world stories separate from my country of origin?

Ratings matrix plot

Are there any tips to help hummingbirds find a new feeder?

What defines a person who is circumcised "of the heart"?

Why is 'additive' EQ more difficult to use than 'subtractive'?

Make the `diff` command look only for differences from a specified range of lines

"Official wife" or "Formal wife"?

Why is the reciprocal used in fraction division?

To exponential digit growth and beyond!

Unary Enumeration

What does `LOGFILE=$1:-/var/log/syslog` do?

What does it mean when みたいな is at the end of a sentence?

Why do testers need root cause analysis?

What is the required burn to keep a satellite at a Lagrangian point?

Negative impact of having the launch pad away from the Equator

How could the B-29 bomber back up under its own power?

What was the primary motivation for a historical figure like Xenophon to create an extensive collection of written material?



VHDL: Why is it hard to design a floating point unit in hardware?


Any good reference for digital architecture implementations of floating point arithmetic operations?Digital Architecture Design Question: Fast way to perform a floating point exponential operationHow to convert a floating point number to integer, using VHDL?VHDL 2008 fixed and floating point type synthesis support?Microcontrollers with Floating Point HardwareWhy is Floating point non-synthesizable in verilogSTM32F4 - Floating point unit ( FPU )FPGA Floating-point to Unsigned 32bitsSTM32 non floating point unit chipsfloating point conversion VHDL 2008






.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty,.everyoneloves__bot-mid-leaderboard:empty margin-bottom:0;








4












$begingroup$


Floating point calculation basically involves representing units in a scientific notation and then deciding how many bits to devote to the manitssa and exponent. Therefore, all calculations involving FP numbers involve these two quanities which must be manipulated. This sounds simple enough and is not hard to do on paper.



I have always come across description of floating point hardware design as being difficult and heard/read things like multiplying and dividing a number by 1 may not give the same result. This perhaps has something to do with how numbers are "unrolled" when arithmetic is to be performed.



Shouldn't there be a unified approach to how floating point hardware is designed in hardware? Why is design and verification of such a hardware considered to be difficult and challenging in spite of there being IEEE 754?










share|improve this question











$endgroup$







  • 3




    $begingroup$
    ieee 754 tells you what the results must be, not how to do it. If you can come up with a quicker way to get exactly the same results, then you could sell your idea to chip manufacturer. The payoff for them is they could reduce their chip area and so improve yield. You know there are several ways to multiply two numbers together, right? One or another might be a better fit for your process. Start with schoolbook, and improve with various clever factorisations and identities.
    $endgroup$
    – Neil_UK
    9 hours ago










  • $begingroup$
    I assume this would be a major area of research and people come with new methods from time to time?
    $endgroup$
    – quantum231
    9 hours ago






  • 2




    $begingroup$
    I imagine because there are a lot of design tradeoffs and objectives that can be prioritized because no only are you working with logic on paper, you're also working with silicon gates. It's like op-amps...they all do the same thing yet no standard design with thousands of varieties.
    $endgroup$
    – Toor
    9 hours ago







  • 3




    $begingroup$
    IEEE 754 makes it harder to implement, not easier. Just look at Xilinx offering and their deviation from 754: xilinx.com/support/documentation/ip_documentation/… There are many corner cases to handle for floating point
    $endgroup$
    – Jonathan Drolet
    9 hours ago











  • $begingroup$
    Bipolar Integrated Technology (aka BIT) fielded ASIC chips that provided floating point done just as you suggest. That would be prior to 1990, as I was working with one of the engineers from there on a separate project using the MIPS R2000 around 1987-ish. I cannot say how complete they were in terms of implementing IEEE 754, though. I'm pretty sure they didn't implement the full specification. BIT was located in the Beaverton, Oregon area.
    $endgroup$
    – jonk
    9 hours ago


















4












$begingroup$


Floating point calculation basically involves representing units in a scientific notation and then deciding how many bits to devote to the manitssa and exponent. Therefore, all calculations involving FP numbers involve these two quanities which must be manipulated. This sounds simple enough and is not hard to do on paper.



I have always come across description of floating point hardware design as being difficult and heard/read things like multiplying and dividing a number by 1 may not give the same result. This perhaps has something to do with how numbers are "unrolled" when arithmetic is to be performed.



Shouldn't there be a unified approach to how floating point hardware is designed in hardware? Why is design and verification of such a hardware considered to be difficult and challenging in spite of there being IEEE 754?










share|improve this question











$endgroup$







  • 3




    $begingroup$
    ieee 754 tells you what the results must be, not how to do it. If you can come up with a quicker way to get exactly the same results, then you could sell your idea to chip manufacturer. The payoff for them is they could reduce their chip area and so improve yield. You know there are several ways to multiply two numbers together, right? One or another might be a better fit for your process. Start with schoolbook, and improve with various clever factorisations and identities.
    $endgroup$
    – Neil_UK
    9 hours ago










  • $begingroup$
    I assume this would be a major area of research and people come with new methods from time to time?
    $endgroup$
    – quantum231
    9 hours ago






  • 2




    $begingroup$
    I imagine because there are a lot of design tradeoffs and objectives that can be prioritized because no only are you working with logic on paper, you're also working with silicon gates. It's like op-amps...they all do the same thing yet no standard design with thousands of varieties.
    $endgroup$
    – Toor
    9 hours ago







  • 3




    $begingroup$
    IEEE 754 makes it harder to implement, not easier. Just look at Xilinx offering and their deviation from 754: xilinx.com/support/documentation/ip_documentation/… There are many corner cases to handle for floating point
    $endgroup$
    – Jonathan Drolet
    9 hours ago











  • $begingroup$
    Bipolar Integrated Technology (aka BIT) fielded ASIC chips that provided floating point done just as you suggest. That would be prior to 1990, as I was working with one of the engineers from there on a separate project using the MIPS R2000 around 1987-ish. I cannot say how complete they were in terms of implementing IEEE 754, though. I'm pretty sure they didn't implement the full specification. BIT was located in the Beaverton, Oregon area.
    $endgroup$
    – jonk
    9 hours ago














4












4








4


1



$begingroup$


Floating point calculation basically involves representing units in a scientific notation and then deciding how many bits to devote to the manitssa and exponent. Therefore, all calculations involving FP numbers involve these two quanities which must be manipulated. This sounds simple enough and is not hard to do on paper.



I have always come across description of floating point hardware design as being difficult and heard/read things like multiplying and dividing a number by 1 may not give the same result. This perhaps has something to do with how numbers are "unrolled" when arithmetic is to be performed.



Shouldn't there be a unified approach to how floating point hardware is designed in hardware? Why is design and verification of such a hardware considered to be difficult and challenging in spite of there being IEEE 754?










share|improve this question











$endgroup$




Floating point calculation basically involves representing units in a scientific notation and then deciding how many bits to devote to the manitssa and exponent. Therefore, all calculations involving FP numbers involve these two quanities which must be manipulated. This sounds simple enough and is not hard to do on paper.



I have always come across description of floating point hardware design as being difficult and heard/read things like multiplying and dividing a number by 1 may not give the same result. This perhaps has something to do with how numbers are "unrolled" when arithmetic is to be performed.



Shouldn't there be a unified approach to how floating point hardware is designed in hardware? Why is design and verification of such a hardware considered to be difficult and challenging in spite of there being IEEE 754?







fpga vhdl floating-point






share|improve this question















share|improve this question













share|improve this question




share|improve this question








edited 33 mins ago









Ken Williams

1032




1032










asked 9 hours ago









quantum231quantum231

4,0211562122




4,0211562122







  • 3




    $begingroup$
    ieee 754 tells you what the results must be, not how to do it. If you can come up with a quicker way to get exactly the same results, then you could sell your idea to chip manufacturer. The payoff for them is they could reduce their chip area and so improve yield. You know there are several ways to multiply two numbers together, right? One or another might be a better fit for your process. Start with schoolbook, and improve with various clever factorisations and identities.
    $endgroup$
    – Neil_UK
    9 hours ago










  • $begingroup$
    I assume this would be a major area of research and people come with new methods from time to time?
    $endgroup$
    – quantum231
    9 hours ago






  • 2




    $begingroup$
    I imagine because there are a lot of design tradeoffs and objectives that can be prioritized because no only are you working with logic on paper, you're also working with silicon gates. It's like op-amps...they all do the same thing yet no standard design with thousands of varieties.
    $endgroup$
    – Toor
    9 hours ago







  • 3




    $begingroup$
    IEEE 754 makes it harder to implement, not easier. Just look at Xilinx offering and their deviation from 754: xilinx.com/support/documentation/ip_documentation/… There are many corner cases to handle for floating point
    $endgroup$
    – Jonathan Drolet
    9 hours ago











  • $begingroup$
    Bipolar Integrated Technology (aka BIT) fielded ASIC chips that provided floating point done just as you suggest. That would be prior to 1990, as I was working with one of the engineers from there on a separate project using the MIPS R2000 around 1987-ish. I cannot say how complete they were in terms of implementing IEEE 754, though. I'm pretty sure they didn't implement the full specification. BIT was located in the Beaverton, Oregon area.
    $endgroup$
    – jonk
    9 hours ago













  • 3




    $begingroup$
    ieee 754 tells you what the results must be, not how to do it. If you can come up with a quicker way to get exactly the same results, then you could sell your idea to chip manufacturer. The payoff for them is they could reduce their chip area and so improve yield. You know there are several ways to multiply two numbers together, right? One or another might be a better fit for your process. Start with schoolbook, and improve with various clever factorisations and identities.
    $endgroup$
    – Neil_UK
    9 hours ago










  • $begingroup$
    I assume this would be a major area of research and people come with new methods from time to time?
    $endgroup$
    – quantum231
    9 hours ago






  • 2




    $begingroup$
    I imagine because there are a lot of design tradeoffs and objectives that can be prioritized because no only are you working with logic on paper, you're also working with silicon gates. It's like op-amps...they all do the same thing yet no standard design with thousands of varieties.
    $endgroup$
    – Toor
    9 hours ago







  • 3




    $begingroup$
    IEEE 754 makes it harder to implement, not easier. Just look at Xilinx offering and their deviation from 754: xilinx.com/support/documentation/ip_documentation/… There are many corner cases to handle for floating point
    $endgroup$
    – Jonathan Drolet
    9 hours ago











  • $begingroup$
    Bipolar Integrated Technology (aka BIT) fielded ASIC chips that provided floating point done just as you suggest. That would be prior to 1990, as I was working with one of the engineers from there on a separate project using the MIPS R2000 around 1987-ish. I cannot say how complete they were in terms of implementing IEEE 754, though. I'm pretty sure they didn't implement the full specification. BIT was located in the Beaverton, Oregon area.
    $endgroup$
    – jonk
    9 hours ago








3




3




$begingroup$
ieee 754 tells you what the results must be, not how to do it. If you can come up with a quicker way to get exactly the same results, then you could sell your idea to chip manufacturer. The payoff for them is they could reduce their chip area and so improve yield. You know there are several ways to multiply two numbers together, right? One or another might be a better fit for your process. Start with schoolbook, and improve with various clever factorisations and identities.
$endgroup$
– Neil_UK
9 hours ago




$begingroup$
ieee 754 tells you what the results must be, not how to do it. If you can come up with a quicker way to get exactly the same results, then you could sell your idea to chip manufacturer. The payoff for them is they could reduce their chip area and so improve yield. You know there are several ways to multiply two numbers together, right? One or another might be a better fit for your process. Start with schoolbook, and improve with various clever factorisations and identities.
$endgroup$
– Neil_UK
9 hours ago












$begingroup$
I assume this would be a major area of research and people come with new methods from time to time?
$endgroup$
– quantum231
9 hours ago




$begingroup$
I assume this would be a major area of research and people come with new methods from time to time?
$endgroup$
– quantum231
9 hours ago




2




2




$begingroup$
I imagine because there are a lot of design tradeoffs and objectives that can be prioritized because no only are you working with logic on paper, you're also working with silicon gates. It's like op-amps...they all do the same thing yet no standard design with thousands of varieties.
$endgroup$
– Toor
9 hours ago





$begingroup$
I imagine because there are a lot of design tradeoffs and objectives that can be prioritized because no only are you working with logic on paper, you're also working with silicon gates. It's like op-amps...they all do the same thing yet no standard design with thousands of varieties.
$endgroup$
– Toor
9 hours ago





3




3




$begingroup$
IEEE 754 makes it harder to implement, not easier. Just look at Xilinx offering and their deviation from 754: xilinx.com/support/documentation/ip_documentation/… There are many corner cases to handle for floating point
$endgroup$
– Jonathan Drolet
9 hours ago





$begingroup$
IEEE 754 makes it harder to implement, not easier. Just look at Xilinx offering and their deviation from 754: xilinx.com/support/documentation/ip_documentation/… There are many corner cases to handle for floating point
$endgroup$
– Jonathan Drolet
9 hours ago













$begingroup$
Bipolar Integrated Technology (aka BIT) fielded ASIC chips that provided floating point done just as you suggest. That would be prior to 1990, as I was working with one of the engineers from there on a separate project using the MIPS R2000 around 1987-ish. I cannot say how complete they were in terms of implementing IEEE 754, though. I'm pretty sure they didn't implement the full specification. BIT was located in the Beaverton, Oregon area.
$endgroup$
– jonk
9 hours ago





$begingroup$
Bipolar Integrated Technology (aka BIT) fielded ASIC chips that provided floating point done just as you suggest. That would be prior to 1990, as I was working with one of the engineers from there on a separate project using the MIPS R2000 around 1987-ish. I cannot say how complete they were in terms of implementing IEEE 754, though. I'm pretty sure they didn't implement the full specification. BIT was located in the Beaverton, Oregon area.
$endgroup$
– jonk
9 hours ago











3 Answers
3






active

oldest

votes


















7












$begingroup$

The standard is well designed and there are subtle details that ease implementation, for example, when rounding, the carry from the mantissa can overflow to the exponent. Or integer comparisons can be used for floating point compares...



But, an FPU is a big heap of combinatorial mess, besides adding, multiplying, dividing, there are barrel shifters to align matissas, leading zeros counters, rounding, flags (imprecise, overflow, ...), NaN and denormals (which need additional hardware for calculations, particularly for mul/div, or at least trigger an exception for software emulation).



And most FPUs also need to do conversions to/from integer and between formats (float,double). That conversion hardware can be mostly implemented through existing floating point hardware, but it incurs additional multiplexers and special cases...



Then, there is pipelining. Depending on the transistor budget and frequency, either add/sub/mul can have the same throughput, or double precision can be slower, which can incur additional complexity in the pipeline. Modern FPU now have a pipelined multiply-add operator.



For division, it is always iterative, it can be a separate unit or reuse the multiplier-adder for Newton-Raphson or Goldshmidt. And while you are busy making a divider, you look for ways to tweak it for square roots...



Validation is complex because there are many corner cases. There are a few systematic test suites with test patterns for "interesting" cases about all the rounding modes but things like fast multipliers or dividers are too complex to test easily.
Iterative dividers can have non obvious bugs (for example the famous Pentium bug in its SRT radix 4 divider), multiplicative (Newton) are difficult to test exact rounding (some bugs in old IBM computers).



Formal methods are now used to prove these parts.



Modern FPUs also implement SIMD hardware, where FP operators are instantiated several times for parallel processing.



There is also the case of the x87 and MC68881/2 FPUs which can calculate decimal conversions, hyperbolic and trigonometric operations. These operations are microcoded and use basic FP operators, they are not directly implemented in hardware.






share|improve this answer











$endgroup$












  • $begingroup$
    The Standard suffers a bit from trying to serve all purposes, and thus being too complicated to serve some while lacking features needed to serve others. For example, questions about the cases when it should guarantee "perfectly" rounded results were based upon whether that would be possible, rather than upon whether the costs would be worth the benefits for all applications. For many purposes, a computation that yields a result within two units in the last place would be more useful than one which yields a perfectly-rounded result but takes twice as long,...
    $endgroup$
    – supercat
    5 hours ago










  • $begingroup$
    ...and on many implementations, computing a result within two ULP would take less than half as long as computing a perfectly-rounded result (as a simple example, computing x*(1/1.234567) will yield a value within a couple ulp of x/1.234567, but will be much faster than computing the latter value). There are times when perfect rounding is useful or even necessary, but having a means of specifying when it isn't necessary would also have been useful.
    $endgroup$
    – supercat
    5 hours ago










  • $begingroup$
    @supercat Yes. For things like divisions, inverse square root... they are often implemented for graphics using pipelined Newton-Raphson on the FPU multiply-add hardware. Denormals are often flushed to zero (for example in DSPs, GPUs). The default rounding mode "round to nearest-even" is slighly easier to implement than the other modes (to infinity, to zero), because of rounding, so sometimes it is the only mode available.
    $endgroup$
    – TEMLIB
    3 hours ago











  • $begingroup$
    The standard was designed by a Mathematician, W. Kahan, and tried to address the shortcomings of previous formats, notably DEC VAX. The first implementation, the 8087, was a rather slow implementation but with extended precision and lots of microcode for a "math library in a chip".
    $endgroup$
    – TEMLIB
    3 hours ago











  • $begingroup$
    I've read some of Kahan's papers, and in most cases where things have evolved contrary to what Kahan advocated, I think Kahan's approach would have been better, but he did make a few mistakes. I think the Standard would have benefited from having an unsigned zero along with signed infinitesimals; 1/(tiny * tiny) should yield +Inf, but 1/0 should yield NaN, to avoid the asymmetry with zero. Still, parts of the Standard fail to consider that cost/benefit trade-offs are different in different kinds of application.
    $endgroup$
    – supercat
    2 hours ago


















1












$begingroup$

Having a look on opencores might give some hints e.g.: https://opencores.org/websvn/filedetails?repname=openfpu64&path=%2Fopenfpu64%2Ftrunk%2Ffpu_mul.vhd



The trouble with floating point is the large number of annoying corner cases. Integer operations have no concept of NaN, but it appears a lot in floating point. Numbers must also be normalised and denormalised correctly.






share|improve this answer









$endgroup$












  • $begingroup$
    The Opencores has several floating point designs done by different people
    $endgroup$
    – quantum231
    9 hours ago










  • $begingroup$
    This specific link has a specific code for multiplication, hmmm
    $endgroup$
    – quantum231
    9 hours ago










  • $begingroup$
    @quantum231 yes, that's why pjc50 used that code snippet to illustrate why floating point is hard to do right: It's a humongous mess of handling special conditions.
    $endgroup$
    – Marcus Müller
    9 hours ago










  • $begingroup$
    Well, any nontrivial design will be complex and have a lot of conditions to be met. I shall study the code in detail later.
    $endgroup$
    – quantum231
    7 hours ago


















1












$begingroup$

Even if you don't handle all the corner cases, floating-point addition or subtraction of two well-formed numbers requires significant logic, because the scale of the mantissa can dramatically change -- consider the problem (in decimal) of the problem 1.9999 - 1.9993 = 0.0007. In floating point the location of the decimal point must be discovered, which isn't trivial, and the mantissa and exponent adjusted. This is even without trying to deal with NaN or denormalized numbers.



All the mention of handling the special cases is quite valid, but even if you put the onus of avoiding special cases on the system designer (which is not uncommon with floating-point IP intended for DSP applications), your floating point arithmetic is still more expensive than equivalent-sized fixed-point arithmetic.



Witness the latest Altera/Intel FPGAs, which have "DSP blocks" that are twinned, and will either do n-bit (I think it's 32-bit, but I'm not sure) fixed-point math in each block, or will do the same-sized floating-point math in one pair of blocks -- so going to floating point not only loses precision (because you only have 25 effective bits of mantissa in an IEEE 32-bit floating point), but uses twice the resources, with very limited handling of corner cases.






share|improve this answer









$endgroup$












  • $begingroup$
    We live in age where nm resolution in fabrication has become quite small and logic resource in FPGAs is quite cheap. What difference does it make how much logic is required for a FPU that complies fully with IEEE 754?
    $endgroup$
    – quantum231
    7 hours ago










  • $begingroup$
    Well, again, look at Altera parts. The FPU can be a big part of a processor if it's fully IEEE compliant. In an FPGA that's filled with a DSP algorithm, those "sorta-compliant" blocks allow far more computation in the same amount of silicon than fully compliant ones would -- and FPGA DSP is often limited by the number of blocks you can afford.
    $endgroup$
    – TimWescott
    5 hours ago











Your Answer






StackExchange.ifUsing("editor", function ()
return StackExchange.using("schematics", function ()
StackExchange.schematics.init();
);
, "cicuitlab");

StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "135"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);

else
createEditor();

);

function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);



);













draft saved

draft discarded


















StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2felectronics.stackexchange.com%2fquestions%2f439327%2fvhdl-why-is-it-hard-to-design-a-floating-point-unit-in-hardware%23new-answer', 'question_page');

);

Post as a guest















Required, but never shown

























3 Answers
3






active

oldest

votes








3 Answers
3






active

oldest

votes









active

oldest

votes






active

oldest

votes









7












$begingroup$

The standard is well designed and there are subtle details that ease implementation, for example, when rounding, the carry from the mantissa can overflow to the exponent. Or integer comparisons can be used for floating point compares...



But, an FPU is a big heap of combinatorial mess, besides adding, multiplying, dividing, there are barrel shifters to align matissas, leading zeros counters, rounding, flags (imprecise, overflow, ...), NaN and denormals (which need additional hardware for calculations, particularly for mul/div, or at least trigger an exception for software emulation).



And most FPUs also need to do conversions to/from integer and between formats (float,double). That conversion hardware can be mostly implemented through existing floating point hardware, but it incurs additional multiplexers and special cases...



Then, there is pipelining. Depending on the transistor budget and frequency, either add/sub/mul can have the same throughput, or double precision can be slower, which can incur additional complexity in the pipeline. Modern FPU now have a pipelined multiply-add operator.



For division, it is always iterative, it can be a separate unit or reuse the multiplier-adder for Newton-Raphson or Goldshmidt. And while you are busy making a divider, you look for ways to tweak it for square roots...



Validation is complex because there are many corner cases. There are a few systematic test suites with test patterns for "interesting" cases about all the rounding modes but things like fast multipliers or dividers are too complex to test easily.
Iterative dividers can have non obvious bugs (for example the famous Pentium bug in its SRT radix 4 divider), multiplicative (Newton) are difficult to test exact rounding (some bugs in old IBM computers).



Formal methods are now used to prove these parts.



Modern FPUs also implement SIMD hardware, where FP operators are instantiated several times for parallel processing.



There is also the case of the x87 and MC68881/2 FPUs which can calculate decimal conversions, hyperbolic and trigonometric operations. These operations are microcoded and use basic FP operators, they are not directly implemented in hardware.






share|improve this answer











$endgroup$












  • $begingroup$
    The Standard suffers a bit from trying to serve all purposes, and thus being too complicated to serve some while lacking features needed to serve others. For example, questions about the cases when it should guarantee "perfectly" rounded results were based upon whether that would be possible, rather than upon whether the costs would be worth the benefits for all applications. For many purposes, a computation that yields a result within two units in the last place would be more useful than one which yields a perfectly-rounded result but takes twice as long,...
    $endgroup$
    – supercat
    5 hours ago










  • $begingroup$
    ...and on many implementations, computing a result within two ULP would take less than half as long as computing a perfectly-rounded result (as a simple example, computing x*(1/1.234567) will yield a value within a couple ulp of x/1.234567, but will be much faster than computing the latter value). There are times when perfect rounding is useful or even necessary, but having a means of specifying when it isn't necessary would also have been useful.
    $endgroup$
    – supercat
    5 hours ago










  • $begingroup$
    @supercat Yes. For things like divisions, inverse square root... they are often implemented for graphics using pipelined Newton-Raphson on the FPU multiply-add hardware. Denormals are often flushed to zero (for example in DSPs, GPUs). The default rounding mode "round to nearest-even" is slighly easier to implement than the other modes (to infinity, to zero), because of rounding, so sometimes it is the only mode available.
    $endgroup$
    – TEMLIB
    3 hours ago











  • $begingroup$
    The standard was designed by a Mathematician, W. Kahan, and tried to address the shortcomings of previous formats, notably DEC VAX. The first implementation, the 8087, was a rather slow implementation but with extended precision and lots of microcode for a "math library in a chip".
    $endgroup$
    – TEMLIB
    3 hours ago











  • $begingroup$
    I've read some of Kahan's papers, and in most cases where things have evolved contrary to what Kahan advocated, I think Kahan's approach would have been better, but he did make a few mistakes. I think the Standard would have benefited from having an unsigned zero along with signed infinitesimals; 1/(tiny * tiny) should yield +Inf, but 1/0 should yield NaN, to avoid the asymmetry with zero. Still, parts of the Standard fail to consider that cost/benefit trade-offs are different in different kinds of application.
    $endgroup$
    – supercat
    2 hours ago















7












$begingroup$

The standard is well designed and there are subtle details that ease implementation, for example, when rounding, the carry from the mantissa can overflow to the exponent. Or integer comparisons can be used for floating point compares...



But, an FPU is a big heap of combinatorial mess, besides adding, multiplying, dividing, there are barrel shifters to align matissas, leading zeros counters, rounding, flags (imprecise, overflow, ...), NaN and denormals (which need additional hardware for calculations, particularly for mul/div, or at least trigger an exception for software emulation).



And most FPUs also need to do conversions to/from integer and between formats (float,double). That conversion hardware can be mostly implemented through existing floating point hardware, but it incurs additional multiplexers and special cases...



Then, there is pipelining. Depending on the transistor budget and frequency, either add/sub/mul can have the same throughput, or double precision can be slower, which can incur additional complexity in the pipeline. Modern FPU now have a pipelined multiply-add operator.



For division, it is always iterative, it can be a separate unit or reuse the multiplier-adder for Newton-Raphson or Goldshmidt. And while you are busy making a divider, you look for ways to tweak it for square roots...



Validation is complex because there are many corner cases. There are a few systematic test suites with test patterns for "interesting" cases about all the rounding modes but things like fast multipliers or dividers are too complex to test easily.
Iterative dividers can have non obvious bugs (for example the famous Pentium bug in its SRT radix 4 divider), multiplicative (Newton) are difficult to test exact rounding (some bugs in old IBM computers).



Formal methods are now used to prove these parts.



Modern FPUs also implement SIMD hardware, where FP operators are instantiated several times for parallel processing.



There is also the case of the x87 and MC68881/2 FPUs which can calculate decimal conversions, hyperbolic and trigonometric operations. These operations are microcoded and use basic FP operators, they are not directly implemented in hardware.






share|improve this answer











$endgroup$












  • $begingroup$
    The Standard suffers a bit from trying to serve all purposes, and thus being too complicated to serve some while lacking features needed to serve others. For example, questions about the cases when it should guarantee "perfectly" rounded results were based upon whether that would be possible, rather than upon whether the costs would be worth the benefits for all applications. For many purposes, a computation that yields a result within two units in the last place would be more useful than one which yields a perfectly-rounded result but takes twice as long,...
    $endgroup$
    – supercat
    5 hours ago










  • $begingroup$
    ...and on many implementations, computing a result within two ULP would take less than half as long as computing a perfectly-rounded result (as a simple example, computing x*(1/1.234567) will yield a value within a couple ulp of x/1.234567, but will be much faster than computing the latter value). There are times when perfect rounding is useful or even necessary, but having a means of specifying when it isn't necessary would also have been useful.
    $endgroup$
    – supercat
    5 hours ago










  • $begingroup$
    @supercat Yes. For things like divisions, inverse square root... they are often implemented for graphics using pipelined Newton-Raphson on the FPU multiply-add hardware. Denormals are often flushed to zero (for example in DSPs, GPUs). The default rounding mode "round to nearest-even" is slighly easier to implement than the other modes (to infinity, to zero), because of rounding, so sometimes it is the only mode available.
    $endgroup$
    – TEMLIB
    3 hours ago











  • $begingroup$
    The standard was designed by a Mathematician, W. Kahan, and tried to address the shortcomings of previous formats, notably DEC VAX. The first implementation, the 8087, was a rather slow implementation but with extended precision and lots of microcode for a "math library in a chip".
    $endgroup$
    – TEMLIB
    3 hours ago











  • $begingroup$
    I've read some of Kahan's papers, and in most cases where things have evolved contrary to what Kahan advocated, I think Kahan's approach would have been better, but he did make a few mistakes. I think the Standard would have benefited from having an unsigned zero along with signed infinitesimals; 1/(tiny * tiny) should yield +Inf, but 1/0 should yield NaN, to avoid the asymmetry with zero. Still, parts of the Standard fail to consider that cost/benefit trade-offs are different in different kinds of application.
    $endgroup$
    – supercat
    2 hours ago













7












7








7





$begingroup$

The standard is well designed and there are subtle details that ease implementation, for example, when rounding, the carry from the mantissa can overflow to the exponent. Or integer comparisons can be used for floating point compares...



But, an FPU is a big heap of combinatorial mess, besides adding, multiplying, dividing, there are barrel shifters to align matissas, leading zeros counters, rounding, flags (imprecise, overflow, ...), NaN and denormals (which need additional hardware for calculations, particularly for mul/div, or at least trigger an exception for software emulation).



And most FPUs also need to do conversions to/from integer and between formats (float,double). That conversion hardware can be mostly implemented through existing floating point hardware, but it incurs additional multiplexers and special cases...



Then, there is pipelining. Depending on the transistor budget and frequency, either add/sub/mul can have the same throughput, or double precision can be slower, which can incur additional complexity in the pipeline. Modern FPU now have a pipelined multiply-add operator.



For division, it is always iterative, it can be a separate unit or reuse the multiplier-adder for Newton-Raphson or Goldshmidt. And while you are busy making a divider, you look for ways to tweak it for square roots...



Validation is complex because there are many corner cases. There are a few systematic test suites with test patterns for "interesting" cases about all the rounding modes but things like fast multipliers or dividers are too complex to test easily.
Iterative dividers can have non obvious bugs (for example the famous Pentium bug in its SRT radix 4 divider), multiplicative (Newton) are difficult to test exact rounding (some bugs in old IBM computers).



Formal methods are now used to prove these parts.



Modern FPUs also implement SIMD hardware, where FP operators are instantiated several times for parallel processing.



There is also the case of the x87 and MC68881/2 FPUs which can calculate decimal conversions, hyperbolic and trigonometric operations. These operations are microcoded and use basic FP operators, they are not directly implemented in hardware.






share|improve this answer











$endgroup$



The standard is well designed and there are subtle details that ease implementation, for example, when rounding, the carry from the mantissa can overflow to the exponent. Or integer comparisons can be used for floating point compares...



But, an FPU is a big heap of combinatorial mess, besides adding, multiplying, dividing, there are barrel shifters to align matissas, leading zeros counters, rounding, flags (imprecise, overflow, ...), NaN and denormals (which need additional hardware for calculations, particularly for mul/div, or at least trigger an exception for software emulation).



And most FPUs also need to do conversions to/from integer and between formats (float,double). That conversion hardware can be mostly implemented through existing floating point hardware, but it incurs additional multiplexers and special cases...



Then, there is pipelining. Depending on the transistor budget and frequency, either add/sub/mul can have the same throughput, or double precision can be slower, which can incur additional complexity in the pipeline. Modern FPU now have a pipelined multiply-add operator.



For division, it is always iterative, it can be a separate unit or reuse the multiplier-adder for Newton-Raphson or Goldshmidt. And while you are busy making a divider, you look for ways to tweak it for square roots...



Validation is complex because there are many corner cases. There are a few systematic test suites with test patterns for "interesting" cases about all the rounding modes but things like fast multipliers or dividers are too complex to test easily.
Iterative dividers can have non obvious bugs (for example the famous Pentium bug in its SRT radix 4 divider), multiplicative (Newton) are difficult to test exact rounding (some bugs in old IBM computers).



Formal methods are now used to prove these parts.



Modern FPUs also implement SIMD hardware, where FP operators are instantiated several times for parallel processing.



There is also the case of the x87 and MC68881/2 FPUs which can calculate decimal conversions, hyperbolic and trigonometric operations. These operations are microcoded and use basic FP operators, they are not directly implemented in hardware.







share|improve this answer














share|improve this answer



share|improve this answer








edited 8 hours ago

























answered 8 hours ago









TEMLIBTEMLIB

1,8971713




1,8971713











  • $begingroup$
    The Standard suffers a bit from trying to serve all purposes, and thus being too complicated to serve some while lacking features needed to serve others. For example, questions about the cases when it should guarantee "perfectly" rounded results were based upon whether that would be possible, rather than upon whether the costs would be worth the benefits for all applications. For many purposes, a computation that yields a result within two units in the last place would be more useful than one which yields a perfectly-rounded result but takes twice as long,...
    $endgroup$
    – supercat
    5 hours ago










  • $begingroup$
    ...and on many implementations, computing a result within two ULP would take less than half as long as computing a perfectly-rounded result (as a simple example, computing x*(1/1.234567) will yield a value within a couple ulp of x/1.234567, but will be much faster than computing the latter value). There are times when perfect rounding is useful or even necessary, but having a means of specifying when it isn't necessary would also have been useful.
    $endgroup$
    – supercat
    5 hours ago










  • $begingroup$
    @supercat Yes. For things like divisions, inverse square root... they are often implemented for graphics using pipelined Newton-Raphson on the FPU multiply-add hardware. Denormals are often flushed to zero (for example in DSPs, GPUs). The default rounding mode "round to nearest-even" is slighly easier to implement than the other modes (to infinity, to zero), because of rounding, so sometimes it is the only mode available.
    $endgroup$
    – TEMLIB
    3 hours ago











  • $begingroup$
    The standard was designed by a Mathematician, W. Kahan, and tried to address the shortcomings of previous formats, notably DEC VAX. The first implementation, the 8087, was a rather slow implementation but with extended precision and lots of microcode for a "math library in a chip".
    $endgroup$
    – TEMLIB
    3 hours ago











  • $begingroup$
    I've read some of Kahan's papers, and in most cases where things have evolved contrary to what Kahan advocated, I think Kahan's approach would have been better, but he did make a few mistakes. I think the Standard would have benefited from having an unsigned zero along with signed infinitesimals; 1/(tiny * tiny) should yield +Inf, but 1/0 should yield NaN, to avoid the asymmetry with zero. Still, parts of the Standard fail to consider that cost/benefit trade-offs are different in different kinds of application.
    $endgroup$
    – supercat
    2 hours ago
















  • $begingroup$
    The Standard suffers a bit from trying to serve all purposes, and thus being too complicated to serve some while lacking features needed to serve others. For example, questions about the cases when it should guarantee "perfectly" rounded results were based upon whether that would be possible, rather than upon whether the costs would be worth the benefits for all applications. For many purposes, a computation that yields a result within two units in the last place would be more useful than one which yields a perfectly-rounded result but takes twice as long,...
    $endgroup$
    – supercat
    5 hours ago










  • $begingroup$
    ...and on many implementations, computing a result within two ULP would take less than half as long as computing a perfectly-rounded result (as a simple example, computing x*(1/1.234567) will yield a value within a couple ulp of x/1.234567, but will be much faster than computing the latter value). There are times when perfect rounding is useful or even necessary, but having a means of specifying when it isn't necessary would also have been useful.
    $endgroup$
    – supercat
    5 hours ago










  • $begingroup$
    @supercat Yes. For things like divisions, inverse square root... they are often implemented for graphics using pipelined Newton-Raphson on the FPU multiply-add hardware. Denormals are often flushed to zero (for example in DSPs, GPUs). The default rounding mode "round to nearest-even" is slighly easier to implement than the other modes (to infinity, to zero), because of rounding, so sometimes it is the only mode available.
    $endgroup$
    – TEMLIB
    3 hours ago











  • $begingroup$
    The standard was designed by a Mathematician, W. Kahan, and tried to address the shortcomings of previous formats, notably DEC VAX. The first implementation, the 8087, was a rather slow implementation but with extended precision and lots of microcode for a "math library in a chip".
    $endgroup$
    – TEMLIB
    3 hours ago











  • $begingroup$
    I've read some of Kahan's papers, and in most cases where things have evolved contrary to what Kahan advocated, I think Kahan's approach would have been better, but he did make a few mistakes. I think the Standard would have benefited from having an unsigned zero along with signed infinitesimals; 1/(tiny * tiny) should yield +Inf, but 1/0 should yield NaN, to avoid the asymmetry with zero. Still, parts of the Standard fail to consider that cost/benefit trade-offs are different in different kinds of application.
    $endgroup$
    – supercat
    2 hours ago















$begingroup$
The Standard suffers a bit from trying to serve all purposes, and thus being too complicated to serve some while lacking features needed to serve others. For example, questions about the cases when it should guarantee "perfectly" rounded results were based upon whether that would be possible, rather than upon whether the costs would be worth the benefits for all applications. For many purposes, a computation that yields a result within two units in the last place would be more useful than one which yields a perfectly-rounded result but takes twice as long,...
$endgroup$
– supercat
5 hours ago




$begingroup$
The Standard suffers a bit from trying to serve all purposes, and thus being too complicated to serve some while lacking features needed to serve others. For example, questions about the cases when it should guarantee "perfectly" rounded results were based upon whether that would be possible, rather than upon whether the costs would be worth the benefits for all applications. For many purposes, a computation that yields a result within two units in the last place would be more useful than one which yields a perfectly-rounded result but takes twice as long,...
$endgroup$
– supercat
5 hours ago












$begingroup$
...and on many implementations, computing a result within two ULP would take less than half as long as computing a perfectly-rounded result (as a simple example, computing x*(1/1.234567) will yield a value within a couple ulp of x/1.234567, but will be much faster than computing the latter value). There are times when perfect rounding is useful or even necessary, but having a means of specifying when it isn't necessary would also have been useful.
$endgroup$
– supercat
5 hours ago




$begingroup$
...and on many implementations, computing a result within two ULP would take less than half as long as computing a perfectly-rounded result (as a simple example, computing x*(1/1.234567) will yield a value within a couple ulp of x/1.234567, but will be much faster than computing the latter value). There are times when perfect rounding is useful or even necessary, but having a means of specifying when it isn't necessary would also have been useful.
$endgroup$
– supercat
5 hours ago












$begingroup$
@supercat Yes. For things like divisions, inverse square root... they are often implemented for graphics using pipelined Newton-Raphson on the FPU multiply-add hardware. Denormals are often flushed to zero (for example in DSPs, GPUs). The default rounding mode "round to nearest-even" is slighly easier to implement than the other modes (to infinity, to zero), because of rounding, so sometimes it is the only mode available.
$endgroup$
– TEMLIB
3 hours ago





$begingroup$
@supercat Yes. For things like divisions, inverse square root... they are often implemented for graphics using pipelined Newton-Raphson on the FPU multiply-add hardware. Denormals are often flushed to zero (for example in DSPs, GPUs). The default rounding mode "round to nearest-even" is slighly easier to implement than the other modes (to infinity, to zero), because of rounding, so sometimes it is the only mode available.
$endgroup$
– TEMLIB
3 hours ago













$begingroup$
The standard was designed by a Mathematician, W. Kahan, and tried to address the shortcomings of previous formats, notably DEC VAX. The first implementation, the 8087, was a rather slow implementation but with extended precision and lots of microcode for a "math library in a chip".
$endgroup$
– TEMLIB
3 hours ago





$begingroup$
The standard was designed by a Mathematician, W. Kahan, and tried to address the shortcomings of previous formats, notably DEC VAX. The first implementation, the 8087, was a rather slow implementation but with extended precision and lots of microcode for a "math library in a chip".
$endgroup$
– TEMLIB
3 hours ago













$begingroup$
I've read some of Kahan's papers, and in most cases where things have evolved contrary to what Kahan advocated, I think Kahan's approach would have been better, but he did make a few mistakes. I think the Standard would have benefited from having an unsigned zero along with signed infinitesimals; 1/(tiny * tiny) should yield +Inf, but 1/0 should yield NaN, to avoid the asymmetry with zero. Still, parts of the Standard fail to consider that cost/benefit trade-offs are different in different kinds of application.
$endgroup$
– supercat
2 hours ago




$begingroup$
I've read some of Kahan's papers, and in most cases where things have evolved contrary to what Kahan advocated, I think Kahan's approach would have been better, but he did make a few mistakes. I think the Standard would have benefited from having an unsigned zero along with signed infinitesimals; 1/(tiny * tiny) should yield +Inf, but 1/0 should yield NaN, to avoid the asymmetry with zero. Still, parts of the Standard fail to consider that cost/benefit trade-offs are different in different kinds of application.
$endgroup$
– supercat
2 hours ago













1












$begingroup$

Having a look on opencores might give some hints e.g.: https://opencores.org/websvn/filedetails?repname=openfpu64&path=%2Fopenfpu64%2Ftrunk%2Ffpu_mul.vhd



The trouble with floating point is the large number of annoying corner cases. Integer operations have no concept of NaN, but it appears a lot in floating point. Numbers must also be normalised and denormalised correctly.






share|improve this answer









$endgroup$












  • $begingroup$
    The Opencores has several floating point designs done by different people
    $endgroup$
    – quantum231
    9 hours ago










  • $begingroup$
    This specific link has a specific code for multiplication, hmmm
    $endgroup$
    – quantum231
    9 hours ago










  • $begingroup$
    @quantum231 yes, that's why pjc50 used that code snippet to illustrate why floating point is hard to do right: It's a humongous mess of handling special conditions.
    $endgroup$
    – Marcus Müller
    9 hours ago










  • $begingroup$
    Well, any nontrivial design will be complex and have a lot of conditions to be met. I shall study the code in detail later.
    $endgroup$
    – quantum231
    7 hours ago















1












$begingroup$

Having a look on opencores might give some hints e.g.: https://opencores.org/websvn/filedetails?repname=openfpu64&path=%2Fopenfpu64%2Ftrunk%2Ffpu_mul.vhd



The trouble with floating point is the large number of annoying corner cases. Integer operations have no concept of NaN, but it appears a lot in floating point. Numbers must also be normalised and denormalised correctly.






share|improve this answer









$endgroup$












  • $begingroup$
    The Opencores has several floating point designs done by different people
    $endgroup$
    – quantum231
    9 hours ago










  • $begingroup$
    This specific link has a specific code for multiplication, hmmm
    $endgroup$
    – quantum231
    9 hours ago










  • $begingroup$
    @quantum231 yes, that's why pjc50 used that code snippet to illustrate why floating point is hard to do right: It's a humongous mess of handling special conditions.
    $endgroup$
    – Marcus Müller
    9 hours ago










  • $begingroup$
    Well, any nontrivial design will be complex and have a lot of conditions to be met. I shall study the code in detail later.
    $endgroup$
    – quantum231
    7 hours ago













1












1








1





$begingroup$

Having a look on opencores might give some hints e.g.: https://opencores.org/websvn/filedetails?repname=openfpu64&path=%2Fopenfpu64%2Ftrunk%2Ffpu_mul.vhd



The trouble with floating point is the large number of annoying corner cases. Integer operations have no concept of NaN, but it appears a lot in floating point. Numbers must also be normalised and denormalised correctly.






share|improve this answer









$endgroup$



Having a look on opencores might give some hints e.g.: https://opencores.org/websvn/filedetails?repname=openfpu64&path=%2Fopenfpu64%2Ftrunk%2Ffpu_mul.vhd



The trouble with floating point is the large number of annoying corner cases. Integer operations have no concept of NaN, but it appears a lot in floating point. Numbers must also be normalised and denormalised correctly.







share|improve this answer












share|improve this answer



share|improve this answer










answered 9 hours ago









pjc50pjc50

34.6k34288




34.6k34288











  • $begingroup$
    The Opencores has several floating point designs done by different people
    $endgroup$
    – quantum231
    9 hours ago










  • $begingroup$
    This specific link has a specific code for multiplication, hmmm
    $endgroup$
    – quantum231
    9 hours ago










  • $begingroup$
    @quantum231 yes, that's why pjc50 used that code snippet to illustrate why floating point is hard to do right: It's a humongous mess of handling special conditions.
    $endgroup$
    – Marcus Müller
    9 hours ago










  • $begingroup$
    Well, any nontrivial design will be complex and have a lot of conditions to be met. I shall study the code in detail later.
    $endgroup$
    – quantum231
    7 hours ago
















  • $begingroup$
    The Opencores has several floating point designs done by different people
    $endgroup$
    – quantum231
    9 hours ago










  • $begingroup$
    This specific link has a specific code for multiplication, hmmm
    $endgroup$
    – quantum231
    9 hours ago










  • $begingroup$
    @quantum231 yes, that's why pjc50 used that code snippet to illustrate why floating point is hard to do right: It's a humongous mess of handling special conditions.
    $endgroup$
    – Marcus Müller
    9 hours ago










  • $begingroup$
    Well, any nontrivial design will be complex and have a lot of conditions to be met. I shall study the code in detail later.
    $endgroup$
    – quantum231
    7 hours ago















$begingroup$
The Opencores has several floating point designs done by different people
$endgroup$
– quantum231
9 hours ago




$begingroup$
The Opencores has several floating point designs done by different people
$endgroup$
– quantum231
9 hours ago












$begingroup$
This specific link has a specific code for multiplication, hmmm
$endgroup$
– quantum231
9 hours ago




$begingroup$
This specific link has a specific code for multiplication, hmmm
$endgroup$
– quantum231
9 hours ago












$begingroup$
@quantum231 yes, that's why pjc50 used that code snippet to illustrate why floating point is hard to do right: It's a humongous mess of handling special conditions.
$endgroup$
– Marcus Müller
9 hours ago




$begingroup$
@quantum231 yes, that's why pjc50 used that code snippet to illustrate why floating point is hard to do right: It's a humongous mess of handling special conditions.
$endgroup$
– Marcus Müller
9 hours ago












$begingroup$
Well, any nontrivial design will be complex and have a lot of conditions to be met. I shall study the code in detail later.
$endgroup$
– quantum231
7 hours ago




$begingroup$
Well, any nontrivial design will be complex and have a lot of conditions to be met. I shall study the code in detail later.
$endgroup$
– quantum231
7 hours ago











1












$begingroup$

Even if you don't handle all the corner cases, floating-point addition or subtraction of two well-formed numbers requires significant logic, because the scale of the mantissa can dramatically change -- consider the problem (in decimal) of the problem 1.9999 - 1.9993 = 0.0007. In floating point the location of the decimal point must be discovered, which isn't trivial, and the mantissa and exponent adjusted. This is even without trying to deal with NaN or denormalized numbers.



All the mention of handling the special cases is quite valid, but even if you put the onus of avoiding special cases on the system designer (which is not uncommon with floating-point IP intended for DSP applications), your floating point arithmetic is still more expensive than equivalent-sized fixed-point arithmetic.



Witness the latest Altera/Intel FPGAs, which have "DSP blocks" that are twinned, and will either do n-bit (I think it's 32-bit, but I'm not sure) fixed-point math in each block, or will do the same-sized floating-point math in one pair of blocks -- so going to floating point not only loses precision (because you only have 25 effective bits of mantissa in an IEEE 32-bit floating point), but uses twice the resources, with very limited handling of corner cases.






share|improve this answer









$endgroup$












  • $begingroup$
    We live in age where nm resolution in fabrication has become quite small and logic resource in FPGAs is quite cheap. What difference does it make how much logic is required for a FPU that complies fully with IEEE 754?
    $endgroup$
    – quantum231
    7 hours ago










  • $begingroup$
    Well, again, look at Altera parts. The FPU can be a big part of a processor if it's fully IEEE compliant. In an FPGA that's filled with a DSP algorithm, those "sorta-compliant" blocks allow far more computation in the same amount of silicon than fully compliant ones would -- and FPGA DSP is often limited by the number of blocks you can afford.
    $endgroup$
    – TimWescott
    5 hours ago















1












$begingroup$

Even if you don't handle all the corner cases, floating-point addition or subtraction of two well-formed numbers requires significant logic, because the scale of the mantissa can dramatically change -- consider the problem (in decimal) of the problem 1.9999 - 1.9993 = 0.0007. In floating point the location of the decimal point must be discovered, which isn't trivial, and the mantissa and exponent adjusted. This is even without trying to deal with NaN or denormalized numbers.



All the mention of handling the special cases is quite valid, but even if you put the onus of avoiding special cases on the system designer (which is not uncommon with floating-point IP intended for DSP applications), your floating point arithmetic is still more expensive than equivalent-sized fixed-point arithmetic.



Witness the latest Altera/Intel FPGAs, which have "DSP blocks" that are twinned, and will either do n-bit (I think it's 32-bit, but I'm not sure) fixed-point math in each block, or will do the same-sized floating-point math in one pair of blocks -- so going to floating point not only loses precision (because you only have 25 effective bits of mantissa in an IEEE 32-bit floating point), but uses twice the resources, with very limited handling of corner cases.






share|improve this answer









$endgroup$












  • $begingroup$
    We live in age where nm resolution in fabrication has become quite small and logic resource in FPGAs is quite cheap. What difference does it make how much logic is required for a FPU that complies fully with IEEE 754?
    $endgroup$
    – quantum231
    7 hours ago










  • $begingroup$
    Well, again, look at Altera parts. The FPU can be a big part of a processor if it's fully IEEE compliant. In an FPGA that's filled with a DSP algorithm, those "sorta-compliant" blocks allow far more computation in the same amount of silicon than fully compliant ones would -- and FPGA DSP is often limited by the number of blocks you can afford.
    $endgroup$
    – TimWescott
    5 hours ago













1












1








1





$begingroup$

Even if you don't handle all the corner cases, floating-point addition or subtraction of two well-formed numbers requires significant logic, because the scale of the mantissa can dramatically change -- consider the problem (in decimal) of the problem 1.9999 - 1.9993 = 0.0007. In floating point the location of the decimal point must be discovered, which isn't trivial, and the mantissa and exponent adjusted. This is even without trying to deal with NaN or denormalized numbers.



All the mention of handling the special cases is quite valid, but even if you put the onus of avoiding special cases on the system designer (which is not uncommon with floating-point IP intended for DSP applications), your floating point arithmetic is still more expensive than equivalent-sized fixed-point arithmetic.



Witness the latest Altera/Intel FPGAs, which have "DSP blocks" that are twinned, and will either do n-bit (I think it's 32-bit, but I'm not sure) fixed-point math in each block, or will do the same-sized floating-point math in one pair of blocks -- so going to floating point not only loses precision (because you only have 25 effective bits of mantissa in an IEEE 32-bit floating point), but uses twice the resources, with very limited handling of corner cases.






share|improve this answer









$endgroup$



Even if you don't handle all the corner cases, floating-point addition or subtraction of two well-formed numbers requires significant logic, because the scale of the mantissa can dramatically change -- consider the problem (in decimal) of the problem 1.9999 - 1.9993 = 0.0007. In floating point the location of the decimal point must be discovered, which isn't trivial, and the mantissa and exponent adjusted. This is even without trying to deal with NaN or denormalized numbers.



All the mention of handling the special cases is quite valid, but even if you put the onus of avoiding special cases on the system designer (which is not uncommon with floating-point IP intended for DSP applications), your floating point arithmetic is still more expensive than equivalent-sized fixed-point arithmetic.



Witness the latest Altera/Intel FPGAs, which have "DSP blocks" that are twinned, and will either do n-bit (I think it's 32-bit, but I'm not sure) fixed-point math in each block, or will do the same-sized floating-point math in one pair of blocks -- so going to floating point not only loses precision (because you only have 25 effective bits of mantissa in an IEEE 32-bit floating point), but uses twice the resources, with very limited handling of corner cases.







share|improve this answer












share|improve this answer



share|improve this answer










answered 8 hours ago









TimWescottTimWescott

8,5091718




8,5091718











  • $begingroup$
    We live in age where nm resolution in fabrication has become quite small and logic resource in FPGAs is quite cheap. What difference does it make how much logic is required for a FPU that complies fully with IEEE 754?
    $endgroup$
    – quantum231
    7 hours ago










  • $begingroup$
    Well, again, look at Altera parts. The FPU can be a big part of a processor if it's fully IEEE compliant. In an FPGA that's filled with a DSP algorithm, those "sorta-compliant" blocks allow far more computation in the same amount of silicon than fully compliant ones would -- and FPGA DSP is often limited by the number of blocks you can afford.
    $endgroup$
    – TimWescott
    5 hours ago
















  • $begingroup$
    We live in age where nm resolution in fabrication has become quite small and logic resource in FPGAs is quite cheap. What difference does it make how much logic is required for a FPU that complies fully with IEEE 754?
    $endgroup$
    – quantum231
    7 hours ago










  • $begingroup$
    Well, again, look at Altera parts. The FPU can be a big part of a processor if it's fully IEEE compliant. In an FPGA that's filled with a DSP algorithm, those "sorta-compliant" blocks allow far more computation in the same amount of silicon than fully compliant ones would -- and FPGA DSP is often limited by the number of blocks you can afford.
    $endgroup$
    – TimWescott
    5 hours ago















$begingroup$
We live in age where nm resolution in fabrication has become quite small and logic resource in FPGAs is quite cheap. What difference does it make how much logic is required for a FPU that complies fully with IEEE 754?
$endgroup$
– quantum231
7 hours ago




$begingroup$
We live in age where nm resolution in fabrication has become quite small and logic resource in FPGAs is quite cheap. What difference does it make how much logic is required for a FPU that complies fully with IEEE 754?
$endgroup$
– quantum231
7 hours ago












$begingroup$
Well, again, look at Altera parts. The FPU can be a big part of a processor if it's fully IEEE compliant. In an FPGA that's filled with a DSP algorithm, those "sorta-compliant" blocks allow far more computation in the same amount of silicon than fully compliant ones would -- and FPGA DSP is often limited by the number of blocks you can afford.
$endgroup$
– TimWescott
5 hours ago




$begingroup$
Well, again, look at Altera parts. The FPU can be a big part of a processor if it's fully IEEE compliant. In an FPGA that's filled with a DSP algorithm, those "sorta-compliant" blocks allow far more computation in the same amount of silicon than fully compliant ones would -- and FPGA DSP is often limited by the number of blocks you can afford.
$endgroup$
– TimWescott
5 hours ago

















draft saved

draft discarded
















































Thanks for contributing an answer to Electrical Engineering Stack Exchange!


  • Please be sure to answer the question. Provide details and share your research!

But avoid


  • Asking for help, clarification, or responding to other answers.

  • Making statements based on opinion; back them up with references or personal experience.

Use MathJax to format equations. MathJax reference.


To learn more, see our tips on writing great answers.




draft saved


draft discarded














StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2felectronics.stackexchange.com%2fquestions%2f439327%2fvhdl-why-is-it-hard-to-design-a-floating-point-unit-in-hardware%23new-answer', 'question_page');

);

Post as a guest















Required, but never shown





















































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown

































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown







Popular posts from this blog

19. јануар Садржај Догађаји Рођења Смрти Празници и дани сећања Види још Референце Мени за навигацијуу

Israel Cuprins Etimologie | Istorie | Geografie | Politică | Demografie | Educație | Economie | Cultură | Note explicative | Note bibliografice | Bibliografie | Legături externe | Meniu de navigaresite web oficialfacebooktweeterGoogle+Instagramcanal YouTubeInstagramtextmodificaremodificarewww.technion.ac.ilnew.huji.ac.ilwww.weizmann.ac.ilwww1.biu.ac.ilenglish.tau.ac.ilwww.haifa.ac.ilin.bgu.ac.ilwww.openu.ac.ilwww.ariel.ac.ilCIA FactbookHarta Israelului"Negotiating Jerusalem," Palestine–Israel JournalThe Schizoid Nature of Modern Hebrew: A Slavic Language in Search of a Semitic Past„Arabic in Israel: an official language and a cultural bridge”„Latest Population Statistics for Israel”„Israel Population”„Tables”„Report for Selected Countries and Subjects”Human Development Report 2016: Human Development for Everyone„Distribution of family income - Gini index”The World FactbookJerusalem Law„Israel”„Israel”„Zionist Leaders: David Ben-Gurion 1886–1973”„The status of Jerusalem”„Analysis: Kadima's big plans”„Israel's Hard-Learned Lessons”„The Legacy of Undefined Borders, Tel Aviv Notes No. 40, 5 iunie 2002”„Israel Journal: A Land Without Borders”„Population”„Israel closes decade with population of 7.5 million”Time Series-DataBank„Selected Statistics on Jerusalem Day 2007 (Hebrew)”Golan belongs to Syria, Druze protestGlobal Survey 2006: Middle East Progress Amid Global Gains in FreedomWHO: Life expectancy in Israel among highest in the worldInternational Monetary Fund, World Economic Outlook Database, April 2011: Nominal GDP list of countries. Data for the year 2010.„Israel's accession to the OECD”Popular Opinion„On the Move”Hosea 12:5„Walking the Bible Timeline”„Palestine: History”„Return to Zion”An invention called 'the Jewish people' – Haaretz – Israel NewsoriginalJewish and Non-Jewish Population of Palestine-Israel (1517–2004)ImmigrationJewishvirtuallibrary.orgChapter One: The Heralders of Zionism„The birth of modern Israel: A scrap of paper that changed history”„League of Nations: The Mandate for Palestine, 24 iulie 1922”The Population of Palestine Prior to 1948originalBackground Paper No. 47 (ST/DPI/SER.A/47)History: Foreign DominationTwo Hundred and Seventh Plenary Meeting„Israel (Labor Zionism)”Population, by Religion and Population GroupThe Suez CrisisAdolf EichmannJustice Ministry Reply to Amnesty International Report„The Interregnum”Israel Ministry of Foreign Affairs – The Palestinian National Covenant- July 1968Research on terrorism: trends, achievements & failuresThe Routledge Atlas of the Arab–Israeli conflict: The Complete History of the Struggle and the Efforts to Resolve It"George Habash, Palestinian Terrorism Tactician, Dies at 82."„1973: Arab states attack Israeli forces”Agranat Commission„Has Israel Annexed East Jerusalem?”original„After 4 Years, Intifada Still Smolders”From the End of the Cold War to 2001originalThe Oslo Accords, 1993Israel-PLO Recognition – Exchange of Letters between PM Rabin and Chairman Arafat – Sept 9- 1993Foundation for Middle East PeaceSources of Population Growth: Total Israeli Population and Settler Population, 1991–2003original„Israel marks Rabin assassination”The Wye River Memorandumoriginal„West Bank barrier route disputed, Israeli missile kills 2”"Permanent Ceasefire to Be Based on Creation Of Buffer Zone Free of Armed Personnel Other than UN, Lebanese Forces"„Hezbollah kills 8 soldiers, kidnaps two in offensive on northern border”„Olmert confirms peace talks with Syria”„Battleground Gaza: Israeli ground forces invade the strip”„IDF begins Gaza troop withdrawal, hours after ending 3-week offensive”„THE LAND: Geography and Climate”„Area of districts, sub-districts, natural regions and lakes”„Israel - Geography”„Makhteshim Country”Israel and the Palestinian Territories„Makhtesh Ramon”„The Living Dead Sea”„Temperatures reach record high in Pakistan”„Climate Extremes In Israel”Israel in figures„Deuteronom”„JNF: 240 million trees planted since 1901”„Vegetation of Israel and Neighboring Countries”Environmental Law in Israel„Executive branch”„Israel's election process explained”„The Electoral System in Israel”„Constitution for Israel”„All 120 incoming Knesset members”„Statul ISRAEL”„The Judiciary: The Court System”„Israel's high court unique in region”„Israel and the International Criminal Court: A Legal Battlefield”„Localities and population, by population group, district, sub-district and natural region”„Israel: Districts, Major Cities, Urban Localities & Metropolitan Areas”„Israel-Egypt Relations: Background & Overview of Peace Treaty”„Solana to Haaretz: New Rules of War Needed for Age of Terror”„Israel's Announcement Regarding Settlements”„United Nations Security Council Resolution 497”„Security Council resolution 478 (1980) on the status of Jerusalem”„Arabs will ask U.N. to seek razing of Israeli wall”„Olmert: Willing to trade land for peace”„Mapping Peace between Syria and Israel”„Egypt: Israel must accept the land-for-peace formula”„Israel: Age structure from 2005 to 2015”„Global, regional, and national disability-adjusted life years (DALYs) for 306 diseases and injuries and healthy life expectancy (HALE) for 188 countries, 1990–2013: quantifying the epidemiological transition”10.1016/S0140-6736(15)61340-X„World Health Statistics 2014”„Life expectancy for Israeli men world's 4th highest”„Family Structure and Well-Being Across Israel's Diverse Population”„Fertility among Jewish and Muslim Women in Israel, by Level of Religiosity, 1979-2009”„Israel leaders in birth rate, but poverty major challenge”„Ethnic Groups”„Israel's population: Over 8.5 million”„Israel - Ethnic groups”„Jews, by country of origin and age”„Minority Communities in Israel: Background & Overview”„Israel”„Language in Israel”„Selected Data from the 2011 Social Survey on Mastery of the Hebrew Language and Usage of Languages”„Religions”„5 facts about Israeli Druze, a unique religious and ethnic group”„Israël”Israel Country Study Guide„Haredi city in Negev – blessing or curse?”„New town Harish harbors hopes of being more than another Pleasantville”„List of localities, in alphabetical order”„Muncitorii români, doriți în Israel”„Prietenia româno-israeliană la nevoie se cunoaște”„The Higher Education System in Israel”„Middle East”„Academic Ranking of World Universities 2016”„Israel”„Israel”„Jewish Nobel Prize Winners”„All Nobel Prizes in Literature”„All Nobel Peace Prizes”„All Prizes in Economic Sciences”„All Nobel Prizes in Chemistry”„List of Fields Medallists”„Sakharov Prize”„Țara care și-a sfidat "destinul" și se bate umăr la umăr cu Silicon Valley”„Apple's R&D center in Israel grew to about 800 employees”„Tim Cook: Apple's Herzliya R&D center second-largest in world”„Lecții de economie de la Israel”„Land use”Israel Investment and Business GuideA Country Study: IsraelCentral Bureau of StatisticsFlorin Diaconu, „Kadima: Flexibilitate și pragmatism, dar nici un compromis în chestiuni vitale", în Revista Institutului Diplomatic Român, anul I, numărul I, semestrul I, 2006, pp. 71-72Florin Diaconu, „Likud: Dreapta israeliană constant opusă retrocedării teritoriilor cureite prin luptă în 1967", în Revista Institutului Diplomatic Român, anul I, numărul I, semestrul I, 2006, pp. 73-74MassadaIsraelul a crescut in 50 de ani cât alte state intr-un mileniuIsrael Government PortalIsraelIsraelIsraelmmmmmXX451232cb118646298(data)4027808-634110000 0004 0372 0767n7900328503691455-bb46-37e3-91d2-cb064a35ffcc1003570400564274ge1294033523775214929302638955X146498911146498911

Кастелфранко ди Сопра Становништво Референце Спољашње везе Мени за навигацију43°37′18″ СГШ; 11°33′32″ ИГД / 43.62156° СГШ; 11.55885° ИГД / 43.62156; 11.5588543°37′18″ СГШ; 11°33′32″ ИГД / 43.62156° СГШ; 11.55885° ИГД / 43.62156; 11.558853179688„The GeoNames geographical database”„Istituto Nazionale di Statistica”проширитиууWorldCat156923403n850174324558639-1cb14643287r(подаци)