最新消息:雨落星辰是一个专注网站SEO优化、网站SEO诊断、搜索引擎研究、网络营销推广、网站策划运营及站长类的自媒体原创博客

x86 - Is floating point math determinstic for all intelamd cpus? - Stack Overflow

programmeradmin2浏览0评论

Suppose I have already compiled a binary, doing some float caculation and output the result. If I provide same input for different execution, can I assume that the result must be completely the same (bit-identical)? Does the binary always produce determinstic result for every instruction (ADDPS, FMADD ... or other sse/avx floating instructions) on all kind of x86_64 CPUS? If not, any instruction/arch example?

Suppose I have already compiled a binary, doing some float caculation and output the result. If I provide same input for different execution, can I assume that the result must be completely the same (bit-identical)? Does the binary always produce determinstic result for every instruction (ADDPS, FMADD ... or other sse/avx floating instructions) on all kind of x86_64 CPUS? If not, any instruction/arch example?

Share asked Feb 11 at 8:43 song xssong xs 811 bronze badge 4
  • I'd expect most CPUs to get down to +/-0.5 ULP on special functions like trig, log, exp using the same algorithm. But the dither might be slightly different on radically different CPUs. I'd be surprised if there were not a few outliers for specific awkward numbers where near catastrophic rounding occurs in a rational approximation. Offhand I don't know of any examples - pick a pair of very different CPUs and test it. FWIW I'd expect all of the classic binary operators to be exact when used in isolation. sqrt these days might also be but if you go back far enough it won't be. – Martin Brown Commented Feb 11 at 10:59
  • 1 If the CPU is conforming to IEE754, then all basic operations (including sqrt) must round to a well-defined value depending to the current rounding mode. Not sure about rsqrt and rcp (I would very strongly assume, these also give identical results on all capable CPUs). Your code may of course have other sources of nondeterminism (non-initialized memory, concurrency), or you may have faulty hardware, etc. – chtz Commented Feb 11 at 11:40
  • 1 @chtz Some people have observed small differences between the results of 'rsqrtss' on Intel and AMD cpu's see here – wim Commented Feb 11 at 11:54
  • Are you including the libraries you link with, particularly dynamic libraries, as part of the “binary”? If you execute the same executable file with different dynamically linked libraries (either on different systems or with updated versions), the libraries may behave differently. – Eric Postpischil Commented Feb 11 at 15:51
Add a comment  | 

2 Answers 2

Reset to default 3

It depends on your binary executable.

A software developer and/or compiler may choose to use different code paths, depending on the instruction set support of the actual CPU and/or OS (runtime cpu dispatching). x86-64 only mandates SSE and SSE2 support. Modern CPU’s may have support for instruction sets such as AVX2/FMA and AVX-512. These instruction sets may help to improve the performance and/or the accuracy of floating point operations. But, for example, the result of computing a*b+c with a single vfmadd132ss instruction is not necessarily bit-identical with the result of a separate add and mul instruction (vmulss and vaddss). Note that library calls also may cause (unexpected) runtime cpu dispatching.

Moreover instructions such as the approximate inverse square root vrsqrtss are not bit-identical across AMD and Intel processors.

The basic floating point instructions, such as add, sub, mul, div, fma and sqrt are deterministic. With an identical code path but different processors, the outcome should be identical if only these instructions are executed.

[One more attempt...]

In addition to @wim's answer above:

Another reference, from 2016, in which I report on comparing the rsqrt and rcp instructions between Intel and AMD processors is https://github/jeff-arnold/math_routines/blob/main/rsqrt_rcp/docs/rsqrt_rcp.pdf. This shows that the rsqrt and rcp instructions may give different results for the same arguments on Intel and AMD processors, and that these differences may affect the result of an application. It deduces the underlying mechanisms of these instructions and shows how they differ on those two processors.

See also https://members.loria.fr/PZimmermann/papers/accuracy.pdf which is a (continuing) study of the accuracy of various implementations of math library functions. The last paragraph of the introduction is relevant to the original question, explaining that a given library run on different hardware may give different results because of runtime dispatching (i.e., different code paths executed based on the underlying hardware) and, for some particular instructions (e.g., rsqrt and rcp), their execution on different hardware may give different results.

发布评论

评论列表(0)

  1. 暂无评论