最新消息:雨落星辰是一个专注网站SEO优化、网站SEO诊断、搜索引擎研究、网络营销推广、网站策划运营及站长类的自媒体原创博客

floating point - If x*y ≠ 2ⁿ, does it follow that x * y = ((x * y)y) * y under IEEE 754 semantics? - Stack Overflow

programmeradmin1浏览0评论

This is a follow-up to my previous question here. Additional restrictions highlighted in bold.

Given two nonzero, finite, double-precision (a.k.a. binary64) floating point numbers x and y, is it always true that the equality

x * y == ((x * y) / y) * y

holds under default¹ IEEE 754 semantics if the result of the floating point operation x * y is not some power of 2? (I.e., if writing out the significand of x * y in base 2 shows at least two bits set.)

I've done a brute-force check over every possible combination of half-precision floating point numbers (a.k.a. binary16) using MPFR, so I know that the assertion holds true within that domain.

I've also searched programmatically over billions of possibilities in binary64 (including in the subnormal range) and am unable to find a counterexample, yet am also unsure how to go about proving the assertion to be true.

(What I do know is that the simpler assertions x == (x / y) * y and x == (x * y) / y are both false, as I can easily find counterexamples for them.)

¹ includes the default rounding mode: "round to nearest, ties to even"

This is a follow-up to my previous question here. Additional restrictions highlighted in bold.

Given two nonzero, finite, double-precision (a.k.a. binary64) floating point numbers x and y, is it always true that the equality

x * y == ((x * y) / y) * y

holds under default¹ IEEE 754 semantics if the result of the floating point operation x * y is not some power of 2? (I.e., if writing out the significand of x * y in base 2 shows at least two bits set.)

I've done a brute-force check over every possible combination of half-precision floating point numbers (a.k.a. binary16) using MPFR, so I know that the assertion holds true within that domain.

I've also searched programmatically over billions of possibilities in binary64 (including in the subnormal range) and am unable to find a counterexample, yet am also unsure how to go about proving the assertion to be true.

(What I do know is that the simpler assertions x == (x / y) * y and x == (x * y) / y are both false, as I can easily find counterexamples for them.)

¹ includes the default rounding mode: "round to nearest, ties to even"

Share Improve this question edited yesterday Hans Brende asked Feb 7 at 18:21 Hans BrendeHans Brende 8,5774 gold badges38 silver badges46 bronze badges 16
  • For all rounding modes, or the default of nearest-to-even? (I don't have an argument why that would matter, just wanted to further clarify the constraints.) – alias Commented Feb 7 at 18:24
  • Default, nearest-to-even! – Hans Brende Commented Feb 7 at 18:24
  • @EricPostpischil. I found counterexamples for x*y/y not being x, but when those were multiplied by y, they produced the same value, thus not providing a counterexample for our master hypothesis. However, by playing with the rounding modes, I easily got a counterexample. – Kaz Commented Feb 7 at 20:27
  • @EricPostpischil It's not obvious what is wrong with the counterexample given in your deleted answer. – Kaz Commented Feb 7 at 20:29
  • @Kaz: x*y was a power of two. – Eric Postpischil Commented Feb 7 at 20:40
 |  Show 11 more comments

1 Answer 1

Reset to default 1

Conclusion

Provided all operands and results are in the normal range and a round-to-nearest method is used, x*yx*y/y*y in binary-floating-point implies x*y is a power of two or the negation of a power of two.

Notation

Text in font expressions floating-point operations: a is a floating-point number. a*b is a floating-point operation, with rounding. Operations in the default font are operations in real-number arithmetic. ab is the real-number-arithmetic product of a and b, with no rounding. ab is the real-number-arithmetic product of the real number a and the floating-point number b.

Sometimes a floating-point number will be written in the default font for ease of reading: a is the same real number as a. (This lets us write xy for the product of x and y, which seems preferable to xy or xy.)

ULP(a) is the unit of least precision of a in the floating-point format.

Let u be ULP(1) (2−52 for binary64).

Proof

Since, within the normal range, the property is invariant under scaling x or y by powers of two or negating them, we may assume without loss of generality that x, y ∊ [1, 2).

If x = 1, 1*y1*y/y*y is clearly false. If y = 1, x*1x*1/1*1 is clearly false. This leaves us with x, y ∊ (1, 2).

Suppose x and y are a counterexample.

If x*y/y = x, then (x*y/y)*y = (x)*y, and thus x*y = x*y/y*y. So x*y/y and x must differ.

As x*y/y has only two rounding errors, each at most ½ ULP at its scale, it cannot produce a result more than 1 ULP different from x.1 Therefore x*y/y = x±u.

Let e0 be the rounding error in x*y: e0 = x*yxy. With round to nearest, |e0| ≤ ½ ULP(xy). ULP(xy) is u or 2u according to whether xy is in [1, 2) or [2, 4).

Let e1 be the rounding error in the division of x*y/y: e1 = x*y/yx*y/y. |e1| ≤ ½ ULP(x) = ½u. (This is true even if the division rounds up to 2. Although ULP(2) is 2u, the division would be rounding up from a value above 2−½u. And it cannot be rounding down; that would imply 2 < x*y/y ⇒ 2y < x*y = xy+e0 ⇒ 2 < x+e0, but x is at least u below 2, and e0 ≤ ½u.)

e0 and e1 must have the same sign.2

From above, x*y/y = x±u, so x±u = x*y/y + e1 = (xy+e0)/y + e1 = x + e0/y + e1, so e0/y + e1 = ±u.

Since |e1| ≤ ½u, |e0/y| ≥ ½u, so |e0| ≥ ½uy. And y > 1, so |e0| > ½u.

So xy cannot be in [1, 2), because then its ULP would be u, and the magnitude of its rounding error e0 cannot exceed ½ u. Therefore xy is in [2, 4). It cannot be 2, since we have assumed x and y are a counterexample to x*y being a power of two, so xy is in (2, 4).

Let e2 be the rounding error in the last multiplication of x*y/y*y: e2 = x*y/y*yx*y/yy.

For x*y/y*y to differ from x*y/y, it must be an ULP away from x*y = xy + e0. If x*y > 2, that ULP is 2u, so ±2u = x*y/y*yx*y = (xy + e0 + e1y + e2) − (xy + e0) = e1y + e2. However, we know |e1| ≤ ½u and y < 2, so |e1y| < u, so ±2u = e1•y + e2 implies |e2| > u, which violates round-to-nearest. Therefore x*y > 2 is false.

However, xy > 2, so the only possibility is x*y = 2. That allows x*y/y*y < 2, which explains the cases previously seen here that have x*yx*y/y*y with x*y equal to a power of two. And now we have shown that is a requirement; for our x and y in [1, 2), x*yx*y/y*y implies x*y is 2, and therefore any scaling of x and y by powers of two and/or negating of one or both will produce an x*y that is a power of two or the negation of a power of two.

Footnotes

1 This is not quite as formal as I would like. I expect we could improve it with a little work.

2 This should have a bit of proof.

与本文相关的文章

发布评论

评论列表(0)

  1. 暂无评论