JavaRush /Java Blog /Random EN /What's inside a floating point number and how it works

Level 2

Харьков

8 August 2023
200 views
0 comments

What's inside a floating point number and how it works

Published in the Random EN group

Content:

Image: http://pikabu.ru/

Introduction

In the very first days of learning Java, I stumbled upon such a curious kind of primitive as floating point numbers. I was immediately interested in their features and, moreover, the way they are written in binary code (which is interconnected). Unlike any range of integers, even in a very small interval (for example, from 1 to 2), there are an infinite number of them. And having a finite memory size, it is impossible to express this set. So how are they expressed in binary and how do they work? Alas, the explanations on the wiki and a rather cool article on Habré did not give me a complete understanding here , although they laid the foundation. Awareness came only after this article-analysis the next morning after reading.

Excursion into history

( I got it from this article on Habré ) In the 60-70s, when computers were large and programs were small, there was no single standard for computing, as well as a standard for expressing the floating point number itself. Each computer did it in its own way, and each had its own errors. But in the mid-70s, Intel decided to make new processors with supported "improved" arithmetic and standardize it at the same time. Professors William Kahan and John Palmer (no, not the author of books about beer) were involved in the development. It was not without drama, but still a new standard was developed. Now this standard is called IEEE754

Floating point format

Even in school textbooks, everyone came across an unusual way of writing very large or very small numbers of the form 1.2 × 10 ³ or 1.2E3 , which is equal to 1.2 × 1000 = 1200 . This is called the method of writing through the exponent. In this case, we are dealing with an expression of a number according to the formula: N=M×n ^p , where

N = 1200 - the resulting number
M \u003d 1.2 - mantissa - fractional part, without regard to orders
n = 10 is the base of the order. In this case, and when we are not talking about computers, the number 10 is the base.
p = 3 - degree of base

Quite often, the base of the order is implied as 10 and only the mantissa and the value of the degree of the base are written, separating them with the letter E. In our example, I gave equivalent entries 1.2 × 10 ³ and 1.2E3 If everything is clear, and we have finished the nostalgic excursion into the school curriculum, then now I recommend forgetting this, because when forming a floating point number, we are dealing with powers of two, not tens, i.e. n = 2 , the whole slender formula 1,2E3 breaks down and it broke my brain great.

Sign and degree

And what do we have? As a result, we also have a binary number, which consists of the mantissa - the part that we will raise to a power and the power itself. In addition, as is customary for integer types, floating-point numbers have a bit that determines the sign - whether the number will be positive or negative. As an example, I propose to consider the type float, which consists of 32 bits. With double precision numbers, doublethe logic is the same, only twice as many bits. Of the 32 bits, the first senior bit is assigned to the sign, the next 8 bits are assigned to the exponent - the degree to which we will raise the mantissa, and the remaining 23 bits - to the mantissa. To demonstrate, let's look at an example example: What's inside a floating point number and how it works - 1

What's inside a floating point number and how it works - 1

With the first bit, everything is very simple. If the value of the first bit is 0, then the number we get will be positive . If the bit is 1 , then the number will be negative . The next block of 8 bits is the exponent block. The exponent is written as an ordinary eight-bit number, and in order to get the required degree, we need to subtract 127 from the resulting number . In our case, eight bits of the exponent is 10000001 . This corresponds to the number 129 . If there is a question, how to calculate it, then the picture is a quick answer. An expanded one can be obtained in any Boolean algebra course. What's inside a floating point number and how it works - 2

What's inside a floating point number and how it works - 2

1x2 ⁷ + 0x2 ⁶ + 0x2 ⁵ + 0x2 ⁴ + 0x2 ³+ 0x2 2 ⁺ 0x2 ¹ + 1x2 ⁰ = 1x128 + 1x1 = 128+1=129 It's easy to calculate that the maximum number we can get from these 8 bits is 11111111 ₂ = 255 ₁₀ (subscript 2 and 10 mean binary and decimal systems) However, if you use only positive exponents ( from 0 to 255 ), then the resulting numbers will have many numbers before the decimal point, but not after? To get negative values of the degree, 127 must be subtracted from the formed exponent . So the power range will be -127 to 128. If we use our example, then the desired degree will be 129-127 = 2 . As long as we remember this number.

Mantissa

Now about the mantissa. It consists of 23 bits, but at the beginning there is always one more unit implied, for which bits are not allocated. This is done for reasons of efficiency and economy. The same number can be expressed in different powers by adding zeros to the mantissa before or after the decimal point. The easiest way to understand this is with a decimal exponent: 120,000 = 1.2×10 ⁵ = 0.12×10 ⁶ = 0.012×10 ⁷ = 0.0012×10 ⁸ and so on. However, by entering a fixed number in the head of the mantissa, we will get new numbers every time. Let's take it for granted that before our 23 bits there will be one more with a unit. Usually this bit will be separated from the rest by a dot, which, however, does not mean anything. It's just more convenient that way. 11100000000000000000000 What's inside a floating point number and how it works - 3

What's inside a floating point number and how it works - 3

Now the resulting mantissa must be raised to a power from left to right, decreasing the power by one with each step. We start with the value of the degree that we received as a result of the calculation, i.e. 2 (I deliberately chose a simple example so as not to write each value of the power of two and in the above table did not calculate them when the corresponding bit is zero) What's inside a floating point number and how it works - 4

What's inside a floating point number and how it works - 4

1 × 2 ² + 1×2 ¹ + 1×2 ⁰ + 1×2 ^-1 = 1×4 + 1×2 + 1×1 + 1×0.5 = 4+2+1+0.5 = 7.5 and got the result 7.5 , correctness can be checked, for example, at this link

Results

The standard floating-point number of the type floatconsists of 32 bits, the first bit is the sign (+ or -), the next eight are the exponent, the next 23 are the mantissa By sign - if bit 0 - the number is positive. If bit 1 is negative. By exponent - we convert bit by bit to a decimal number (the first bit on the left is 128 , the second is 64, the third is 32, the fourth is 16 , the fifth is 8 , the sixth is 4 , the seventh is 2 , the eighth is 1 ), we subtract 127 from the resulting number , we get the degree with which we will start. By mantissa- to the existing 23 bits in front, we add one more bit with a value of 1 and from it we begin to raise to the power we have received, decrementing this power with each next bit. That's all folks, kids! What's inside a floating point number and how it works - 5

What's inside a floating point number and how it works - 5

PS: As a homework, using this article, leave in the comments your versions of why precision errors occur with a large number of arithmetic operations with floating point numbers

Comments

TO VIEW ALL COMMENTS OR TO MAKE A COMMENT,
GO TO FULL VERSION