Real numbers in computer memory. Explanation.

Good afternoon While studying the lecture “Nuances of working with real numbers” of the first quest (section 2. The structure of floating point numbers) and additional lectures on the topic, many must have encountered many questions on this topic. Initially, I tried to give myself the necessary answers, and now I offer them to you to help you fully understand in a consistent logical order. 1. Decimal and binary number systems. 1.1 The decimal number system is one of the most common systems; it is the one we use for any non-computer mathematical calculations at school, at university, in life. It uses the numbers 1, 2, 3, 4, 5, 6, 7, 8, 9, 0 (Arabic) - 10 digits in total. There is also a Roman notation for numbers, which, however, is practically not used now. In the decimal system, counting is carried out in units, tens, hundreds, thousands, tens of thousands, hundreds of thousands, millions, etc. - in other words, these are all the digits of the number. The digit of a number is the position (place) of the digit in the number record. The lowest digit of natural numbers (and the least significant ) is the units digit (rightmost). Why is he the most insignificant? Because by dropping the unit digit of a number, the number itself will change minimally (for example, the numbers 345 and 340). Next, the second digit is the tens digit, etc. What does this all mean? Let's take any integer in the decimal system and decompose it into digits . 3297 = 3*1000 + 2*100 + 9*10 + 7 Thus, we find that the number 3297 contains 3 units of the fourth digit (that is, 3 thousand), 2 units of the third digit (2 hundreds), 9 units of the second digit (9 tens) and 7 units of the first digit . In other words, this number is three thousand two hundred ninety-seven and it is, accordingly, positional . What about the digits of fractional (real) numbers? The digits of fractional numbers (their fractional part) are called: tenths, hundredths, thousandths, ten-thousandths, etc. The farther the digit is from the decimal point (from the whole part of the number), the less significant it is (discarding it, the value of the number will change little). For example, let’s take any fractional number presented as a decimal: 25.076 = 2*10 + 5 +0*0.1 + 7*0.01 +6*0.001 Thus, we find that the fractional number 25.076 contains 2 tens, 5 units, 0 tenths, 7 hundredths and 6 thousandths. The decimal system uses 10 digits and multiples of 10 places - hence the name "decimal". 1.2 Binary number system is a number system used in almost all modern computers and other computing electronic devices. To record numbers, it uses only two digits - 0 and 1. In order not to confuse which number system the number is written in, it is provided with an indicator at the bottom right (it is the base of the number system ), for example: 1000₁₀ 1000₂ Here the first number is the familiar thousand in decimal system, and the bottom one is the number in the binary system representation and it is equal in the decimal system... 8 ! Like the decimal system, the binary system also breaks down numbers into digits . Each digit in a binary number is called a bit (or digit ). (If anyone is interested, four bits is a nibble (or tetrad), 8 bits is a byte , 16 bits is a word , 32 bits is a double word ). Bits (digits) are also numbered from right to left, starting from zero (unlike the decimal system). The least significant, least significant, right bit has a sequence number of 0 . Next comes the first bit, the second , etc., the older the bit, the more significant it is (by analogy with the decimal system we understand - if you remove the ones from the number 1455, you will be left with the number 1450 - almost equal to the initial one. But if you remove the hundreds, you will be left with the number 1050, which is already far from the initial value, because the hundreds place is much more significant (high-order) than the units place). Example. DO NOT READ YET :))) Real numbers in computer memory. UNDER DEVELOPMENT!!! - 2

DO NOT READ YET :))) Real numbers in computer memory. UNDER DEVELOPMENT!!! - 2

At the bottom, the bits of this fractional binary number are numbered in red - in total we have 18 bits (digits) of this number. Looking ahead, I would like to note that fractional numbers are stored in computer memory in a completely different way - this will be discussed later. In the meantime, let's learn how to convert numbers from one number system to another. 2. Converting integers and fractions from the decimal system to the binary system and vice versa. 2.1 Conversion from decimal to binary. 2.1.1 Integers. In order to convert an integer decimal number to the binary number system, you need to divide this number by 2, write down the remainder of the division (it is always equal to 0 or 1, depending on whether the number is even or odd), and divide the result of the division again by 2 , again write down the remainder of the division (0 or 1), and divide the result from the second division by 2 again. Continue this way until the result of the division equals one. Next, we write down all the resulting zeros and ones in reverse order , starting with the most recent division result, which is always equal to 1. Important note. The end result of sequential division of ANY INTEGER by 2 will always end up being one (1)! If the result is greater than 1, we continue to divide this result by 2 until we get one as a result. And the result of division by 2 can be zero (0) only in one case - this is the division of zero itself by 2. Example. Let's convert the number 145 from the decimal system to binary. 145/2 = 72 (remainder 1 ) 72/2 = 36 (remainder 0 ) 36/2 = 18 (remainder 0 ) 18/2 = 9 (remainder 0 ) 9/2 = 4 (remainder 1 ) 4/2 = 2 (remainder 0 ) 2/2 = 1 (remainder 0 ) Now we “collect” our binary number in reverse order. We get the number 10010001. Done! Interesting nuance 1. Let's convert the number 1 from the decimal system to binary. In the binary system, this number will also be written as 1. After all, the final result of division by 2, which should be equal to 1, is already equal to the number 1 itself. 1₁₀ = 1₂ Interesting nuance 2. Let's convert the number 0 from the decimal system to binary. In the binary system, this number will also be written as 0. 0₁₀ = 0₂ 2.1.2 Fractional numbers. How to convert fractional numbers to binary? To convert a decimal fraction to the binary number system, you must: a) convert the whole part of the fraction into the binary system according to the studied algorithm in paragraph 2.1.1 b) multiply the fractional part of the fraction by 2 , write the resulting digit of the result BEFORE the decimal point (always equal to 0 or 1, which is logical), then ONLY multiply the fractional part of the result obtained by 2 again, write down the resulting digit of the result BEFORE the decimal point (0 or 1) and so on until the fractional part of the multiplication result becomes equal to 0 or until the required number of decimal places (required precision ) (equal to the number of multiplications by 2). Then you need to write down the resulting sequence of written zeros and ones IN ORDER after the point separating the integer and fractional parts of the real (fractional) number. Example 1. Let's convert the number 2.25 (2 point 25 hundredths) from the decimal system to the binary system. In the binary system the fraction will be equal to 10.01 . How did we get this? The number consists of an integer part (up to a point) - this is 2 and a fractional part - this is 0.25. 1) Translation of the whole part: 2/2 = 1 (remainder 0 ) The whole part will be 10 . 2) Translation of the fractional part. 0.25 * 2 = 0 .5 (0) 0.5 * 2 = 1 .0 (1) The fractional part became equal to 0 as a result of successive multiplication by 2. We stop multiplying. Now we “collect” the fractional part IN ORDER - we get 0.01 in the binary system. 3) Add the integer and fractional parts - we get that the decimal fraction 2.25 will be equal to the binary fraction 10.01 . Example 2. Let's convert the number 0.116 from the decimal system to the binary system. 0.116 * 2 = 0.232 (0) 0.232 * 2 = 0.464 (0) 0.464 * 2 = 0.928 (0) 0.928 * 2 = 1.856 (1) //discard the integer part of this result 0.856 * 2 = 1 .712 (1) //discard the whole part of this result 0.712 * 2 = 1 .424 (1) //discard the whole part of this result 0.424 * 2 = 0 .848 (0) As we can see, the multiplication goes on and on , the fractional part of the result does not become equal to 0. Then we decide that we will convert our decimal fraction into binary with an accuracy of 7 decimal places (bits) after the point (in the fractional part). Let us remember what we studied about insignificant bits - the further the bit (bit) is from the whole part, the easier it is for us to neglect it (explanation in section 1 of the lecture, who forgot). We get the binary fraction 0.0001110 with an accuracy of 7 bits after the dot. 2.2 Conversion from binary to decimal. 2.2.1 Integers. To translate the wholenumber from the binary number system to decimal, it is necessary to divide this number into digits (bits) and multiply each digit (bit) by the number 2 to a certain positive degree (this degree starts counting from right to left from the least significant (right bit) and starts from 0 ) . In other words, the power of two is equal to the number of a given bit (but this unwritten rule can only be used in the case of converting integers , since for fractional numbers the numbering of bits begins in the fractional part, which is translated into the decimal system differently ). Next you need to add up the resulting products. Example. Let's convert the binary number 110011 to the decimal number system. 110011₂ = 1*2⁵ + 1*2⁴ + 0*2³ + 0*2² + 1*2¹ + 1*2º = 32 +16 +0 + 0 + 2 + 1 = 51₁₀ As a result, we get the number 51 in the binary system . For information, below is a table of the first powers of the number 2 . DO NOT READ YET :))) Real numbers in computer memory. UNDER DEVELOPMENT!!! - 5

DO NOT READ YET :))) Real numbers in computer memory. UNDER DEVELOPMENT!!! - 5

! Please note that the zero power of a number is always 1. 2.2.2 Fractional numbers. In order to convert a binary fractional (real) number to decimal , you must: a) convert its integer part to decimal according to the algorithm from paragraph 2.2.1 ; b) translate its fractional part as follows. It is necessary to present the fractional part as the sum of products of digits by two , raised to a certain negative power (the power for the first digit after the point (after the whole part of the fraction) will be equal to -1, for the second digit after the point will be equal to -2, etc.) Result this amount will be the fractional part of the number in the decimal system. Example. Let's convert the number 10111.01 to the binary system. 10111.01₂ = (1*2⁴ + 0*2³ + 0*2² + 1*2¹ + 1*2º) . (0*2ˉ¹ + 1*2ˉ²) = (16 + 0 + 4 + 2 + 1) . (0 + 0.25) = 23.25₁₀ As a result, we get the number 23.25 in the decimal number system. The table of first negative powers of 2 is given below. DO NOT READ YET :))) Real numbers in computer memory. UNDER DEVELOPMENT!!! - 7

DO NOT READ YET :))) Real numbers in computer memory. UNDER DEVELOPMENT!!! - 7

2.2.3 General formula for converting numbers from binary to decimal. Let's give a general formula for converting numbers from binary to decimal (both integer and fractional parts). DO NOT READ YET :))) Real numbers in computer memory. UNDER DEVELOPMENT!!! - 4

DO NOT READ YET :))) Real numbers in computer memory. UNDER DEVELOPMENT!!! - 4

where A is a number in the binary number system; The base of the number system is 2 (meaning each bit is multiplied by 2 to the power); n— number of integer digits (bits) ; m is the number of fractional digits (bits) of the number . The first bit of the integer part from the dividing point is highlighted in red . It is always multiplied by 2 to the zero power. The next bit before it (to the left) is multiplied by 2 to the first power, etc. The first bit of the fractional part from the dividing point is highlighted in green . It is always multiplied by 2 to the minus first power. The next bit to the right is multiplied by 2 to the minus second power, etc. 3. Scientific notation: a normalized notation in both systems. Mantissa, exponent, degree of exponent. 3.1 Exponential form of writing a number. Previously, we studied a detailed scheme for recording positional numbers by digit. Let's take the number 0.0000000000000000000016 . It has a very long entry in standard form. And in exponential form it will look like this: 1.6 * 10ˉ²¹ So what is the exponential form of a number and how to represent a number in this form? Scientific notation for a number is a representation of real numbers as a mantissa and exponent . Convenient for representing very large and very small numbers, as well as for unifying their writing. N = M * pⁿ where N is the number to be written, M is the mantissa of the number, p is the base (equal to the base of the number system of the given number), n (integer) is the order (degree, can be positive and negative), p to the power of n is the characteristic numbers (exponent, i.e. base raised to a power (order)). An important nuance. If the integer part of the decimal number is different from 0 , then the order (degree) of the exponent will be positive , if the integer part is equal to 0 , the degree of the exponent will be negative . 3.2 Normal and normalized form of writing numbers. The normal form of a number is a form in which the mantissa (without taking into account the sign) is located on the half-interval [0,1], that is, 0 <= M < 1. This form of writing has a drawback: some numbers are written ambiguously (for example, 0.0001 can be write as 0.000001*10², 0.00001⋅10¹, 0.0001⋅10º, 0.001⋅10ˉ¹ and so on). Therefore, another form of recording is widespread (especially in computer science) - normalized, in which the mantissa of a decimal number takes values from 1 (inclusive) to 10 (exclusive), that is, 1 <= M < 10 (similarly, the mantissa of a binary number takes values from 1 to 2 ). In other words, the mantissa in the decimal system must be a fractional number from 1.0 (inclusive) to 10 (exclusive) , i.e. the integer part of the mantissa must contain a single digit, and the fractional part is not mathematically limited. The advantage of the normalized form is that, thus, any number (except 0) is written in a unique way. The disadvantage is that it is impossible to represent 0 in this form, so the representation of numbers in computer science provides a special sign (bit) for the number 0. 3.3 Examples of writing decimal numbers in exponential normalized form. Let's look at examples. Example 1. Let's write the decimal number 1015000 (one million fifteen thousand) in exponential normalized form. The number system for this number is decimal, so the base will be 10 . Let's select the mantissa . To do this, imagine the number as a fraction, the fractional part of which will be equal to zero (since the number is an integer): 1000000.0. If the integer part of the number is greater than 0 , then move the point to the left of its initial position (inside the integer part) until there is only one digit left in the integer part . After it we put a period. We discard insignificant zeros (at the end of the number). We get the mantissa of the number equal to 1.015 . Let's determine the degree (order) of the base of the number. How many positions to the left has our point separating the integer and fractional parts moved? For six positions. This means the order will be 6 . In this case, the order is positive (we moved the point in the integer part of the number not equal to 0). The final entry in normalized form: 1.015 * 10⁶ . We can write this number in this form: 1.015E6 (where E6 is the exponent of a decimal number, that is, 10 to the 6th power). Let's test ourselves. Exponential notation for a number is nothing more than the product of a number (mantissa) and another number (exponent). What happens if you multiply 1.015 by 10⁶? 1.015*10⁶ = 1.015*1000000 = 1015000 . That's right. This approach (normalized) helps create an unambiguous recordnumbers in exponential form, as indicated above. Example 2. Let's write the decimal real number 0.0098 in normalized form. Let's highlight the base of the number - it is equal to 10 (decimal number system). Let's select the mantissa of the number - it is equal to 9.8 (the integer part of the number is equal to zero, which means we move the point to the right to the first significant digit (lying in the range from 1 to 9 inclusive). We determine the order of the number - we moved the point by three positions, which means the order is 3. Positive is it negative or negative? Since we moved the point to the right (in the fractional part of the number), the order (power) will be negative . The final record of the number in normalized form is 9.8 * 10ˉ³ or 9.8E-3 . Let's check ourselves again. Multiply 9.8 by 10ˉ³. 9.8 * 10ˉ³ = 9.8 * 0.001 = 0.0098 . That's right. Example 3. Let's write the decimal real number 3.56 in normalized form. Select the base of the number - it is equal to 10 (decimal number system). Select the mantissa of the number - it is equal to... 3.56 (the integer part of the number is one single digit, not equal to 0. This means that the point does not need to be shifted anywhere, the number itself will be the mantissa.) Let’s highlight the order of the base: By what number should the mantissa, equal to the number itself, be multiplied so that it does not change? Per unit. This means that the order will be zero. The final record of the number in normalized form is 3.56 * 10º or 3.56E0. 4. Storing real numbers in computer memory: float and double. 4.1 Types float and double. Let's move on to the key section of our lecture. As we already know, there are two types of real numbers in Java: float and double . The float type occupies 32 bits in computer memory and can take values in the range [3.4E-38; 3.4E+38) (in other words, in the range from 3.4*10ˉ³⁸ (inclusive) to 3.4 * 10³⁸ (excluding)). Important nuance 1. Float numbers can be either positive or negative. This range above is presented to indicate the modules of numbers included in the float range. Important nuance 2. 10³⁸ is approximately equal to 2¹²⁷ , respectively, 10 ˉ³⁸ is approximately equal to 2ˉ¹²⁷ . Thus, the interval of absolute values of float numbers can be written as [3.4 * 2ˉ¹²⁷; 3.4 * 2¹²⁷). The double type takes up twice as much computer memory -64 bits and can accept decimal values in the range [-1.7E+308; 1.7E+308) respectively. 4.2 Exponential normalized form of binary numbers. We know that numbers are stored in binary form in computer memory. So, let's take the number 1560.256 (float type) and convert it to the binary system in positional form: 11000011000.01000001100 . You might think that this is how it will be stored in the computer's memory. But that's not true! In computer memory, the types float and double ( real floating-point types ) are stored in exponential normalized form , but the base of the power is 2 instead of 10. This is due to the fact that, as stated above, all data in the computer is represented in binary form (bits ). A certain amount of computer memory is allocated for a number. Let's represent the positive number 15.2 in normalized exponential form: 1.52*10¹ . Next, let's represent its binary "twin" 1111.00110011001 also in exponential normalized notation, using the same algorithm: 1) The base will be equal to 2 2) The mantissa will be equal to 1.11100110011001 3) The degree will be positive and equal to 3 (the point is shifted 3 bits to the left) in decimal system. Let's convert it to the binary system: 11 . So in binary exponential normalized form it would be 1.11100110011001 * 2¹¹. 4.3 Storing the exponential normalized binary form of a float number in computer memory. So, we figured out that a real number will be stored in computer memory in exponential normalized binary form . How will it look in memory? Let's take the float type . The computer allocates 32 bits for each float number . They are distributed as follows . This figure schematically shows the allocated memory for a 32-bit float number in a computer. DO NOT READ YET :))) Real numbers in computer memory. UNDER DEVELOPMENT!!! - 5

The bit numbering is indicated in red . Green indicates a piece of allocated memory (1 bit) for storing the sign of the number. Yellow indicates a piece of allocated memory for storing the shifted power (order) of the exponential form of the number (8 bits). Bluedenotes a piece of allocated memory for storing the normalized mantissa of a number without an implicit unit (23 bits). Let's take a closer look. 1) Sign bit. The most significant (first from the left) bit is always allocated to store the sign of the number (1 if the number is negative, and 0 if the number is positive). An exception may be the number zero - in programming, zero can be both negative and positive . 2) Next come the bits of the degree (order) of the exponent with base 2 . For this, 8 bits are allocated. The exponent degree of float numbers , as we know, can be both negative (for numbers whose integer part is 0, see paragraph 3.3) and positive (for numbers whose integer part is different from zero) and ranges from 2ˉ¹²⁷ to 2¹²⁷ . In theory, we should allocate one bit to determine the sign of the exponent, as is the case with the sign bit. But that's not true. In order not to waste a bit on determining the sign of the exponent, float numbers add an offset to the exponent of half a byte +127 (0111 1111). Thus, instead of a range of powers from 2ˉ¹²⁷ to 2¹²⁷, the computer stores a range of powers from 0 to +254 - all power values are positive , there is no need to waste an extra byte on the sign. It turns out that the value of the exponent is shifted by half relative to the possible value. This means that to obtain the actual value of the exponent, you must subtract this offset from the value stored in memory. If the exponent value stored in memory is less than the offset (+127), then the exponent is negative: this is logical. Example. Let's perform a shift of negative degree -18 . We add the offset +127 to it, we get the value of the degree +108 (do not forget the degree 0 in the calculation). Let's convert the degree into binary form: 1101100 But 8 bits of memory are allocated for the degree, and here we get a 7-bit number. In place of the empty, unoccupied high digit (bit), the computer adds 0. The result is that this degree will be stored in the computer’s memory as 01101100 . Let's see: +108 < +127, which means that the degree is actually negative. Consider the following interesting table: It shows all possible values of the powers of the normalized forms of float numbers in binary and decimal systems. As we can see, in the binary system +127 is exactly half of a whole byte (8 bits). 3) The remaining 23 bits are reserved for the mantissa DO NOT READ YET :))) Real numbers in computer memory. UNDER DEVELOPMENT!!! - eleven

DO NOT READ YET :))) Real numbers in computer memory. UNDER DEVELOPMENT!!! - eleven

. But for a normalized binary mantissa, the most significant bit (aka the integer part of the normalized mantissa) is always equal to 1 (called implicit one ), since the number of the mantissa lies in the range 1<=M<2 (and also recall paragraph 2.1.1 of the lecture). The only exception is the number 0. There is no point in writing a unit into the allotted 23 bits and wasting memory, so the remainder of the mantissa (its fractional part) is written into the allotted 23 bits. It turns out that essentially the significant part of the float number has a length of 24, of which one less bit is stored. An important nuance. Let us remember that when converting decimal fractional numbers into binary numbers, the fractional part in the binary system often turned out to be huge. And we only have 32 bits to store a float number. In this case, the lowest, least significant digits of the binary fraction (remember paragraph 2.1.2 of this lecture) will not be included in the allocated memory and the computer will neglect them . The accuracy of the number will be lost , but, you see, it’s minimal. In other words, the precision of fractional floats is 6-7 decimal places. 4.4 Storing the exponential normalized binary form of the number double in computer memory. Real numbers of type double are stored in computer memory in the same way as float numbers, with the exception of some characteristics. A double number has 64 bits in computer memory. They are distributed as follows (also in order from left to right): 1) Sign bit (see paragraph 4.3). We understand that the number of this bit will be 63 . 2) Degree (order). Double numbers are allocated 11 bits to store it . A degree shift is also carried out , but for double numbers it will be equal to +1023. 3) Mantissa (significant part). Double numbers are allocated 52 bits (digits) to store it. Also, the exact integer part of the mantissa ( implicit unit ) is not stored in memory . It's also worth noting that the precision of fractional doubles is about 16 decimal places . 4.5 Examples of representing a real number of the decimal system in computer memory. And the final point of our lecture will be an example of converting a fractional number of the decimal number system into the form of its storage in computer memory to consolidate understanding of the topic. Example 1. Take a number-4.25 float type. Let's present it in exponential normalized form in the binary number system, remembering everything we covered in this lecture. 1) Convert the integer part of the number into binary form: 4/2 = 2 (remainder of division 0 ) 2/2 = 1 (remainder of division 0 ) The integer part will be equal to 100 in the binary system. 2) Convert the fractional part of the number into binary form. 0.25*2 = 0.5 ( 0 ) 0.5*2 = 1.0 ( 1 ) The fractional part will be equal to 0.01 in the binary system. 3) Thus, -4.25₁₀ = -100.01₂ . 4) Let's convert the number -100.01₂ into exponential normalized form in the binary number system (which means the base of the power will be 2). -100.01₂ = -1.0001 *2² Let's convert the value of the degree from decimal format to binary . 2/2= 1 (remainder 0 ) The degree is 10₂. We get that the number -4.25₁₀ in its binary exponential normalized form will be equal to -1.0001 * 2¹º Let's write down how it will look in the computer memory. The sign bit will be 1 (negative number). The exponent offset is equal to 2+127 = 129₁₀ = 10000001₂ We remove the implicit one from the mantissa , we get 00010000000000000000000 ( we fill the unoccupied low-order bits with zeros ). Bottom line. 1 10000001 00010000000000000000000 - this is how the number -4.25 is stored in the computer’s memory. Example 2. Convert the float number 0.75₁₀ into a binary storage format in computer memory. The result should be 0 01111110 100000000000000000000000 . Thank you for attention.