JavaRush /Java Blog /Random EN /Bytes. What are we reading from the file?
Roman
Level 33

Bytes. What are we reading from the file?

Published in the Random EN group
In general, this is information for beginners. When the topic of reading information from a file came up, the question arose: if the file contains letters, then why do we read numbers from it in the form of bytes and what is a byte in this case? What a byte is has already been written quite well here. But, after reading, the question of the mechanism for transforming letters into numbers still remained, so I had to dig a little deeper on the Internet. Therefore, what is written below can be considered an addition. The computer stores each file as information consisting of zeros and ones in binary form. Each file is actually a collection of bytes following each other. Typically, there are two types of information files: a text file and a binary file. The text file contains a typical human set of readable characters, which we can open in any text editor. Binary files consist of characters that we are not used to operating in everyday life; therefore, a special program is required that can read them. Text files consist of letters, numbers and other common characters. Such files have extensions .txt, .py, .csv, etc. When we open such a file, we see the usual set of characters that form words. Although in reality this content is not stored in this form inside the computer. It is stored in the form of bits, that is, 0 or 1. In various encoding tables ASCII, UNICODE or some other value of each character is defined in binary form. Accordingly, if a byte can hold 256 characters, then each character has its own binary encoding of zeros and ones (eight consecutively written zeros or ones give one character). Thus, when the file is opened, the text editor translates each ASCII value into a familiar character and displays it in its usual form. For example, at number 65 in the binary form of the ASCII code is 1000001, which will be displayed in the file with the Latin (not Cyrillic alphabet. Cyrillic alphabet starts from position 192) letter “A”. That is, in the ASCII system, a byte with the value 1000001 corresponds to the value of the Latin letter “A”. Each line of the file has its own line break – EOL (End of Line). Often this character (two characters) is "\n" (binary value in ASCII: 00001010). Having read such a character, the program interprets it as the end of the line and a transition to the line below. There are other similar "functional symbols". Binary files, like text files, are stored in binary form, but they are not “attached” to a program that decodes them, that is, there is no ASCII-type decryption table. Basically, the content of such files are pictures, audio and video, which in turn are compressed versions of other files, such as self-executing files (.exe). Such files (binary) are not readable by humans in the usual sense, so an attempt to open them with conventional text editors will display a bunch of incomprehensible garbage. Accordingly, special programs are produced to correctly read such files. Binary files are also stored as a series of bytes, but in this case, changing even one bit can make the entire file unreadable. The ASCII character table can be viewed here. Thus, when we read a file, 8 characters (one or zero) are read into the byte variable, which can then be converted by some program like Notepad into readable characters. The source that helped me figure it out.
Comments
TO VIEW ALL COMMENTS OR TO MAKE A COMMENT,
GO TO FULL VERSION