Binary file primer
What are binary files?
All the files on your computer, programs as well as documents,
are really large "lists" of numbers. Imagine a numbered
list like this one:
0) 24
1) 33
2) 9
3) 0
4) 115
The numbering starts at zero. At each list index there is a
number, for example the number 9 is at index "2". The
index is usually called "offset". Translhextion
displays the current offset in the lower left corner of it's
window. Large files contain many numbers, small files not so many.
The numbers in the list can have a value between 0 and 255, for a
special reason: Each number is a "byte".
What are bits and bytes?
"Bit" is an abbreviation for "BInary digiT".
It can have a value of either 0 or 1. One bit alone is not very
useful, because already the number 2 can not be described with it.
So you take many bits to describe large numbers. This is how it
works:
The normal decimal system you know, with digits from 0 to 9,
describes numbers larger than 9 by arranging several digits in
such a way that each position in the number means something
different.
For example the number "245" means really "2 times
100 + 4 times 10 + 5 times 1". The further left a digit's
position in the number, the higher a value it contributes to the
number. Every time you go one step left in the number, you
multiply the digit there with 1, 10, 100, 1000 and so on and add
that value to what you already have. The values you have to
multiply the digit *with* can be obtained if you start out with 1
on the far right, then multiply with 10 every time you go left.
So you get to multiply with 1, then with 1*10=10, then with 10*10=100,
then 10*100=1000 and so on. This is our "base 10"
number system.
Now the *binary system* is a "base 2"-system. A binary
number might looks like this: 1011
This is not "one thousand and eleven", as it would be
in the decimal system. In the decimal system we multiplied the
digits at the various positions with a 1, 10, 10*10=100, 10*100=1000
and so on, but this time, in the binary system (also known as
"dual system"), we multiply with 1, 2, 2*2=4, 2*4=8, 2*8=16
and so on, starting at the right and doing with a "2"
what we did with a "10" before. So the value of the
binary number above is "1 times 1 +1 times 2 + 0 times 4 + 1
times 8", giving us the number "11" in our decimal
system.
Computers use bits to calculate everything, because they can be
modeled as "current on" or "current off"
easily. With a packet of 4 bits (called a "nibble") you
can describe the numbers from 0 to 15. A packet of 8 bits is
called a "byte", and *these* are what the file contains.
Each byte can describe a number from 0 to 255.
So, a file is a list (or "array") of "bytes",
with each byte at a certain index or "offset". These
bytes are what Translhextion displays in the middle of the screen,
but in yet another different number system: the "hexadecimal"
system, which is a system with the base 16. So there are 16
digits, the normal ones from 0 to 9, and additionally the letters
"a" through "f" with following values:
a = 10
b = 11
c = 12
d = 13
e = 14
f = 15
So the "hex" (short for "hexadecimal") number
"f14a" means: "10 times 1 + 4 times 16 + 1 time 16*16(=256)
+ 15 times 16*256(=4096)", giving a value of 61770 in
decimal. The point of using hex numbers is that values of up to
255 (maximum value of a byte) can be described by employing up to
2 hexadecimal digits, which gives a neat and ordered output on
hex editors such as Translhextion.
On the right side of the Translhextion window there are "normal"
letters, punctuation symbols and so on. This is because each byte
can be seen as an index to a symbol in a "character set".
This means that the computer stores for example the letter "a"
as a number with the value 97 ("61" in hex notation),
the comma "," as a number 44 and so on. So if you write
a text on the computer and save it to disk, then the file will
contain a number of bytes, each byte signifying a letter, other
symbol, or a special code number for "End of line" or
"Tabulator" and such. But there are several character
sets in use (for example Windows uses either the ANSI or the OEM
character set): In one character set the number 97 might mean the
letter "a", but in another it could be used for the
apostrophe.
Primer provided by Raihan Kibria