2.1 Number Systems
Last updated
Last updated
Digital Forensics Lab Walkthrough â‹… 2 Basic Computer Skills for Digital Forensics â‹… 2.1 Number Systems
The binary, decimal, and hexadecimal number systems are all used by forensic professionals.
As humans, we are used to counting numbers using the decimal system. The decimal system consists of ten digits (0-9). When we count, we start with the ten digits (0, 1, 2, ..., 7, 8, 9). When we run out of digits, we increase the number in the tens place by 1 and reset our ones place back to 0 (10). We continue this pattern for the hundreds place, thousands place, etc. Because of this pattern, we can break down decimal numbers as follows:
Decimal is also known as base 10 because the digit in each place value is multiplied by 10 raised to an exponent.
Binary is similar to decimal. However, instead of being base 10, binary is base 2. As a result, we have only two digits: 0 and 1. When we count, we start with our digits (0, 1). After 1, we run out of digits. Thus, we increase the number in next place by 1 and reset our ones place back to 0 (10). As a result, 3 in decimal is 10 in binary, 4 in decimal is 11 in binary, and so on. This pattern allows us to break binary numbers down as follows:
Notice how similar this break down is similar to the decimal, only instead of multiplying by 10 raised to an exponent, we multiply by 2 raised to an exponent.
Hexadecimal is base 16. Instead of having ten digits or two digits, we have 16 (0-9 and A-F). As with binary and decimal, we move on to the next place when we run out of digits. Here is an example of how we break down a hexadecimal number:
In this case, we multiply by 16 raised to an exponent. Notice how A = 10, B = 11, ..., F = 15 in hexadecimal.
As shown in the examples above, I prefix binary numbers with "0b" and hexadecimal numbers with "0x" in order to make it clear what number system the following digits are written in. Numbers that do not start with any prefix are assumed to be in decimal.
The examples given in the previous section also show how we can convert from binary to decimal and hexadecimal to decimal.
We can convert to from decimal to binary as follows:
We can convert from decimal to hexadecimal as follows:
Since one hexadecimal digit is four binary digits, conversions between hexadecimal and binary are straightforward. Notice how each group of four binary digits corresponds to one hex digit.
We can also do conversions using the command line. I denote the beginnings of commands with $
. These $
are not part of the actual commands.
Let's convert from binary and hexadecimal to decimal. Note that the 2#
indicates that the following numbers are to be represented as base 2. You can use 16#
or 0x
to indicate base 16. The $(( ))
construction indicates that we want to do shell arithmetic.
To convert from decimal to binary and hexadecimal, we first need to download basic calculator (bc).
After basic calculator is installed, we can convert:
We can also use the programming language C in the command line to convert. %x
is for printing as hexadecimal, %d
is for printing as decimal, and %s
is for printing as string.
Python is a high-level, general-purpose programming language. It has widespread use in scientific computing, software engineering, cybersecurity, and more. Python can also be used to convert between the number systems. In order to run Python (version 3) from the command line, first python3
in the command line.
Any commands that follow will be interpreted as Python. The beginnings of Python commands run from the command line are denoted with >>>
. These >>>
are not part of the actual commands.
Let's convert from binary and hexadecimal to decimal. The first argument of int()
is the string. The second argument indicates which base to interpret the string as.
To convert from decimal to binary and hexadecimal, we can use the following commands:
Press CTRL+Z
to exit Python mode.
ASCII is a method of representing common English-language symbols in computing devices using binary. According to ASCII, each symbol is represented by eight bits, or one byte. We can view the ASCII conversion table in the command line:
As with the number systems, we can convert between ASCII and numbers using the command line or Python. On the command line, we can convert between ASCII and hexadecimal with the following commands:
In Python, ASCII conversions between ASCII and decimal can be done as such:
ASCII and hexadecimal conversions can also be done in Python using the binascii
module.
You may have heard of UTF-8 Unicode, a different method of representing symbols in computing devices. While ASCII uses 7 bits to represent symbols, UTF-8 uses a variable length of bits. As a result, UTF-8 is able to represent a much wider range of symbols, especially those that are non-English.
Epoch time is a type of time representation that is especially important for digital forensics. Epoch time represents time as the number of seconds or milliseconds that have passed since a certain time, known as the epoch.
Unix Epoch Time, or POSIX Time, measures time elapsed in seconds from the Unix epoch, 00:00:00 UTC on January 1, 1970.
We can view the current time through the command line.
We can also convert times between a human-readable format and Unix Epoch Time.
We can also see the Unix Epoch Time with Python using the datetime
module. Notice how our default time zone is the timezone of our system.
We can convert between times with Python as well.
The time
module also allows us to get the current time.
Hash functions are publicly-known functions that take in arbitrary input and output a fixed-length string (called the hash). Hash functions are not designed to be reversed. These functions have many applications, from storing passwords securely to ensuring that a message has integrity and has not been changed (if you're interested in cryptographic theory, I recommend learning about HMAC construction). A secure hash function should have three characteristics:
Preimage resistance: given a hash, it is hard to find the original input
Second preimage resistance: given an input, is hard to find another input that has the same hash
Collision resistance: it is hard to find a pair of inputs that have the same hash
The command line can be used to run find the hash of a particular input into a particular hash function. Note that the hash functions SHA-1 and MD5 are now considered completely broken. In practice, we use SHA-256 and SHA-3.
The echo
command usually adds a new line () at the end of the string it is echoing. The -n
flag included in the commands above specifies that we want to compute hash of the string hello
not hello
. As you can see below, leaving out this flag dramatically changes the resulting hashes.
We can also find hashes in Python.
Endianness describes the order in which bytes are stored in memory. Suppose that we have a byte-addressable memory (such that each memory address holds one byte).
Little endian means that the least significant byte is stored in the smallest address.
Big endian means that the most significant byte is stored in the smallest address.
You can check the endianness of your personal computer: