2.1 Number Systems

Digital Forensics Lab Walkthrough ⋅ 2 Basic Computer Skills for Digital Forensics ⋅ 2.1 Number Systems

Number Systems

Binary, Decimal, and Hexadecimal

The binary, decimal, and hexadecimal number systems are all used by forensic professionals.

As humans, we are used to counting numbers using the decimal system. The decimal system consists of ten digits (0-9). When we count, we start with the ten digits (0, 1, 2, ..., 7, 8, 9). When we run out of digits, we increase the number in the tens place by 1 and reset our ones place back to 0 (10). We continue this pattern for the hundreds place, thousands place, etc. Because of this pattern, we can break down decimal numbers as follows:

324 = 3*10^2 + 2*10^1 + 4 *10^0

Decimal is also known as base 10 because the digit in each place value is multiplied by 10 raised to an exponent.

Binary is similar to decimal. However, instead of being base 10, binary is base 2. As a result, we have only two digits: 0 and 1. When we count, we start with our digits (0, 1). After 1, we run out of digits. Thus, we increase the number in next place by 1 and reset our ones place back to 0 (10). As a result, 3 in decimal is 10 in binary, 4 in decimal is 11 in binary, and so on. This pattern allows us to break binary numbers down as follows:

0b1011 = 1*2^3 + 0*2^2 + 1*2^1 + 1*2^0 = 11

Notice how similar this break down is similar to the decimal, only instead of multiplying by 10 raised to an exponent, we multiply by 2 raised to an exponent.

Hexadecimal is base 16. Instead of having ten digits or two digits, we have 16 (0-9 and A-F). As with binary and decimal, we move on to the next place when we run out of digits. Here is an example of how we break down a hexadecimal number:

In this case, we multiply by 16 raised to an exponent. Notice how A = 10, B = 11, ..., F = 15 in hexadecimal.

As shown in the examples above, I prefix binary numbers with "0b" and hexadecimal numbers with "0x" in order to make it clear what number system the following digits are written in. Numbers that do not start with any prefix are assumed to be in decimal.

Converting Between the Number Systems

The examples given in the previous section also show how we can convert from binary to decimal and hexadecimal to decimal.

We can convert to from decimal to binary as follows:

We can convert from decimal to hexadecimal as follows:

Since one hexadecimal digit is four binary digits, conversions between hexadecimal and binary are straightforward. Notice how each group of four binary digits corresponds to one hex digit.

We can also do conversions using the command line. I denote the beginnings of commands with $. These $ are not part of the actual commands.

Let's convert from binary and hexadecimal to decimal. Note that the 2# indicates that the following numbers are to be represented as base 2. You can use 16# or 0x to indicate base 16. The $(( )) construction indicates that we want to do shell arithmetic.

$ echo "$((2#101010))"
42
$ echo "$((16#AB2))"
2738
$ echo "$((0xAB2))"
2738

To convert from decimal to binary and hexadecimal, we first need to download basic calculator (bc).

$ sudo apt install bc

After basic calculator is installed, we can convert:

$ echo "obase=2;126" | bc
1111110
$ echo "obase=16;2738" | bc
AB2

We can also use the programming language C in the command line to convert. %x is for printing as hexadecimal, %d is for printing as decimal, and %s is for printing as string.

$ printf "%x" 2738
ab2

Python is a high-level, general-purpose programming language. It has widespread use in scientific computing, software engineering, cybersecurity, and more. Python can also be used to convert between the number systems. In order to run Python (version 3) from the command line, first python3 in the command line.

$ python3

Any commands that follow will be interpreted as Python. The beginnings of Python commands run from the command line are denoted with >>>. These >>> are not part of the actual commands.

Let's convert from binary and hexadecimal to decimal. The first argument of int() is the string. The second argument indicates which base to interpret the string as.

>>> int("101010", 2)
42
>>> int("ab2", 16)
2738

To convert from decimal to binary and hexadecimal, we can use the following commands:

>>> bin(126)
'0b1111110'
>>> hex(2375)
'0x947'

Press CTRL+Z to exit Python mode.

American Standard Code for Information Interchange (ASCII)

ASCII is a method of representing common English-language symbols in computing devices using binary. According to ASCII, each symbol is represented by eight bits, or one byte. We can view the ASCII conversion table in the command line:

$ man ascii

As with the number systems, we can convert between ASCII and numbers using the command line or Python. On the command line, we can convert between ASCII and hexadecimal with the following commands:

$ printf "\x48\x65\x6c\x6c\x6f"
Hello
$ echo -n Hello | xxd
00000000: 4865 6c6c 6f                             Hello                       Hello

In Python, ASCII conversions between ASCII and decimal can be done as such:

>>> chr(97)
'a'
>>> ord('a')  
97

ASCII and hexadecimal conversions can also be done in Python using the binascii module.

>>> import binascii
>>> binascii.hexlify(b'hello')
b'68656c6c6f'
>>> binascii.unhexlify(b'68656c6c6f')
b'hello'

You may have heard of UTF-8 Unicode, a different method of representing symbols in computing devices. While ASCII uses 7 bits to represent symbols, UTF-8 uses a variable length of bits. As a result, UTF-8 is able to represent a much wider range of symbols, especially those that are non-English.

Epoch Time and Timestamps

Epoch time is a type of time representation that is especially important for digital forensics. Epoch time represents time as the number of seconds or milliseconds that have passed since a certain time, known as the epoch.

Unix Epoch Time, or POSIX Time, measures time elapsed in seconds from the Unix epoch, 00:00:00 UTC on January 1, 1970.

We can view the current time through the command line.

$ date
Sun Mar 17 11:04:04 EDT 2024
$ date "+%s"
1710687863

We can also convert times between a human-readable format and Unix Epoch Time.

$ date -d @17106878300
Fri Feb  5 01:38:20 EST 2512
$ date -d @17106878300 --utc
Fri Feb  5 06:38:20 UTC 2512

We can also see the Unix Epoch Time with Python using the datetime module. Notice how our default time zone is the timezone of our system.

>>> from datetime import datetime, timezone
>>> datetime.now()
datetime.datetime(2024, 3, 17, 11, 48, 15, 575908)
>>> datetime.now(timezone.utc)
datetime.datetime(2024, 3, 17, 15, 48, 41, 32137, tzinfo=datetime.timezone.utc)
>>> datetime.now().timestamp()
1710690562.438871f

We can convert between times with Python as well.

>>> from datetime import datetime
>>> datetime.fromtimestamp(1675008832)
datetime.datetime(2023, 1, 29, 11, 13, 52)
>>> datetime.utcfromtimestamp(1675008832)
datetime.datetime(2023, 1, 29, 16, 13, 52)

The time module also allows us to get the current time.

>> import time
>> time.ctime()
'Sun Mar 17 11:42:17 2024'
>> time.time()
1710690139.564818

Hash Code

Hash functions are publicly-known functions that take in arbitrary input and output a fixed-length string (called the hash). Hash functions are not designed to be reversed. These functions have many applications, from storing passwords securely to ensuring that a message has integrity and has not been changed (if you're interested in cryptographic theory, I recommend learning about HMAC construction). A secure hash function should have three characteristics:

Preimage resistance: given a hash, it is hard to find the original input
Second preimage resistance: given an input, is hard to find another input that has the same hash
Collision resistance: it is hard to find a pair of inputs that have the same hash

The command line can be used to run find the hash of a particular input into a particular hash function. Note that the hash functions SHA-1 and MD5 are now considered completely broken. In practice, we use SHA-256 and SHA-3.

$ echo -n hello | sha1sum
aaf4c61ddcc5e8a2dabede0f3b482cd9aea9434d  -
$ echo -n hello | md5sum
5d41402abc4b2a76b9719d911017c592  -

The echo command usually adds a new line () at the end of the string it is echoing. The -n flag included in the commands above specifies that we want to compute hash of the string hello not hello. As you can see below, leaving out this flag dramatically changes the resulting hashes.

$ echo hello | sha1sum
f572d396fae9206628714fb2ce00f72e94f2258f  -
$ echo hello | md5sum
b1946ac92492d2347c6235b4d2611184  -

We can also find hashes in Python.

>>> import hashlib
>>> hashlib.sha1("hello".encode("utf-8")).hexdigest()
'aaf4c61ddcc5e8a2dabede0f3b482cd9aea9434d'
>>> hashlib.md5("hello".encode("utf-8")).hexdigest()
'5d41402abc4b2a76b9719d911017c592'

Data Stored in Memory

Endianness describes the order in which bytes are stored in memory. Suppose that we have a byte-addressable memory (such that each memory address holds one byte).

Little endian means that the least significant byte is stored in the smallest address.
Big endian means that the most significant byte is stored in the smallest address.

You can check the endianness of your personal computer:

$ lscpu

Up Next: 2.2 PC Introduction

Home ⋅ Work ⋅ Thoughts

Last updated 7 months ago