Blog

Dispatches from Class: How Your Website Boils Down to 1s and 0s. Part 2.

In the previous post I explained how computers store data internally. Everything is stored as a base 2, binary number, then that number can be taken at face value (if treated as an integer), mapped to a character (if it represents text), or interpreted differently based on the use case. We’re now in a position to understand how computers execute binary instructions.

Computer are actually quite dumb machines, and as such they can only execute very simple instructions. They can add. They can multiply. They can move data from one location to another. Add in a few other math instructions and you’ve essentially named everything your CPU can do at the base level. The tasks that a given computer can perform are defined in its instruction set architecture (ISA). I’m typing this on a Macbook Pro with an Intel i7 processor, so my computer can execute all the instructions defined for an Intel x86-64 processor.

Processors understand a language called assembly. Assembly is a low-level language that implements basic functions like adding, moving data, and loading memory addresses. Let’s look at an example.

Here’s a very simple C++ program that adds the integers 1 and 2 together and outputs the result.

When you run this program, it’s tempting to imagine the computer executing each line as we ourselves read it. What actually happens is that the code above is compiled into assembly, which the processor can then execute. Think of assembly as the way you’d explain code to a 6 year old. Instead of printing the result of one + two, for example, you would 1) store the variable one 2) store the variable two 3) add them together 4) store the result somewhere 4) call the print routine and pass in the result you just computed. It’s a much more verbose, but explicit, way of doing things. Here is the assembly generated from the above C++ code:

Understanding each line of that assembly code is beyond the scope of this post, but let’s look at 4 lines as a sample of what assembly does:

Each line has three ingredients: the instruction, the source, and target. In line 1, the instruction is movl, meaning “move.” The source is $0x1, meaning the hexadecimal number for 1, and the target is -0x8(%rbp). To decipher this target, add the first number (-8 in this hex format) to the address of %rbp (%rbp is a register, which you can think of as a variable). Altogether, line 1 says, “move the integer 1 into the address on our stack that is 8 bytes below %rbp.” When I said assembly is very explicit, I wasn’t kidding.

Line 2 does more of the same; it moves the integer 2 into a memory address on our stack. Line 3 uses -0x8(%rbp) as our source, which you’ll recognize as the target from line 1. After line 3 is executed, %eax holds the number 1 and -0xc(%rbp) holds the number 2.

The actual addition happens on line 4. The instruction is an add, the source is -0xc(%rbp), and the target is %eax. In English this means, “add the values in -0xc(%rbp) and %eax together, and store the result in %eax.” This is the CPU adding 1 and 2 together and storing the result in a variable.

Now we are almost down to the layer of 1s and 0s. The final step is to map our assembly instructions into their corresponding bytecode numbers. Each instruction and register has a unique identifying bytecode number. I used a program called gobjdump to translate my human-readable assembly into bytecode. Here’s the result:

The hex numbers to the left of each line show how they translate to bytecode. Altogether, they become:

c7 45 f8 01 00 00 00 c7 45 f4 02 00 00 00 8b 45 f8 03 45 f4

Huzzah! We’ve gotten to the point where a complete program is boiled down to hexadecimal numbers. The last step is convert these hexademical numbers to binary (this hex chart lays out the mapping):

110001110100010111110000000100000000000000000000000011000111010001011111010000000010000000000000000000000000100010110100010111111000000000110100010111110100

And there you have it. (Note that the binary string above represents just the 4 lines of assembly I highlighted above. The full bytecode program would be much longer).

The JavaScript and Ruby you write undergo this same process when executed, except that they’re not compiled up front. They undergo JIT (just-in-time) compiling, meaning a block or line of code waits to be compiled until the last minute. You’ve seen how much assembly is generated from a simple program that adds integers. Since JavaScript makes it trivial to use complex data structures like objects, each line can translate to tens or even hundreds of assembly instructions.

I’ve purposefully stripped out a ton of detail from this top-level post. If anything is confusing or you think deserves expanding, let me know in the comments.