Blog

Dispatches from Class: Performance-Tuning Your Code

Welcome to Dispatches from Class, a blog series where I attempt to connect the theory I'm learning in my computer science classes with web development.

As software engineers, we try to squeeze out as much performance and efficiency from our code as possible. I've been learning how processors actually execute machine code, and have learned some surprising lessons that were not apparent from reading normal source code. Here I'll detail two ways you can speed up code snippets you probably use every day.

In my code examples you'll see calls to console.time(). I use this profiler (built in to all modern browsers) to come up with the result times throughout this post.

Use a local variable when looping over objects

Objects are fun and easy to use in JavaScript, but like any non-primitive data structure, you pay a cost when accessing them. The following (slow) example loops over an object 10000 times, incrementing key3 by 1 each iteration:

Each time the loop is run, you have to reach all the way into memory to get key3, bring it to the CPU to operate on it, then stick it back in memory. Accessing data stored in memory is slow, and accessing data stored in memory 10000 times is just painful.

To speed this up, store the value of key3 in a local variable before your loop. After you're done looping, simply overwrite the key3 value with your temp variable. Doing this allows your CPU to use a super-fast register file while looping, avoiding any fetching or writing to memory. In other words, instead of accessing memory 10000 times, you access it twice (once before the loop and once after):

This code does the exact same thing as the first example, but it completes it in half the time. Tl;dr: avoid using objects or other non-primitive data structures in loops. Do all the work with a local variable and copy that value to your object after the loop.

Result: non-local: 0.280 ms local: 0.106 ms

When iterating through data, keep memory references sequential

Most data structures store their information sequentially, so an array of [1,2,3] will store the value 1 in memory, then store the value 2 right next to it, and so on. Because CPUs cache data that is physically near data you previously accessed, you can get major speed gains by tweaking your code to use memory sequentially. For example, take this code snippet that creates and instantiates a class with two member arrays. We want a sum of the values from the two arrays, so our first (slow) example will use a simple loop:

The problem here is that we're accessing myinstance.arrayone and then immediately jumping to access myinstance.arraytwo (in the loop). Because the two arrays are separate and live in physically distinct spaces in memory, the CPU has to jump around to access those values.

To fix this, let's only access one of the member arrays in a loop. This will allow the processor to take drastically smaller jumps when accessing data, and it will enable it to use better caching:

If you look closely, you'll see a way to integrate the first strategy into this example as well.

Result: slow array: 0.139 ms fast array: 0.071 ms

Looking at the results for both tests, you can see that our optimized versions are much faster. Implementing those simple tweaks can greatly increase the speed of your code. Happy fine-tuning!