Old Code / New Code

It can be hard to put yourself in the shoes of your past self. Some stylistic preferences seem to flow from the mind itself, and hints are there from the beginning. On the other hand, learning and experience change you, as you grow in response to them.

It’d be interesting to see exactly what’s changed, wouldn’t it?

Recently I happened into an accidental experiment that lends itself well to such a comparison. I hopped onto a site I hadn’t visited in years, codewars.com. I clicked on the first problem I saw put in front of my face. I composed a solution. And then, on the “answers” page, I noticed: a solution of mine to this problem from years before was saved!

There are some caveats. It’s a little too simple of a problem to read a lot into, and it is in the context of a “code challenge” website. That said, it does give an interesting accidental snapshot comparison.

The code challenge was this:

Code Challenge DescriptionStrip Comments
1
2
3
4
5
6
7
8
9
var commentString =
`apples, pears # and bananas
grapes
bananas !apples`
var result = solution(commentString, ["#", "!"])
// result should ==
`apples, pears
grapes
bananas`

2015

Here’s the solution I posted back in the day:

Fresh Out Of Bootcamp 2015 versionStrip Comments
1
2
3
4
5
6
7
8
9
10
11
function solution(input, markers) {
var arr = input.split("\n"); //split input into lines
for (var i = 0; i < arr.length; i++){ //for each line
for (var j = 0; j < arr[i].length; j++){ //for each letter in that line
for (var k = 0; k < markers.length; k++){ //for each marker
if (arr[i][j] === markers[k]) arr[i] = arr[i].slice(0,j);
}
}
}
return arr.join("\n");
}

I can see that I cared about making my code terse. I wasn’t yet comfortable with ES6, which had just been officially released months earlier. I hadn’t yet rejected single-letter variable names, apparently. At least I used comments–and to be fair, with the comments, it’s not too hard to figure out what’s going on here. And in its favor, it attacks the nature of the problem fairly directly: compare every character to every marker, and then cut if needed.

But it’s terse. It isn’t the kind of code you can glance at and understand. It prioritizes the wrong things. It isn’t humanized. One might say that my emphasis seems to be on writing an algorithm for machines, as opposed to writing code for humans. And while this is, of course, an algorithm challenge, it’s also a solution written in Javascript, not C.

(To be fair: in its own way, it is readable, to those familiar with writing lots of algorithmic code. The concept being applied here is clear. However, it wasn’t until writing this post and pouring over it closely that I found a tiny bug in the code that had even gotten past the tests on codewars.com. This is what I mean by unreadable–mistakes will be hard to spot in code of this style.)

2018

My approach a few years later:

2018 versionStrip Comments
1
2
3
4
5
6
7
8
9
10
11
function solution(input, markers) {
return input
.split("\n")
.map(line => {
markers.forEach(marker => {
if (line.includes(marker)) line = line.slice(0, line.indexOf(marker));
});
return line.trimRight();
})
.join("\n");
}

It feels much more relaxed, it’s clear. Variables have descriptive names, making comments unnecessary. Instead of raw for(;;) loops, all loops are with the more declarative Array built-ins map(), includes(), and forEach(). As a result, intentions are far more clear. Every line is digestible.

It’s significanly fewer characters, lines don’t run wide, and comments are no longer needed. At the same time, while the two solutions have the same algorithmic complexity, in the real world the newer code runs substantially faster.

How much faster?

Performance

Comparing the two on JSperf, in extreme cases I was able to get a 188x (and in theory even more) speed improvement in the new code.



To be completely clear: outside of minimizing algorithmic complexity, pursuing speed in a JS is usually a fool’s errand, given that the programmer cannot know what their code will be turned into by the interpreter, and that it’s almost always premature optimization to think about it. That said, this is a pretty interesting speed difference. What’s going on here?

Playing with JSperf, it seems that with a heavy bias to short lines (even shorter than what you might see in normal code, rarely more than 15 chars per line) the speed difference can be forced down to as low as only 5x faster. As we bias the test towards significantly longer lines (more like what you might see in minified code), we start seeing multiple orders of magnitude in difference–and the longer the lines, the more the improvement. That tells us that the source of the gains is probably in large part due to includes() in the 2018 code, stretching its legs on long lines, which intuition and evidence indicate may be nearly as optimized as a regex–the 2015 code, in the meantime, imperatively dictates the algorithm details in JS itself, and V8 can’t infer potential optimizations as effectively.

To make the test more comprehensive and honest, I modified my test suite to run a multi-dimensional analysis with a broad range of influencing factors–short/medium/long lines, different total character lengths, and varying numbers of comment markers (none, some, several). Testing across all dimensions, the net difference across the board is about 10x–still an order of magnitude.

Other variations

Out of curiousity, I saw some room to optimize the 2015 solution in the spirit of the 2015 code. I removed join(), split() and slice() completely, making it effectively the streaming character reducer it was clearly trying to be. This adds its own complexity, since you end up having to handle newlines by hand, but I was able to eek out a 25-33% speed improvement over the original 2015 code in many test cases, but on the comprehensive suite it’s closer to 10% faster (and in selected cases, it’s actually slower).

It is still far slower than the 2018 code–and relies on the bad practice of modifying the iterator directly, which creates bug-prone code and further kills any hope of v8 optimizing. (Almost too perfectly, at one point running these tests this solution started to crash intermittently. The cause was an update to my tests that resulted in the test input occasionally not terminating with a newline, which would then cause a subtle off-by-one error and throw an error.)

I went a step further and ran the top solutions on codewars.com, as well. Of the top two, one was a straight regex wrapper (which of course runs about 8 orders of magnitude faster than anything else), and the other was a nice solution that was quite similar to mine, but a bit more terse by relying on map() -> reduce() -> split() as opposed to map() -> forEach() -> includes(). The interpreter’s rendering of my 2018 solution does seem to handily outperform that one–after some more tests, it seems that split(), which generates an array for every line as an intermediary step, can’t be optimized to the same level that includes() can here.

I toyed with some other options as well. You can run all of these, and examine the tests I wrote, on JSperf.

Fun experiment.

Bonus: Python

I hadn’t touched Python in a year and a half, so I decided to throw together a solution to that problem one day. Python looks like Python:

2018 python versionStrip Comments
1
2
3
4
5
6
7
8
9
10
11
12
def solution(string,markers):
string_array = string.split("\n")
output = ""

for line in string_array:
for marker in markers:
if marker in line:
line = line[0:line.index(marker)]
line = line.rstrip()
output += line + "\n"

return output[0:len(output)-1]

Slice calls are replaced with the arguably cleaner and more readable string index[start:end] notation, significant whitespace removes the need for brackets. Python seems to just beg for its clean for loops as opposed to a more functional style, which isn’t really the ‘one obvious way’ of Python.