Interview Code Challenge II: Zipcode Region Highlighter

kylebaker.io

2018-04-02

GeoJSON, code challenge, fullstack, google maps, map, maps api, node, zipcodes

I was approached by a recruiter on behalf of a small, profitable startup built around a group of web+mobile apps that was starting to outgrow the ability to keep up with its own tech needs. They were looking for a CTO. I wasn’t really looking at the time, but this recruiter appeared to have looked deeply into my profile, which I appreciated, so I decided to hear him out.

I wasn’t immediately sold, but to make a long story short, after some phone calls and emails over the course of a couple weeks, I ended up with plane tickets fly out to North Carolina for a final stage in-person interview to join as a co-founder CTO.

The Challenge

Of course, before I arrived, they wanted one last thing: a code challenge.

Many users had requested a feature that was just beyond their technical reach and bandwidth. Their product relied on mapping, and customers wanted to be able to select and define regional boundaries by zipcode.

So the challenge was, in a couple days, create a tool that:

Was a Google Maps API map,
That, when clicked, would cause the zipcode that the click was within to highlight.
Multiple zipcodes should be able to be toggled on/off in this manner, allowing as many to remain highlighted as desired
These collections of zipcodes should be able to be saved as ‘regions’
This needed to run fast.

The size of the data meant that loading everything into the browser wasn’t ideal, and even subsets of the data could overwhelm the browser, so creating responsive clicks and preventing the browser from chugging on the mapping data was crucial. If it was possible, it was likely that what I put together would end up being used as a feature in the product, as well.

Honestly, I was shocked that this wasn’t possible using built-ins from the Google API itself, and spent some time in disbelief looking for a built-in way to do this. If there wasn’t, this seemed like a pretty tall order. They offered another code challenge that sounded much more bland, but at this point I was committed. I didn’t know if this was really manageable in a 48 hour window, but because it was for a real feature we’d likely add, and because a crash course in the Maps API would prove valuable in this job, I decided to give it a shot. At some level, I had just become curious. Even if it was too much for 48 hours, I was confident that my work would show what I was capable of.

Lay of the land

I dove into the Google Maps API. I got a map up on an HTML page and started reading through the documentation, getting a feel for the interface. It’s honestly an incredible product, and there’s a lot to it.

The first step was figuring out reverse geocoding. That looked promising. A short while later I had console statements informing me of the zipcode of any point I clicked on the map.

I knew from experience with Google Maps that looking up cities or states or countries would lead to them being highlighted, just like what we wanted for zipcodes. Surely this functionality existed, right?

It doesn’t. I found many people requesting this feature for years and years from Google, but no one with a solution posted. At least, no solution from Google. There were some solutions from third parties…

They were just expensive.

I found one website that did have a map of all the US zipcodes visible… and a post on their website offering a database of the polygons for $4000-$5000. At that point, if someone wanted that much money for the data, I realized this problem must not be trivial to solve. I guess this may be an interesting code challenge after all–it seemed I’d be building out the mvp of a product worth a few grand.

In a real product scenario, going deeper into the mapping community would yield a lot of fruit here, but time was at a premium. I needed to learn how to draw shapes in Google Maps, and I needed a data source for those shapes.

Piecing the puzzle together

My research turned up a common data set for geographical zipcode boundaries, from the US Census Bureau¹, who put out a shapefile (a well known standard in the GIS mapping community) of US zipcodes that is widely used by the public and available for download. Thankfully I had Google Fiber at the time–nearly 1 gigabyte later, I had a shapefile.

Now what? How do we plug this in to Google Maps?

The next hurdle was figuring out how to draw polygons on the map, so I could figure out exactly what kind of data I’d actually need to make these polygons. With the excellent docs, it wasn’t long before that started to make sense, too. I lingered here just long enough to get the idea and start looking for the next piece.

That piece was, how do we turn a shapefile into collections of collections of coordinates, which we can then turn into polygons?

There are many options. There are some nice domain-specific applications that rely on specialized database systems like PostGIS, a “spatial database extender for PostgreSQL”, or CartoDB, and so on, that would be worth looking into in the long range, and I considered them for this project. I decided that given the tight time limit, I should avoid the potential complexity of another piece of the toolchain if possible. Having decided that, research showed that it was possible to use an open source command-line toolset to turn shapefiles into GeoJSON directly, and that GeoJSON was an input format supported by Google Maps natively. With JSON, we could avoid the database question entirely for this challenge.

Filetype Fun: Shaping up

Things were coming together nicely.

One of the tools in that toolset was the somewhat obtuse ogr2ogr. Using that, generating a GeoJSON file wasn’t difficult, though it took a few minutes for my beefy computer to crunch the data. I opened the large resulting JSON file. I copied a polygon, opened my index.html draft with the Google Maps API loaded, pasted it into the console… and after a bit of tinkering, got a polygon of a zip code to show up on the screen. Nice!

But the file was 1.8GB–that wasn’t going to cut it for this project. The resulting large, highly detailed polygons, in the context of an already heavy mapping web app, in aggregate, were exactly what this project had to show could be avoided. A google map starts to really churn if you overload it, and I knew from discussions with them that this was already going to be a map frequently filled with thousands of visual points.

Unfortunately, the man pages for ogr2ogr aren’t very helpful here. They list a large number of obscure options with extremely brief descriptions. It’s clearly a tool written for domain specialists. A few experiments, obscure blog posts and stack overflow answers later, though, and I had crafted a set of options and parameters that smoothed out line segments enough–but not too much–and generated a respectable 120mb GeoJSON version of the zipcode dataset.

I copied another polygon out by hand, loaded it in. A simpler shape, but definitely serviceable. Deciding precision level was now just an implementation detail, and this was definitely small enough for my purposes and served as a viable proof of concept.

Now I needed a server. Node would be fast to develop in, eliminate the need to context switch in this project, and play well with JSON, so I threw together a Node.js process that would serve my index.html, and built out endpoints for serving zipcode polygons, getting userdata, and threw together a skeleton model for a mock NoSQL database in memory as a JS object. Similarly, I wanted to load the JSON straight into memory as a dictionary object in JS. The GeoJSON we had would need to be tweaked a little to use it that way, though. I stared at the data a bit, and wrote up a shell script that streamed input from the source GeoJSON and then streamed output to build up a new JSON file with an ideal format for the import I wanted.

Next a bit more work in the client.

A click handler on that map would fire a reverse geocode request,
get back the zipcode the click was within from the Maps API,
use that zipcode to form a request to my internal Node.js endpoint,
get back the GeoJSON polygon data, and load it into the map.
as a little stretch goal, I threw in a textbox alternative to the reverse-geocoding-clicking method.

Playing with the code now, I am reminded of one more roadblock that came up in the client near the end. Google’s reverse-geocoding of a click location doesn’t always perfectly match up with the polygon we’re on (because of a few factors, including problems with zipcode definitions¹ and our polygon smoothing steps–implementation problems requiring a more nuanced response than our deadline allows us to deal with), so code had to be written to handle this collision scenario.

Map() to Map

We should basically be done! On the cusp of completion here, however, I was getting errors for lots of zipcodes from the Maps API. What’s happening?

The errors were obscure. GeoJSON is only semi-human friendly, and is pretty heavily nested. After staring and tweaking for a while, I realized that the output from ogr2ogr didn’t correspond to the Google Maps API’s GeoJSON expectations. The output of ogr2ogr seemed to include, sometimes, nesting polygons. It’s been quite some time since this happened, and at this point I was pulling an all nighter to finish this on time, so I’m reconstructing this from memory and code as best I can–it appears I dug deep enough into the Maps API’s GeoJSON implementation and the data structure and the errors I was seeing to come up with a way to map the ogr2ogr style GeoJSON into the Maps API style GeoJSON. I expanded the shell script so that it would also detect the nesting problem if it existed, and map it into a new format–essentially taking what was previously nested and instead storing it as a flat array under a property name that the Maps API would properly identify.

Luckily, it was only that relatively small tweak to the data structure that was needed. It worked. Not bad for the 11th hour.

I stayed up a few more hours through the morning, adding a basic skeleton of a user login system, the ability to save regions, and I cleaned up a bit. Other than the map, which felt slick, the interface was crude. But the map worked well. Clicks were extremely responsive, and it was a visually very satisfying product.

I was completely exhausted, so rather than my normal process of deploying a code challenge, I just filmed a demo and emailed the code. Satisfied, I went to bed at dawn.

Reflecting

This challenge was unusual, but from their perspective, the results were very practical. This challenge entailed coming up with scalable, usable solutions to real problems, inhaling new technologies and executing them on the fly. Solving anything thrown at you, fast, was a big part of the job description for taking ownership of the tech side of an early stage startup.

They were impressed. I flew out there a couple days later to show them the project in person, and I got the job. :)

Try it

I recently decided to deploy it–you can view its current iteration here.

Or, you can just check it out below:

Notes:

¹Actually, it’s important to note that most people think in terms of USPS zip codes, and that those from the Census Bureau don’t perfectly match up (while the USPS are regularly changing theirs)^*. It’s pretty difficult to get the geographic data for zip codes since they aren’t defined geographically, either. That said, at least one company has spent time carefully generating them and has gotten the nod from the USPS–in a real world version of this product, I’d probably recommend using that dataset. This lack of quality mapping data for zip codes is a problem known within the USPS, and known pain point in the GIS community in general.