Delight Customers By Not Giving Them What They Want

It sounds odd, doesn’t it? I’ve spent the last 2 years at Amazon, where we preach customer obsession day in and day out. It’s one of the things that makes me love my job. Yet I’m firmly convinced that in some very few cases, the best thing to do for a customer is to not give them exactly what they are asking for.

Consider a recent discussion that happened at work during a meeting. To set the stage, you have to understand that Amazon has made a massive bet in continuous deployment over the last several years. John Jenkins, a long time Amazonian before joining Pinterest, gave some impressive stats on how frequently Amazon deploys code to production during his talk at Velocity in 2011. In the two years since that talk the company has doubled down on it’s investment to ship code to production continuously, and while I can’t disclose specific details the numbers today would blow the 2011 numbers out of the water. We have an internal saying that “bake time is for cookies”, and the normal mode of operation for most teams is that changes flow to production within minutes or hours of when they are checked in. Of course, deploying continuously comes with a cost. First, you have to have tooling in place that can drive the correct integration and deployment workflows and respond to failures gracefully. Second, applications have to be designed and implemented a certain way: interfaces must be hardened to allow independent deployment of components, adequate automated tests must be written, etc. In practice this state can be difficult to achieve when you’re starting from a messy, legacy code base.

In this particular instance a team was trying to move to continuous deployment, but they wanted to take baby steps to get there. They were asking us to build legacy features into our shiny new continuous deployment tooling that would allow them to use some of the tools while still running a bunch of manual tests, batching up changes, and deploying infrequently. My boss has an amazing knack for analogies, and he described the situation this way:

“We’re trying to sell modern medicine. We’ve built a beautiful new hospital that people are excited about. But some people can’t fully justify the cost of buying in, and now they’re starting to come to us and ask us to put rooms in the new hospital for bloodletting.”

There were several problems with building the features that this team was requesting. First, it would enable the team not to make the massive leap that was necessary to change the way that they develop and deploy software. In the world of software engineering, states that are viewed as stepping stones often become much more permanent. Worse yet, it would allow other teams to leverage the features and not make the leap. And finally, the work required to build the features would prevent our team that built the continuous deployment tooling from building cool new features that people who were really doing continuous deployment needed.

Don’t get me wrong, this kind of circumstance is the exception and not the rule. If you think that your users are asking for the wrong thing you should first doubt and question yourself, then question yourself again. But occasionally the impulse is correct and the way to delight a customer you have to not give them what they want. They won’t appreciate it in the short term, but if you’re confident that you’re making the correct call they may thank you down the road.

The Final Nail In The Windows Coffin

I generally boot over to Windows for one of 2 reasons: to play games, or to use Office. The rest of my time is happily spent in Ubuntu. I’ve been under the impression that people generally use Windows because it’s more “polished”. My mother is never going to be able to hack away at the command line or understand the dark magics of device drivers, so she needs the neat and tidy packaging that Microsoft offers. Tonight I decided to upgrade from Windows 7 to 8, and it was the worst experience possible. My motivation was that my Windows 7 installation had developed a weird tendency to BSOD (for seemingly random reasons after some debugging) with the dreaded “Page Fault in Nonpaged Area” message, so I figured I would try a clean OS install and thought I would upgrade in the process to see what Windows 8 is all about.

I started by downloading the Microsoft Windows Update utility, as recommended. I went through the steps and was told that I had two purchase options: Windows 8, or Windows 8 Pro. The former was $120, so I spent a while poking around looking for a way to select that I wanted the less expensive “Upgrade” version. I couldn’t figure it out, so I eventually caved and bought the full meal deal. I’m a firm believer in clean installation for Operating Systems based on some anecdotal past experiences, so I downloaded the ISO and burned a DVD. A few minutes later I was booted into the installation utility and was ready to install.

That’s when I hit my first speed bump. When I selected the appropriate disk, Windows told me that it couldn’t create a new partition or locate an existing one, and that I should check the setup log files for more info. I had everything important backed up to Dropbox, so I tried deleting the partition, formatting, and every other option available to me. I reboot and went through the process again with the same result. Before hunting down where the “setup log files” were, I hit Google on my cell phone and stumbled on this article and tried the command line partitioning utilities that were suggested. I rebooted again, and still no dice. After a lot of tinkering, I ended up having to unplug my other drives including the one where Linux was installed and reboot the computer, and then things magically worked.

I hadn’t ever messed with Windows 8, so I surprised to be greeted by no start button and no immediately obvious way to launch applications. I was told that I needed to activate Windows, and asked to re-enter my Product Key that I had already entered a million times while trying to get the installation working (fortunately by this point I had it practically memorized). When I tried to activate I got an error message telling me that my product key could only be used to upgrade Windows, despite the fact that I had been using Windows 7 just an hour prior, was under the impression that I bought a full non-upgrade version of Windows 8, and didn’t see any clear warnings to this effect during the purchase process. I went back to Google and poked around for a while and found this suggestion on hacking the registry to make activation work anyway, which seemed to do the trick.

Next I tried to change the wallpaper from the ugly flower, and that didn’t work without any obvious error messages. I was able to click on other images, but all I saw was weird flashing behavior in the edges of the window and the background didn’t change. Again per Google it sounds like I may need to wait until a while after activation to change my wallpaper, which is just bizarre.

I started downloading apps, and when I hit the Skype site they sent me over to the Windows App Store to download it. Inconveniently, there was no clear visual indication how to get back to my desktop from the Metro style UI. I started trying to poke around with Metro and was annoyed at how poorly it’s visual metaphor seemed to map to the mouse and keyboard, so I searched around for a way to permanently disable Metro for desktop users. Unfortunately that seems to require downloading (or in most cases purchasing) a separate application, which seems absurd.

The icing on the cake was that on my next reboot, I again hit the new and slightly less ugly BSOD with the same error that I was getting before. Both the Windows and Linux memory and disk analysis tools seem to suggest that all is fine on the hardware front, and I have yet to have any issues with Ubuntu which is running on the same machine down to the disk. I guess I’m back to trying to troubleshoot that issue later tonight.

After multiple hours of just trying to get things up and running, I’m to picture my mom buying the latest version of Windows because of “ease of use” and having to run disk partitioning utilities from the command line and edit registry keys. Clearly that ain’t happening. I’m also flashing back to how seamless and straightforward installing Ubuntu was last time around. If my experience isn’t atypical, then I think the final nail has been driven into the Windows coffin. That may sound like a sensational claim, but Windows has already lost the battle for mobile to Android (and to a lesser extent these days, iOS) and more and more of computing is moving away from the desktop. At some point individuals and companies that do use desktops for niche activities aren’t going to be willing to pay $120 for an inferior product to something that they can get for free, particularly if they’re already having to retrain habits because existing UI conventions are already broken in any option.

I’m excited that Steam is out for Linux because it feels like that may start a movement for PC games to ship on non-Windows operating systems. Now if only I can get Office working with Wine, I will never have to boot over to Windows again…

String Alignment, Dynamic Programming, & DNA

Most modern runtime environments provide an implementation of some basic functionality for working with strings. We’re used to leveraging simple tasks like splicing together strings or checking whether one string contains another, but what if we wanted to do something more complex and determine whether two strings have a shared etymology or are likely to have a similar meaning? The problem is interesting for a few reasons. First, there are some very interesting applications for predicting how closely related strings are. In terms of spoken and written language, being able to glean information about the evolutionary origin of words can serve as one building block for more complex problems like both natural language processing and translation. There are also very important uses for this kind of algorithm in related fields like DNA sequencing, and even in comparing the source code of programs. Second, the algorithms at the core of the solution to the problem provide a neat example of Dynamic Programming, which is a handy tool for any developer to have in their toolbox.

To get started, let’s consider a simple real world example. Suppose that we need to programmatically decide whether the English word “dictionary” and the Spanish word “diccionario” are related in order to draw some conclusions about the meaning of each word in absence of a… well, dictionary. We begin by noting that in practice, words tend to evolve over time through the insertion, deletion, and mutation of characters. To capitalize on this observation, we need to start by trying to find the most likely alignment between the two words by inserting gaps to make similar characters match up. The first step in finding that alignment is to establish a scoring system between characters in each word. The scoring system should be based on data and should reward similarity while penalizing differences between the strings. For our example, let’s start with a very simple scoring system where letters that match exactly are worth +2, vowels that don’t match are still worth +1 (and we’ll include the character ‘y’ with vowels), and letters that are mismatched (or matched with gaps) are worth -3. This scoring system takes advantage of a somewhat anecdotal observation that vowels tend to turn into other vowels more frequently than other mutations.

Given these words and the scoring system, our brains can actually scan the two words rather intuitively and posit a valid alignment. We can represent the alignment by showing the two words with dashes inserted for gaps, and a third row where letters indicate a direct match and plus signs indicate a non-match that was still assigned a positive score:


Our alignment has 8 letters that match perfectly and a score of 11, and at a glance it seems like a very strong match. It also happens to be a global alignment, which I’ll touch on later. The gap in “dictionary” is also somewhat arbitrary in that the score would be identical if the gap followed the ‘y’ instead of preceding it, which means that in this case there are multiple alignments that are equally valid.

So we’ve produced an alignment, but how could we automate the alignment finding process? To take things a step further, how can we calculate the probability that the alignment is meaningful and not just coincidence? Let’s deal with the former problem first. At first blush, it looks like we could just implement a recursive top down algorithm that finds the optimal alignment by drilling down to individual characters and then building back up by adding together the maximum scoring alignments of each possible combination of substrings and gaps. If we want to align “dictionary”, we could first find the possible alignments and scores of ‘d’ and ‘i’ and then find the alignments and scores for “di” by picking the from individual character alignments and gaps, and so on. The problem with this approach is that we would end up considering each character in the first string with each character in the second string, and then we would also consider each combination of characters in the first string with each combination in the second string. The result is a less than desirable runtime complexity of O(n^2m) for strings with length n and m, respectively. Unfortunately that won’t scale well with bigger strings (like DNA, which we’ll touch on later) when we have to run a large number alignments to try to calculate the probability that the alignment is significant (which we’ll also touch on later).

An alternative is to use Dynamic Programming (DP): breaking the problem up into smaller sub-problems and solving from the bottom up. The beauty of DP is that we can optimize so that we don’t have to redo some of the simple calculations that get repeated with recursion, and we can also take advantage of nuances of the problem in the way that we setup the algorithm. For string alignment the particular nuance that we can capitalize on is the fact that subsequent characters of the string must occur in order, so at each character we can only either insert the next character or insert a gap in one string or the other.

I’ll pause briefly before explaining the DP algorithm to clarify that there are two flavors to string alignment. Global alignment requires that we use each string in it’s entirety. Local alignment requires that we find only the most aligned substring between the two strings. The most widely used global alignment algorithm is called Needleman-Wunsch, while the local equivalent is an algorithm called Smith-Waterman. Both algorithms are examples of DP, and begin by building up a two dimensional matrix of integers with dimensions of the size of each respective string plus one (because we can start each aligned string with either a character or a gap). I’m going to use the local alignment algorithm in my example, but global alignment is very similar to implement in practice.

The first step in finding a local alignment is building up the matrix of integers, where each location in the matrix represents a score as we traverse the alignment. We start by initializing the values along the top and left of the matrix to zero, since local alignment can start at any point in the string. If we were finding a global alignment we would have initialized the edges so that at each subsequent location we imposed the mismatch or gap penalty, because we would need to include the entire string in the result. Next we start with the top left corner that remains and work either right or down (it doesn’t matter as long as we stay consistent), and at each location we enter the maximum of four values: the value to the left plus the score of the top character and a gap, the value to the top plus the score of the left character and a gap, the value up and left plus the score of the characters to the top and left, or zero. If that explanation isn’t clear or the intuition isn’t there, consider what each decision is implying with respect to the alignment that we’re scoring. The first two options imply inserting a gap in one of the strings, the third implies aligning two characters, and the forth implies ending a local alignment. For our contrived example and scoring system, the matrix looks like the following:

  – D I C C I O N A R I O
– 0 0 0 0 0 0 0 0 0 0 0 0
D 0 2 0 0 0 0 0 0 0 0 0 0
I 0 0 4 1 0 2 0 0 0 0 2 0
C 0 0 1 6 3 0 0 0 0 0 0 0
T 0 0 0 3 3 0 0 0 0 0 0 0
I 0 0 2 0 0 5 2 0 0 0 2 0
O 0 0 0 0 0 2 7 4 1 0 0 4
N 0 0 0 0 0 0 4 9 6 3 0 1
A 0 0 0 0 0 0 1 611 8 5 2
R 0 0 0 0 0 0 0 3 81310 7
Y 0 0 0 0 0 0 0 0 51010 7

Apologies if my somewhat crude tables don’t format well on your device, I was lazy and copied/pasted them from program output without applying a lot of formatting. You can see that building up the matrix takes O(nm) time for strings with length n and m, so we’ve reduced our runtime complexity by an order of magnitude.

The next step is retrieving the alignment from the matrix we start with the location with the maximum value, which we can store as we build up the matrix. From that location we trace backwards and figure out which action resulted in that value by considering the locations to the top, left, and top left of the current location and calculating which value was a valid part of the calculation for the current value. In some cases there will be multiple valid alignments, as indicated by either multiple starting locations with identical scores or multiple valid locations that could have contributed to the calculation for the current location. In that case it’s important to consider the context and decide whether it makes sense to return multiple alignments, pick a single alignment arbitrarily, or create a criteria to pick between alignments (for example penalizing longer alignments). Since we’re finding a local alignment, we trace until we reach a location where the score is equal to or less than zero. If we were doing global alignment we would use the same logic, but we would start in the bottom right location and go until we reach the top left location. The same matrix below shows our trace back with the path that we’ve selected in bold:

  – D I C C I O N A R I O
– 0 0 0 0 0 0 0 0 0 0 0 0
D 0 2 0 0 0 0 0 0 0 0 0 0
I 0 0 4 1 0 2 0 0 0 0 2 0
C 0 0 1 6 3 0 0 0 0 0 0 0
T 0 0 0 3 3 0 0 0 0 0 0 0
I 0 0 2 0 0 5 2 0 0 0 2 0
O 0 0 0 0 0 2 7 4 1 0 0 4
N 0 0 0 0 0 0 4 9 6 3 0 1
A 0 0 0 0 0 0 1 611 8 5 2
R 0 0 0 0 0 0 0 3 81310 7
Y 0 0 0 0 0 0 0 0 51010 7

At this point we’ve got an alignment and a score of 13, which is pretty rad. But how can we tell whether the score is statistically significant? Clearly we can’t make hard and fast rules, because the number that the score produces is relative to a lot of factors including the size of the matrix and the alphabet that was used. One option is to create a bunch (in practice a bunch usually means a minimum of 10^4) of random permutations of one of the strings that reuses all the characters and is the same length, align them with the other string, and then see how many alignments score equal to or better than the initial match. The resulting “P-Value” can be calculated by taking the number of better matches plus one divided by the number of attempts plus one (because we include the initial match in our set). Interpretation of the value depends on the context, but in our case we may decide that a probability value of .5 may indicate that the relationship is random while a value of 10^-4 may suggest a more compelling relationship between the strings. For the given example I tried running 10^5 different alignments while randomizing a string and didn’t see a single alignment with a score that was equal or better, which isn’t a surprise because the alignment that we produced is almost a direct match. The obvious conclusion is that the words almost certainly did evolve from the same origin at some point in human history, and thus likely share meaning.

Now that we’re armed with our nifty algorithm, where else can we put it to use? Some of the more interesting applications for string alignment occur when trying to sequence and compare DNA. At the end of the day, DNA is just a long string that consists of a 4 letter alphabet of nucleotides. DNA produces RNA, which in turn also produces proteins that are made up of a chain of amino acids. Each protein is also just a string that consists of a 20 letter alphabet of amino acids. DNA (and as a result, the proteins that it produces) has evolved through a series of insertions, deletions, and mutations just like written or spoken language. DNA evolution is actually much more pure than that of written and spoken language, because it’s much more difficult for an entire section of DNA to instantly mutate than it is for a brand new word to become adopted in a culture. A gene is the DNA equivalent of a word: it’s a section of DNA that is ultimately responsible for the production of a protein that may contribute to a physical trait. By aligning either DNA or protein sequences, we can make very educated guesses about whether genes are evolutionarily linked, and whether the proteins that they produce fold and function similarly. By looking at highly conserved regions of related proteins over time and making statistical observations about the likelihood of certain amino acids and common mutations, researchers have produced scoring systems like BLOSUM that are very effective when comparing proteins.

Another interesting application for our algorithm is in examining iterations to the source code of a program, or comparing multiple implementations of a program. A program is really just a string where the alphabet is generally defined by the context free grammar of a programming language. Suppose that we want to try to figure out whether a student in a computer science class copied his assignment from a peer, or did the work on his own. In this case we can simply align each two programs, look at the probabilities that the programs are related, and apply some manual investigation and common sense investigation to any results that appear closely related.

That’s a lot of information to pack into a single post, but I hope you find it helpful and are able to think of other creative applications for the algorithms and concepts. As always, I’ll look forward to reading your comments!

Debunking the Myths of RPC & REST

The internet is chock-full of articles, blog posts, and discussions about RPC and REST. Most are targeted at answering a question about using RPC or REST for a particular application, which in itself is a false dichotomy. The answers that are provided generally leave something to be desired and give me the impression that there are a slew of developers plugging RESTful architectures because they’re told that REST is cool, but without understanding why. Ironically, Roy Fielding took issue with this type of “design by fad” in the dissertation in which he introduced and defined REST:

“Consider how often we see software projects begin with adoption of the latest fad in architectural design, and only later discover whether or not the system requirements call for such an architecture.”

If you want to deeply understand REST and can read only one document, don’t read this post. Stop here and read Fielding’s dissertation. If you don’t have time to read his dissertation or you’re just looking for a high level overview on RPC and REST, read on. To get started, let’s take a look at RPC, REST, and HTTP each in some detail.

Remote Procedure Call (RPC) is way to describe a mechanism that lets you call a procedure in another process and exchange data by message passing. It typically involves generating some method stubs on the client process that makes making the call appear local, but behind the stub is logic to marshall the request and send it to the server process. The server process then unmarshalls the request and invokes the desired method before repeating the process in reverse to get whatever the method returns back to the client. HTTP is sometimes used as the underlying protocol for message passing, but nothing about RPC is inherently bound to HTTP. Remote Method Invocation (RMI) is closely related to RPC, but it takes remote invocation a step further by making it object oriented and providing the capability to keep references to remote objects and invoke their methods.

Representational State Transfer (REST) is an architectural style that imposes a particular set of constraints on an interface to achieve a set of goals. REST enforces a client/server model, where the client is interested in gaining information and acting on a set of resources that are managed by the server. The server tells the client about resources by providing a representation of one or more resources at a time and providing actions that can either get a new representation of resources or manipulate resources. All communication between the client and the server must be stateless and cachable. Implementations of a REST architecture are said to be RESTful.

Hypertext Transfer Protocol (HTTP) is a RESTful protocol for exposing resources across distributed systems. In HTTP the server tells clients about resources by providing a representation about those resources in the body of an HTTP request. If the body is HTML, legal subsequent actions are given in A tags that let the client either get new representations via additional GET requests, or act on resources via POST/PUT or DELETE requests.

Given those definitions, there are a few important observations to make:

  • It doesn’t make sense to talk about RPC vs REST. In fact you can implement a RESTful service on top of any RPC implementation by creating methods that conform to the constraints of REST. You can even create an HTTP style REST implementation on top of an RPC implementation by creating methods for GET, POST, PUT, DELETE that take in some metadata that mirrors HTTP headers and return a string that mirrors the body of an HTTP request.
  • HTTP doesn’t map 1:1 to REST, it’s an implementation of REST. REST is a set of constraints, but it doesn’t include aspects of the HTTP specific implementation. For example your service could implement a RESTful interface that exposes methods other than the ones that HTTP exposes and still be RESTful.

What people are really asking when they ask whether they should use RPC or REST is: “Should I make my service RESTful by exposing my resources via vanilla HTTP, or should I build on a higher level abstraction like SOAP or XML-RPC to expose resources in a more customized way?”. To answer that question, let’s start by exploring some of the benefits of REST and HTTP. Note that there are separate arguments for why you would want to make a service RESTful and why you would want to use vanilla HTTP, although all the arguments for the former apply to the latter (but not vice versa).

The beauty of REST is that the set of legal actions from any state is always controlled by the server. The contract with the client is very minimal; in the case of HTTP’s implementation of REST the contract is a single top level URI that I can send a GET request to. From that point on the set of legal actions is given to the client by the server at run time. This is in direct contrast to typical RPC implementations where the set of legal actions are much more rigid; as defined by the procedures that are consumed by the client and explicitly called out at build time. To illustrate the point, consider the difference between finding a particular product by navigating to a webpage and following a set of links (for things like product category) that you’re given until you find what you’re looking for, versus calling a procedure to get a product by name. In the first case changes to the API can be made on the server only, but in the second case a coordinated deployment to the client and server is required. The obvious issue with this example is that computers aren’t naturally good at consuming API’s in a dynamic fashion, so it’s difficult (but possible in most cases) to get the benefit of server controlled actions if you’re building a service to be consumed by other services.

One main advantage of using vanilla HTTP to expose resources is that you can take advantage of a whole bunch of technologies that are built for HTTP without any additional effort. To consider one specific example, let’s look at caching. Because the inputs of an HTTP response are well defined (querystring, HTTP headers, POST data) I can stand up a Squid server in front of my service and it will work out of the box. In RPC even if I’m using HTTP for message passing under the hood, there isn’t any guarantee that messages map 1:1 with procedure calls, since a single call may pass multiple messages depending on implementation. Also, the inputs to RPC procedures may or may not be passed in a standard way that makes it possible to map requests to cached responses, and requests may or may not include the appropriate metadata for to know how to handle caching (like HTTP does). It’s nice to not have to reinvent the wheel, and caching is just one example of an area where HTTP gives us the ability to stand on the shoulders of others.

Another advantage to HTTP that I’ll just touch on briefly is that it exposes resources in a very generic way via GET, POST/PUT, and DELETE operations which have stood the test of time. In practice when you roll your own ways to expose resources you’re likely to expose things in a way that is too narrow for future use cases, which results in an explosion in the number of methods, interfaces, or services that end up becoming a part of your SOA. This point really warrants a blog post of it’s own, so I’ll avoid going deeper in this particular post.

There is a lot of additional depth that I could go into, but in the interest of keeping this post reasonably brief I’ll stop here. My hope is that at least some folks find the material here to be helpful when deciding how to best expose resources via a service! If I’ve missed anything or if you feel that any of the details are inaccurate, please feel free to comment and I’ll do my best to update the material accordingly.

A Strategy For The Dreaded Interview Coding Question

If you’re a software developer and you’re thinking about changing jobs, you’re probably at least a bit anxious (if not downright freaked out) about the prospect of facing a whiteboard armed with only a trusty dry erase marker and your wits while an interviewer fires a coding question at you. That’s not shocking because software development interviews are weird: the skills necessary to answer the technical and behavioral/situational questions that are asked don’t necessarily map 1:1 with the skills to be a good developer. We’re used to developing with access to tools like IDE’s and StackOverflow, without unnatural time constraints and the pressure of landing a job in the balance. I’ve interviewed literally hundreds of candidates in my roles as a manager both at Microsoft and Amazon, and I’ve seen hundreds bomb coding questions. That doesn’t shock me for the reasons previously mentioned, but what does shock me is the number of bright folks who fail on the questions simply because they don’t approach them with a solid strategy.

The anti-patterns are crystal clear and they almost always lead to a wipe, for example: diving straight in on code, assuming that language/syntax doesn’t matter, or failing to consider edge cases before implementation. To avoid these pitfalls, I recommend that every interviewing developer should practice the following strategy before going into interviews and put it into practice (without fail, no matter how simple the question seems) during the process.

Restate the problem and ask clarifying questions.

Repeating the problem in your own words and asking some follow up questions only takes a second, and it’s a good way to quickly tease out any bad assumptions that have been made. It also gives the interviewer confidence that you’re used to attacking real world coding tasks the right way: being sure that you’ve correctly interpreted requirements and thinking through questions that impact various potential approaches. Ask how important optimization is instead of just assuming that implementing the naive solution is bad. Ask what you should optimize for, for example quickest execution speed or smallest memory footprint.

Walk through a basic example in detail and consider a few edge cases.

Take the time to think through at least one straightforward case, as well as a few relevant edge cases. Talk through your thought process as you’re going through them, utilizing the whiteboard as much as you can. Consider null or zero length inputs. Consider very large inputs, and be prepared to answer questions about whether your implementation would fit in memory on specific hardware given specific inputs. The process of walking through these cases should get you very close to pseudocode.

Write up pseudocode.

Be sure that you’re not writing in a real programming language. Pick a spot on the board where you don’t have to erase your pseudocode when you start to write real code, and will be able to read it. Lots of interview questions require thinking about recursive versus iterative implementations, so it doesn’t hurt to always consider whether that question is in play if it is relevant to the problem. Don’t abandon the pseudocode to dive into real code until you have completed the problem. Be sure to continue the dialogue with the interviewer while you’re thinking, and show that you can listen and course correct given hints.

Pick a language, and ask how important syntax is.

Always assume that for actual implementation, the interviewer cares about the details. I’m generally not a stickler for small syntactical minutia, but I get annoyed I get when an interviewer just assumes that it’s alright for the final implementation to be in pseudocode or some hodge-podge of languages. If you decide to code in a language other than the one that you indicated that you’re the most comfortable with on your resume, be sure to explain why. Asking how much the interviewer cares about syntax can help you decide whether to take an extra pass at the end of the loop being sure that everything is spot on; if the interviewer doesn’t care they may see it as a waste of precious time.

Code it!

You’ve done all the hard work, getting from pseudocode to your language of choice should be fairly trivial.

It’s important to remember that a typical interview in a loop will run 45-60 minutes, and most interviewers are going to want to touch on more than a single coding question. The expectation is most likely that you can complete the question in 20-30 minutes, so be sure that you’re not spending a ton of time on each step. A lot of interviewers will tell you that they’re not necessarily looking for you to get to a fully working implementation, but don’t believe them. If you don’t get code up on the board or stall out they will definitely ding you. Don’t spend more than a few minutes restating the question and walking through edge cases. The bulk of your time should be spent in an even split between pseudocode and code.

The beauty of following this strategy is that you will come across as organized and informed even if you don’t understand the question. It also provides an opportunity to work with the interviewer through follow up questions while running through examples and pseudocoding. Remember, the interviewer knows the answer to the question and they probably want to get you hints as you move in the right direction, so engaging them and using them as a resource is critical. Hope that these ideas help, now go nail that interview loop!

The Perl Script That May Save Your Life

I had a major “oh !@#$” moment tonight. While playing around with Maven, M2Eclipse and moving some project folders around I hastily hammered out a “sudo rm -R” and realized seconds later that I had blown away some code that I wrote last night that wasn’t in version control. All deleted. Not cool.

Fortunately I stumbled on this simple yet life saving article + perl script that greps the nothing of sda* for a particular string that was in your file and prints the contents around it:

open(DEV, '/dev/sda1') or die "Can't open: $!\n";
while (read(DEV, $buf, 4096)) {
  print tell(DEV), "\n", $buf, "\n"
    if $buf =~ /textToSearchFor/;

A quick run and a few minutes later I had my code back in one beautiful piece again. Mad props to chipmunk!

How To Automagically Classify Webpages By Topic

Ever wondered how you can automate the process of figuring out whether a webpage is about a particular topic? I’ve spent some time recently on a side project that involved solving a flavor of that exact problem. I’ve seen several related questions on Stack Overflow and other sites, so I thought I would throw together a quick post to describe bits of my implementation.

For our example, let’s suppose that we need to implement something that exposes an API like the following:

  • boolean classifyWebpage(Webpage webpage)
  • void trainClassifier(Map < Webpage, boolean > trainingExamples)

We will mandate that consumers call the function to train the classifier once with a training set before we can call the function to evaluate whether a webpage is about our topic. Our train classifier function will take a bunch of webpages and whether or not they are about the given topic, to use as training examples. Our classify webpage method will take a webpage and it returns true if the webpage is about the topic and false if it isn’t. To achieve this, we’ll implement a few helper functions:

  • String cleanHtml(String html)
  • Map < String, int > breakTextIntoTokens(String text)
  • Map < String, float > getTokenValues(Map < String, int > tokenCounts)

Let’s look at how we can implement some of these pieces in detail.

Cleaning up HTML

The first piece of infrastructure that we’ll want to build is something that strips markup from an HTML string and splits it into tokens, because words like “href” and “li” are about formatting and aren’t part of the true document content. A naive but decently effective and low cost way to this is to use regular expressions to strip out everything in the contents between script and style tags, and then everything between < and >. We’ll also want to replace things like non-breaking space characters with literal spaces. Assuming that we’re working with fairly conventional webpage layouts, the blob of text that we’re left with will include body of the webpage plus some noise from things like navigation and ads. That’s good enough for our purposes, so we’ll return that and make a mental note that our classification algorithm needs to be good at ignoring some noise.

Break Text into Tokens

Once we have clean text, we’ll want to break it into tokens by splitting on spaces or punctuation and storing the results in a data structure with the number of occurrences of each token. This gives us a handy representation of the document for doing a particular kind of lexicographical analysis to bubble up the words that matter the most. Again, regular expressions are our friend.

Find the Keywords

Armed with a map of tokens and count of occurrences for each token, we want to build something that can pick the keywords for the document. Words like “the” and “to” don’t provide any clues about what a document is about, so we want to find a way to focus on keywords. The important words in a document are likely to be repeated, and they’re also not likely to be found often in most other documents about different topics. There’s a nifty algorithm called Term Frequency Inverse Document Frequency that is both easy to implement and does a great job find keywords by comparing the frequency of words in a single document with the frequency of words in a corpus of documents.

To make this work we’ll need to start by building a corpus. One option is to bootstrap by crawling a bunch of websites and running the entire set through the our initial function for cleaning and tokenizing. If we’re going to go this route we need to be sure that we’ve got a good mix of pages and not just ones about our topic, otherwise the corpus will be skewed and it will see things that should be keywords as less valuable. A better option in most cases is to use an existing corpus , assuming that one is available for the desired language, and manipulate it into whatever format we want to use for classification.

Classify a Webpage based on Keywords

The next bit is the secret sauce. We know that given any webpage we can extract keywords by doing some prep work and then comparing against a corpus, but given those keywords we need to decide whether a webpage is about a given topic. We need to pick an algorithm that will give us a boolean result that tells us whether a webpage is about our topic. Keep in mind that while we’re setting up our algorithm we have some training examples to work with where we’re given a class, in other words we know whether they are about the topic or not.

The first option that most people think of is to come up with a mathematical formula to tell whether a webpage matches a topic. We could start by boiling the problem down to how well two specific webpages match each other by coming up with a mathematical formula to compare two webpages based on similar keywords. For example we could compute a running similarity total, adding to it the product for the ranking values in each respective page for keywords that match. The result would be a scalar value, but we could convert it to a boolean value by coming up with some arbitrary threshold based on experimentation and saying that pages with similarity over our threshold are indeed about the same topic. In practice, this actually works decently well with some exceptions. With these building blocks we could figure out whether a given webpage is about a topic by finding how similar it is to webpages in our training set that are about that topic versus ones that aren’t, and making a decision based on which group has a higher percentage of matches. While it may be effective, but it has several flaws. First, like Instance Based Learning it requires comparison with the training set during classification which is slow at runtime because we have to consider many permutations. More significantly, we would have applied a human element to the algorithm by defining the threshold for a match, and humans aren’t very good at making these kind of determinations because they can’t process certain kinds of data as quickly as a computer can.

Using machine learning, we can enlist the help of a computer to apply the data to a particular learner that will output a classifier for the domain. Frameworks like Weka offer all kinds of learners that we can try use out of the box with our training data to create classifiers. For example Naive Bayes is an example of one such algorithm that tends to do a great job with text classification. If we use our words with weights as attributes and each website in the training set as an example to train on, a Naive Bayes learner will find probabilistic correlation between the occurrence of words and the topic of a webpage and will output a classifier that is likely to give more accurate results than any algorithm that a human could come up with in a reasonable amount of time.

Wiring it Up

So how do we wire these pieces together, and what does it look like to consume the finished product? Let’s suppose that we want to be able to tell whether a website is about soccer. We start by creating a whitelist of websites that we know produce editorial content about soccer. We’ll also want to create a blacklist of sites that produce content about world news, technology, rock and roll, ponies, and anything that isn’t soccer. We throw together a function that crawls the sites and for each example we infer a class based on the source (we may be wrong in some edge cases, but in general we’re going to assume that the sites in our whitelist/blacklist are producing content that is or isn’t soccer across the board). We run the webpages through our cleaning, tokenizing, and ranking functions and we end up with training examples that look like the following contrived ones:

  • – True. Manchester (.85), Rooney (.75), United (.64), match (.5).
  • – False. Muse (.9), Bellamy (.72), guitar (.72), cool (.48), show (.43).

Getting a Weka to speak our language may require massaging the examples into ARFF or some format that the framework understands, but at this point we can directly apply the training set to the learner. For subsequent webpages we run them through the same functions to get ranked keywords, and then we pass the new example into the classifier and we’re given a boolean result. Magic.

Simple Optimization

Note that we only used the words in the body of a webpage, but in the real world we would have access to more data. We could potentially look at the hints provided in the page title, meta tags, and other tags like heading/bold or bigger font sizes and make some assumptions about the importance of the words (of course we have to do this before stripping tags out). If we could get our hands on link text for sites that link to the article that could also serve as valuable input, although search engines aren’t making it as easy to access link data from their indexes these days. We could use this additional data to either augment our Naive Bayes learner using arbitrary weights, or we can use more complex learners like a Perceptron or a Support Vector Machine to try to let the computer decide how important we should consider these other inputs to be. It’s certainly possible that for some topics other kinds of learners may produce better results. Or we could investigate ways to use learners in combination (via Bagging or Boosting, for example) to get better accuracy than any single learner.


Classifying webpages by topic is a fairly common example of a problem that can be solved by an algorithm created by either a human or a computer. My aim in this post was to provide a quick look at one way to attack the problem and to touch on some very introductory machine learning concepts. If you’re interested in reading more about machine learning specifically there are countless resources online and some great books available on the subject. Hope you found the post helpful. Classify away, and I’ll certainly look forward hearing back from any readers who have tackled similar challenges or want to provide feedback!

Dependency Injection (& Small Furry Animals)

Dependency Injection (DI) is a game changing design pattern that most programmers should have in their toolbox. DI was initially known as Inversion of Control (IoC), but because frameworks inherently invert some sort of control Martin Fowler proposed the naming switch to DI in his canonical post on the subject to better describe which aspect of control is being inverted. There are plenty of tutorials on DI that are extremely helpful when ramping up on the topic, although some of them are either tightly coupled to a particular language or framework while others can be a bit lengthy. For that reason I thought I would take the time to write a quick post that explains the concept of DI at a high level, in a (somewhat) concise fashion. I’ll try to explain things in a way that is language agnostic, but I will use Java for my initial example. After that I’ll touch on ways to use DI in a few common languages, and then I’ll close by listing a few reasons why DI is cool.

To illustrate what DI is all about, let’s suppose that I’m writing a Java application that makes small furry animals sing and dance on the screen (because everyone loves a minstrel marmot). I begin by creating a SmallFurryAnimal Interface like so:

Interface SmallFurryAnimal {
  void Sing();
  void Dance();

I then create a couple of animal Classes called Marmot and Squirrel that implement SmallFurryAnimal with their own animal specific implementation of Sing and Dance. I tie everything together in a main method that looks like this:

Class Application {
  public static void main(String [ ] args) {
    while (true) {

When I run javac to compile my application to bytecode I need to be sure to pass the source files for Marmot and Squirrel to compile Application because they are hardcoded compile time dependencies. When I run the application using java I also need to be sure that the ClassLoader is able to load Squirrel and Marmot. This is all fine and dandy until I realize that I want to give people who use my application the ability to design, implement, and plugin their own arbitrary small furry animals without having to worry about recompiling or even knowing about the rest of my application. Essentially I want to do the following:

Class Application {
  public static void main(String [ ] args) {
    while (true) {
      for (all the aniamls that I inject at runtime) {

With these changes I’m only bound to the interface for SmallFurryAnimal, not any particular implementation. I’m free to code up animals to my heart’s content, and I can use DI to inject them into my application at runtime.

It turns out that this kind of ability to inject dependencies into an application at runtime is very common for all sorts of applications. One very common example is applications that run workflows and allow users to inject custom workflow tasks. Another similar example is a state machine application that allows users to inject custom states behaviors that relate to states. In fact almost any time you implement an interface or derive from a base class may be a candidate for injecting the dependency on the implementation at runtime unless the number of possible or necessary implementations is of a relatively small fixed size that isn’t likely to change.

It’s possible to use DI in Java without taking advantage of any additional frameworks by using Reflection to crack open Class Files and look for Classes that implement a specific interface. The most common way to use DI in Java is Spring, a popular Java Framework targeted at a wide array of Java deployment use cases. One of the things that Spring provides is an IoC container which provides the ability to load Java objects using Reflection (typically by specifying which objects to load in configuration) and handles managing the lifecycle of those objects.

C# is obviously very similar to Java, and it allows you to implement DI using Reflection. .NET has also introduced the Managed Extensibility Framework which allows usage of the DI pattern without configuration by using class attributes to tell the Composition Container what to load at runtime.

If you’re using dynamic programming languages like Perl or Ruby then DI is essentially baked in, you’re probably already using it whether or not you realize it.

There are a bunch of reasons why DI is super important, and why many programmers who first stumble on the pattern feel like it’s a game changer. It improves testability by providing the ability to inject mock objects during test execution. It improves reusability of code by allowing developers to build components once and dynamically injected them into multiple applications, and by allowing application consumers to inject objects that are specific to their requirements. It makes code more readable by making dependencies explicit in configuration or metadata. It makes applications easier to deploy or upgrade by guaranteeing loose coupling with dependencies so that either the application or any injected object can be deployed in isolation.

It’s not the only way to accomplish these objectives, but it’s certainly a great way to do so. Hopefully this quick overview has served to help introduce you to the DI pattern, and provided some useful links in case you want to dig deeper.

The "Base" Suffix

Came across some code today that seemed a bit backwards (and made me chuckle), it was effectively the following:

public class FooBase extends Foo { }

Bass ackward nomenclature aside, the code brings a separate point to mind on use of the Base suffix. Krzysztof Cwalina aptly points out that the Base type can lead to some ugly code in common cases where you want to return the base type. This also implicitly suggests that it shouldn’t be used in public facing interfaces. I would add the fact that words with the Base suffix aren’t usually descriptive of real objects (Dog extends Mammal extends… MammalBase?) and for that reason I often see examples where code in classes suffixed with the word Base would be a more natural fit either in the child class itself, in a separate utility class, or in a better thought out base class that represents a real object.

The Religion of Coding Style

I’ve been doing some investigation of different “Coding Style Enforcement” tools (primarily for C# code) recently on behalf of a Bing v-team on coding conventions and I’ve learned a few things:

1. ReSharper kicks ass.

2. Developers are willing to shed blood over coding style preferences.

3. ReSharper really kicks ass.

First a quick introduction on a few tools in the space:

ReSharper (or R#) is a Visual Studio add-on that does on the fly code analysis, and is also great for refactoring code. It gives you nice little squiggly lines that represent errors, warnings, suggestions, or hints for your code as you type. The thing that makes R# ‘leet is that it can also often fix issues for you.

StyleCop is a tool that was developed by Microsoft and is available under the MPL. It has a good stock set of rules that many Microsoft teams tend to follow (for the most part). Current versions now ship with a ReSharper add-on that really rocks, because R# can now automatically fix a lot of StyleCop issues and give R# style real time warnings as you type. The only downside is that as of right now StyleCop 4.6 hasn’t yet released (should come very soon), so there is no intergration with R# 6.0.

FxCop is a different breed of tool so I won’t go into great detail, but people often wonder what the difference is so I’ll touch on it briefly. FxCop analyzes built assemblies (not code) for design, localization, performance, or security issues. VS has built in FxCop integration that’s controllable via the Code Analysis tab at the project level.

CodeRush is a competitor to R#. Agent Smith is a competitor to StyleCop, only from what I can tell it is only available as a R# plugin. I don’t know much about either except that I’ve met people who use and think they’re cool.

That’s certainly not any kind of exhaustive list, but those are a few technologies that I’ve either played with or seen mentioned in the managed code style enforcement space. Some props go out to Scott Marlowe, I found his blog post to be a pretty good starting point for investigation.

Tools aside, the question that I hear posed on a fairly regular basis is “How much does adherence to coding conventions really matter?” My answer to this question has changed a bit throughout my development career. For the last few years I haven’t been huge on conventions, I’ve generally encouraged developers who work for me that the guiding principle should be to match the style of the project that you’re working in. I still believe this is probably the most important idea when contributing to a project, but I also find myself more compelled to accept coding standards for projects where more than a few developers are partying. There’s something beautiful about looking at code and not being able to tell who wrote it because the style is consistent across the board. It makes it easy to jump in on new projects because they’re always readable to me, not just the dude who wrote the code. It also avoids the kind of “should I use var or not” battles that tend to become religious crusades and can actually even fracture a team in extreme cases.

Speaking of var, I’ve become a huge believer lately in part because I started using QuickGraph which can result in some crazy nested generic types, but also because the default R# rules encourage it. I actually find myself agreeing with Michael Brennan’s arguments in favor of always using var. But then again, I’m not quite ready to shed anyone’s blood over it.