Kata 19 and Optimisation

by Charles Miller on October 30, 2003

Over the weekend, I did PragDave's Kata 19: Word Chains. I like the kata. After spending most of my day dealing with framework this, architecture that and library the other, it's nice to dive down and deal with something small and concrete.

There’s a type of puzzle where the challenge is to build a chain of words, starting with one particular word and ending with another. Successive entries in the chain must all be real words, and each can differ from the previous word by just one letter... The objective of this kata is to write a program that accepts start and end words and, using words from the dictionary, builds a word chain between them. For added programming fun, return the shortest word chain that solves each puzzle.

I highly recommend you pull out your language-of-choice and play around with this problem yourself. The idea of the exercise is to do, after all, rather than to read how someone else did it.

I did the kata in Ruby, because that seems to be my spare-time language of choice these days. I finally ended up with this little library. You can use it like this:

load "kata19-cm.rb"
dict = Dictionary.load(4)
from, to = dict.find_word("ruby"), dict.find_word("gold")
puts dict.find_path(from, to).join(" -> ")

And it will print out this:

ruby -> rubs -> cubs -> cues -> cued -> coed -> cold -> gold

If you're feeling more complicated, you could do this:

load "kata19-cm.rb"
dict = Dictionary.load(4)
dict.calculate_all_paths(dict.find_word("ruby"))
longest_word = nil
touched = 0
dict.each_word do |word|
    longest_word = word if 
       !longest_word or 
       longest_word.fwd_path.length < word.fwd_path.length
   touched += 1 if word.fwd_path.length != 0
end
puts "longest 'shortest path' from ruby is %d words, e.g. %s" %
    [longest_word.fwd_path.length, longest_word.fwd_path.join(" -> ")]
puts "there are %d words reachable from ruby" % (touched - 1)

Which prints out1:

longest 'shortest path' from ruby is 13 words, e.g. 
  ruby -> rubs -> dubs -> dues -> dyes -> ayes -> aces -> 
  ices -> ires -> Ares -> Ames -> Amos -> Abos
there are 1915 words reachable from ruby

The search is pretty simple: the trick is to stop thinking of them as words, and just deal with it as a shortest-path problem for a graph. The word to word search is breadth-first, starting at both ends and working towards the middle until they meet (or run out of places to go). For the calculation of every path from a single word, it's an exhaustive breadth-first search.

The problem is that in this form, it takes bloody ages. The graph-traversal part happens in 0.01s, but it takes several seconds to build the graph from the word-list. There are several ways this could be done faster (see here for some ruminations on that subject in Python by Ian Bicking), but in the end, I decided not to bother with any of them. From reading Ian's notes this turned out to be a good decision: it looks like it took a long time to optimize, and the attempt topped out at 3.5s.

Instead, I just pre-compiled the dictionaries for the word-lengths I needed.

dict = Dictionary.load(4)
File.open("dictionary-4.dump", "w+") do |f|
    Marshal.dump(dict, f)
end

It takes about 0.35s to un-marshall the dictionary from disk, then you're ready to rumble.

dict = nil
File.open("dictionary-4.dump") do |f|
    dict = Marshal.load(f)
end

I think the moral of the story here is that optimization is sometimes the judicious application of brute force. Identify those things you only ever have to do once, do them once, and don't bother optimising them further. The time you spend running the un-optimised code once is probably far less than the time you'd spend trying to get it to run faster.

1 The dictionary is deliberately case-sensitive: 'a' and 'A' are considered to be different letters.

Previously: struts-menu Permissions

Next: Why the "Hacker Logo" is stupid