Recently, I’ve done a bunch of work to clean up the cryptic crossword clue solver which I first described in an earlier post. I have simplified the code layout and added more flexibility to the solver’s interface, so it should be easier to experiment and play around with. What follows are a few examples of what the new solver can do. You can also view these examples as a rendered IPython notebook on nbviewer.org.
from pycryptics.grammar.clue_parse import generate_clues
from pycryptics.solve_clue import Constraints
from pycryptics.grammar.clue_tree import ClueUnsolvableError
Let's demonstrate what we can do with a simple cryptic clue. Here's our clue, broken up into a list of words:
phrases = 'spin broken shingle'.split()
We also have constraints on the final answer, namely that it will have 7 letters and the answer won't be any of the words which directly appear in the clue.
con = Constraints(phrases=phrases,
lengths=(7,),
pattern='',
known_answer=None)
Now we can generate all possible clue interpretations of those phrases:
clues = generate_clues(con)
clues.sort() # sort the clues list so that we can reliably find
# specific clues for this demo. You won't need to do this.
For example, here's one interpretation:
print clues[0]
And another:
print clues[10]
Here are all the possible ways 'spin broken shingle' could be interpreted:
for c in clues:
print c
Let's try solving one of these clues:
try:
print clues[0].answers
except ClueUnsolvableError as e:
print e
That clue had no answers which matched our constraints. This raises an exception so that when we're solving a clue, any of its subparts being unsolvable will trigger the exception and let us skip out of solving the other subparts.
Let's look at the subparts of a clue. Each clue is a tree, and we can access the children of any node in the tree using the [] notation.
print clues[-1]
print clues[-1][0]
print clues[-1][0][0]
print clues[-1][0][0].answers
The .answers
property returns the clue's possible answers or raises a ClueUnsolvableException
if there are no answers. The first time a clue's .answers
is requested, it automatically calls ClueTree.solve() and caches that answer so that subsequent requests for .answers
will be instananeous.
The format of a clue's answers is a dictionary in which each key is a possible answer and the corresponding value is the answers to the children of that clue which yielded the given answer. For example, the first answer given above is 'emat', with corresponding child answers of '' and 'tame'. The ordering of these sub-answers corresponds to the ordering in the (rev (rev_ spin) (rev_arg (syn broken)))
, so the rev_
reversal marker produced an empty string ('') and the syn
synonym operator acting on 'broken' produced 'tame'. Reversing 'tame' gave the answer 'emit'.
We can also ask any clue how it got a particular answer:
print clues[-1][0][0].derivation('emat')
If it's not clear what this means, we can also ask for a human-readible derivation:
print clues[-1][0][0].long_derivation('emat')
We can ask for the derivation or long derivation of any of this sub-clue's answers:
print clues[-1][0][0].long_derivation('tsrub')
Now, let's find a clue that actually produces a final answer. How about this one:
print clues[9]
print "\nAnswers:", clues[9].answers
print clues[9].long_derivation('english')
In general, it would seem like generating all of these different clue interpretations and then solving them separately would be terribly inefficient. However, the way our parser is constructed ensures that identical sub-clues actually refer to the same python object, so solving one sub-clue solves all of its instances contained in other clues. We can check this easily:
print clues[0]
print clues[1]
print clues[0][1]
print clues[1][1]
clues[0][1] is clues[1][1] # 'is' tests whether two variables refer to the same python object
Now let's try actually searching for the best answer. For this, we'll need the full CrypticClueSolver
from pycryptics.solve_clue import CrypticClueSolver
solver = CrypticClueSolver()
annotated_answers = solver.solve_constraints(con)
for a in annotated_answers:
print a
The annotated_answers
contains a list of possible solutions, including the actual answer, its confidence score (between 0 and 1), and its short derivation, and it's automatically sorted to put the best answers first. We can also check the long derivation of any answer:
print annotated_answers[0].long_derivation()
Now let's try a slightly harder clue:
phrases = 'couch unfinished until now'.split()
con = Constraints(phrases=phrases,
lengths=(4,),
pattern='',
known_answer=None)
First, let's just try solving this the same way as before:
ann = solver.solve_constraints(con)
print ann[0].long_derivation()
This is not a very good answer, since it only matches with 56% confidence. In fact, the correct answer is SOFA, which we get by taking a synonym of "until now" to get "SO FAR" and then removing the last letter to get SOFA. So what went wrong? Well, the solver treated "until" and "now" as separate phrases, so it looked for synonyms of both of them, but not for the phrase "until now". To make sure we get the right combinations of phrases, the solver can actually search over all possible combinations of phrases to find the best answer:
ann = solver.solve_all_phrasings(con)
print ann[0].long_derivation()
The lists which are printed above show all the combinations of phrases that the solver tried. We can see that it got the correct answer by combining "until now" into a single phrase, which we've shown by joining those words with an underscore.