This version has been updated since the first posting. it adds a hash table so that you can recognise known SSNs with a slightly higher degree of confidence (however *without* embedding the SSNs in the program, so that it should not be possible to reverse-engineer the SSN list from the binary); and the job of determining a successful hit from a false positive is separated from the code, and is now done in an external program. Perhaps someone with some statistics experience can write a better function.