A few months ago, I came across a job posting with a code challenge to solve and submit with your resume.
I'm not in the job market but I generally like these sort of challenges and since the job is no longer listed, I will post what I came up with.
The Rules:
Given a dictionary, output all word pairs where both words
share all their letters except the last two, which are
distinct and reversed.
Notes:
- use a reasonable dictionary of your choice
- potential word pairs are: "ear, era" ; "revies, revise" ; "burglaries, burglarise"
- "shear, era" is not a valid pair because "she" != "e"
- "bell, bell" is not a valid pair because the last two letters are not distinct
- this will be benchmarked
Here is the final result I came up with, I don't have anything to compare it with, but I believe it's correct. (filter.pl)
# filter.pl
#to run:
#cat /usr/share/dict/words | perl filter.pl
use strict;
use warnings;
my $save_possible = {};
while (<>) {
chomp $_;
# lower case
$_ = lc $_;
# skip words smaller then 3 chars
next unless length $_ >= 3;
my $word_minus_2 = substr($_, 0, -2);
my $last_2_chars_from_word = substr($_, length($_)-2, length($_));
$save_possible->{$_} = $_;
my $tmp = $_;
# the last two chars of word, reversed
my $last_2_chars_from_word_rev = chop($tmp);
$last_2_chars_from_word_rev .= chop($tmp);
if ($save_possible->{$word_minus_2.$last_2_chars_from_word_rev} #exists
and ($_ ne $word_minus_2.$last_2_chars_from_word_rev)){#skip same word
my $w2_rev=$save_possible->{$word_minus_2.$last_2_chars_from_word_rev};
print "$_ -> $w2_rev\n";
$save_possible->{$_} = undef; # skip dups
}
}
real 0m0.992s
user 0m0.897s
sys 0m0.047s
Results verified with:
$ wc -l /usr/share/dict/words
234936
$ time cat /usr/share/dict/words | perl filter.pl | sed -e 's/ ->.*//g' > OUT
$ wc -l OUT
565 OUT
$ grep -ciwf OUT /usr/share/dict/words
565
=~ .2% of the words in this dict fit our criteria
Ok, now that I've posted that I will illustrate how the wrong assumptions can lead you down the wrong path, here was my original approach:
My assumption was that the cases would be consecutive.
i.e. (would exist lexicographically, in succession)
aab
aba
abel
able
ala
aal
...
So, my (as I thought, 'slick') idea was to use the Perl's native sort function, providing my own sorting routine, that should be very fast and I don't have to do much, great!
Here is my first draft in Perl: (yes, it's wrong - and ugly!)
#!/usr/bin/perl -w
#on my mac os x system one run's this as:
#$ cat /usr/share/dict/words | perl filter.pl
use strict;
use warnings;
my %possible = ();
sub _same {
chomp $a;
chomp $b;
return -1 unless length $a > 1 && length $b > 2;
my $last_two_a = substr($a, length($a)-2, length($a));
my $last_two_b_r = substr($b, length($b)-1, length($b));
$last_two_b_r .= substr($b, length($b)-2, 1);
if (((substr($a, 0, length($a)-2)
cmp
substr($b, 0, length($b)-2)) == 0)
&&
(($last_two_a cmp $last_two_b_r) == 0)){
$possible{lc $a} = lc $b;
return 0;
}
# we dont care about the returned sort, it's a byproduct
return 1;
}
my @len_sort = sort _same <>;
foreach my $w (keys %possible){
print "$w => $possible{$w}\n";
}
real 0m0.715s
user 0m0.660s
sys 0m0.053s
Great, results! and great performance, Well, that settles it, it must be working (haha).
One night a few weeks later, this popped into my head again (especially, since I had a feeling my approach was not accurate), so I revisited it.
Well, of course my 'ass'-umption was incredibly WRONG!
This approach does find some cases, I never bothered to really verify it, the quick eyeball said, 'geez, I would think there would be more, but ok,...'. (Ok, this is what I get for working past midnight all week)
After other attempts of trying to utilize Perl's native sort function. (I'll save you the from the example code, since at this point it's messy and complex, which alone is a red flag, since this should be simple!)
Basically by this point, it became clear that the sort method is not providing any benefit, is the wrong approach and is helping me to obfuscating the code in this specific situation.
That is when I came up with the approach I posted at the beginning. (I avoided doing this the first time because I thought, "I just need to compare consecutive words and how would I save the previous word, to compare with the next via an input stream" (i.e. without using global storage, saving all of them; and since I assumed they were consecutive, I would just provide a method to sort, which provides much for free. lazy! in a bad way). Had I realised, of course, that they are not consecutive I would have abandoned the sort method right off the bat.