Re: Finding matching words in a word game - Mailing list pgsql-general

From Chris Angelico
Subject Re: Finding matching words in a word game
Date
Msg-id CAPTjJmoQ9sdw81Hfb7eGd+Uvxx66N-62J8BnPdjM8z7upmtSpA@mail.gmail.com
Whole thread Raw
In response to Finding matching words in a word game  (Alexander Farber <alexander.farber@gmail.com>)
Responses Re: Finding matching words in a word game
List pgsql-general
On Tue, Mar 5, 2013 at 8:29 PM, Alexander Farber
<alexander.farber@gmail.com> wrote:
> is there maybe a clever way of finding all possible words
> from a given set of letters by means of PostgreSQL
> (i.e. inside the database vs. scanning all database
> rows by a PHP script, which would take too long) -
> if the dictionary is kept in a simple table like:
>
> create table good_words (
>         word varchar(16) primary key,
>         stamp timestamp default current_timestamp
> );

How many words are you looking at, in your dictionary? I wrote an
anagramming program in C++ that works off a fairly small dictionary of
common words (~60K words) and gives adequate performance without any
indexing - the time is dwarfed by just scrolling the text up the
console. The only thing I'd recommend doing differently is the
language - use one that has a proper hash/tree type, saving you the
trouble of implementing one (I implemented my own non-balancing binary
tree for the task... no wait, on examination, it seems to actually be
a linear search - and yet it has passable performance). PHP can quite
probably do everything you want here; otherwise, I'd recommend
something like Python or Pike.

Simple Python example:

words = {}
for word in dictionary:   # provide a dictionary somehow - maybe from a file/db
    words.setdefault(''.join(sorted(word)),[]).append(word)
# Voila! You now have your mapping. One-off initialization complete.

find_anagrams_of = "stop"
anagrams = words.get(''.join(sorted(find_anagrams_of)),[])
# anagrams is now a list of all known anagrams of the target -
possibly an empty list
print(anagrams)


['opts', 'post', 'pots', 'spot', 'stop', 'tops']

On my laptop, loading ~100K words took about 1 second, and the lookup
took effectively no time. I don't think there's any need for a heavy
database engine here, unless you're working with millions and millions
of words :)

Chris Angelico


pgsql-general by date:

Previous
From: Alvaro Herrera
Date:
Subject: Re: Change owner for all tables in a database in one batch
Next
From: Alexander Farber
Date:
Subject: Re: Finding matching words in a word game