Thread: Finding matching words in a word game

Finding matching words in a word game

From
Alexander Farber
Date:
Hello,

is there maybe a clever way of finding all possible words
from a given set of letters by means of PostgreSQL
(i.e. inside the database vs. scanning all database
rows by a PHP script, which would take too long) -
if the dictionary is kept in a simple table like:

create table good_words (
        word varchar(16) primary key,
        stamp timestamp default current_timestamp
);

I could add a column above, where same letters as in "word"
would be sorted alphabetically... but then I don't know.

I've described my question in more detail at
http://stackoverflow.com/questions/15220072/postgresql-and-word-games

Thank you for any suggestions
Alex


Re: Finding matching words in a word game

From
Alexander Farber
Date:
Or I could add integer columns 'a', 'b', ... ,'z' to the table

On Tue, Mar 5, 2013 at 10:29 AM, Alexander Farber
<alexander.farber@gmail.com> wrote:
>
> create table good_words (
>         word varchar(16) primary key,
>         stamp timestamp default current_timestamp
> );
>
> http://stackoverflow.com/questions/15220072/postgresql-and-word-games


Re: Finding matching words in a word game

From
Alexander Farber
Date:
I've come up with the following INSERT trigger,
if you have any improvement suggestions,
please let me know (and also I wonder
what to do with non-english language here,
where I can't name columns "a", "b", etc.) -

On Tue, Mar 5, 2013 at 10:59 AM, Alexander Farber
<alexander.farber@gmail.com> wrote:
>> http://stackoverflow.com/questions/15220072/postgresql-and-word-games

create table good_words (
        word varchar(16) primary key,
        a integer not null default 0,
        b integer not null default 0,
        c integer not null default 0,
        d integer not null default 0,
        e integer not null default 0,
        /* ...skipped 20 letters... */
        z integer not null default 0
);

CREATE or REPLACE FUNCTION count_letters() RETURNS trigger AS $BODY$
    BEGIN
        SELECT into NEW.a LENGTH(NEW.word) - LENGTH(REPLACE(NEW.word, 'a', ''));
        SELECT into NEW.b LENGTH(NEW.word) - LENGTH(REPLACE(NEW.word, 'b', ''));
        SELECT into NEW.c LENGTH(NEW.word) - LENGTH(REPLACE(NEW.word, 'c', ''));
        SELECT into NEW.d LENGTH(NEW.word) - LENGTH(REPLACE(NEW.word, 'd', ''));
        /* ...skipped 20 letters... */
        SELECT into NEW.z LENGTH(NEW.word) - LENGTH(REPLACE(NEW.word, 'z', ''));
        RETURN NEW;
    END;
$BODY$ LANGUAGE plpgsql;

CREATE TRIGGER count_letters BEFORE INSERT OR UPDATE ON good_words
    FOR EACH ROW EXECUTE PROCEDURE count_letters();


Re: Finding matching words in a word game

From
Jeff Janes
Date:
On Tue, Mar 5, 2013 at 1:29 AM, Alexander Farber <alexander.farber@gmail.com> wrote:
Hello,

is there maybe a clever way of finding all possible words
from a given set of letters by means of PostgreSQL
(i.e. inside the database vs. scanning all database
rows by a PHP script, which would take too long) -
if the dictionary is kept in a simple table like:

create table good_words (
        word varchar(16) primary key,
        stamp timestamp default current_timestamp
);

I could add a column above, where same letters as in "word"
would be sorted alphabetically... but then I don't know.


Yes, that is how I'd do it.  Then you'd just build an index on the sorted column and select on the sorted letters.

You could do the canonicalization (sorting of the letters) in PHP for every query (and insert) and pass in that value, and so have the database not know what the meaning of the new column is. 

select word from good_words were sorted_letters=?

Or you could create a function in the database that does the sort, and then your PHP would not have to pass in already sorted letters.

select word from good_words were sorted_letters=sort_function(?),

and also have triggers automatically compute and store the "sorted_letters" from "word" upon insert or update.

Cheers,

Jeff

Re: Finding matching words in a word game

From
Misa Simic
Date:
Hi,

I think you can make another table:

Word, letter, count (word, letter - pk)

In good_words add column sorted_letters.

Now we can make a view based on that two tables:

Word, letter, count, sorted_letters


Now we need two immutable functions:

1. For given word returns sorted_letters "word"
2. For given word returns set of our_view

Trigger on insert good_words will set sorted_letters and insert rows in word_letter_count table.... Using above functions...

And now we have the letters: "ogdssoedillrthyhtmkjilsdaio"

We can now say

Select distinct our_view.word from second_function(the_letters) f

Join our_view using(letter)

Where f.sorted_letters like our_view.sorted_letters || '%' and our_view.count <= f.count

Now to improve performance i think would be good to put index on (letter, count) and maybe second part in where move to join part... But it would depend on explain analyze...


Kind regards,
Misa

On Tuesday, March 5, 2013, Alexander Farber wrote:
I've come up with the following INSERT trigger,
if you have any improvement suggestions,
please let me know (and also I wonder
what to do with non-english language here,
where I can't name columns "a", "b", etc.) -

On Tue, Mar 5, 2013 at 10:59 AM, Alexander Farber
<alexander.farber@gmail.com> wrote:
>> http://stackoverflow.com/questions/15220072/postgresql-and-word-games

create table good_words (
        word varchar(16) primary key,
        a integer not null default 0,
        b integer not null default 0,
        c integer not null default 0,
        d integer not null default 0,
        e integer not null default 0,
        /* ...skipped 20 letters... */
        z integer not null default 0
);

CREATE or REPLACE FUNCTION count_letters() RETURNS trigger AS $BODY$
    BEGIN
        SELECT into NEW.a LENGTH(NEW.word) - LENGTH(REPLACE(NEW.word, 'a', ''));
        SELECT into NEW.b LENGTH(NEW.word) - LENGTH(REPLACE(NEW.word, 'b', ''));
        SELECT into NEW.c LENGTH(NEW.word) - LENGTH(REPLACE(NEW.word, 'c', ''));
        SELECT into NEW.d LENGTH(NEW.word) - LENGTH(REPLACE(NEW.word, 'd', ''));
        /* ...skipped 20 letters... */
        SELECT into NEW.z LENGTH(NEW.word) - LENGTH(REPLACE(NEW.word, 'z', ''));
        RETURN NEW;
    END;
$BODY$ LANGUAGE plpgsql;

CREATE TRIGGER count_letters BEFORE INSERT OR UPDATE ON good_words
    FOR EACH ROW EXECUTE PROCEDURE count_letters();


--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: Finding matching words in a word game

From
Chris Angelico
Date:
On Tue, Mar 5, 2013 at 8:29 PM, Alexander Farber
<alexander.farber@gmail.com> wrote:
> is there maybe a clever way of finding all possible words
> from a given set of letters by means of PostgreSQL
> (i.e. inside the database vs. scanning all database
> rows by a PHP script, which would take too long) -
> if the dictionary is kept in a simple table like:
>
> create table good_words (
>         word varchar(16) primary key,
>         stamp timestamp default current_timestamp
> );

How many words are you looking at, in your dictionary? I wrote an
anagramming program in C++ that works off a fairly small dictionary of
common words (~60K words) and gives adequate performance without any
indexing - the time is dwarfed by just scrolling the text up the
console. The only thing I'd recommend doing differently is the
language - use one that has a proper hash/tree type, saving you the
trouble of implementing one (I implemented my own non-balancing binary
tree for the task... no wait, on examination, it seems to actually be
a linear search - and yet it has passable performance). PHP can quite
probably do everything you want here; otherwise, I'd recommend
something like Python or Pike.

Simple Python example:

words = {}
for word in dictionary:   # provide a dictionary somehow - maybe from a file/db
    words.setdefault(''.join(sorted(word)),[]).append(word)
# Voila! You now have your mapping. One-off initialization complete.

find_anagrams_of = "stop"
anagrams = words.get(''.join(sorted(find_anagrams_of)),[])
# anagrams is now a list of all known anagrams of the target -
possibly an empty list
print(anagrams)


['opts', 'post', 'pots', 'spot', 'stop', 'tops']

On my laptop, loading ~100K words took about 1 second, and the lookup
took effectively no time. I don't think there's any need for a heavy
database engine here, unless you're working with millions and millions
of words :)

Chris Angelico


Re: Finding matching words in a word game

From
Alexander Farber
Date:
Thanks, will try that (the "dumb" approach) too :-)

Still working on my dictionary (will be auto-generated by a script).


On Wed, Mar 6, 2013 at 2:57 PM, Chris Angelico <rosuav@gmail.com> wrote:
> words = {}
> for word in dictionary:   # provide a dictionary somehow - maybe from a file/db
>     words.setdefault(''.join(sorted(word)),[]).append(word)
> # Voila! You now have your mapping. One-off initialization complete.
>
> find_anagrams_of = "stop"
> anagrams = words.get(''.join(sorted(find_anagrams_of)),[])
> # anagrams is now a list of all known anagrams of the target -
> possibly an empty list
> print(anagrams)
>
>
> ['opts', 'post', 'pots', 'spot', 'stop', 'tops']
>
> On my laptop, loading ~100K words took about 1 second, and the lookup
> took effectively no time. I don't think there's any need for a heavy
> database engine here, unless you're working with millions and millions
> of words :)
>
> Chris Angelico
>
>
> --
> Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-general