Thread: Finding matching words in a word game
Hello, is there maybe a clever way of finding all possible words from a given set of letters by means of PostgreSQL (i.e. inside the database vs. scanning all database rows by a PHP script, which would take too long) - if the dictionary is kept in a simple table like: create table good_words ( word varchar(16) primary key, stamp timestamp default current_timestamp ); I could add a column above, where same letters as in "word" would be sorted alphabetically... but then I don't know. I've described my question in more detail at http://stackoverflow.com/questions/15220072/postgresql-and-word-games Thank you for any suggestions Alex
Or I could add integer columns 'a', 'b', ... ,'z' to the table On Tue, Mar 5, 2013 at 10:29 AM, Alexander Farber <alexander.farber@gmail.com> wrote: > > create table good_words ( > word varchar(16) primary key, > stamp timestamp default current_timestamp > ); > > http://stackoverflow.com/questions/15220072/postgresql-and-word-games
I've come up with the following INSERT trigger, if you have any improvement suggestions, please let me know (and also I wonder what to do with non-english language here, where I can't name columns "a", "b", etc.) - On Tue, Mar 5, 2013 at 10:59 AM, Alexander Farber <alexander.farber@gmail.com> wrote: >> http://stackoverflow.com/questions/15220072/postgresql-and-word-games create table good_words ( word varchar(16) primary key, a integer not null default 0, b integer not null default 0, c integer not null default 0, d integer not null default 0, e integer not null default 0, /* ...skipped 20 letters... */ z integer not null default 0 ); CREATE or REPLACE FUNCTION count_letters() RETURNS trigger AS $BODY$ BEGIN SELECT into NEW.a LENGTH(NEW.word) - LENGTH(REPLACE(NEW.word, 'a', '')); SELECT into NEW.b LENGTH(NEW.word) - LENGTH(REPLACE(NEW.word, 'b', '')); SELECT into NEW.c LENGTH(NEW.word) - LENGTH(REPLACE(NEW.word, 'c', '')); SELECT into NEW.d LENGTH(NEW.word) - LENGTH(REPLACE(NEW.word, 'd', '')); /* ...skipped 20 letters... */ SELECT into NEW.z LENGTH(NEW.word) - LENGTH(REPLACE(NEW.word, 'z', '')); RETURN NEW; END; $BODY$ LANGUAGE plpgsql; CREATE TRIGGER count_letters BEFORE INSERT OR UPDATE ON good_words FOR EACH ROW EXECUTE PROCEDURE count_letters();
On Tue, Mar 5, 2013 at 1:29 AM, Alexander Farber <alexander.farber@gmail.com> wrote:
Yes, that is how I'd do it. Then you'd just build an index on the sorted column and select on the sorted letters.
You could do the canonicalization (sorting of the letters) in PHP for every query (and insert) and pass in that value, and so have the database not know what the meaning of the new column is.
select word from good_words were sorted_letters=?
Or you could create a function in the database that does the sort, and then your PHP would not have to pass in already sorted letters.
select word from good_words were sorted_letters=sort_function(?),
and also have triggers automatically compute and store the "sorted_letters" from "word" upon insert or update.
Cheers,
Jeff
Hello,
is there maybe a clever way of finding all possible words
from a given set of letters by means of PostgreSQL
(i.e. inside the database vs. scanning all database
rows by a PHP script, which would take too long) -
if the dictionary is kept in a simple table like:
create table good_words (
word varchar(16) primary key,
stamp timestamp default current_timestamp
);
I could add a column above, where same letters as in "word"
would be sorted alphabetically... but then I don't know.
Yes, that is how I'd do it. Then you'd just build an index on the sorted column and select on the sorted letters.
You could do the canonicalization (sorting of the letters) in PHP for every query (and insert) and pass in that value, and so have the database not know what the meaning of the new column is.
select word from good_words were sorted_letters=?
Or you could create a function in the database that does the sort, and then your PHP would not have to pass in already sorted letters.
select word from good_words were sorted_letters=sort_function(?),
and also have triggers automatically compute and store the "sorted_letters" from "word" upon insert or update.
Cheers,
Jeff
Hi,
I think you can make another table:
Word, letter, count (word, letter - pk)
In good_words add column sorted_letters.
Now we can make a view based on that two tables:
Word, letter, count, sorted_letters
Now we need two immutable functions:
1. For given word returns sorted_letters "word"
2. For given word returns set of our_view
Trigger on insert good_words will set sorted_letters and insert rows in word_letter_count table.... Using above functions...
And now we have the letters: "ogdssoedillrthyhtmkjilsdaio"
We can now say
Select distinct our_view.word from second_function(the_letters) f
Join our_view using(letter)
Where f.sorted_letters like our_view.sorted_letters || '%' and our_view.count <= f.count
Now to improve performance i think would be good to put index on (letter, count) and maybe second part in where move to join part... But it would depend on explain analyze...
Kind regards,
Misa
On Tuesday, March 5, 2013, Alexander Farber wrote:
On Tuesday, March 5, 2013, Alexander Farber wrote:
I've come up with the following INSERT trigger,
if you have any improvement suggestions,
please let me know (and also I wonder
what to do with non-english language here,
where I can't name columns "a", "b", etc.) -
On Tue, Mar 5, 2013 at 10:59 AM, Alexander Farber
<alexander.farber@gmail.com> wrote:
>> http://stackoverflow.com/questions/15220072/postgresql-and-word-games
create table good_words (
word varchar(16) primary key,
a integer not null default 0,
b integer not null default 0,
c integer not null default 0,
d integer not null default 0,
e integer not null default 0,
/* ...skipped 20 letters... */
z integer not null default 0
);
CREATE or REPLACE FUNCTION count_letters() RETURNS trigger AS $BODY$
BEGIN
SELECT into NEW.a LENGTH(NEW.word) - LENGTH(REPLACE(NEW.word, 'a', ''));
SELECT into NEW.b LENGTH(NEW.word) - LENGTH(REPLACE(NEW.word, 'b', ''));
SELECT into NEW.c LENGTH(NEW.word) - LENGTH(REPLACE(NEW.word, 'c', ''));
SELECT into NEW.d LENGTH(NEW.word) - LENGTH(REPLACE(NEW.word, 'd', ''));
/* ...skipped 20 letters... */
SELECT into NEW.z LENGTH(NEW.word) - LENGTH(REPLACE(NEW.word, 'z', ''));
RETURN NEW;
END;
$BODY$ LANGUAGE plpgsql;
CREATE TRIGGER count_letters BEFORE INSERT OR UPDATE ON good_words
FOR EACH ROW EXECUTE PROCEDURE count_letters();
--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general
On Tue, Mar 5, 2013 at 8:29 PM, Alexander Farber <alexander.farber@gmail.com> wrote: > is there maybe a clever way of finding all possible words > from a given set of letters by means of PostgreSQL > (i.e. inside the database vs. scanning all database > rows by a PHP script, which would take too long) - > if the dictionary is kept in a simple table like: > > create table good_words ( > word varchar(16) primary key, > stamp timestamp default current_timestamp > ); How many words are you looking at, in your dictionary? I wrote an anagramming program in C++ that works off a fairly small dictionary of common words (~60K words) and gives adequate performance without any indexing - the time is dwarfed by just scrolling the text up the console. The only thing I'd recommend doing differently is the language - use one that has a proper hash/tree type, saving you the trouble of implementing one (I implemented my own non-balancing binary tree for the task... no wait, on examination, it seems to actually be a linear search - and yet it has passable performance). PHP can quite probably do everything you want here; otherwise, I'd recommend something like Python or Pike. Simple Python example: words = {} for word in dictionary: # provide a dictionary somehow - maybe from a file/db words.setdefault(''.join(sorted(word)),[]).append(word) # Voila! You now have your mapping. One-off initialization complete. find_anagrams_of = "stop" anagrams = words.get(''.join(sorted(find_anagrams_of)),[]) # anagrams is now a list of all known anagrams of the target - possibly an empty list print(anagrams) ['opts', 'post', 'pots', 'spot', 'stop', 'tops'] On my laptop, loading ~100K words took about 1 second, and the lookup took effectively no time. I don't think there's any need for a heavy database engine here, unless you're working with millions and millions of words :) Chris Angelico
Thanks, will try that (the "dumb" approach) too :-) Still working on my dictionary (will be auto-generated by a script). On Wed, Mar 6, 2013 at 2:57 PM, Chris Angelico <rosuav@gmail.com> wrote: > words = {} > for word in dictionary: # provide a dictionary somehow - maybe from a file/db > words.setdefault(''.join(sorted(word)),[]).append(word) > # Voila! You now have your mapping. One-off initialization complete. > > find_anagrams_of = "stop" > anagrams = words.get(''.join(sorted(find_anagrams_of)),[]) > # anagrams is now a list of all known anagrams of the target - > possibly an empty list > print(anagrams) > > > ['opts', 'post', 'pots', 'spot', 'stop', 'tops'] > > On my laptop, loading ~100K words took about 1 second, and the lookup > took effectively no time. I don't think there's any need for a heavy > database engine here, unless you're working with millions and millions > of words :) > > Chris Angelico > > > -- > Sent via pgsql-general mailing list (pgsql-general@postgresql.org) > To make changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-general