N-grams - Mailing list pgsql-hackers

From Anthony Gentile
Subject N-grams
Date
Msg-id AANLkTi=Gs8obcr_suRmEOUYUXpYRVNGzO9s2TWWMqn2m@mail.gmail.com
Whole thread Raw
List pgsql-hackers
Hello,<br /><br />     Today I was reading a blog post from a fellow coworker <a
href="http://www.depesz.com/index.php/2010/12/11/waiting-for-9-1-knngist/"
target="_blank">http://www.depesz.com/index.php/2010/12/11/waiting-for-9-1-knngist/</a> and started to mess around with
thetrigram contrib package for postgres and playing with some different word dictionaries for English and German. I was
wantingto see how performant particular queries could be if SIGLENINT in trgm.h was adjusted to be the avg character
lengthfor a particular word dictionary<br /><br /><a href="http://packages.ubuntu.com/dapper/wamerican"
target="_blank">http://packages.ubuntu.com/dapper/wamerican</a><br/>compling=# SELECT AVG(LENGTH(CAST(word AS bytea),
'UTF8'))FROM english_words;<br />        avg         <br /> --------------------<br /> 8.4498980409662267<br /><br />vs
<br/><br /><a href="http://packages.ubuntu.com/dapper/wngerman"
target="_blank">http://packages.ubuntu.com/dapper/wngerman</a><br/>compling=# SELECT AVG(LENGTH(CAST(word AS bytea),
'UTF8'))FROM words; //german<br />          avg         <br />---------------------<br /> 11.9518056504365566<br /><br
/>(unsurprisinglyGerman words are on average longer than English ones)<br /><br />Effectly wanting to make the trigram
packageact more along the lines of n-gram where I am explicitly setting the N when it is built. I, am however, not very
proficientin C and doubt that is the only change necessary needed to convert the trigram contrib to an n-gram as after
changingSIGLENINT to 12 in trgm.h I still get trigram results for show_trgrm() . I was hoping someone familiar with it
couldprovide a little help for me by perhaps giving me a path of action needed to change the trigram implementation to
behaveas an n-gram. Thanks for your time and I appreciate any advice anyone can give me.<br /><br clear="all" />Anthony
Gentile<br/> 

pgsql-hackers by date:

Previous
From: Fujii Masao
Date:
Subject: Change pg_last_xlog_receive_location not to move backwards
Next
From: Alex Hunsaker
Date:
Subject: Re: arrays as pl/perl input arguments [PATCH]