Thread: N-grams
Hello,<br /><br /> Today I was reading a blog post from a fellow coworker <a href="http://www.depesz.com/index.php/2010/12/11/waiting-for-9-1-knngist/" target="_blank">http://www.depesz.com/index.php/2010/12/11/waiting-for-9-1-knngist/</a> and started to mess around with thetrigram contrib package for postgres and playing with some different word dictionaries for English and German. I was wantingto see how performant particular queries could be if SIGLENINT in trgm.h was adjusted to be the avg character lengthfor a particular word dictionary<br /><br /><a href="http://packages.ubuntu.com/dapper/wamerican" target="_blank">http://packages.ubuntu.com/dapper/wamerican</a><br/>compling=# SELECT AVG(LENGTH(CAST(word AS bytea), 'UTF8'))FROM english_words;<br /> avg <br /> --------------------<br /> 8.4498980409662267<br /><br />vs <br/><br /><a href="http://packages.ubuntu.com/dapper/wngerman" target="_blank">http://packages.ubuntu.com/dapper/wngerman</a><br/>compling=# SELECT AVG(LENGTH(CAST(word AS bytea), 'UTF8'))FROM words; //german<br /> avg <br />---------------------<br /> 11.9518056504365566<br /><br />(unsurprisinglyGerman words are on average longer than English ones)<br /><br />Effectly wanting to make the trigram packageact more along the lines of n-gram where I am explicitly setting the N when it is built. I, am however, not very proficientin C and doubt that is the only change necessary needed to convert the trigram contrib to an n-gram as after changingSIGLENINT to 12 in trgm.h I still get trigram results for show_trgrm() . I was hoping someone familiar with it couldprovide a little help for me by perhaps giving me a path of action needed to change the trigram implementation to behaveas an n-gram. Thanks for your time and I appreciate any advice anyone can give me.<br /><br clear="all" />Anthony Gentile<br/>