> I am investigating whether it is useful to directly query a database
> containing a rather large text corpus (order of magnitude 100k - 1m
> newspaper articles, so around 100 million words), or whether I should
> use third party text indexing services. I want to know things such as:
> how often is a certain word (or pattern) mentioned in an article and how
> often it is mentioned with the condition that another word is nearby
> (same article or n words distant).
You really want to use the contrib/tsearch2 module that comes already
with PostgreSQL.
cd contrib/tsearch2
gmake install
psql <mydb> < tsearch2.sql
more README.tsearch2
Chris