Re: 512,600ms query becomes 7500ms... but why? Postgres 8.3 query planner quirk? - Mailing list pgsql-performance

From Pierre Frédéric Caillaud
Subject Re: 512,600ms query becomes 7500ms... but why? Postgres 8.3 query planner quirk?
Date
Msg-id op.u72og7fccke6l8@soyouz
Whole thread Raw
In response to Re: 512,600ms query becomes 7500ms... but why? Postgres 8.3 query planner quirk?  (Bryce Nesbitt <bryce2@obviously.com>)
List pgsql-performance
Your Query :

SELECT contexts.context_key FROM contexts
JOIN articles ON (articles.context_key=contexts.context_key)
JOIN matview_82034 ON (contexts.context_key=matview_82034.context_key)
WHERE contexts.context_key IN
     (SELECT context_key FROM article_words JOIN words using (word_key)
WHERE word = 'insider'
      INTERSECT
      SELECT context_key FROM article_words JOIN words using (word_key)
WHERE word = 'trading')
AND contexts.context_key IN
          (SELECT a.context_key FROM virtual_ancestors a JOIN bp_categories
ON (a.ancestor_key = bp_categories.context_key)
          WHERE lower(bp_categories.category) = 'law') AND articles.indexed;


I guess this is some form of keyword search, like :
- search for article
- with keywords "insider" and "trading"
- and belongs to a subcategory of "law"

The way you do it is exactly the same as the way phpBB forum implements
it, in the case you use a database that doesn't support full text search.
It is a fallback mechanism only meant for small forums on old versions of
MySQL, because it is extremely slow.

Even your faster timing (7500 ms) is extremely slow.

Option 1 :

a) Instead of building your own keywords table, use Postgres' fulltext
search, which is a lot smarter about combining keywords than using
INTERSECT.
You can either index the entire article, or use a separate keyword field,
or both.

b) If an article belongs to only one category, use an integer field. If,
as is most often the case, an article can belong to several categories,
use gist. When an article belongs to categories 1,2,3, set a column
article_categories to the integer array {1,2,3}::INTEGER[]. Then, use a
gist index on it.

You can then do a SELECT from articles (only one table) using an AND on
the intersection of article_categories with an array of the required
categories, and using Postgres' full text search on keywords.

This will most likely result in a Bitmap Scan, which will do the ANDing
much faster than any other solution.

Alternately, you can also use keywords like category_1234, stuff
everything in your keywords column, and use only Fulltext search.

You should this solution first, it works really well. When the data set
becomes quite a bit larger than your RAM, it can get slow, though.

Option 2 :

Postgres' full text search is perfectly integrated and has benefits :
fast, high write concurrency, etc. However full text search can be made
much faster with some compromises.

For instance, I have tried Xapian : it is a lot faster than Postgres for
full text search (and more powerful too), but the price you pay is
- a bit of work to integrate it
    - I suggest using triggers and a Python indexer script running in the
background to update the index
    - You can't SQL query it, so you need some interfacing
- updates are not concurrent (single-writer).

So, if you don't make lots of updates, Xapian may work for you. Its
performance is unbelievable, even on huge datasets.

pgsql-performance by date:

Previous
From: Scott Marlowe
Date:
Subject: Re: Dell PERC H700/H800
Next
From: Віталій Тимчишин
Date:
Subject: Re: Deferred constraint and delete performance