Re: Yet another abort-early plan disaster on 9.3 - Mailing list pgsql-performance

From Craig James
Subject Re: Yet another abort-early plan disaster on 9.3
Date
Msg-id CAFwQ8renpGVZPorJn4j7O7L_dGLaLqsmccR+7EhGxF0fvDgyTg@mail.gmail.com
Whole thread Raw
In response to Re: Yet another abort-early plan disaster on 9.3  (Tomas Vondra <tv@fuzzy.cz>)
Responses Re: Yet another abort-early plan disaster on 9.3
List pgsql-performance
On Fri, Oct 10, 2014 at 9:53 AM, Tomas Vondra <tv@fuzzy.cz> wrote:

On 10.10.2014 16:21, Craig James wrote:
> Our index is for chemical structures. Chemicals are indexed on
> chemical fragments
> <http://emolecules.com/info/molecular-informatics>. A search
> typically starts with 50-200 indexed "columns" (chemical fragments).
> The query is always flat, "A and B and ... and Z". The indexed
> fragments are both correlated (the existence of one strongly raises
> the chances of another) and anti-correlated (certain combinations are
> very rare).

Maybe I don't understand the problem well enough, but isn't this a
perfect match for GIN indexes? I mean, you essentially need to do
queries like "WHERE substance @@ ('A & B & !C')" etc. Which is exactly
what GIN does, because it keeps pointers to tuples for each fragment.

On the day our web site opened we were using tsearch. Before the end of the day we realized it was a bad idea, for the very reasons discussed here. The early-abort/late-start problem ("offset N limit M") could take minutes to return the requested page. With the external dynamically-optimized index, we can almost always get answers in less than a couple seconds, often in 0.1 seconds.

Craig

pgsql-performance by date:

Previous
From: Tomas Vondra
Date:
Subject: Re: Yet another abort-early plan disaster on 9.3
Next
From: Tomas Vondra
Date:
Subject: Re: Yet another abort-early plan disaster on 9.3