Re: Performance problem with low correlation data - Mailing list pgsql-general

From Greg Stark
Subject Re: Performance problem with low correlation data
Date
Msg-id 407d949e0907061123g5d2008e6l712d36ee4a28fce8@mail.gmail.com
Whole thread Raw
In response to Performance problem with low correlation data  (Scara Maccai <m_lists@yahoo.it>)
List pgsql-general
On Mon, Jul 6, 2009 at 6:32 PM, Scara Maccai<m_lists@yahoo.it> wrote:


> The "best" way to read the table would still be a nested loop, but a loop on the
> "t" values, not on the ne_id values, since data for the same timestamp is "close".

But that would be a different query -- there's no restrictions on the
t values in this one.

> How can I tell to PG to use an algorithm such as:
>
> fetch the heap
>        for each quarter
>                for each id found where groupname='a group name'
>                        fetch all the indexes

Have you tried something using IN or EXISTS instead of a join? The
algorithm you describe doesn't work for the join because it has to
produce a record which includes the matching group columns. A bitmap
scan would return the various groups (I know in your case there's only
one but the optimizer can't be sure) mixed together.

That might work better in 8.4 than 8.3 as the IN and EXISTS handling
is improved.

Actually I wonder if doing a sequential scan with a hash join against
the group list wouldn't be a better option. That would scan more of
the heap but if they're really that spread out it might not make much
difference, and it would avoid having to do all the index scans. You
could experiment with turning enable_nestloop off and see how fast the
hash join plan is.

--
greg
http://mit.edu/~gsstark/resume.pdf

pgsql-general by date:

Previous
From: Dave Page
Date:
Subject: Re: Windows installer for pg-8.4 confusion
Next
From: Michael Gould
Date:
Subject: Added parameter for CREATE ROLE