Re: More index / search speed questions - Mailing list pgsql-general

From Tom Lane
Subject Re: More index / search speed questions
Date
Msg-id 25884.969591711@sss.pgh.pa.us
Whole thread Raw
In response to More index / search speed questions  ("Mitch Vincent" <mitch@venux.net>)
List pgsql-general
"Mitch Vincent" <mitch@venux.net> writes:
> This is a bit long, sorry about that..

One good thing to ask yourself is always "do the planner's row-count
estimates have anything to do with reality?"

In this case the issue seems to be that the planner is using an
indexscan over the whole of resumes_fti --- there is no way to limit the
scan using app_id, so it must be using the index just as a way to order
the data for a mergejoin.

In your quicker example, the innermost nested loop is pulling out
potential app_id values from the applicants table and using each one
to perform a constrained indexscan on resumes_fti.  That's a great
strategy as long as you don't have very many hits in the applicants
table (else the repeated indexscan startup overhead kills you).  I don't
know if the planner's estimate of 111 hits is very accurate, but clearly
it's guessed right that the number of hits is not large, else you'd not
be happy with the performance of that plan ;-)

In the slower case, the planner is estimating quite a few thousand
potential matches, and that leads it to use a mergejoin, which
may be relatively slow here but it won't fall apart completely when
there are many matches.  Since you're complaining, I guess that that
estimate was *not* accurate.  But what are the correct numbers?

Also, you might experiment with "set enable_mergejoin = OFF' to see what
sort of plan you get (probably a hashjoin) and what its performance is
like.

            regards, tom lane

pgsql-general by date:

Previous
From: Tom Lane
Date:
Subject: Re: Re: Large Objects
Next
From: Adam Haberlach
Date:
Subject: Re: Re: Large Objects