Re: Incorrect estimates on columns - Mailing list pgsql-performance

From Tom Lane
Subject Re: Incorrect estimates on columns
Date
Msg-id 13440.1192667019@sss.pgh.pa.us
Whole thread Raw
In response to Re: Incorrect estimates on columns  (Chris Kratz <chris.kratz@vistashare.com>)
Responses Re: Incorrect estimates on columns  (Chris Kratz <chris.kratz@vistashare.com>)
List pgsql-performance
Chris Kratz <chris.kratz@vistashare.com> writes:
> On Wednesday 17 October 2007 14:49, Tom Lane wrote:
>> Evidently it's not realizing that every row of par will have a join
>> partner, but why not?  I suppose a.activityid is unique, and in most
>> cases that I've seen the code seems to get that case right.
>>
>> Would you show us the pg_stats rows for par.activity and a.activityid?

> Here are the pg_stats rows for par.activity and a.activityid.

Hmm, nothing out of the ordinary there.

I poked at this a bit and realized that what seems to be happening is
that the a.programid = 171 condition is reducing the selectivity
estimate --- that is, it knows that that will filter out X percent of
the activity rows, and it assumes that *the size of the join result will
be reduced by that same percentage*, since join partners would then be
missing for some of the par rows.  The fact that the join result doesn't
actually decrease in size at all suggests that there's some hidden
correlation between the programid condition and the condition on
par.provider_lfm.  Is that true?  Maybe you could eliminate one of the
two conditions from the query?

Since PG doesn't have any cross-table (or even cross-column) statistics
it's not currently possible for the optimizer to deal very well with
hidden correlations like this ...

            regards, tom lane

pgsql-performance by date:

Previous
From: Tom Lane
Date:
Subject: Re: Huge amount of memory consumed during transaction
Next
From: Ow Mun Heng
Date:
Subject: Re: Shared Buffer setting in postgresql.conf