Re: merge join killing performance - Mailing list pgsql-performance

From Scott Marlowe
Subject Re: merge join killing performance
Date
Msg-id AANLkTik0NxYr5FAqlYOqseHhT7WaJv6uJuMvD7ScONnV@mail.gmail.com
Whole thread Raw
In response to Re: merge join killing performance  (Scott Marlowe <scott.marlowe@gmail.com>)
Responses Re: merge join killing performance
Re: merge join killing performance
List pgsql-performance
On Wed, May 19, 2010 at 8:06 PM, Scott Marlowe <scott.marlowe@gmail.com> wrote:
> On Wed, May 19, 2010 at 8:04 PM, Scott Marlowe <scott.marlowe@gmail.com> wrote:
>> On Wed, May 19, 2010 at 7:46 PM, Matthew Wakeling <matthew@flymine.org> wrote:
>>> On Wed, 19 May 2010, Scott Marlowe wrote:
>>>>>
>>>>> It's apparently estimating (wrongly) that the merge join won't have to
>>>>> scan very much of "files" before it can stop because it finds an eid
>>>>> value larger than any eid in the other table.  So the issue here is an
>>>>> inexact stats value for the max eid.
>>>
>>> I wandered if it could be something like that, but I rejected that idea, as
>>> it obviously wasn't the real world case, and statistics should at least get
>>> that right, if they are up to date.
>>>
>>>> I changed stats target to 1000 for that field and still get the bad plan.
>>>
>>> What do the stats say the max values are?
>>
>> 5277063,5423043,13843899 (I think).
>>
>> # select count(distinct eid) from files;
>>  count
>> -------
>>   365
>> (1 row)
>>
>> # select count(*) from files;
>>  count
>> ---------
>>  3793748
>
> A followup.  of those rows,
>
> select count(*) from files where eid is null;
>  count
> ---------
>  3793215
>
> are null.

So, Tom, so you think it's possible that the planner isn't noticing
all those nulls and thinks it'll just take a row or two to get to the
value it needs to join on?

pgsql-performance by date:

Previous
From: Scott Marlowe
Date:
Subject: Re: merge join killing performance
Next
From: David Jarvis
Date:
Subject: Optimize date query for large child tables: GiST or GIN?