Re: TB-sized databases - Mailing list pgsql-performance

From Tom Lane
Subject Re: TB-sized databases
Date
Msg-id 18104.1196966087@sss.pgh.pa.us
Whole thread Raw
In response to Re: TB-sized databases  (Matthew <matthew@flymine.org>)
List pgsql-performance
Matthew <matthew@flymine.org> writes:
> On Thu, 6 Dec 2007, Tom Lane wrote:
>> Hmm.  IIRC, there are smarts in there about whether a mergejoin can
>> terminate early because of disparate ranges of the two join variables.

> Very cool. Would that be a planner cost estimate fix (so it avoids the
> merge join), or a query execution fix (so it does the merge join on the
> table subset)?

Cost estimate fix.  Basically what I'm thinking is that the startup cost
attributed to a mergejoin ought to account for any rows that have to be
skipped over before we reach the first join pair.  In general this is
hard to estimate, but for mergejoin it can be estimated using the same
type of logic we already use at the other end.

After looking at the code a bit, I'm realizing that there's actually a
bug in there as of 8.3: mergejoinscansel() is expected to be able to
derive numbers for either direction of scan, but if it's asked to
compute numbers for a DESC-order scan, it looks for a pg_stats entry
sorted with '>', which isn't gonna be there.  It needs to know to
look for an '<' histogram and switch the min/max.  So the lack of
symmetry here is causing an actual bug in logic that already exists.
That makes the case for fixing this now a bit stronger ...

            regards, tom lane

pgsql-performance by date:

Previous
From: Matthew
Date:
Subject: Re: TB-sized databases
Next
From: Michael Stone
Date:
Subject: Re: TB-sized databases