Thread: Re: slow join on postgresql6.5

Re: slow join on postgresql6.5

From

Don Baccus

Date:

30 March 2000, 19:28:01

At 11:10 AM 3/30/00 -0500, Tom Lane wrote:
>Don Baccus <dhogaza@pacifier.com> writes:
>> This is an example where commercial systems that have indices
>> synchronized with data such that queries referencing only the
>> fields in indices can win big vs. PG in SOME (not all) cases.
>> In particular, when the indices are to a table that has a bunch
>> of other, perhaps long, columns.  PG has to read the table and
>> drag all that dead weight around to do RI referential checking
>> and semantic actions.
>
>Keep in mind, though, that once we have TOAST the long columns are
>likely to get pushed out to a secondary table, so that the amount
>of data you have to read is reduced (as long as you don't touch
>any of the long columns, of course).

Sure...and you can BLOB or CLOB longer data in Oracle, too.  TOASTing
isn't without costs, either...life's a tradeoff!

>
>The main reason that Postgres indexes can't be used without also
>consulting the main table is that we do not store transaction status
>information in index entries, only in real tuples.  After finding
>an index entry we must still consult the referenced tuple to see
>if it's been deleted, or even committed yet.  I believe this is a
>pretty good tradeoff.

I must wonder, though, given that proper syncing seems to be the
norm in commercial systems.  Or so I'm lead to believe when Gray's
book, for instance.  Or a good book on speeding up Oracle queries.

Whatever ... in this particular case  - referential integrity 
with MATCH <unspecified> and MATCH PARTIAL and multi-column
foreign keys - performance will likely drop spectacularly once the
leading column is NULL, while (say) with Oracle you'd expect much
less of a performance hit. 

The point of my note is that this is probably worth documenting.

Don't get me wrong, these semantics and RI and multi-column keys
appear to be pretty inefficient by nature, I don't think anyone
is likely to be horrified to read that it might well be even worse
in PG than in certain commercial systems... 

>I suppose that keeping tuple status in index entries could be a win
>on nearly-read-only tables, but I think that on average it'd be
>a performance loser.

Well...I've personally not studied the issue in detail, but just
have to wonder if the folks at Oracle are really as stupid as the
above analysis would make them appear to be.  I presume that they
have a pretty good idea of the kind of mix large database installations
make, and presumably make choices designed to win on average.

- Don Baccus, Portland OR <dhogaza@pacifier.com> Nature photos, on-line guides, Pacific Northwest Rare Bird Alert
Serviceand other goodies at http://donb.photo.net.

RE: slow join on postgresql6.5

From

"Hiroshi Inoue"

Date:

31 March 2000, 05:00:10

> -----Original Message-----
> From: majordomo-owner@hub.org [mailto:majordomo-owner@hub.org]On Behalf
> Of Don Baccus
> 
> Whatever ... in this particular case  - referential integrity 
> with MATCH <unspecified> and MATCH PARTIAL and multi-column
> foreign keys - performance will likely drop spectacularly once the
> leading column is NULL, while (say) with Oracle you'd expect much
> less of a performance hit. 
>

As for NULL,it seems possible to look up NULL keys in a btree index
because NULL == NULL for btree indexes.
I've wondered why PostgreSQL's planner/executor never looks up
indexes for queries using 'IS NULL'.

Regards.

Hiroshi Inoue
Inoue@tpf.co.jp

RE: slow join on postgresql6.5

From

Don Baccus

Date:

31 March 2000, 07:14:11

At 07:05 PM 3/31/00 +0900, Hiroshi Inoue wrote:
>> -----Original Message-----
>> From: majordomo-owner@hub.org [mailto:majordomo-owner@hub.org]On Behalf
>> Of Don Baccus
>> 
>> Whatever ... in this particular case  - referential integrity 
>> with MATCH <unspecified> and MATCH PARTIAL and multi-column
>> foreign keys - performance will likely drop spectacularly once the
>> leading column is NULL, while (say) with Oracle you'd expect much
>> less of a performance hit. 
>>
>
>As for NULL,it seems possible to look up NULL keys in a btree index
>because NULL == NULL for btree indexes.
>I've wondered why PostgreSQL's planner/executor never looks up
>indexes for queries using 'IS NULL'.

Unfortunately for the RI MATCH PARTIAL case, NULL is a "wildcard".

This doesn't affect the validity of your observation in the general
case, though.



- Don Baccus, Portland OR <dhogaza@pacifier.com> Nature photos, on-line guides, Pacific Northwest Rare Bird Alert
Serviceand other goodies at http://donb.photo.net.

RE: slow join on postgresql6.5

From

"Hiroshi Inoue"

Date:

02 April 2000, 11:16:24

> -----Original Message-----
> From: Don Baccus [mailto:dhogaza@pacifier.com]
> Sent: Friday, March 31, 2000 11:34 PM
>
> At 07:05 PM 3/31/00 +0900, Hiroshi Inoue wrote:
> >> -----Original Message-----
> >> From: majordomo-owner@hub.org [mailto:majordomo-owner@hub.org]On Behalf
> >> Of Don Baccus
> >>
> >> Whatever ... in this particular case  - referential integrity
> >> with MATCH <unspecified> and MATCH PARTIAL and multi-column
> >> foreign keys - performance will likely drop spectacularly once the
> >> leading column is NULL, while (say) with Oracle you'd expect much
> >> less of a performance hit.
> >>
> >
> >As for NULL,it seems possible to look up NULL keys in a btree index
> >because NULL == NULL for btree indexes.
> >I've wondered why PostgreSQL's planner/executor never looks up
> >indexes for queries using 'IS NULL'.
>
> Unfortunately for the RI MATCH PARTIAL case, NULL is a "wildcard".
>

Oops I misunderstood NULL.

Hmm,is the following TODO worth the work ?
* Use index to restrict rows returned by multi-key index when used with non-consecutive keys or OR clauses, so fewer
heapaccesses.

Probably this is for the case likeSELECT .. FROM .. WHERE key1 = val1 and key3 = val3;
,where (key1,key2,key3) is a multi-column index.
Currently index scan doesn't take 'key3=val3' into account because
(key1,key3) isn't consecutive.
The TODO may include the caseSELECT .. FROM .. WHERE key2 = val2;
Though we have to scan the index entirely,access to the main table
is needed only when key2 = val2. If (key2 = val2) is sufficiently
restrictive,
the scan would be faster than simple sequential scan.

Comments ?

Regards.

Hiroshi Inoue
Inoue@tpf.co.jp

Re: slow join on postgresql6.5

From

Bruce Momjian

Date:

02 April 2000, 12:39:25

> > >As for NULL,it seems possible to look up NULL keys in a btree index
> > >because NULL == NULL for btree indexes.
> > >I've wondered why PostgreSQL's planner/executor never looks up
> > >indexes for queries using 'IS NULL'.
> >
> > Unfortunately for the RI MATCH PARTIAL case, NULL is a "wildcard".
> >
> 
> Oops I misunderstood NULL.
> 
> Hmm,is the following TODO worth the work ?
> * Use index to restrict rows returned by multi-key index when used with
>   non-consecutive keys or OR clauses, so fewer heap accesses.

This is a Vadim item.


--  Bruce Momjian                        |  http://www.op.net/~candle pgman@candle.pha.pa.us               |  (610)
853-3000+  If your life is a hard drive,     |  830 Blythe Avenue +  Christ can be your backup.        |  Drexel Hill,
Pennsylvania19026