Re: slow join on postgresql6.5 - Mailing list pgsql-hackers

From Don Baccus
Subject Re: slow join on postgresql6.5
Date
Msg-id 3.0.1.32.20000330190005.00f683ec@mail.pacifier.com
Whole thread Raw
Responses RE: slow join on postgresql6.5  ("Hiroshi Inoue" <Inoue@tpf.co.jp>)
List pgsql-hackers
At 11:10 AM 3/30/00 -0500, Tom Lane wrote:
>Don Baccus <dhogaza@pacifier.com> writes:
>> This is an example where commercial systems that have indices
>> synchronized with data such that queries referencing only the
>> fields in indices can win big vs. PG in SOME (not all) cases.
>> In particular, when the indices are to a table that has a bunch
>> of other, perhaps long, columns.  PG has to read the table and
>> drag all that dead weight around to do RI referential checking
>> and semantic actions.
>
>Keep in mind, though, that once we have TOAST the long columns are
>likely to get pushed out to a secondary table, so that the amount
>of data you have to read is reduced (as long as you don't touch
>any of the long columns, of course).

Sure...and you can BLOB or CLOB longer data in Oracle, too.  TOASTing
isn't without costs, either...life's a tradeoff!

>
>The main reason that Postgres indexes can't be used without also
>consulting the main table is that we do not store transaction status
>information in index entries, only in real tuples.  After finding
>an index entry we must still consult the referenced tuple to see
>if it's been deleted, or even committed yet.  I believe this is a
>pretty good tradeoff.

I must wonder, though, given that proper syncing seems to be the
norm in commercial systems.  Or so I'm lead to believe when Gray's
book, for instance.  Or a good book on speeding up Oracle queries.

Whatever ... in this particular case  - referential integrity 
with MATCH <unspecified> and MATCH PARTIAL and multi-column
foreign keys - performance will likely drop spectacularly once the
leading column is NULL, while (say) with Oracle you'd expect much
less of a performance hit. 

The point of my note is that this is probably worth documenting.

Don't get me wrong, these semantics and RI and multi-column keys
appear to be pretty inefficient by nature, I don't think anyone
is likely to be horrified to read that it might well be even worse
in PG than in certain commercial systems... 

>I suppose that keeping tuple status in index entries could be a win
>on nearly-read-only tables, but I think that on average it'd be
>a performance loser.

Well...I've personally not studied the issue in detail, but just
have to wonder if the folks at Oracle are really as stupid as the
above analysis would make them appear to be.  I presume that they
have a pretty good idea of the kind of mix large database installations
make, and presumably make choices designed to win on average.



- Don Baccus, Portland OR <dhogaza@pacifier.com> Nature photos, on-line guides, Pacific Northwest Rare Bird Alert
Serviceand other goodies at http://donb.photo.net.
 


pgsql-hackers by date:

Previous
From: Don Baccus
Date:
Subject: Re: slow join on postgresql6.5
Next
From: Michael Meskes
Date:
Subject: Re: BIT datatype