RE: slow join on postgresql6.5 - Mailing list pgsql-hackers

From Wenjin Zheng
Subject RE: slow join on postgresql6.5
Date
Msg-id E3E65B3C4E12D311BE6900A0C9EC2FA11554AC@GENOMICS1
Whole thread Raw
List pgsql-hackers
Thank you all very much for the help.  The post really solved my problem.
Your help is greatly appreciated.

Wenjin Zheng
Bioinformatic Analyst
Biosource Technologies, Inc.
3333 Vaca Valley Parkway
Vacaville, CA 95688
(707)469-2353
email: wenjin.zheng@lsbc.com 

-----Original Message-----
From: Tom Lane [mailto:tgl@sss.pgh.pa.us]
Sent: Thursday, March 30, 2000 8:11 AM
To: Don Baccus
Cc: Wenjin Zheng; pgsql-hackers@postgresql.org
Subject: Re: [HACKERS] slow join on postgresql6.5 


Don Baccus <dhogaza@pacifier.com> writes:
> This is an example where commercial systems that have indices
> synchronized with data such that queries referencing only the
> fields in indices can win big vs. PG in SOME (not all) cases.
> In particular, when the indices are to a table that has a bunch
> of other, perhaps long, columns.  PG has to read the table and
> drag all that dead weight around to do RI referential checking
> and semantic actions.

Keep in mind, though, that once we have TOAST the long columns are
likely to get pushed out to a secondary table, so that the amount
of data you have to read is reduced (as long as you don't touch
any of the long columns, of course).

The main reason that Postgres indexes can't be used without also
consulting the main table is that we do not store transaction status
information in index entries, only in real tuples.  After finding
an index entry we must still consult the referenced tuple to see
if it's been deleted, or even committed yet.  I believe this is a
pretty good tradeoff.  If we kept a copy of the status info in the
index, then UPDATE and DELETE would get hugely slower and more
complex, since they'd have to be able to find and mark all the
index entries pointing at a tuple as well as the tuple itself.
The extra info would also increase the size of index entries,
which is bad because the point of an index is that reading it
takes less disk traffic than reading the underlying table.
(BTW, to return to the original thread, one of the biggest reasons
that indexes with many columns are a loser is that they're so big.)

I suppose that keeping tuple status in index entries could be a win
on nearly-read-only tables, but I think that on average it'd be
a performance loser.
        regards, tom lane


pgsql-hackers by date:

Previous
From: The Hermit Hacker
Date:
Subject: Re: AW: src/test/locale/de_DE.ISO-8859-1
Next
From: wieck@debis.com (Jan Wieck)
Date:
Subject: Re: caching query results