Re: More tablescanning fun - Mailing list pgsql-performance

From Jim C. Nasby
Subject Re: More tablescanning fun
Date
Msg-id 20030424235924.B66185@flake.decibel.org
Whole thread Raw
In response to Re: More tablescanning fun  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: More tablescanning fun
Re: More tablescanning fun
List pgsql-performance
On Thu, Apr 24, 2003 at 07:58:30PM -0400, Tom Lane wrote:
> "Jim C. Nasby" <jim@nasby.net> writes:
> > It seems like the metrics used for the cost of index scanning v. table
> > scanning on large tables need to be revisited. It might be such a huge
> > difference in this case because the table is essentially clustered on
> > the primary key.
>
> Probably.  What does the correlation figure in pg_stats show as?

stats=# select attname, correlation from pg_stats where
tablename='email_contrib';
  attname   | correlation
  ------------+-------------
  project_id |           1
  id         |    0.449204
  date       |    0.271775
  team_id    |    0.165588
  work_units |   0.0697928

> There's been some previous debate about the equation used to correct
> for correlation, which is certainly bogus (I picked it more or less
> out of the air ;-)).  But so far no one has proposed a replacement
> equation with any better foundation ... take a look in
> src/backend/optimizer/path/costsize.c if you want to get involved.

Are you reffering to the PF formula?

> > Also, is there a TODO to impliment
> > real clustered indexes?
>
> No.  It's not apparent to me how you could do that without abandoning
> MVCC, which we're not likely to do.

Hmm... does MVCC mandate inserts go at the end? My understanding is that
each tuple indicates it's insert/last modified time; if this is the
case, why would a true clustered index break mvcc? I guess an update
that moves the tuple would be tricky, but I'm guesing there's some kind
of magic that could happen there... worst case would be adding an
'expired' timestamp.

On the other hand, it might be possible to get the advantages of a
clustered index without doing a *true* clustered index. The real point
is to be able to use indexes; I've heard things like 'if you need to
access more than 10% of a table then using an index would be
disasterous', and that's not good... that number should really be over
50% for most reasonable ratios of fields indexed to fields in table (of
course field size plays a factor).
--
Jim C. Nasby (aka Decibel!)                    jim@nasby.net
Member: Triangle Fraternity, Sports Car Club of America
Give your computer some brain candy! www.distributed.net Team #1828

Windows: "Where do you want to go today?"
Linux: "Where do you want to go tomorrow?"
FreeBSD: "Are you guys coming, or what?"


pgsql-performance by date:

Previous
From: Tom Lane
Date:
Subject: Re: More tablescanning fun
Next
From: Tom Lane
Date:
Subject: Re: More tablescanning fun