Re: More tablescanning fun - Mailing list pgsql-performance
From | Jim C. Nasby |
---|---|
Subject | Re: More tablescanning fun |
Date | |
Msg-id | 20030424235924.B66185@flake.decibel.org Whole thread Raw |
In response to | Re: More tablescanning fun (Tom Lane <tgl@sss.pgh.pa.us>) |
Responses |
Re: More tablescanning fun
Re: More tablescanning fun |
List | pgsql-performance |
On Thu, Apr 24, 2003 at 07:58:30PM -0400, Tom Lane wrote: > "Jim C. Nasby" <jim@nasby.net> writes: > > It seems like the metrics used for the cost of index scanning v. table > > scanning on large tables need to be revisited. It might be such a huge > > difference in this case because the table is essentially clustered on > > the primary key. > > Probably. What does the correlation figure in pg_stats show as? stats=# select attname, correlation from pg_stats where tablename='email_contrib'; attname | correlation ------------+------------- project_id | 1 id | 0.449204 date | 0.271775 team_id | 0.165588 work_units | 0.0697928 > There's been some previous debate about the equation used to correct > for correlation, which is certainly bogus (I picked it more or less > out of the air ;-)). But so far no one has proposed a replacement > equation with any better foundation ... take a look in > src/backend/optimizer/path/costsize.c if you want to get involved. Are you reffering to the PF formula? > > Also, is there a TODO to impliment > > real clustered indexes? > > No. It's not apparent to me how you could do that without abandoning > MVCC, which we're not likely to do. Hmm... does MVCC mandate inserts go at the end? My understanding is that each tuple indicates it's insert/last modified time; if this is the case, why would a true clustered index break mvcc? I guess an update that moves the tuple would be tricky, but I'm guesing there's some kind of magic that could happen there... worst case would be adding an 'expired' timestamp. On the other hand, it might be possible to get the advantages of a clustered index without doing a *true* clustered index. The real point is to be able to use indexes; I've heard things like 'if you need to access more than 10% of a table then using an index would be disasterous', and that's not good... that number should really be over 50% for most reasonable ratios of fields indexed to fields in table (of course field size plays a factor). -- Jim C. Nasby (aka Decibel!) jim@nasby.net Member: Triangle Fraternity, Sports Car Club of America Give your computer some brain candy! www.distributed.net Team #1828 Windows: "Where do you want to go today?" Linux: "Where do you want to go tomorrow?" FreeBSD: "Are you guys coming, or what?"
pgsql-performance by date: