Re: CLUSTER versus a dedicated table - Mailing list pgsql-performance
From | mark |
---|---|
Subject | Re: CLUSTER versus a dedicated table |
Date | |
Msg-id | 005b01cc20c2$f041f360$d0c5da20$@com Whole thread Raw |
In response to | CLUSTER versus a dedicated table (Robert James <srobertjames@gmail.com>) |
List | pgsql-performance |
> -----Original Message----- > From: pgsql-performance-owner@postgresql.org [mailto:pgsql-performance- > owner@postgresql.org] On Behalf Of Robert James > Sent: Wednesday, June 01, 2011 5:55 PM > To: pgsql-performance@postgresql.org > Subject: [PERFORM] CLUSTER versus a dedicated table > > Hi. I'm interested in understanding the differences between > CLUSTERing a table and making a dedicated one. > > We have a table with about 1 million records. On a given day, only > about 1% of them are of interest. That 1% changes every day (it's > WHERE active_date = today), and so we index and cluster on it. > > Even so, the planner shows a very large cost for the Index Scan: about > 3500. If I instead do a SELECT INTO temp_table FROM big_table WHERE > active_date = today, and then do SELECT * FROM temp_table, I get a > planned cost of 65. Yet, the actual time for both queries is almost > identical. > > Questions: > 1. Why is there such a discrepancy between the planner's estimate and > the actual cost? > > 2. In a case like this, will I in general see a performance gain by > doing a daily SELECT INTO and then querying from that table? My ad hoc > test doesn't indicate I would (despite the planner's prediction), and > I'd rather avoid this if it won't help. > > 3. In general, does CLUSTER provide all the performance benefits of a > dedicated table? If it doesn't, what does it lack? > > Thank you. Start here: http://wiki.postgresql.org/wiki/SlowQueryQuestions 1: there could be many reasons for the planner to come up a grossly inaccurate ESTIMATE for some values. Last time the table was analyzed, is usually where people start. 2: look at table partitioning it's pretty straight forward and sounds like it might be a good fit for you. It will however involve some triggers or rules and check constraints. Table partitioning has some downsides though, you should be aware of what they are before you commit to it. 3: clustering, from a high level, just reorders the data on disk by a given index. Depending on your setup keeping it close to that clustered ordering might be trivial or it might not be. Big tables are relative to different people, 1M rows might be a big table or it might not be, since you didn't post the size of the table and indexes we can only guess. Table partitioning helps most with table maintenance, IMO, but can be very useful it the constraint exclusion can eliminate a large number of child tables right off so it doesn't have to traverse large indexes or do lots of random IO. You will need to post at lot more specific info if you want more specific help. The guide to reporting slow queries or guide to reporting problems and start gathering specific information and then post back to the list. -Mark > > -- > Sent via pgsql-performance mailing list (pgsql- > performance@postgresql.org) > To make changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-performance
pgsql-performance by date: