Thread: Performance of CLUSTER
What is the expected performance of cluster and what tuning parameters have most effect? I have a set of 5 tables with identical structure (all inherit a common table). The sizes given are total relation size. The clustering index is a gist index on a (non null) geographic(linestring) column 1. 327600 rows, 105MB, 15.8s 2. 770165 rows, 232MB, 59.5s 3. 1437041 rows, 424MB, 140s 4. 3980922 rows, 1167MB, 276s 5. 31843368 rows, 9709MB, ~ 10 hours Server is version 9.1. with postgis 1.5.4. Regards, Mark Thornton
On 06/10/2012 03:20 AM, Mark Thornton wrote: > 4. 3980922 rows, 1167MB, 276s > 5. 31843368 rows, 9709MB, ~ 10 hours Just judging based on the difference between these two, it would appear to be from a lot of temp space thrashing. An order of magnitude more rows shouldn't take over 100x longer to cluster, even with GIST. What's your maintenance_work_mem setting? -- Shaun Thomas OptionsHouse | 141 W. Jackson Blvd. | Suite 500 | Chicago IL, 60604 312-444-8534 sthomas@optionshouse.com ______________________________________________ See http://www.peak6.com/email_disclaimer/ for terms and conditions related to this email
On 11/06/12 14:52, Shaun Thomas wrote: > On 06/11/2012 08:46 AM, Mark Thornton wrote: > >> 500m --- though isn't clear if cluster uses maintenance memory or the >> regular work memory. I could readily use a higher value for >> maintenance_work_mem. > > For an operation like that, having a full GB wouldn't hurt. Though if > you haven't already, you might think about pointing I didn't think the process was using even the 500m it ought to have had available, whereas creating an index did appear to use that much. Note though that I didn't stay up all night watching it! > your pgsql_tmp to /dev/shm for a while, even for just this operation. > > Then again, is your CPU at 100% during the entire operation? No the CPU utilization is quite low. Most of the time is waiting for IO. > If it's not fetching anything from disk or writing out much, reducing > IO won't help. :) One deficiency we've had with CLUSTER is that it's a > synchronous operation. It does each thing one after the other. So > it'll organize the table contents, then it'll reindex each index > (including the primary key) one after the other. If you have a lot of > those, that can take a while, especially if any composite or complex > indexes exist. In this case there are only two indexes, the gist one and a primary key (on a bigint value). > > You might actually be better off running parallel REINDEX commands on > the table (I wrote a script for this because we have a 250M row table > that each index takes 1-2.5 hours to build). You might also consider > pg_reorg, which seems to handle some parts of a table rebuild a little > better. > > That should give you an escalation pattern, though. :) Thanks for your help, Mark Thornton
On 06/11/2012 09:02 AM, Mark Thornton wrote: > I didn't think the process was using even the 500m it ought to have > had available, whereas creating an index did appear to use that much. > Note though that I didn't stay up all night watching it! You'd be surprised. If you look in your base/pgsql_tmp directory during a cluster of that table (make a copy of it if you don't want to interfere with a running system) you should see that directory fill with temporary structures, mostly during the index rebuild portions. It also wouldn't hurt to bootstrap system cache with the contents of that table. Do an explain analyze on SELECT * with no where clause and all of that table should be in memory. Oh, actually that reminds me... does your 10GB table fit into memory? If not, that might explain it right there. -- Shaun Thomas OptionsHouse | 141 W. Jackson Blvd. | Suite 500 | Chicago IL, 60604 312-444-8534 sthomas@optionshouse.com ______________________________________________ See http://www.peak6.com/email_disclaimer/ for terms and conditions related to this email
On 06/11/2012 09:25 AM, Mark Thornton wrote: > Certainly not --- the server only has 5GB of memory. Nevertheless I > don't expect quadratic behaviour for CLUSTER (n log n would be my > expected time). And there it is. :) Since that's the case, *DO NOT* create the symlink from pgsql_tmp to /dev/shm like I suggested before. You don't have enough memory for that, and it will likely cause problems. I need to stop assuming everyone has huge servers. I know low-end laptops have 4GB of RAM these days, but servers have longer shelf-lives, and VMs can play larger roles. So here's the thing, and I should have honestly realized it the second I noted the >100x jump in execution time. All of your previous tables fit in memory. Nice, speedy, >100x faster than disk, memory. It's not that the table is only 10x larger than other tables in your examples, it's that the entire thing doesn't fit in memory. Since it can't just read the table and assume it's in memory, reads have a chance to fetch from disk. Since it's also maintaining several temporary files for the new index and replacement table structures, it's fighting for random reads and writes during the whole process. That's in addition to any transaction log traffic and checkpoints since the process will span several. Actually, your case is a good illustration of how memory and high-performance IO devices can reduce maintenance costs. If you played around with steadily increasing table sizes, I bet you could even find the exact row count and table size where the table no longer fits in PostgreSQL or OS cache, and suddenly takes 100x longer to process. That kind of steady table growth is often seen in databases, and admins sometimes see this without understanding why it happens. -- Shaun Thomas OptionsHouse | 141 W. Jackson Blvd. | Suite 500 | Chicago IL, 60604 312-444-8534 sthomas@optionshouse.com ______________________________________________ See http://www.peak6.com/email_disclaimer/ for terms and conditions related to this email
On Mon, 2012-06-11 at 08:42 -0500, Shaun Thomas wrote: > On 06/10/2012 03:20 AM, Mark Thornton wrote: > > > 4. 3980922 rows, 1167MB, 276s > > 5. 31843368 rows, 9709MB, ~ 10 hours > > Just judging based on the difference between these two, it would appear > to be from a lot of temp space thrashing. An order of magnitude more > rows shouldn't take over 100x longer to cluster, even with GIST. What's > your maintenance_work_mem setting? GiST can have a large impact depending on all kinds of factors, including data distribution. 9.2 contains some important improvements in GiST build times for cases where the index doesn't fit in memory. Mark, can you please try your experiments on the 9.2beta and tell us whether that helps you? Regards, Jeff Davis