Thread: regarding CLUSTER and HUGE work_mem / maintenance_work_mem
Let's say I have a 7GB table with 3-4 indices for a total of 10-12GB. Furthermore, let's say I have a machine with sufficient memory for me to set the work_mem and maintenance_work_mem to 20GB (just for this session). When I issue a CLUSTER using one of the indices, I see PostgreSQL (by way of strace) performing an index scan which amounts to large quantities of random I/O. In my case, that means it takes a very, very long time. PostgreSQL is largely at defaults, except for a 2GB shared_buffers and a few unrelated changes. The system itself has 32GB of physical RAM and has plenty free. Why didn't PostgreSQL just read the table into memory (and the interesting index) as a sequential scan, sort, and then write it out? It seems like there would be more than enough memory for that. The sequential I/O rate on this machine is 50-100x the random I/O rate. I'm using 8.4.10 (with the 'inet' de-toasting patch) on Scientific Linux 6.1. -- Jon
On 27.01.2012 19:43, Jon Nelson wrote: > Let's say I have a 7GB table with 3-4 indices for a total of 10-12GB. > Furthermore, let's say I have a machine with sufficient memory for me > to set the work_mem and maintenance_work_mem to 20GB (just for this > session). > When I issue a CLUSTER using one of the indices, I see PostgreSQL (by > way of strace) performing an index scan which amounts to large > quantities of random I/O. > In my case, that means it takes a very, very long time. PostgreSQL is > largely at defaults, except for a 2GB shared_buffers and a few > unrelated changes. The system itself has 32GB of physical RAM and has > plenty free. > Why didn't PostgreSQL just read the table into memory (and the > interesting index) as a sequential scan, sort, and then write it out? > It seems like there would be more than enough memory for that. The > sequential I/O rate on this machine is 50-100x the random I/O rate. > > I'm using 8.4.10 (with the 'inet' de-toasting patch) on Scientific Linux 6.1. The suppport for doing a seqscan+sort in CLUSTER was introduced in version 9.1. Before that, CLUSTER always did an indexscan. See release notes: http://www.postgresql.org/docs/9.1/static/release-9-1.html#AEN107416 -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com
On Fri, Jan 27, 2012 at 12:05 PM, Heikki Linnakangas <heikki.linnakangas@enterprisedb.com> wrote: > On 27.01.2012 19:43, Jon Nelson wrote: >> >> Let's say I have a 7GB table with 3-4 indices for a total of 10-12GB. >> Furthermore, let's say I have a machine with sufficient memory for me >> to set the work_mem and maintenance_work_mem to 20GB (just for this >> session). >> When I issue a CLUSTER using one of the indices, I see PostgreSQL (by >> way of strace) performing an index scan which amounts to large >> quantities of random I/O. >> In my case, that means it takes a very, very long time. PostgreSQL is >> largely at defaults, except for a 2GB shared_buffers and a few >> unrelated changes. The system itself has 32GB of physical RAM and has >> plenty free. >> Why didn't PostgreSQL just read the table into memory (and the >> interesting index) as a sequential scan, sort, and then write it out? >> It seems like there would be more than enough memory for that. The >> sequential I/O rate on this machine is 50-100x the random I/O rate. >> >> I'm using 8.4.10 (with the 'inet' de-toasting patch) on Scientific Linux >> 6.1. > > > The suppport for doing a seqscan+sort in CLUSTER was introduced in version > 9.1. Before that, CLUSTER always did an indexscan. See release notes: > http://www.postgresql.org/docs/9.1/static/release-9-1.html#AEN107416 That's what I get for digging through the source (git) but working with 8.4.10, on a Friday, at the end of a long week. Thanks for pointing that out to somebody that should have known better. -- Jon
On Fri, Jan 27, 2012 at 7:34 PM, Jon Nelson <jnelson+pgsql@jamponi.net> wrote: > On Fri, Jan 27, 2012 at 12:05 PM, Heikki Linnakangas > <heikki.linnakangas@enterprisedb.com> wrote: >> On 27.01.2012 19:43, Jon Nelson wrote: >>> >>> Let's say I have a 7GB table with 3-4 indices for a total of 10-12GB. >>> Furthermore, let's say I have a machine with sufficient memory for me >>> to set the work_mem and maintenance_work_mem to 20GB (just for this >>> session). >>> When I issue a CLUSTER using one of the indices, I see PostgreSQL (by >>> way of strace) performing an index scan which amounts to large >>> quantities of random I/O. >>> In my case, that means it takes a very, very long time. PostgreSQL is >>> largely at defaults, except for a 2GB shared_buffers and a few >>> unrelated changes. The system itself has 32GB of physical RAM and has >>> plenty free. >>> Why didn't PostgreSQL just read the table into memory (and the >>> interesting index) as a sequential scan, sort, and then write it out? >>> It seems like there would be more than enough memory for that. The >>> sequential I/O rate on this machine is 50-100x the random I/O rate. >>> >>> I'm using 8.4.10 (with the 'inet' de-toasting patch) on Scientific Linux >>> 6.1. >> >> >> The suppport for doing a seqscan+sort in CLUSTER was introduced in version >> 9.1. Before that, CLUSTER always did an indexscan. See release notes: >> http://www.postgresql.org/docs/9.1/static/release-9-1.html#AEN107416 > > That's what I get for digging through the source (git) but working > with 8.4.10, on a Friday, at the end of a long week. > Thanks for pointing that out to somebody that should have known better. But if you're stuck on < 9.1 for a while, the workaround is to cluster the table yourself by using a select * ... order by pkey. For randomly distributed tables this is far faster for a first time cluster. After that, subsequent clusters won't have as much work to do and the older method for clustering should work ok. It's kinda funny to have a complaint against pgsql for NOT using a sequential scan. Most DBAs that come from other DBAs are upset when it doesn't use an index.