Thread: Questions about vacuum analyze
System: Compaq ProLiant 3000, dual-336MHz Pentium II, 512MB RAM O/S: Linux 2.2.6 (Slackware 4.0 dist) DB: Postgresql 6.5.1 DB on 8GB 4-disk stripe-set using md. DB with two tables, each with about 1M rows. Current DB size about 1.5GB I am having trouble deciding when to run vacuum analyze, how often, and what state the system should be in when I run it. I have found that if I run it once a night, on a system that is quiet (no other processes connected to the DB) it will still take several (4-8) hours. Does anyone have recommendations regarding vacuum analyze? Specifically: 1) Should it be run on a system that is quiet or will it run acceptably with other processes accessing the DB? 2) How often should it be run? Thanks, -- Steven Wheeler UNIX Engineering (918) 292-4119
"Steven M. Wheeler" <swheeler@sabre.com> writes: > Does anyone have recommendations regarding vacuum analyze? > Specifically: > 1) Should it be run on a system that is quiet or will it run acceptably > with other processes accessing the DB? Vacuum will run fine, but it obtains an exclusive lock on each table while it is working on that table. You are likely to find your other processes waiting for the lock... > 2) How often should it be run? How fast does your database change? I'd doubt that running vacuum analyze, as opposed to a plain vacuum, needs to be done very often --- it's unlikely that the statistics vacuum analyze measures are changing that fast, especially not in million-row tables. The other function of vacuum is reclaiming space from dead tuples, and the need for that depends on how often you update or delete tuples. I'm just guessing here, but a rule of thumb might be that it's worth vacuuming when something like 20% of the tuples in your table are dead, ie, the number of updates/deletes you've done since last vacuum is about 20% of the table row count. 6.5 seems to have some performance problems with vacuuming large indexes, BTW. We are looking into that, but in the meantime you might experiment with dropping indexes on a table, vacuum table, recreating indexes to see if that's faster than a straight vacuum. regards, tom lane
System: Compaq ProLiant 3000 with 2 300MHz PentPro, 512MB RAM, 32GB hardware based stripe-set for the DB area. OS: Linux kernel 2.2.12 DB: version 6.5 Thanks for the info, thought I'd send a followup regarding my ongoing problems with vacuum. After the last email, I vacuumed the DB a couple of times, with indexes dropped. The first time it completed in 2-3 hours, the next time was appreciably longer. After this I stopped doing anything except for inserts, trying to catch up on the incoming data. This went on for a little over a week until I had inserted several million additional rows (its now up to 31M+). I then dropped the indexes and started vacuum again. Start: 09/28/1999@10:06:57 Finish: 09/30/1999@19:13:14 33 hours - WOW! This morning, I rebuilt the indexes and tried to do a "select distinct statdate from currnt;" This select statement has been running for several hours now. The only output has been the following message, repeated twice: "NOTICE: BufFileRead: should have flushed after writing". The system is running at about 50% utilization. I have checked everything I can think of, and at this point I have very little hair left to loose. What the heck am I doing wrong?!?!?!? Have I mentioned how desperate I am;-} -- Steven Wheeler Mid-Range UNIX Engineering Sabre Inc. (918) 292-4119 Tom Lane wrote: > "Steven M. Wheeler" <swheeler@sabre.com> writes: > > Does anyone have recommendations regarding vacuum analyze? > > Specifically: > > 1) Should it be run on a system that is quiet or will it run acceptably > > with other processes accessing the DB? > > Vacuum will run fine, but it obtains an exclusive lock on each table > while it is working on that table. You are likely to find your other > processes waiting for the lock... > > > 2) How often should it be run? > > How fast does your database change? > > I'd doubt that running vacuum analyze, as opposed to a plain vacuum, > needs to be done very often --- it's unlikely that the statistics > vacuum analyze measures are changing that fast, especially not in > million-row tables. The other function of vacuum is reclaiming space > from dead tuples, and the need for that depends on how often you update > or delete tuples. > > I'm just guessing here, but a rule of thumb might be that it's worth > vacuuming when something like 20% of the tuples in your table are > dead, ie, the number of updates/deletes you've done since last vacuum > is about 20% of the table row count. > > 6.5 seems to have some performance problems with vacuuming large > indexes, BTW. We are looking into that, but in the meantime you might > experiment with dropping indexes on a table, vacuum table, recreating > indexes to see if that's faster than a straight vacuum. > > regards, tom lane
> System: Compaq ProLiant 3000 with 2 300MHz PentPro, 512MB RAM, 32GB hardware > based stripe-set for the DB area. > OS: Linux kernel 2.2.12 > DB: version 6.5 > > Thanks for the info, thought I'd send a followup regarding my ongoing > problems with vacuum. > > After the last email, I vacuumed the DB a couple of times, with indexes > dropped. The first time it completed in 2-3 hours, the next time was > appreciably longer. After this I stopped doing anything except for inserts, > trying to catch up on the incoming data. This went on for a little over a > week until I had inserted several million additional rows (its now up to > 31M+). I then dropped the indexes and started vacuum again. > Start: 09/28/1999@10:06:57 > Finish: 09/30/1999@19:13:14 > > 33 hours - WOW! Yikes, that is a long time. > > This morning, I rebuilt the indexes and tried to do a "select distinct > statdate from currnt;" This select statement has been running for several SELECT DISTINCT is going to take forever because it has to read all rows, then sort them to remove the duplicates. That could take some time. DISTINCT doesn't use the indexes, because if it did, it would be paging in all over the place. Better to read the table sequentially, then sort. -- Bruce Momjian | http://www.op.net/~candle maillist@candle.pha.pa.us | (610) 853-3000+ If your life is a hard drive, | 830 Blythe Avenue + Christ can be your backup. | Drexel Hill, Pennsylvania19026
"Steven M. Wheeler" <swheeler@sabre.com> writes: > DB: version 6.5 > 33 hours [to vacuum] - WOW! Yipes. We did find and fix a glitch in memory allocation in vacuum (causing both slowness and excessive memory usage) between 6.5 and 6.5.2. You might try updating to see if it's gotten any better... regards, tom lane
Boy this is fun! ;-} I have upgraded to 6.5.2 as you suggest. Binaries are built and I am now re-running my vacuum just to give my self a benchmark starting point. Just wanted to let you know that, for perl5, when I tried to run "make test" after having built the perl kit, I get the following: cobra:/opt/pgsql/postgresql-6.5.2/src/interfaces/perl5 $make test PERL_DL_NONLAZY=1 /usr/bin/perl -Iblib/arch -Iblib/lib -I/usr/lib/perl5/i386-linux -I/usr/lib/perl5 test.pl Pg::conndefaults ........ ok Pg::connectdb ........... ok $conn->exec ............. ok $conn->errorMessage ..... ok pqReadData() -- backend closed the channel unexpectedly. This probably means the backend terminated abnormally before or while processing the request. make: *** [test_dynamic] Error 255 Am I better off re-writing my perl scripts in 'C'? I've been thinking about it in hopes of boosting the performance. -- Steven Wheeler UNIX Engineering (918) 292-4119 Tom Lane wrote: > "Steven M. Wheeler" <swheeler@sabre.com> writes: > > DB: version 6.5 > > 33 hours [to vacuum] - WOW! > > Yipes. We did find and fix a glitch in memory allocation in vacuum > (causing both slowness and excessive memory usage) between 6.5 and > 6.5.2. You might try updating to see if it's gotten any better... > > regards, tom lane
Tom, I believe that 6.5.2 and a little space reclamation in my DB directory may have taken care of the vacuum problem. The last vacuum ran in about 10 minutes on an already vacuumed DB. I am now loading another 400K inserts and will perform the vacuum again at the conclusion. A couple of questions remain. 1) With 6.5.2 should I still drop the indexes prior to running vacuum? 2) Is there a command/script that will cleanup the temp files and previous table files in the DB directory. The reason I ask: I now have 3 copies of my currnt table file (currnt, currnt.1, currnt.2) for a total usage of about 3GB. Since it appears that currnt.2 is the one being accessed, can I safely delete currnt & currnt.1? Thanks again for all the help! -- Steven Wheeler Mid-Range UNIX Engineering Sabre Inc. (918) 292-4119 Tom Lane wrote: > "Steven M. Wheeler" <swheeler@sabre.com> writes: > > DB: version 6.5 > > 33 hours [to vacuum] - WOW! > > Yipes. We did find and fix a glitch in memory allocation in vacuum > (causing both slowness and excessive memory usage) between 6.5 and > 6.5.2. You might try updating to see if it's gotten any better... > > regards, tom lane
"Steven M. Wheeler" <swheeler@sabre.com> writes: > I believe that 6.5.2 and a little space reclamation in my DB directory may > have taken care of the vacuum problem. The last vacuum ran in about 10 > minutes on an already vacuumed DB. Ah, that's more like it... > 1) With 6.5.2 should I still drop the indexes prior to running vacuum? Probably, but I'm not sure. Try it both ways and see. > 2) Is there a command/script that will cleanup the temp files and previous > table files in the DB directory. The reason I ask: I now have 3 copies of > my currnt table file (currnt, currnt.1, currnt.2) for a total usage of > about 3GB. Since it appears that currnt.2 is the one being accessed, can I > safely delete currnt & currnt.1? NO NO NO NO NO!!!! Files named like that are NOT temp files!! What they are are segments of a large table. We segment big tables into gigabyte-sized chunks to avoid problems on systems that have an int32-related limit on the size of individual files (which is most Unixes these days, I think). If you see anything named like pg_tempNNN.NNN, then that really is a temp file, and if it's not got a very recent mod time then it's probably left over from a crashed backend. Old temp files should be safe to get rid of. (IIRC, one of the components of the name is the PID of the creating backend, so you can check to be sure that the backend is no longer around if you want to be doubly sure.) regards, tom lane
More information on the continuing vacuum saga: The latest run after having processed an additional 40K+ msgs (4M+ inserts). Start: 10/06/1999@07:28:07 complete: 10/06/1999@21:17:32 About 14 hours. Two tables. The currnt table has 39.5M+ rows in it. Select count(*) took 14m41s The history table only has 190K rows. I am now running into another problem. I need to do select distinct from the currnt to get dates for maintenance activity (rows to summarize and move to history). When ever I do I get this message in my server.log file: NOTICE: BufFileRead: should have flushed after writing And I get 0 records selected. Can't find this message in the docs. Any suggestions? Should I submit this on the SQL list? Thanks, you're help is greatly appreciated! -- Steven Wheeler Mid-Range UNIX Engineering Sabre Inc. (918) 292-4119 Tom Lane wrote: > "Steven M. Wheeler" <swheeler@sabre.com> writes: > > I believe that 6.5.2 and a little space reclamation in my DB directory may > > have taken care of the vacuum problem. The last vacuum ran in about 10 > > minutes on an already vacuumed DB. > > Ah, that's more like it... > > > 1) With 6.5.2 should I still drop the indexes prior to running vacuum? > > Probably, but I'm not sure. Try it both ways and see. > > > 2) Is there a command/script that will cleanup the temp files and previous > > table files in the DB directory. The reason I ask: I now have 3 copies of > > my currnt table file (currnt, currnt.1, currnt.2) for a total usage of > > about 3GB. Since it appears that currnt.2 is the one being accessed, can I > > safely delete currnt & currnt.1? > > NO NO NO NO NO!!!! > > Files named like that are NOT temp files!! What they are are segments > of a large table. We segment big tables into gigabyte-sized chunks > to avoid problems on systems that have an int32-related limit on the > size of individual files (which is most Unixes these days, I think). > > If you see anything named like pg_tempNNN.NNN, then that really is > a temp file, and if it's not got a very recent mod time then it's > probably left over from a crashed backend. Old temp files should be > safe to get rid of. (IIRC, one of the components of the name is the > PID of the creating backend, so you can check to be sure that the > backend is no longer around if you want to be doubly sure.) > > regards, tom lane
"Steven M. Wheeler" <swheeler@sabre.com> writes: > I am now running into another problem. I need to do select distinct from the > currnt to get dates for maintenance activity (rows to summarize and move to > history). When ever I do I get this message in my server.log file: > NOTICE: BufFileRead: should have flushed after writing Hmph --- just looking at the code that issues that message, it looks like it is a harmless gripe (it complains, but then goes and does the missing step). However, it's suggestive that there may be a logic error in the sorting code that's calling the BufFileXXX routines. > And I get 0 records selected. And that's even more suggestive ;-) This is going to be tough to debug if it takes a multi-gig database to replicate. But if I'm guessing right about where the problem is, it probably can be replicated on smaller tables if you use a smaller -S (sort memory limit) setting when starting the backend. You can set -S as small as 32 (kb), and having done that the bug might show up on tables with a few hundred K of data. Would you try that and let me know? The easiest way to experiment is to set -S on a per-backend basis via the PGOPTIONS environment variable, egsetenv PGOPTIONS "-S 32"psql ... The backend started for this psql session will set S=32 instead of whatever the prevailing default is. regards, tom lane
"Steven M. Wheeler" <swheeler@sabre.com> writes: > I am now running into another problem. I need to do select distinct from the > currnt to get dates for maintenance activity (rows to summarize and move to > history). When ever I do I get this message in my server.log file: > NOTICE: BufFileRead: should have flushed after writing > And I get 0 records selected. After some code browsing I have come up with a theory: is it possible that you are running out of disk space during the sort? I see psort.c neglects to check for write failure :-(, although I am not entirely clear on why that particular message and nothing else would come out after it bulled ahead with an incomplete temporary file. Sorting a large table seems to require temp disk space equal to several times the size of the table. Since you said your table is 40M rows, I can easily believe you don't have that much free space... I don't believe anyone has taken a hard look at psort.c in a long time. I will try to dive into it and see if its resource consumption can't be reduced. Mergesort inherently requires 2X the original data space, but it seems that psort is using even more than that, and perhaps this can be improved. regards, tom lane
I wrote: > "Steven M. Wheeler" <swheeler@sabre.com> writes: >> I am now running into another problem. I need to do select distinct from the >> currnt to get dates for maintenance activity (rows to summarize and move to >> history). When ever I do I get this message in my server.log file: >> >> NOTICE: BufFileRead: should have flushed after writing >> >> And I get 0 records selected. > After some code browsing I have come up with a theory: is it possible > that you are running out of disk space during the sort? I see psort.c > neglects to check for write failure :-(, although I am not entirely > clear on why that particular message and nothing else would come out > after it bulled ahead with an incomplete temporary file. > Sorting a large table seems to require temp disk space equal to several > times the size of the table. Since you said your table is 40M rows, > I can easily believe you don't have that much free space... A further thought: 40M rows could well be more than 4gig of data. Although Postgres supports tables that large (by segmenting them into 1gig-apiece files to avoid OS limitations on file size), we currently do not cope with temporary sort files that exceed 4 gig, which means a "select distinct" on that much data will fail regardless of whether you have enough free disk space :-(. We have a TODO item to fix this. After looking at psort.c, I see a bunch of other shortcomings, one being that it's using a polyphase merge algorithm --- which was hot stuff back when people did this sort of thing on magnetic tape drives, but it's less than appropriate for sorting on disk. The worst problem with it is that the space consumption is about 4x the actual data volume, which is not too cool when you're talking about a huge file to begin with... Meanwhile, there is a completely separate implementation of external sorting over in nbtsort.c (where it's only used for CREATE INDEX, AFAICS), with a somewhat different set of deficiencies. I think I'll add "rewrite sort code from the ground up" to my to-do list ;-) regards, tom lane
> A further thought: 40M rows could well be more than 4gig of data. > Although Postgres supports tables that large (by segmenting them into > 1gig-apiece files to avoid OS limitations on file size), we currently > do not cope with temporary sort files that exceed 4 gig, which means > a "select distinct" on that much data will fail regardless of whether > you have enough free disk space :-(. We have a TODO item to fix this. > > After looking at psort.c, I see a bunch of other shortcomings, one being > that it's using a polyphase merge algorithm --- which was hot stuff back > when people did this sort of thing on magnetic tape drives, but it's > less than appropriate for sorting on disk. The worst problem with it is > that the space consumption is about 4x the actual data volume, which is > not too cool when you're talking about a huge file to begin with... > > Meanwhile, there is a completely separate implementation of external > sorting over in nbtsort.c (where it's only used for CREATE INDEX, > AFAICS), with a somewhat different set of deficiencies. > > I think I'll add "rewrite sort code from the ground up" to my to-do > list ;-) I have on TODO: Make index creation use psort code, because it is now faster(Vadim) I didn't know sorting algorithms for tape and disk had different optimizations. I figured the paging in of disk blocks had a similar penalty to tape rewinding. None of us really knows a lot about the best algorithm for that job. Nice you recognized it. -- Bruce Momjian | http://www.op.net/~candle maillist@candle.pha.pa.us | (610) 853-3000+ If your life is a hard drive, | 830 Blythe Avenue + Christ can be your backup. | Drexel Hill, Pennsylvania19026
Bruce Momjian <maillist@candle.pha.pa.us> writes: > I didn't know sorting algorithms for tape and disk had different > optimizations. I figured the paging in of disk blocks had a similar > penalty to tape rewinding. None of us really knows a lot about the best > algorithm for that job. Nice you recognized it. I didn't know much about it either until the comments in psort.c led me to read the relevant parts of Knuth. Tape rewind and disk seek times are not remotely comparable, especially when you're talking about N tape units versus one disk unit --- using more files (tapes) to minimize rewind time can be a big win on tapes, but put those same files on a disk and you're just spreading the seek time around differently. More to the point, though, tape-oriented algorithms tend to assume that tape space is free. Polyphase merge doesn't care in the least that it has several copies of any given record laying about on different tapes, only one of which is of interest anymore. I know how to get the sort space requirement down to 2x the actual data volume (just use a balanced merge instead of the "smarter" polyphase) but I am thinking about ways to get it down to data volume + a few percent with an extra layer of bookkeeping. The trick is to release and recycle space as soon as we have read in the tuples stored in it... regards, tom lane
> I didn't know much about it either until the comments in psort.c led me > to read the relevant parts of Knuth. Tape rewind and disk seek times Yes, that's what I did too. > are not remotely comparable, especially when you're talking about N tape > units versus one disk unit --- using more files (tapes) to minimize > rewind time can be a big win on tapes, but put those same files on a > disk and you're just spreading the seek time around differently. More > to the point, though, tape-oriented algorithms tend to assume that tape > space is free. Polyphase merge doesn't care in the least that it has > several copies of any given record laying about on different tapes, only > one of which is of interest anymore. > > I know how to get the sort space requirement down to 2x the actual data > volume (just use a balanced merge instead of the "smarter" polyphase) > but I am thinking about ways to get it down to data volume + a few > percent with an extra layer of bookkeeping. The trick is to release > and recycle space as soon as we have read in the tuples stored in it... The thing I liked about the existing algorithm is that it did the sorting without the page faulting/thrashing caused by many sort algorithms. I can see better space reuse as being a big win because we would get even less page faulting because the replaced page would already be in memory. -- Bruce Momjian | http://www.op.net/~candle maillist@candle.pha.pa.us | (610) 853-3000+ If your life is a hard drive, | 830 Blythe Avenue + Christ can be your backup. | Drexel Hill, Pennsylvania19026
Bruce Momjian <maillist@candle.pha.pa.us> writes: > The thing I liked about the existing algorithm is that it did the > sorting without the page faulting/thrashing caused by many sort > algorithms. Yes, I think we want to stick with the basic architecture of an initial replacement-selection filter operating in limited memory, followed by merge passes. I'm just thinking about how to reduce the amount of disk space chewed up by intermediate merge results... regards, tom lane
> Bruce Momjian <maillist@candle.pha.pa.us> writes: > > The thing I liked about the existing algorithm is that it did the > > sorting without the page faulting/thrashing caused by many sort > > algorithms. > > Yes, I think we want to stick with the basic architecture of an > initial replacement-selection filter operating in limited memory, > followed by merge passes. I'm just thinking about how to reduce > the amount of disk space chewed up by intermediate merge results... Wonder how our create index does it. Seems it is must be similar. -- Bruce Momjian | http://www.op.net/~candle maillist@candle.pha.pa.us | (610) 853-3000+ If your life is a hard drive, | 830 Blythe Avenue + Christ can be your backup. | Drexel Hill, Pennsylvania19026
Bruce Momjian <maillist@candle.pha.pa.us> writes: > Wonder how our create index does it. Seems it is must be similar. CREATE INDEX (for btrees) is very similar, and really ought to share code. Someone apparently didn't want to figure out how to generalize psort.c to handle index tuples, though. The CREATE INDEX code is a little better for large amounts of data but a lot worse for small amounts --- AFAICT it doesn't take any shortcuts, even when everything fits in memory. (I'm sure you've noticed that CREATE INDEX hits the disk pretty hard even when the source table is empty :-(.) I'm planning to merge the two sets of code and keep the best features of both. I already have some first-draft code that allows them to work with >2gig data sets. regards, tom lane
> Bruce Momjian <maillist@candle.pha.pa.us> writes: > > Wonder how our create index does it. Seems it is must be similar. > > CREATE INDEX (for btrees) is very similar, and really ought to share > code. Someone apparently didn't want to figure out how to generalize > psort.c to handle index tuples, though. The CREATE INDEX code is a > little better for large amounts of data but a lot worse for small > amounts --- AFAICT it doesn't take any shortcuts, even when everything > fits in memory. (I'm sure you've noticed that CREATE INDEX hits the > disk pretty hard even when the source table is empty :-(.) > > I'm planning to merge the two sets of code and keep the best features of > both. I already have some first-draft code that allows them to work > with >2gig data sets. Great. -- Bruce Momjian | http://www.op.net/~candle maillist@candle.pha.pa.us | (610) 853-3000+ If your life is a hard drive, | 830 Blythe Avenue + Christ can be your backup. | Drexel Hill, Pennsylvania19026
Tom, I tried your suggestion for -S 32 through -S 1 running a select distinct against my history table. I normally run with -S 2048 on the startup of the master process. FYI: the history table uses the same layout as the currnt table, its just an hourly summation of the minute by minute data in currnt. history has 189,724 records. Bottom line: The select never failed. Thanks -- Steven Wheeler Mid-Range UNIX Engineering Sabre Inc. (918) 292-4119 Tom Lane wrote: > "Steven M. Wheeler" <swheeler@sabre.com> writes: > > I am now running into another problem. I need to do select distinct from the > > currnt to get dates for maintenance activity (rows to summarize and move to > > history). When ever I do I get this message in my server.log file: > > > NOTICE: BufFileRead: should have flushed after writing > > Hmph --- just looking at the code that issues that message, it looks > like it is a harmless gripe (it complains, but then goes and does the > missing step). However, it's suggestive that there may be a logic error > in the sorting code that's calling the BufFileXXX routines. > > > And I get 0 records selected. > > And that's even more suggestive ;-) > > This is going to be tough to debug if it takes a multi-gig database to > replicate. But if I'm guessing right about where the problem is, it > probably can be replicated on smaller tables if you use a smaller -S > (sort memory limit) setting when starting the backend. You can set -S > as small as 32 (kb), and having done that the bug might show up on tables > with a few hundred K of data. Would you try that and let me know? > > The easiest way to experiment is to set -S on a per-backend basis > via the PGOPTIONS environment variable, eg > setenv PGOPTIONS "-S 32" > psql ... > The backend started for this psql session will set S=32 instead of > whatever the prevailing default is. > > regards, tom lane
"Steven M. Wheeler" <swheeler@sabre.com> writes: > I tried your suggestion for -S 32 through -S 1 running a select > distinct against my history table. I normally run with -S 2048 on the > startup of the master process. FYI: the history table uses the same > layout as the currnt table, its just an hourly summation of the minute > by minute data in currnt. history has 189,724 records. > Bottom line: The select never failed. My current theory is that you were running out of disk space or else running up against a 4-gig-per-temp-file limit in the sort that's done for SELECT DISTINCT. So -S wouldn't really affect matters, and testing on a smaller table definitely won't prove anything one way or the other. BTW, I believe -S 32 is the smallest value the backend will actually accept --- it just ignores any attempt to set a smaller value. regards, tom lane
Space taken up by entire DB: 6.7GB Space available on filesystem: 9.2GB Space taken by currnt table: 3.7GB Space taken by currnt index: 2.5GB Fluctuation of available space during a select distinct on the currnt table: max: 9.2GB, min: 4.1GB -- Steven Wheeler Mid-Range UNIX Engineering Sabre Inc. (918) 292-4119 Tom Lane wrote: > "Steven M. Wheeler" <swheeler@sabre.com> writes: > > I am now running into another problem. I need to do select distinct from the > > currnt to get dates for maintenance activity (rows to summarize and move to > > history). When ever I do I get this message in my server.log file: > > > NOTICE: BufFileRead: should have flushed after writing > > > And I get 0 records selected. > > After some code browsing I have come up with a theory: is it possible > that you are running out of disk space during the sort? I see psort.c > neglects to check for write failure :-(, although I am not entirely > clear on why that particular message and nothing else would come out > after it bulled ahead with an incomplete temporary file. > > Sorting a large table seems to require temp disk space equal to several > times the size of the table. Since you said your table is 40M rows, > I can easily believe you don't have that much free space... > > I don't believe anyone has taken a hard look at psort.c in a long time. > I will try to dive into it and see if its resource consumption can't > be reduced. Mergesort inherently requires 2X the original data space, > but it seems that psort is using even more than that, and perhaps this > can be improved. > > regards, tom lane
"Steven M. Wheeler" <swheeler@sabre.com> writes: > Space taken up by entire DB: 6.7GB > Space available on filesystem: 9.2GB > Space taken by currnt table: 3.7GB > Space taken by currnt index: 2.5GB > Fluctuation of available space during a select distinct on the currnt table: > max: 9.2GB, min: 4.1GB OK, so you're not running out of space, but it could easy be that you are hitting the limit on temp file size --- which is likely either 2 or 4Gb depending on whether your OS uses unsigned arithmetic for file offsets. Can you watch the pg_sorttempNNNN.NN files in the database directory and see how large they get during the sort? There should be seven of 'em generated by the sorting process, and at least one needs to grow to reach the same size as the table (more or less). BTW, I have some code which solves this problem by splitting large temporary sort files into segments, just as we do for the tables themselves ... but it's not even well-tested enough to commit to current yet, let alone backpatch into 6.5.* ;-) regards, tom lane