Thread: BUG #2225: Backend crash -- BIG table
The following bug has been logged online: Bug reference: 2225 Logged by: Patrick Rotsaert Email address: patrick.rotsaert@arrowup.be PostgreSQL version: 8.1.2 Operating system: Linux Description: Backend crash -- BIG table Details: Situation: --------- Database with 1 table: CREATE TABLE pptran ( cchk int4, trid char(14), csnr char(13), amount int4, date date, "time" time, lane int4, "type" int4, data char(24), stat int4, skip int4, retry int4, points_bc int4, points_chip int4, trid_tid int2, trid_seq int2 ); This table contains approx. 36 million records! CREATE INDEX pptran_trid ON pptran USING btree (trid); Problem: ------- Executing a select query causes the backend to crash. This is the output from the psql frontend: pointspp=# select trid, count(*) from pptran group by trid having count(*) > 1; server closed the connection unexpectedly This probably means the server terminated abnormally before or while processing the request. The connection to the server was lost. Attempting reset: Failed. !> This error appears after several minutes. During execution, 'top' shows about 4% CPU usage and 98% memory usage of the postmaster process. At the time of the crash, the server logs: LOG: server process (PID 21815) was terminated by signal 9 LOG: terminating any other active server processes FATAL: the database system is in recovery mode LOG: all server processes terminated; reinitializing LOG: database system was interrupted at 2006-01-30 15:04:31 CET LOG: checkpoint record is at 3/275944AC LOG: redo record is at 3/275944AC; undo record is at 0/0; shutdown TRUE LOG: next transaction ID: 5415; next OID: 16444 LOG: next MultiXactId: 1; next MultiXactOffset: 0 LOG: database system was not properly shut down; automatic recovery in progress LOG: record with zero length at 3/275944F0 LOG: redo is not required LOG: database system is ready LOG: transaction ID wrap limit is 2147484146, limited by database "postgres" Platform info: ------------- - PostgreSQL version 8.1.2 - Linux Slackware 9.1.0, Kernel 2.4.22 - /proc/cpuinfo processor : 0 vendor_id : GenuineIntel cpu family : 15 model : 2 model name : Intel(R) Pentium(R) 4 CPU 2.20GHz stepping : 4 cpu MHz : 2192.973 cache size : 512 KB fdiv_bug : no hlt_bug : no f00f_bug : no coma_bug : no fpu : yes fpu_exception : yes cpuid level : 2 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm bogomips : 4377.80 - /proc/meminfo total: used: free: shared: buffers: cached: Mem: 926220288 33148928 893071360 0 1163264 17268736 Swap: 2048376832 18296832 2030080000 MemTotal: 904512 kB MemFree: 872140 kB MemShared: 0 kB Buffers: 1136 kB Cached: 15288 kB SwapCached: 1576 kB Active: 5824 kB Inactive: 15820 kB HighTotal: 0 kB HighFree: 0 kB LowTotal: 904512 kB LowFree: 872140 kB SwapTotal: 2000368 kB SwapFree: 1982500 kB
"Patrick Rotsaert" <patrick.rotsaert@arrowup.be> writes: > At the time of the crash, the server logs: > LOG: server process (PID 21815) was terminated by signal 9 You're running on a Linux machine with memory overcommit enabled. Turn that off, or nothing will ever work very reliably --- the OOM killer is entirely capable of zapping innocent processes that have nothing to do with the one eating too much memory; and even when it kills the right process, "kill -9" is not IMHO an acceptable way for the system to tell a process it can't have any more memory. See http://www.postgresql.org/docs/8.1/static/kernel-resources.html#AEN18105 regards, tom lane
on 31/01/2006 16:18 Tom Lane wrote : >"Patrick Rotsaert" <patrick.rotsaert@arrowup.be> writes: > > >>At the time of the crash, the server logs: >> LOG: server process (PID 21815) was terminated by signal 9 >> >> > >You're running on a Linux machine with memory overcommit enabled. >Turn that off, or nothing will ever work very reliably --- the OOM >killer is entirely capable of zapping innocent processes that have >nothing to do with the one eating too much memory; and even when it >kills the right process, "kill -9" is not IMHO an acceptable way for >the system to tell a process it can't have any more memory. See >http://www.postgresql.org/docs/8.1/static/kernel-resources.html#AEN18105 > > regards, tom lane > > Syslog indeed shows: kernel: Out of Memory: Killed process 21815 (postmaster). Looking at the kernel source mm/mmap.c, the function `int vm_enough_memory(long pages)' does 1 simple test: /* Sometimes we want to use more memory than we have. */ if (sysctl_overcommit_memory) return 1; But /proc/sys/vm/overcommit_memory reads `0', so my guess is that overcommit is not enabled... right? Any hints? Thanks, Patrick Rotsaert
Patrick Rotsaert <patrick.rotsaert@arrowup.be> writes: > on 31/01/2006 16:18 Tom Lane wrote : >> http://www.postgresql.org/docs/8.1/static/kernel-resources.html#AEN18105 > But /proc/sys/vm/overcommit_memory reads `0', so my guess is that > overcommit is not enabled... right? Please read the reference I pointed you to. regards, tom lane
Tom Lane wrote: >Patrick Rotsaert <patrick.rotsaert@arrowup.be> writes: > > >>on 31/01/2006 16:18 Tom Lane wrote : >> >> >>>http://www.postgresql.org/docs/8.1/static/kernel-resources.html#AEN18105 >>> >>> > > > >>But /proc/sys/vm/overcommit_memory reads `0', so my guess is that >>overcommit is not enabled... right? >> >> > >Please read the reference I pointed you to. > > regards, tom lane > > I did read it, very carefully. The proposed fix will only work in 2.6 kernels. Mine is a 2.4 and upgrading it is not an option. The document suggests to look at the kernel source for 2.4 kernels. I did that, as I wrote in the previous mail. Setting the overcommit parameter to '2', or any value for that matter, won't do any good because in this kernel, it is only tested if it is non-zero. On my system, the parameter is 0, so overcommit is *not* enabled. I don't know what else I can do. The other proposed option is to install more memory. Sorry, not an option, 1GB has to be sufficient. Apart from the overcommit subject, why is postgres consuming so much memory? Should the solution of this problem not be searched for here? Thanks, Patrick Rotsaert
Patrick Rotsaert wrote: > On my > system, the parameter is 0, so overcommit is *not* enabled. 0 means overcommit is enabled. You want to set it to something other than 0 to prevent overcommitting and the consequent suprising process deaths. Exactly what other values are accepted varies, but 0 isn't the one for you. -- Peter Eisentraut http://developer.postgresql.org/~petere/
On Wed, Feb 01, 2006 at 17:57:50 +0100, Patrick Rotsaert <patrick.rotsaert@arrowup.be> wrote: > I did read it, very carefully. The proposed fix will only work in 2.6 > kernels. Mine is a 2.4 and upgrading it is not an option. The document > suggests to look at the kernel source for 2.4 kernels. I did that, as I > wrote in the previous mail. Setting the overcommit parameter to '2', or > any value for that matter, won't do any good because in this kernel, it > is only tested if it is non-zero. On my system, the parameter is 0, so > overcommit is *not* enabled. I don't know what else I can do. > The other proposed option is to install more memory. Sorry, not an > option, 1GB has to be sufficient. Is there some reason you can't add more swap space? > Apart from the overcommit subject, why is postgres consuming so much > memory? Should the solution of this problem not be searched for here? How do you know it is Postgres that is using lots of memory? The OOM killer doesn't just kill of memory hogs, so you can't just assume the processes being killed tells you which processes were using lots of memory. The memory that Postgres uses is controlled in postgresql.conf. One particular gotcha is that sortmem is per sort, so if you have a number of concurrent sorts you might be using more memory than you expected.
On Mon, 30 Jan 2006, Patrick Rotsaert wrote: > Problem: > ------- > Executing a select query causes the backend to crash. This is the output > from the psql frontend: > > pointspp=# select trid, count(*) from pptran group by trid having count(*) > > 1; > server closed the connection unexpectedly > This probably means the server terminated abnormally > before or while processing the request. > The connection to the server was lost. Attempting reset: Failed. > !> One question is what does the explain (without analyze) plan look like for the above and are the row estimates valid in the case of one of the hash plans. Failing that, how many rows should the above return?
On Fri, 3 Feb 2006, Patrick Rotsaert wrote: > > >One question is what does the explain (without analyze) plan look like for > >the above and are the row estimates valid in the case of one of the hash > >plans. > > > > > pointspp=# explain select trid, count(*) from pptran group by trid > having count(*) > 1; > QUERY PLAN > -------------------------------------------------------------------------- > HashAggregate (cost=1311899.28..1311902.78 rows=200 width=18) > Filter: (count(*) > 1) > -> Seq Scan on pptran (cost=0.00..1039731.02 rows=36289102 width=18) > (3 rows) Hmm, if you do an enable_hashagg=false and then run the query (without explain) does it work then? > >Failing that, how many rows should the above return? > > > > > That is exactly what I am trying to find out. I can only guess that, but > it should not be more than a couple of 10k rows. Okay, so it's unlikely to be a client side memory issue (if it were returning alot of rows, that could also be an issue).
Stephan Szabo <sszabo@megazone.bigpanda.com> writes: >> pointspp=# explain select trid, count(*) from pptran group by trid >> having count(*) > 1; >> QUERY PLAN >> -------------------------------------------------------------------------- >> HashAggregate (cost=1311899.28..1311902.78 rows=200 width=18) >> Filter: (count(*) > 1) >> -> Seq Scan on pptran (cost=0.00..1039731.02 rows=36289102 width=18) >> (3 rows) >>> Failing that, how many rows should the above return? >> That is exactly what I am trying to find out. I can only guess that, but >> it should not be more than a couple of 10k rows. The problem is that the HashAgg will have to maintain a counter for every distinct value of trid, not just those that occur more than once. So if there are a huge number of one-time-only values you could still blow out memory (and HashAgg doesn't currently know how to spill to disk). That "rows=200" estimate looks suspiciously like a default. Has this table been ANALYZEd recently? I'd expect the planner not to choose HashAgg if it has a more realistic estimate of the number of groups. regards, tom lane
Patrick Rotsaert <patrick.rotsaert@arrowup.be> writes: > I did a vacuum analyze, now the explain gives different results. > pointspp=# explain select trid, count(*) from pptran group by trid > having count(*) > 1; > QUERY PLAN > -------------------------------------------------------------------------------- > GroupAggregate (cost=9842885.29..10840821.57 rows=36288592 width=18) > Filter: (count(*) > 1) > -> Sort (cost=9842885.29..9933606.77 rows=36288592 width=18) > Sort Key: trid > -> Seq Scan on pptran (cost=0.00..1039725.92 rows=36288592 > width=18) > (5 rows) OK that looks more reasonable. > pointspp=# select trid, count(*) from pptran group by trid having > count(*) > 1; > ERROR: could not write block 661572 of temporary file: No space left on > device > HINT: Perhaps out of disk space? > I have 5.1GB of free disk space. If this is the cause, I have a > problem... or is there another way to extract (and remove) duplicate rows? Time to buy more disk :-( regards, tom lane
>Is there some reason you can't add more swap space? > > Yes, disk space. I have about 2 GB of swap space enabled. >How do you know it is Postgres that is using lots of memory? The OOM killer >doesn't just kill of memory hogs, so you can't just assume the processes >being killed tells you which processes were using lots of memory. > > > I keep 'top' running while I launch the query. One single postmaster climbs to the top, claiming 98% MEM and about 8% CPU. >The memory that Postgres uses is controlled in postgresql.conf. One particular >gotcha is that sortmem is per sort, so if you have a number of concurrent >sorts you might be using more memory than you expected. > > I am very sure there are no other queries running at the same time. This is a development machine of which I have full control.
>One question is what does the explain (without analyze) plan look like for >the above and are the row estimates valid in the case of one of the hash >plans. > > pointspp=# explain select trid, count(*) from pptran group by trid having count(*) > 1; QUERY PLAN -------------------------------------------------------------------------- HashAggregate (cost=1311899.28..1311902.78 rows=200 width=18) Filter: (count(*) > 1) -> Seq Scan on pptran (cost=0.00..1039731.02 rows=36289102 width=18) (3 rows) >Failing that, how many rows should the above return? > > That is exactly what I am trying to find out. I can only guess that, but it should not be more than a couple of 10k rows.
>0 means overcommit is enabled. You want to set it to something other >than 0 to prevent overcommitting and the consequent suprising process >deaths. Exactly what other values are accepted varies, but 0 isn't the >one for you. > > > I do not understand how 0 could mean overcommit is enabled. I do not know how it is in recent kernels, but the source code of the 2.4 kernel I use is this: int vm_enough_memory(long pages) { unsigned long free; /* Sometimes we want to use more memory than we have. */ if (sysctl_overcommit_memory) return 1; // ... } seems pretty straightforward to me. I also did a recursive grep through all of the kernel source and this is the only place where this parameter is used. I tried setting the parameter to 1, but it did not make any difference.
>Hmm, if you do an enable_hashagg=false and then run the query (without >explain) does it work then? > > pointspp=# set enable_hashagg = false; SET pointspp=# select trid, count(*) from pptran group by trid having count(*) > 1; ERROR: could not write block 661582 of temporary file: No space left on device HINT: Perhaps out of disk space? Still does not work, but it no longer consumes that same amount of memory
>The problem is that the HashAgg will have to maintain a counter for >every distinct value of trid, not just those that occur more than >once. So if there are a huge number of one-time-only values you could >still blow out memory (and HashAgg doesn't currently know how to spill >to disk). > > One-time-only values are in my case more probable, so it will use a lot of counters. >That "rows=200" estimate looks suspiciously like a default. Has this >table been ANALYZEd recently? I'd expect the planner not to choose >HashAgg if it has a more realistic estimate of the number of groups. > > regards, tom lane > > I did a vacuum analyze, now the explain gives different results. pointspp=# vacuum analyze; VACUUM pointspp=# explain select trid, count(*) from pptran group by trid having count(*) > 1; QUERY PLAN -------------------------------------------------------------------------------- GroupAggregate (cost=9842885.29..10840821.57 rows=36288592 width=18) Filter: (count(*) > 1) -> Sort (cost=9842885.29..9933606.77 rows=36288592 width=18) Sort Key: trid -> Seq Scan on pptran (cost=0.00..1039725.92 rows=36288592 width=18) (5 rows) pointspp=# select trid, count(*) from pptran group by trid having count(*) > 1; ERROR: could not write block 661572 of temporary file: No space left on device HINT: Perhaps out of disk space? I have 5.1GB of free disk space. If this is the cause, I have a problem... or is there another way to extract (and remove) duplicate rows?
On Fri, Feb 03, 2006 at 19:38:04 +0100, Patrick Rotsaert <patrick.rotsaert@arrowup.be> wrote: > > I have 5.1GB of free disk space. If this is the cause, I have a > problem... or is there another way to extract (and remove) duplicate rows? How about processing a subset of the ids in one pass and then may make multiple passes to check all of the ids. As long as you don't have to use too small of chunks, this might work for you.