Thread: BUG #2168: 45.000.000 records too much?
The following bug has been logged online: Bug reference: 2168 Logged by: Steven Mooij Email address: steven@mooij.name PostgreSQL version: 8.1 Operating system: Linux Description: 45.000.000 records too much? Details: Tried to copy the result of a join into another table, about 45.000.000 (small) records. This is the result: testsearch=> insert into t_documentword2 (SELECT document_id, t_word2.id, frequency from t_documentword, t_word2 where t_documentword.word = t_word2.word); server closed the connection unexpectedly This probably means the server terminated abnormally before or while processing the request. The connection to the server was lost. Attempting reset: Failed. I got the same result in version 7.5.15.
""Steven Mooij"" <steven@mooij.name> wrote > > > Tried to copy the result of a join into another table, about 45.000.000 > (small) records. This is the result: > > testsearch=> insert into t_documentword2 (SELECT document_id, t_word2.id, > frequency from t_documentword, t_word2 where t_documentword.word = > t_word2.word); > server closed the connection unexpectedly > This probably means the server terminated abnormally > before or while processing the request. > The connection to the server was lost. Attempting reset: Failed. > I guess you are Rudolph's friend ;-) Please follow the post "Error when inserting millions of record" in pg-general. Regards, Qingqing
"Steven Mooij" <steven@mooij.name> writes: > testsearch=> insert into t_documentword2 (SELECT document_id, t_word2.id, > frequency from t_documentword, t_word2 where t_documentword.word = > t_word2.word); > server closed the connection unexpectedly There's not enough information here to guess what the problem is. (But it's not table size; people routinely manipulate tables much bigger than that in Postgres.) A stack trace from the core dump would be really helpful, or even better a self-contained test case that other people could replicate the failure with. See the bug reporting guidelines at http://www.postgresql.org/docs/8.1/static/bug-reporting.html > I got the same result in version 7.5.15. There is no version of Postgres named 7.5.anything. regards, tom lane
Some additional info. We repeated the test on different hardware with unfortunately the same result. This is the output of the postgresql logfile of that experiment: LOG: background writer process (PID 8208) was terminated by signal 9 LOG: terminating any other active server processes LOG: statistics collector process (PID 8209) was terminated by signal 9 LOG: all server processes terminated; reinitializing LOG: database system was interrupted at 2006-01-18 11:25:34 CET LOG: checkpoint record is at 3/AE5CF484 LOG: redo record is at 3/AE5CB388; undo record is at 0/0; shutdown FALSE LOG: next transaction ID: 10061; next OID: 25648 LOG: database system was not properly shut down; automatic recovery in progress LOG: redo starts at 3/AE5CB388 LOG: record with zero length at 3/AE5E7314 LOG: redo done at 3/AE5E72D0 LOG: database system is ready This time we tested with postgresql 8.0.3 on a 2.6.12 linux kernel. (Previously i tested with postgresql 7.4.9 on a 2.6.15 kernel.) I think you will be able to repeat the experiment with this information. Regards, Steven
Tom Lane wrote: > "Steven Mooij" <steven@mooij.name> writes: > > >> testsearch=> insert into t_documentword2 (SELECT document_id, >> t_word2.id, >> frequency from t_documentword, t_word2 where t_documentword.word = >> t_word2.word); >> server closed the connection unexpectedly >> > > > There's not enough information here to guess what the problem is. > (But it's not table size; people routinely manipulate tables much > bigger than that in Postgres.) A stack trace from the core dump > would be really helpful, or even better a self-contained test case > that other people could replicate the failure with. See the bug > reporting guidelines at > http://www.postgresql.org/docs/8.1/static/bug-reporting.html > > > I created a self-contained test case, here's a script that produces a similar error: CREATE TABLE t_test1 ( x bigint, y bigint, PRIMARY KEY(x) ) WITHOUT OIDS; CREATE TABLE t_test2 ( x bigint, z bigint, PRIMARY KEY (x), FOREIGN KEY (x) REFERENCES t_test1(x) ) WITHOUT OIDS; INSERT INTO t_test1 VALUES (1, 2); INSERT INTO t_test1 (SELECT x + (SELECT count(*) FROM t_test1), y + (SELECT count(*) FROM t_test1) FROM t_test1); INSERT INTO t_test1 (SELECT x + (SELECT count(*) FROM t_test1), y + (SELECT count(*) FROM t_test1) FROM t_test1); INSERT INTO t_test1 (SELECT x + (SELECT count(*) FROM t_test1), y + (SELECT count(*) FROM t_test1) FROM t_test1); INSERT INTO t_test1 (SELECT x + (SELECT count(*) FROM t_test1), y + (SELECT count(*) FROM t_test1) FROM t_test1); INSERT INTO t_test1 (SELECT x + (SELECT count(*) FROM t_test1), y + (SELECT count(*) FROM t_test1) FROM t_test1); INSERT INTO t_test1 (SELECT x + (SELECT count(*) FROM t_test1), y + (SELECT count(*) FROM t_test1) FROM t_test1); INSERT INTO t_test1 (SELECT x + (SELECT count(*) FROM t_test1), y + (SELECT count(*) FROM t_test1) FROM t_test1); INSERT INTO t_test1 (SELECT x + (SELECT count(*) FROM t_test1), y + (SELECT count(*) FROM t_test1) FROM t_test1); INSERT INTO t_test1 (SELECT x + (SELECT count(*) FROM t_test1), y + (SELECT count(*) FROM t_test1) FROM t_test1); INSERT INTO t_test1 (SELECT x + (SELECT count(*) FROM t_test1), y + (SELECT count(*) FROM t_test1) FROM t_test1); INSERT INTO t_test1 (SELECT x + (SELECT count(*) FROM t_test1), y + (SELECT count(*) FROM t_test1) FROM t_test1); INSERT INTO t_test1 (SELECT x + (SELECT count(*) FROM t_test1), y + (SELECT count(*) FROM t_test1) FROM t_test1); INSERT INTO t_test1 (SELECT x + (SELECT count(*) FROM t_test1), y + (SELECT count(*) FROM t_test1) FROM t_test1); INSERT INTO t_test1 (SELECT x + (SELECT count(*) FROM t_test1), y + (SELECT count(*) FROM t_test1) FROM t_test1); INSERT INTO t_test1 (SELECT x + (SELECT count(*) FROM t_test1), y + (SELECT count(*) FROM t_test1) FROM t_test1); INSERT INTO t_test1 (SELECT x + (SELECT count(*) FROM t_test1), y + (SELECT count(*) FROM t_test1) FROM t_test1); INSERT INTO t_test1 (SELECT x + (SELECT count(*) FROM t_test1), y + (SELECT count(*) FROM t_test1) FROM t_test1); INSERT INTO t_test1 (SELECT x + (SELECT count(*) FROM t_test1), y + (SELECT count(*) FROM t_test1) FROM t_test1); INSERT INTO t_test1 (SELECT x + (SELECT count(*) FROM t_test1), y + (SELECT count(*) FROM t_test1) FROM t_test1); INSERT INTO t_test1 (SELECT x + (SELECT count(*) FROM t_test1), y + (SELECT count(*) FROM t_test1) FROM t_test1); INSERT INTO t_test1 (SELECT x + (SELECT count(*) FROM t_test1), y + (SELECT count(*) FROM t_test1) FROM t_test1); INSERT INTO t_test1 (SELECT x + (SELECT count(*) FROM t_test1), y + (SELECT count(*) FROM t_test1) FROM t_test1); INSERT INTO t_test1 (SELECT x + (SELECT count(*) FROM t_test1), y + (SELECT count(*) FROM t_test1) FROM t_test1); INSERT INTO t_test1 (SELECT x + (SELECT count(*) FROM t_test1), y + (SELECT count(*) FROM t_test1) FROM t_test1); INSERT INTO t_test1 (SELECT x + (SELECT count(*) FROM t_test1), y + (SELECT count(*) FROM t_test1) FROM t_test1); INSERT INTO t_test1 (SELECT x + (SELECT count(*) FROM t_test1), y + (SELECT count(*) FROM t_test1) FROM t_test1); INSERT INTO t_test2 (SELECT x, x * 2 FROM t_test1); The first block of inserts completes flawlessly and fills t_test1 with over 64.000.000 records. It's the last statement copying (no join involved this time) from t_test1 to t_test2 that results in: server closed the connection unexpectedly This probably means the server terminated abnormally before or while processing the request. The connection to the server was lost. Attempting reset: Failed. I am not 100% sure, but i think this is the relevant part of the logfile that goes with it: LOG: database system was interrupted at 2006-01-18 00:07:48 CET LOG: checkpoint record is at B/A2857768 LOG: redo record is at B/A28535B0; undo record is at 0/0; shutdown FALSE LOG: next transaction ID: 2724; next OID: 75506450 LOG: database system was not properly shut down; automatic recovery in progressLOG: incomplete startup packet LOG: redo starts at B/A28535B0 LOG: record with zero length at B/A28577EC LOG: redo done at B/A28577A8 FATAL: the database system is starting up LOG: database system is ready >> I got the same result in version 7.5.15. >> > > > There is no version of Postgres named 7.5.anything. > > > I wrongfully looked at the version of the Debian package. When I start psql it says "Welcome to psql 7.4.9, the PostgreSQL interactive terminal." Hopefully this is enough information for you to look into it, but if you do need anything else please let me know...
Steven Mooij wrote: > Some additional info. We repeated the test on different hardware with > unfortunately the same result. This is the output of the postgresql > logfile of that experiment: [log messages about processes kill 9ed] > This time we tested with postgresql 8.0.3 on a 2.6.12 linux kernel. > (Previously i tested with postgresql 7.4.9 on a 2.6.15 kernel.) Does dmesg or syslog mention the OOM killer? I tried running your test case while watching memory usage and stopped the postmaster before it finished as I didn't want to risk the kernel killing it (or some random other process) the hard way. -- Seneca Cunningham scunning@ca.afilias.info
Seneca Cunningham wrote: >Steven Mooij wrote: > > >>Some additional info. We repeated the test on different hardware with >>unfortunately the same result. This is the output of the postgresql >>logfile of that experiment: >> >> > >[log messages about processes kill 9ed] > > > >>This time we tested with postgresql 8.0.3 on a 2.6.12 linux kernel. >>(Previously i tested with postgresql 7.4.9 on a 2.6.15 kernel.) >> >> > >Does dmesg or syslog mention the OOM killer? I tried running your test >case while watching memory usage and stopped the postmaster before it >finished as I didn't want to risk the kernel killing it (or some random >other process) the hard way. > > > Yes, the oom-killer kills the postmaster. But that's besides the problem. In my opinion the bug is that PostgreSQL has an inefficient implementation of this operation, that takes way too much memory. (Of course, I say this with a lot of respect for all the good PostgreSQL-code that has been written.) Jan 18 11:29:40 laptop2 kernel: [4304395.479000] oom-killer: gfp_mask=0x1d2 Jan 18 11:29:41 laptop2 kernel: [4304395.479000] DMA per-cpu: Jan 18 11:29:41 laptop2 kernel: [4304395.479000] cpu 0 hot: low 2, high 6, batch 1 Jan 18 11:29:41 laptop2 kernel: [4304395.479000] cpu 0 cold: low 0, high 2, batch 1 Jan 18 11:29:41 laptop2 kernel: [4304395.479000] Normal per-cpu: Jan 18 11:29:41 laptop2 kernel: [4304395.479000] cpu 0 hot: low 62, high 186, batch 31 Jan 18 11:29:41 laptop2 kernel: [4304395.479000] cpu 0 cold: low 0, high 62, batch 31 Jan 18 11:29:41 laptop2 kernel: [4304395.479000] HighMem per-cpu: empty Jan 18 11:29:41 laptop2 kernel: [4304395.479000] Jan 18 11:29:41 laptop2 kernel: [4304395.479000] Free pages: 4964kB (0kB HighMem) Jan 18 11:29:41 laptop2 kernel: [4304395.479000] Active:52023 inactive:52455 dirty:0 writeback:0 unstable:0 free:1241 slab:3563mapped:102467 pagetables:751 Jan 18 11:29:41 laptop2 kernel: [4304395.479000] DMA free:2032kB min:88kB low:108kB high:132kB active:5492kB inactive:5512kBpresent:16384kB pages_scanned:16847 all_unreclaimable? yes Jan 18 11:29:41 laptop2 kernel: [4304395.479000] lowmem_reserve[]: 0 486 486 Jan 18 11:29:41 laptop2 kernel: [4304395.479000] Normal free:2932kB min:2776kB low:3468kB high:4164kB active:202600kB inactive:204308kBpresent:498560kB pages_scanned:379686 all_unreclaimable? no Jan 18 11:29:41 laptop2 kernel: [4304395.479000] lowmem_reserve[]: 0 0 0 Jan 18 11:29:41 laptop2 kernel: [4304395.479000] HighMem free:0kB min:128kB low:160kB high:192kB active:0kB inactive:0kBpresent:0kB pages_scanned:0 all_unreclaimable? no Jan 18 11:29:41 laptop2 kernel: [4304395.479000] lowmem_reserve[]: 0 0 0 Jan 18 11:29:41 laptop2 kernel: [4304395.479000] DMA: 0*4kB 0*8kB 1*16kB 1*32kB 1*64kB 1*128kB 1*256kB 1*512kB 1*1024kB 0*2048kB0*4096kB = 2032kB Jan 18 11:29:41 laptop2 kernel: [4304395.479000] Normal: 63*4kB 1*8kB 7*16kB 6*32kB 3*64kB 1*128kB 0*256kB 0*512kB 0*1024kB1*2048kB 0*4096kB = 2932kB Jan 18 11:29:41 laptop2 kernel: [4304395.479000] HighMem: empty Jan 18 11:29:41 laptop2 kernel: [4304395.479000] Swap cache: add 267257, delete 267257, find 1913/4312, race 0+3 Jan 18 11:29:41 laptop2 kernel: [4304395.479000] Free swap = 0kB Jan 18 11:29:41 laptop2 kernel: [4304395.479000] Total swap = 979956kB Jan 18 11:29:41 laptop2 kernel: [4304395.479000] Out of Memory: Killed process 8208 (postmaster). Jan 18 11:29:41 laptop2 kernel: [4304399.118000] oom-killer: gfp_mask=0x1d2 Jan 18 11:29:41 laptop2 kernel: [4304399.118000] DMA per-cpu: Jan 18 11:29:41 laptop2 kernel: [4304399.118000] cpu 0 hot: low 2, high 6, batch 1 Jan 18 11:29:41 laptop2 kernel: [4304399.118000] cpu 0 cold: low 0, high 2, batch 1 Jan 18 11:29:41 laptop2 kernel: [4304399.118000] Normal per-cpu: Jan 18 11:29:41 laptop2 kernel: [4304399.118000] cpu 0 hot: low 62, high 186, batch 31 Jan 18 11:29:41 laptop2 kernel: [4304399.118000] cpu 0 cold: low 0, high 62, batch 31 Jan 18 11:29:41 laptop2 kernel: [4304399.118000] HighMem per-cpu: empty Jan 18 11:29:41 laptop2 kernel: [4304399.118000] Jan 18 11:29:41 laptop2 kernel: [4304399.118000] Free pages: 4716kB (0kB HighMem) Jan 18 11:29:41 laptop2 kernel: [4304399.118000] Active:51957 inactive:52556 dirty:0 writeback:0 unstable:0 free:1179 slab:3547mapped:102505 pagetables:745 Jan 18 11:29:41 laptop2 kernel: [4304399.118000] DMA free:2032kB min:88kB low:108kB high:132kB active:5496kB inactive:5528kBpresent:16384kB pages_scanned:18629 all_unreclaimable? yes Jan 18 11:29:41 laptop2 kernel: [4304399.118000] lowmem_reserve[]: 0 486 486 Jan 18 11:29:41 laptop2 kernel: [4304399.118000] Normal free:2684kB min:2776kB low:3468kB high:4164kB active:202332kB inactive:204696kBpresent:498560kB pages_scanned:463523 all_unreclaimable? no Jan 18 11:29:41 laptop2 kernel: [4304399.118000] lowmem_reserve[]: 0 0 0 Jan 18 11:29:41 laptop2 kernel: [4304399.118000] HighMem free:0kB min:128kB low:160kB high:192kB active:0kB inactive:0kBpresent:0kB pages_scanned:0 all_unreclaimable? no Jan 18 11:29:41 laptop2 kernel: [4304399.118000] lowmem_reserve[]: 0 0 0 Jan 18 11:29:41 laptop2 kernel: [4304399.118000] DMA: 0*4kB 0*8kB 1*16kB 1*32kB 1*64kB 1*128kB 1*256kB 1*512kB 1*1024kB 0*2048kB0*4096kB = 2032kB Jan 18 11:29:41 laptop2 kernel: [4304399.118000] Normal: 1*4kB 1*8kB 7*16kB 6*32kB 3*64kB 1*128kB 0*256kB 0*512kB 0*1024kB1*2048kB 0*4096kB = 2684kB Jan 18 11:29:41 laptop2 kernel: [4304399.118000] HighMem: empty Jan 18 11:29:41 laptop2 kernel: [4304399.118000] Swap cache: add 267302, delete 267302, find 1913/4316, race 0+3 Jan 18 11:29:41 laptop2 kernel: [4304399.118000] Free swap = 0kB Jan 18 11:29:41 laptop2 kernel: [4304399.118000] Total swap = 979956kB Jan 18 11:29:41 laptop2 kernel: [4304399.118000] Out of Memory: Killed process 8209 (postmaster). Jan 18 11:29:41 laptop2 kernel: [4304399.141000] oom-killer: gfp_mask=0x1d2 Jan 18 11:29:41 laptop2 kernel: [4304399.141000] DMA per-cpu: Jan 18 11:29:41 laptop2 kernel: [4304399.141000] cpu 0 hot: low 2, high 6, batch 1 Jan 18 11:29:41 laptop2 kernel: [4304399.141000] cpu 0 cold: low 0, high 2, batch 1 Jan 18 11:29:41 laptop2 kernel: [4304399.141000] Normal per-cpu: Jan 18 11:29:41 laptop2 kernel: [4304399.141000] cpu 0 hot: low 62, high 186, batch 31 Jan 18 11:29:41 laptop2 kernel: [4304399.141000] cpu 0 cold: low 0, high 62, batch 31 Jan 18 11:29:41 laptop2 kernel: [4304399.141000] HighMem per-cpu: empty Jan 18 11:29:41 laptop2 kernel: [4304399.141000] Jan 18 11:29:41 laptop2 kernel: [4304399.141000] Free pages: 4716kB (0kB HighMem) Jan 18 11:29:41 laptop2 kernel: [4304399.141000] Active:51990 inactive:52547 dirty:0 writeback:26 unstable:0 free:1179 slab:3547mapped:102473 pagetables:741 Jan 18 11:29:41 laptop2 kernel: [4304399.141000] DMA free:2032kB min:88kB low:108kB high:132kB active:5496kB inactive:5528kBpresent:16384kB pages_scanned:18629 all_unreclaimable? yes Jan 18 11:29:41 laptop2 kernel: [4304399.141000] lowmem_reserve[]: 0 486 486 Jan 18 11:29:41 laptop2 kernel: [4304399.141000] Normal free:2684kB min:2776kB low:3468kB high:4164kB active:202464kB inactive:204660kBpresent:498560kB pages_scanned:491576 all_unreclaimable? yes Jan 18 11:29:41 laptop2 kernel: [4304399.141000] lowmem_reserve[]: 0 0 0 Jan 18 11:29:41 laptop2 kernel: [4304399.141000] HighMem free:0kB min:128kB low:160kB high:192kB active:0kB inactive:0kBpresent:0kB pages_scanned:0 all_unreclaimable? no Jan 18 11:29:41 laptop2 kernel: [4304399.141000] lowmem_reserve[]: 0 0 0 Jan 18 11:29:41 laptop2 kernel: [4304399.141000] DMA: 0*4kB 0*8kB 1*16kB 1*32kB 1*64kB 1*128kB 1*256kB 1*512kB 1*1024kB 0*2048kB0*4096kB = 2032kB Jan 18 11:29:41 laptop2 kernel: [4304399.141000] Normal: 1*4kB 1*8kB 7*16kB 6*32kB 3*64kB 1*128kB 0*256kB 0*512kB 0*1024kB1*2048kB 0*4096kB = 2684kB Jan 18 11:29:41 laptop2 kernel: [4304399.141000] HighMem: empty Jan 18 11:29:41 laptop2 kernel: [4304399.141000] Swap cache: add 267329, delete 267303, find 1913/4317, race 0+3 Jan 18 11:29:41 laptop2 kernel: [4304399.141000] Free swap = 0kB Jan 18 11:29:41 laptop2 kernel: [4304399.141000] Total swap = 979956kB Jan 18 11:29:41 laptop2 kernel: [4304399.141000] Out of Memory: Killed process 9294 (postmaster). -- Steven Mooij Koekamp 83 2623XP Delft e steven@mooij.name t 015 256 8304