Re: [HACKERS] memory problems in copying large table to STDOUT - Mailing list pgsql-hackers
From | Martin Weinberg |
---|---|
Subject | Re: [HACKERS] memory problems in copying large table to STDOUT |
Date | |
Msg-id | 199910101438.KAA15476@osprey.astro.umass.edu Whole thread Raw |
In response to | Re: [HACKERS] memory problems in copying large table to STDOUT (Tom Lane <tgl@sss.pgh.pa.us>) |
Responses |
Re: [HACKERS] memory problems in copying large table to STDOUT
(Tom Lane <tgl@sss.pgh.pa.us>)
|
List | pgsql-hackers |
Tom, I am attaching the backtrace. This one simultaneously generated this kernel message from the md driver: raid0_map bug: hash->zone0==NULL for block 1132810879 Bad md_map in ll_rw_block Definitely a problem but no longer sure if it's the same one . . . sigh. I inserted a pause() in the beginning of elog.c so that I could attach remotely; I'm keeping the sleeping processing around in case there's anything else you would like me to check. Guess, I'm (foolishly?) pushing the envelope here with a 100GB database on software raid. Thanks!!! --Martin P.S. After the fact, I realized that my source is Oliver Elphick's Debian 6.5.1 source package rather than a pure vanilla source. Hope this is not a problem . . . ---------------------------------------------------------------------- (gdb) bt #0 0x4012eb77 in pause () #1 0x81160e9 in elog (lev=-1, fmt=0x8146a4b "cannot read block %d of %s") at elog.c:81 #2 0x80e76ef in smgrread (which=0, reln=0x822af60, blocknum=2638753, buffer=0x44a85040 "h") at smgr.c:235 #3 0x80dd7a2 in ReadBufferWithBufferLock (reln=0x822af60, blockNum=2638753, bufferLockHeld=0) at bufmgr.c:302 #4 0x80dd682 in ReadBuffer (reln=0x822af60, blockNum=2638753) at bufmgr.c:180 #5 0x80ddf1d in ReleaseAndReadBuffer (buffer=9175, relation=0x822af60, blockNum=2638753) at bufmgr.c:954 #6 0x806ad13 in heapgettup (relation=0x822af60, tuple=0x8235374, dir=1, buffer=0x8235398, snapshot=0x8232af0, nkeys=0,key=0x0) at heapam.c:469 #7 0x806b6bf in heap_getnext (scandesc=0x8235360, backw=0) at heapam.c:912 #8 0x8084eb3 in CopyTo (rel=0x822af60, binary=0 '\000', oids=0 '\000', fp=0x0, delim=0x813c829 "\t") at copy.c:405 #9 0x8084ce4 in DoCopy (relname=0x82350c0 "psc", binary=0 '\000', oids=0 '\000', from=0 '\000', pipe=1 '\001', filename=0x0, delim=0x813c829 "\t") at copy.c:323 #10 0x80ea8c6 in ProcessUtility (parsetree=0x82350d8, dest=Remote) at utility.c:227 #11 0x80e8a36 in pg_exec_query_dest ( query_string=0xbfffaef4 "copy psc to STDOUT;", dest=Remote, aclOverride=0) at postgres.c:727 #12 0x80e8944 in pg_exec_query (query_string=0xbfffaef4 "copy psc to STDOUT;") at postgres.c:656 #13 0x80e9b88 in PostgresMain (argc=11, argv=0xbffff46c, real_argc=12, real_argv=0xbffff984) at postgres.c:1647 #14 0x80d1adc in DoBackend (port=0x81ef748) at postmaster.c:1628 #15 0x80d1613 in BackendStartup (port=0x81ef748) at postmaster.c:1373 #16 0x80d0ca6 in ServerLoop () at postmaster.c:823 #17 0x80d080c in PostmasterMain (argc=12, argv=0xbffff984) at postmaster.c:616 #18 0x80a4597 in main (argc=12, argv=0xbffff984) at main.c:93 Tom Lane wrote on Sat, 09 Oct 1999 14:42:56 EDT >Martin Weinberg <weinberg@osprey.astro.umass.edu> writes: >> Tom Lane wrote on Sat, 09 Oct 1999 11:54:34 EDT >>>> FATAL 1: Memory exhausted in AllocSetAlloc() >>> >>> Hmm. What is the exact declaration of the table? >>> >>> The only explanation I can think of offhand is that the output >>> conversion function for one of the column types is leaking memory... >>> copy.c itself looks to be pretty careful not to. > >> The table def is: > >> CREATE TABLE psc ( >> hemis text, >> date date, >> scan int2, >> id int4, >> ra float4, >> [ lots more fields of these same types ] > >Hmm, nothing unusual there. I made up a dummy table containing these >column types, filled it with 16 meg of junk data, and copied in and >out without observing any process memory usage growth at all, under >both current sources and 6.5.2. I also used gdb to set a breakpoint >at AllocSetAlloc, and checked that the inner loop of the copy wasn't >allocating anything it didn't free. So there's no obvious memory >leakage bug here. (It'd be pretty surprising if there was, really, >for such commonly used data types.) > >I'm now thinking that there must be either a problem specific to your >platform, or some heretofore unnoticed problem with copying from a >multi-segment (ie, multi-gigabyte) table. I don't have enough disk >space to check the latter theory here... > >Can you prepare a debugger backtrace showing what the backend is doing >when it gets the error? If you're not familiar with gdb, it'd go >something like this: > >1. Build & install postgres with debugging symbols enabled > ("make CUSTOM_COPT=-g all"). >2. Start gdb on the postgres executable, eg > "gdb /usr/local/pgsql/bin/postgres". >3. Fire up the copy-out operation as usual. (I assume this takes long > enough that you have plenty of time for the next two steps ;-)) >4. Use ps or top to find out the process number of the backend handling > the session. >5. Attach to that process number in gdb: > (gdb) attach NNNN >6. Set a breakpoint at elog, and let the backend continue running: > (gdb) break elog > (gdb) continue >7. When the breakpoint is hit, get a backtrace: > Breakpoint 1, elog ... > (gdb) bt > After copying & pasting the resulting printout, you can "quit" to > get out of gdb. > > regards, tom lane > >************ >
pgsql-hackers by date: