Re: [HACKERS] memory problems in copying large table to STDOUT - Mailing list pgsql-hackers

From Martin Weinberg
Subject Re: [HACKERS] memory problems in copying large table to STDOUT
Date
Msg-id 199910101438.KAA15476@osprey.astro.umass.edu
Whole thread Raw
In response to Re: [HACKERS] memory problems in copying large table to STDOUT  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: [HACKERS] memory problems in copying large table to STDOUT  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-hackers
Tom,

I am attaching the backtrace.  This one simultaneously generated
this kernel message from the md driver:

raid0_map bug: hash->zone0==NULL for block 1132810879
Bad md_map in ll_rw_block

Definitely a problem but no longer sure if it's the same one . . .
sigh.  

I inserted a pause() in the beginning of elog.c so that I
could attach remotely; I'm keeping the sleeping processing
around in case there's anything else you would like me to
check.

Guess, I'm (foolishly?) pushing the envelope here with a 100GB 
database on software raid.

Thanks!!!

--Martin

P.S. After the fact, I realized that my source is Oliver Elphick's
Debian 6.5.1 source package rather than a pure vanilla source.  
Hope this is not a problem . . .

----------------------------------------------------------------------

(gdb) bt
#0  0x4012eb77 in pause ()
#1  0x81160e9 in elog (lev=-1, fmt=0x8146a4b "cannot read block %d of %s")   at elog.c:81
#2  0x80e76ef in smgrread (which=0, reln=0x822af60, blocknum=2638753,    buffer=0x44a85040 "h") at smgr.c:235
#3  0x80dd7a2 in ReadBufferWithBufferLock (reln=0x822af60, blockNum=2638753,    bufferLockHeld=0) at bufmgr.c:302
#4  0x80dd682 in ReadBuffer (reln=0x822af60, blockNum=2638753) at bufmgr.c:180
#5  0x80ddf1d in ReleaseAndReadBuffer (buffer=9175, relation=0x822af60,    blockNum=2638753) at bufmgr.c:954
#6  0x806ad13 in heapgettup (relation=0x822af60, tuple=0x8235374, dir=1,    buffer=0x8235398, snapshot=0x8232af0,
nkeys=0,key=0x0) at heapam.c:469
 
#7  0x806b6bf in heap_getnext (scandesc=0x8235360, backw=0) at heapam.c:912
#8  0x8084eb3 in CopyTo (rel=0x822af60, binary=0 '\000', oids=0 '\000',    fp=0x0, delim=0x813c829 "\t") at copy.c:405
#9  0x8084ce4 in DoCopy (relname=0x82350c0 "psc", binary=0 '\000',    oids=0 '\000', from=0 '\000', pipe=1 '\001',
filename=0x0,   delim=0x813c829 "\t") at copy.c:323
 
#10 0x80ea8c6 in ProcessUtility (parsetree=0x82350d8, dest=Remote)   at utility.c:227
#11 0x80e8a36 in pg_exec_query_dest (   query_string=0xbfffaef4 "copy psc to STDOUT;", dest=Remote, aclOverride=0)   at
postgres.c:727
#12 0x80e8944 in pg_exec_query (query_string=0xbfffaef4 "copy psc to STDOUT;")   at postgres.c:656
#13 0x80e9b88 in PostgresMain (argc=11, argv=0xbffff46c, real_argc=12,    real_argv=0xbffff984) at postgres.c:1647
#14 0x80d1adc in DoBackend (port=0x81ef748) at postmaster.c:1628
#15 0x80d1613 in BackendStartup (port=0x81ef748) at postmaster.c:1373
#16 0x80d0ca6 in ServerLoop () at postmaster.c:823
#17 0x80d080c in PostmasterMain (argc=12, argv=0xbffff984) at postmaster.c:616
#18 0x80a4597 in main (argc=12, argv=0xbffff984) at main.c:93


Tom Lane wrote on Sat, 09 Oct 1999 14:42:56 EDT
>Martin Weinberg <weinberg@osprey.astro.umass.edu> writes:
>> Tom Lane wrote on Sat, 09 Oct 1999 11:54:34 EDT
>>>> FATAL 1:  Memory exhausted in AllocSetAlloc()
>>> 
>>> Hmm.  What is the exact declaration of the table?
>>> 
>>> The only explanation I can think of offhand is that the output
>>> conversion function for one of the column types is leaking memory...
>>> copy.c itself looks to be pretty careful not to.
>
>> The table def is:
>
>> CREATE TABLE psc (
>>     hemis        text,
>>     date        date,
>>     scan        int2,
>>     id        int4,
>>     ra        float4,
>>       [ lots more fields of these same types ]
>
>Hmm, nothing unusual there.  I made up a dummy table containing these
>column types, filled it with 16 meg of junk data, and copied in and
>out without observing any process memory usage growth at all, under
>both current sources and 6.5.2.  I also used gdb to set a breakpoint
>at AllocSetAlloc, and checked that the inner loop of the copy wasn't
>allocating anything it didn't free.  So there's no obvious memory
>leakage bug here.  (It'd be pretty surprising if there was, really,
>for such commonly used data types.)
>
>I'm now thinking that there must be either a problem specific to your
>platform, or some heretofore unnoticed problem with copying from a
>multi-segment (ie, multi-gigabyte) table.  I don't have enough disk
>space to check the latter theory here...
>
>Can you prepare a debugger backtrace showing what the backend is doing
>when it gets the error?  If you're not familiar with gdb, it'd go
>something like this:
>
>1. Build & install postgres with debugging symbols enabled
>   ("make CUSTOM_COPT=-g all"). 
>2. Start gdb on the postgres executable, eg
>   "gdb /usr/local/pgsql/bin/postgres".
>3. Fire up the copy-out operation as usual.  (I assume this takes long
>   enough that you have plenty of time for the next two steps ;-))
>4. Use ps or top to find out the process number of the backend handling
>   the session.
>5. Attach to that process number in gdb:
>    (gdb)    attach NNNN
>6. Set a breakpoint at elog, and let the backend continue running:
>    (gdb)    break elog
>    (gdb)    continue
>7. When the breakpoint is hit, get a backtrace:
>    Breakpoint 1, elog ...
>    (gdb)    bt
>   After copying & pasting the resulting printout, you can "quit" to
>   get out of gdb.
>
>            regards, tom lane
>
>************
>


pgsql-hackers by date:

Previous
From: Jeroen van Vianen
Date:
Subject: pgxml 1.0 released
Next
From: Tom Lane
Date:
Subject: Re: [HACKERS] memory problems in copying large table to STDOUT