Re: [HACKERS] memory problems in copying large table to STDOUT - Mailing list pgsql-hackers

From Tom Lane
Subject Re: [HACKERS] memory problems in copying large table to STDOUT
Date
Msg-id 3580.939605580@sss.pgh.pa.us
Whole thread Raw
In response to Re: [HACKERS] memory problems in copying large table to STDOUT  (Martin Weinberg <weinberg@osprey.astro.umass.edu>)
Responses Re: [HACKERS] memory problems in copying large table to STDOUT  (Martin Weinberg <weinberg@osprey.astro.umass.edu>)
List pgsql-hackers
Martin Weinberg <weinberg@osprey.astro.umass.edu> writes:
> I got the backtrace with ""Memory exhausted in
> AllocSetAlloc()" this time.

> #4  0x8119732 in MemoryContextAlloc (context=0x81efbd8, size=875628846)
>     at mcxt.c:230
> #5  0x810ebf1 in textout (vlena=0x4106182c) at varlena.c:190
> #6  0x808508c in CopyTo (rel=0x822af08, binary=0 '\000', oids=0 '\000', 
>     fp=0x0, delim=0x813c829 "\t") at copy.c:421

OK, that shoots down the "memory leak" theory.  It sure looks like
what you've got is corrupt data: textout is reading a length word of
875628846 (plus or minus a couple bytes) from what is supposed to be
a text datum.    Obviously that's not right.  Next question is how
it got that way.

I think it's pretty likely that the original cause is the kernel disk
driver or disk hardware flakiness that we already have evidence for.
However, I hate passing the buck like that, so I'm willing to continue
digging if you are.

> Again, I've saved the process in case checking any symbols
> would be helpful.

You should look at the source tuple location info in CopyTo ---
something like(gdb)    f 6        -- frame 6, ie, CopyTo(gdb)    p i        -- get column number(gdb)   p *tuple    --
printcontents of HeapTupleData(gdb)   p *tuple->t_data  -- print contents of HeapTupleHeaderData
 
The last is mainly to find out the tuple's OID for possible future
reference.  What we want right now is the tuple location info,
tuple->t_self, which will give us a block number (bi_hi and bi_lo in
that struct are the high and low 16 bits of the block number).  Then,
if you can use dd and od to get a hex dump of that block from the
relation's data files, we can see what's really on disk there.
(Remember that the "blocks" are 8K each; also, if you get an offset
beyond 1 gig, then it's going to be in one of the continuation files
"psc.1", "psc.2", etc --- one gig apiece.)

It would also be useful to look at the contents of the disk block as
sitting in memory in the backend, to see if they are the same as what
you read using dd; I would not be too surprised to find they are not.
The t_data pointer should be pointing into a disk buffer in Postgres'
shared memory block, but offhand I'm not sure what's the easiest way to
discover the starting address of that buffer using gdb.  (Can any other
hackers lend a hand here?)
        regards, tom lane


pgsql-hackers by date:

Previous
From: Martin Weinberg
Date:
Subject: Re: [HACKERS] memory problems in copying large table to STDOUT
Next
From: Victor Kane
Date:
Subject: Looking for pgsql programmer