Re: [HACKERS] memory problems in copying large table to STDOUT - Mailing list pgsql-hackers

From Martin Weinberg
Subject Re: [HACKERS] memory problems in copying large table to STDOUT
Date
Msg-id 199910110417.AAA18542@osprey.astro.umass.edu
Whole thread Raw
In response to Re: [HACKERS] memory problems in copying large table to STDOUT  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: [HACKERS] memory problems in copying large table to STDOUT  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-hackers
Hi Tom,

Tom Lane wrote on Sun, 10 Oct 1999 21:33:00 EDT
>
>I think it's pretty likely that the original cause is the kernel disk
>driver or disk hardware flakiness that we already have evidence for.
>However, I hate passing the buck like that, so I'm willing to continue
>digging if you are.

Honestly, I'm inclined to agree but I'd like to know for sure before
go buy a hardware raid controller (or give up trying to build a
database this large with Linux altogether).  The "coincidence" that's
haunting me here is that I used pgsql on a 12GB database of the same
sort with no trouble for about 6 months.  All of this started when I
tried to load 60GB.  And after loading the table, I index it and
exercise it and everything is fine.  Twice now, this problem has
arisen when I've tried to backup the table with a "copy".  The first
time, I wiped out the raid array, got the latest kernel with the
up-to-date raid patches and rebuilt (takes a while . . .).  Same thing
again.  Of course, this copy hits the disks pretty hard, so maybe it's
not so coincidental.  Anyway, ruling out a pgsql problem would be
progress.

>> Again, I've saved the process in case checking any symbols
>> would be helpful.
>
>You should look at the source tuple location info in CopyTo ---
>something like
>    (gdb)    f 6        -- frame 6, ie, CopyTo
>    (gdb)    p i        -- get column number
>    (gdb)   p *tuple    -- print contents of HeapTupleData
>    (gdb)   p *tuple->t_data  -- print contents of HeapTupleHeaderData
>    
>The last is mainly to find out the tuple's OID for possible future
>reference.  What we want right now is the tuple location info,
>tuple->t_self, which will give us a block number (bi_hi and bi_lo in
>that struct are the high and low 16 bits of the block number).  Then,
>if you can use dd and od to get a hex dump of that block from the
>relation's data files, we can see what's really on disk there.
>(Remember that the "blocks" are 8K each; also, if you get an offset
>beyond 1 gig, then it's going to be in one of the continuation files
>"psc.1", "psc.2", etc --- one gig apiece.)
>

Ok done.  Here's what I find:

(gdb) p i
$1 = 48
(gdb) p *tuple
$2 = {t_len = 352, t_self = {ip_blkid = {bi_hi = 24, bi_lo = 26279},    ip_posid = 19}, t_data = 0x41061710}
(gdb) p *tuple->t_data
$3 = {t_oid = 37497689, t_cmin = 0, t_cmax = 0, t_xmin = 17943, t_xmax
= 0,  t_ctid = {ip_blkid = {bi_hi = 24, bi_lo = 26279}, ip_posid = 19},  t_natts = 63, t_infomask = 2307, t_hoff = 40
'(',t_bits = "<FF><FF><FF><FF>"}
 

Now, check me to make sure I've followed you correctly:

Since 1GB of blocks is 0x20000, this data in the 13th GB.
The offset into the 13th is 26279.

So I did:

dd if=psc.12 skip=26279 count=1 bs=8k | od -t x > ~/dump.hex

I'm not sure what I'm looking for in dump.hex.  The first hand full
of lines are:

0000000 01840064 20002000 02c09ea0 02809d60
0000020 02c09c00 02989ab4 02c09954 02989808
0000040 028096c8 02809588 02c09428 02c092c8
0000060 02c09168 02c09008 02c08ea8 02988d5c
0000100 02c08bfc 02c08a9c 0280895c 02588830
0000120 02c086d0 02c08570 02588444 02c082e4
0000140 02c08184 00000000 00000000 00000000
0000160 00000000 00000000 00000000 00000000
*
0000600 00000000 023c2b5d 00000000 00000000
0000620 00004617 00000000 66a70018 003f0017
0000640 ff280903 ffffffff 000fffff 00000005
0000660 00000073 fffffd95 000000ac 00003d2b
.
.
.

--Martin


===========================================================================

Martin Weinberg                      Phone: (413) 545-3821
Dept. of Physics and Astronomy       FAX:   (413) 545-2117/0648
530 Graduate Research Tower         weinberg@astro.umass.edu
University of Massachusetts         http://www.astro.umass.edu/~weinberg/
Amherst, MA  01003-4525


pgsql-hackers by date:

Previous
From: Victor Kane
Date:
Subject: Looking for pgsql programmer
Next
From: "Allan Huffman"
Date:
Subject: --with-mb-encoding?