7.0.2 crash, backtrace with debug available - Mailing list pgsql-hackers

From Alfred Perlstein
Subject 7.0.2 crash, backtrace with debug available
Date
Msg-id 20000920061808.P9141@fw.wintelcom.net
Whole thread Raw
Responses Re: 7.0.2 crash, backtrace with debug available
Re: 7.0.2 crash, backtrace with debug available
Re: 7.0.2 crash, backtrace with debug available
List pgsql-hackers
There's two tracebacks of crashed 7.0.2 backends at the end of this
email.

I posted earlier this week about a table of ours getting corrupted
after some time.  The table looks like this:

details:id         | integer     | attr_type  | varchar(32) | attr_name  | varchar(32) | attr_vers  | varchar(32) |
attr_hits | bigint      | default 0
 

We're using a perl script to update this table from a "raw" table:

details_raw:id         | integer     | stat_date  | timestamp   | attr_type  | varchar(32) | attr_name  | varchar(32) |
attr_vers | varchar(32) | attr_hits  | bigint      | default 0
 


SELECT id, attr_type, attr_name, attr_vers, sum(attr_hits) as attr_hits
FROM details_raw
WHERE stat_date < ? 
GROUP BY counter_id, attr_type, attr_name, attr_vers 
;

then we loop over this UPDATE query calling an INSERT query if UPDATE
returns 0 rows.  We are vacuuming the table after every update (which
can be several hundred rows) which is why i need the vacuum.

INSERT INTO details   (id, attr_type, attr_name, attr_vers, attr_hits)   VALUES ( ?, ?, ?, ?, ? )
;

UPDATE details
SET attr_hits = attr_hits + ? 
WHERE id = ? AND attr_type = ? AND attr_name = ? AND attr_vers = ?
;

After a while we get this crash apparently, followed by crashes of any
backend that scans this table, not updating this specific table makes
the crashes go away, the problem seems pretty isolated to this data.

% gdb /usr/local/pgsql/bin/postgres postgres.54738.core 
#0  0x4829f77a in memmove () from /usr/lib/libc.so.4
(gdb) bt
#0  0x4829f77a in memmove () from /usr/lib/libc.so.4
#1  0x53260f14 in ?? ()
#2  0x8093c93 in vc_attrstats (onerel=0x84a2788, vacrelstats=0x8496290,    tuple=0xbfbfe93c) at vacuum.c:2354
#3  0x8091609 in vc_scanheap (vacrelstats=0x8496290, onerel=0x84a2788,    vacuum_pages=0xbfbfe9d0,
fraged_pages=0xbfbfe9c0)at vacuum.c:980
 
#4  0x8090ccb in vc_vacone (relid=2238241037, analyze=1, va_cols=0x0)   at vacuum.c:599
#5  0x8090454 in vc_vacuum (VacRelP=0xbfbfea60, analyze=1 '\001', va_cols=0x0)   at vacuum.c:299
#6  0x80903dc in vacuum (vacrel=0x84960e8 "\230`I\b ", verbose=1,    analyze=1 '\001', va_spec=0x0) at vacuum.c:223
#7  0x80fa444 in ProcessUtility (parsetree=0x8496110, dest=Remote)   at utility.c:694
#8  0x80f7e5e in pg_exec_query_dest (   query_string=0x81a9370 "VACUUM verbose analyze webhit_details_formatted;",
dest=Remote,aclOverride=0) at postgres.c:617
 
#9  0x80f7db9 in pg_exec_query (   query_string=0x81a9370 "VACUUM verbose analyze webhit_details_formatted;")   at
postgres.c:562
#10 0x80f8d1a in PostgresMain (argc=9, argv=0xbfbff0e0, real_argc=10,    real_argv=0xbfbffb40) at postgres.c:1590
#11 0x80e1d06 in DoBackend (port=0x843f000) at postmaster.c:2009
#12 0x80e1899 in BackendStartup (port=0x843f000) at postmaster.c:1776
#13 0x80e0abd in ServerLoop () at postmaster.c:1037
#14 0x80e04be in PostmasterMain (argc=10, argv=0xbfbffb40) at postmaster.c:725
#15 0x80aee43 in main (argc=10, argv=0xbfbffb40) at main.c:93
#16 0x80633c5 in _start ()
(gdb) list
2349                                    stats->guess1_hits = 1;
2350                                    stats->guess2_hits = 1;
2351                            }
2352                            if (!value_hit)
2353                            {
2354                                    vc_bucketcpy(stats->attr, value, &stats->guess2, &stats->guess2_len);
2355                                    stats->guess1_hits = 1;
2356                                    stats->guess2_hits = 1;
2357                            }
2358                    }
(gdb) print stats
No symbol "stats" in current context.
(gdb) print stats->attr
No symbol "stats" in current context.
(gdb) print value_hit
$1 = 0 '\000'
(gdb) print value
$2 = 1395003156
(gdb) print stats->guess2_len
No symbol "stats" in current context.
(gdb) print i
$1 = 3
(gdb) print attr_cnt
$2 = 167920861
(gdb) print *vacattrstats
Cannot access memory at address 0xb5a9e104.
(gdb) print tupDesc
$3 = 0x84a6368
(gdb) print *tupDesc
$4 = {natts = 5, attrs = 0x8492748, constr = 0x84a6380}
(gdb) print *onerel 
$5 = {rd_fd = 138985496, rd_nblocks = 892691556, rd_refcnt = 25698,  rd_myxactonly = 53 '5', rd_isnailed = 53 '5',
rd_isnoname= 100 'd',  rd_unlinked = 102 'f', rd_am = 0xac70000, rd_rel = 0x40580044, rd_id = 77,  rd_lockInfo =
{lockRelId= {relId = 1937204590, dbId = 1848586042}},  rd_att = 0x31737765, rd_rules = 0x2e63712e, rd_istrat =
0x706d7973, rd_support = 0x63697461, trigdesc = 0x61632e6f}
 
(gdb) print value;
Invalid character ';' in expression.
(gdb) print value 
$6 = 1395003156
(gdb) print *value
$7 = 892691554
(gdb) print isnull
$8 = 0 '\000'
(gdb) 
* IDENTIFICATION*        $Header: /home/pgcvs/pgsql/src/backend/commands/vacuum.c,v 1.148 2000/
05/19 03:22:29 tgl Exp $


here's another that happened to occur during what seems to be a 'COPY OUT':
#0  0x482a7d95 in ?? ()
(gdb) bt
#0  0x482a7d95 in ?? ()
#1  0x808c393 in CopyTo (rel=0x8777890, binary=0 '\000', oids=0 '\000',    fp=0x0, delim=0x8159fa9 "\t",
null_print=0x8159fab"\\N") at copy.c:508
 
#2  0x808bf99 in DoCopy (relname=0x87230e8 "~+", binary=0 '\000',    oids=0 '\000', from=0 '\000', pipe=1 '\001',
filename=0x0,   delim=0x8159fa9 "\t", null_print=0x8159fab "\\N") at copy.c:374
 
#3  0x80f98a3 in ProcessUtility (parsetree=0x8723110, dest=Remote)   at utility.c:262
#4  0x80f7e5e in pg_exec_query_dest (query_string=0x81a9388 "", dest=Remote,    aclOverride=0) at postgres.c:617
#5  0x80f7db9 in pg_exec_query (query_string=0x81a9388 "") at postgres.c:562
#6  0x80f8d1a in PostgresMain (argc=9, argv=0xbfbff0e0, real_argc=10,    real_argv=0xbfbffb40) at postgres.c:1590
#7  0x80e1d06 in DoBackend (port=0x843f000) at postmaster.c:2009
#8  0x80e1899 in BackendStartup (port=0x843f000) at postmaster.c:1776
#9  0x80e0abd in ServerLoop () at postmaster.c:1037
#10 0x80e04be in PostmasterMain (argc=10, argv=0xbfbffb40) at postmaster.c:725
#11 0x80aee43 in main (argc=10, argv=0xbfbffb40) at main.c:93
#12 0x80633c5 in _start ()
(gdb) up
#1  0x808c393 in CopyTo (rel=0x8777890, binary=0 '\000', oids=0 '\000',    fp=0x0, delim=0x8159fa9 "\t",
null_print=0x8159fab"\\N") at copy.c:508
 
508                                             string = (char *) (*fmgr_faddr(&out_functions[i]))
(gdb) list
503                                             continue;
504                                     }
505     #endif   /* _DROP_COLUMN_HACK__ */
506                                     if (!isnull)
507                                     {
508                                             string = (char *) (*fmgr_faddr(&out_functions[i]))
509                                                     (value, elements[i], typmod[i]);
510                                             CopyAttributeOut(fp, string, delim);
511                                             pfree(string);
512                                     }
(gdb) print isnull
$1 = 0 '\000'
(gdb) print string
$2 = 0xfffffffc <Address 0xfffffffc out of bounds>
(gdb) print value
$3 = 1072255572
(gdb) print elements[i]
$4 = 11134
(gdb) print typmod[i]  
$5 = 1742544
(gdb) print out_functions[i]
$6 = {fn_addr = 0x4005a, fn_plhandler = 0x208900, fn_oid = 11134, fn_nargs = 0}
(gdb) 
* IDENTIFICATION*    $Header: /home/pgcvs/pgsql/src/backend/commands/copy.c,v 1.106.2.2 2000/06/28 06:13:01 tgl Exp $

I know there's been a couple of updates to the source since this date
(compiled on Aug 3), any idea if:
a) an upgrade is a good idea
b) an upgrade is a safe idea

If there's anything that I can do to provide clearer information?

thanks very much for your time,
-- 
-Alfred Perlstein - [bright@wintelcom.net|alfred@freebsd.org]
"I have the heart of a child; I keep it in a jar on my desk."


pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: Re: pg_dump tries to do too much per query
Next
From: "Oliver Elphick"
Date:
Subject: Debian Bug#72084: Broken permissions required with foreign keys (fwd)