repeatable system index corruption on 7.4.2 - Mailing list pgsql-hackers

From Joe Conway
Subject repeatable system index corruption on 7.4.2
Date
Msg-id 4124EA7A.4030706@joeconway.com
Whole thread Raw
Responses Re: repeatable system index corruption on 7.4.2
Re: repeatable system index corruption on 7.4.2
List pgsql-hackers
I'm seeing the following errors after a few hours of fairly aggressive 
bulk load of a database running on Postgres 7.4.2:

cyspec=# select version();
ERROR:  root page 43 of "pg_proc_proname_args_nsp_index" has level 0, 
expected 2
cyspec=# select * from pg_class;
ERROR:  root page 3 of "pg_attribute_relid_attnum_index" has level 0, 
expected 1

When I say aggressive, I mean up to 6 simultaneous COPY processes. It is 
different from the issue Tom solved the other day in that we don't get 
SIGABORT, just corrupt index pages. Here is a backtrace:

#0  errfinish (dummy=0) at elog.c:319
#1  0x081cbc26 in elog_finish (elevel=20, fmt=0x81e6ee0 "root page %u of 
\"%s\" has level %u, expected %u")    at elog.c:853
#2  0x0808c632 in _bt_getroot (rel=0x82d58d4, access=1429534320) at 
nbtpage.c:287
#3  0x080902f3 in _bt_search (rel=0x82d58d4, keysz=2, scankey=0x8307358, 
bufP=0xfeffefa8, access=1) at nbtsearch.c:46
#4  0x08090fea in _bt_first (scan=0x8307198, dir=ForwardScanDirection) 
at nbtsearch.c:575
#5  0x0808ed47 in btgettuple (fcinfo=0x82a65cc) at nbtree.c:326
#6  0x081ce5b6 in FunctionCall2 (flinfo=0x0, arg1=136996300, 
arg2=136996300) at fmgr.c:993
#7  0x08088329 in index_getnext (scan=0x8307198, 
direction=ForwardScanDirection) at indexam.c:503
#8  0x08087951 in systable_getnext (sysscan=0x82a65cc) at genam.c:253
#9  0x081c3c43 in RelationBuildTupleDesc (buildinfo=        {infotype = 2, i = {info_id = 136666169, info_name =
0x8255c39
 
"pg_index_indrelid_index"}},    relation=0x5a2cdea4) at relcache.c:548
#10 0x081c459f in RelationBuildDesc (buildinfo=        {infotype = 2, i = {info_id = 136666169, info_name = 0x8255c39 
"pg_index_indrelid_index"}}, oldrelation=0x0)    at relcache.c:884
#11 0x081c56c0 in RelationSysNameGetRelation (relationName=0x8255c39 
"pg_index_indrelid_index") at relcache.c:1637
#12 0x0807febe in relation_openr (sysRelationName=0x8255c39 
"pg_index_indrelid_index", lockmode=0) at heapam.c:529
#13 0x08087ab5 in index_openr (sysRelationName=0x8255c39 
"pg_index_indrelid_index") at indexam.c:179
#14 0x0808790a in systable_beginscan (heapRelation=0x82ee11c, 
indexRelname=0x8255c39 "pg_index_indrelid_index",    indexOK=1 '\001', snapshot=0x0, nkeys=1, key=0xfefff2d0) at
genam.c:192
#15 0x081c6f53 in RelationGetIndexList (relation=0x82d846c) at 
relcache.c:2717
#16 0x08147bc4 in get_relation_info (relationObjectId=1259, 
rel=0x83070d4) at plancat.c:81
#17 0x081492b4 in make_base_rel (root=0x82a65cc, relid=1) at relnode.c:159
#18 0x08148f91 in build_base_rel (root=0x8302164, relid=1) at relnode.c:70
#19 0x0813d5b8 in add_base_rels_to_query (root=0x8302164, 
jtnode=0x83070b8) at initsplan.c:86
#20 0x0813e4c9 in query_planner (root=0x8302164, tlist=0x83067a0, 
tuple_fraction=0, cheapest_path=0xfefff4e0,    sorted_path=0xfefff4e4) at planmain.c:119
#21 0x0813f03b in grouping_planner (parse=0x8302164, tuple_fraction=0) 
at planner.c:897
#22 0x0813e9b4 in subquery_planner (parse=0x8302164, tuple_fraction=0) 
at planner.c:315
#23 0x0813e69c in planner (parse=0x8302164, isCursor=0 '\0', 
cursorOptions=0) at planner.c:119
#24 0x0816e8ba in pg_plan_query (querytree=0x8302164) at postgres.c:589
#25 0x0816e944 in pg_plan_queries (querytrees=0x830673c, needSnapshot=0 
'\0') at postgres.c:656
#26 0x0816eafc in exec_simple_query (query_string=0x8301dbc "select * 
from pg_class;\n") at postgres.c:814

Any thoughts? Do you think the patch in 7.4.5 will address this also, or 
are we looking at something else?

Thanks,

Joe



pgsql-hackers by date:

Previous
From: Andrew Dunstan
Date:
Subject: Re: 7.4.3 & 8.0.0beta1 + Solaris 9: default pg_hba.conf
Next
From: Tom Lane
Date:
Subject: Re: repeatable system index corruption on 7.4.2