Thread: reldesc does not exit
Howdy! I continually get the following error when I truncate a very large table in my db: NOTICE: trying to delete a reldesc that does not exist Is this something that I should be concerned about? Thanks for the help! Darrin
"Darrin Ladd" <dladd@newfoundcomm.net> writes: > I continually get the following error when I truncate a very large table in > my db: > NOTICE: trying to delete a reldesc that does not exist > Is this something that I should be concerned about? Possibly. That's a "shouldn't ever happen" kind of message, so it certainly indicates some sort of bug. Severity of bug is unguessable at this point. I don't see this when doing TRUNCATE on a plain-vanilla table, so I guess there is something special about your situation. What Postgres version are you running, on what platform? May we see the full definition of the table in question? Also, how large is "very large"? regards, tom lane
I have Postgres 7.0.2 installed on an Alpha running Red Hat Linux 6.2. The table is truncated and loaded with approximately 40,000 records per day. The load is done by performing a COPY FROM. The first time it was ever loaded, the load was done using insert statements, with autocommit on and the box froze half way through. Ever since then, every time the truncation is performed, the reldesc warning has been displayed. Currently this is just a demo version of the application, but the production version is planned to be rolled out within a month and the volume of records held in this table will eventually get up to 1 million. I added the definition of the table below. Any direction that you can give me to help me hunt this down is greatly appreciated. I am still pretty new at all of this. Thank you very much! Darrin CREATE TABLE foo ( bar varchar, last_category_cde varchar, last_bite_cnt int, last_page_cnt int, last_site_cnt int, dtd_category_cde varchar, dtd_bite_cnt int, dtd_page_cnt int, dtd_site_cnt int, dtd_run_cnt int, dtd_categ_1 varchar, dtd_rating_1 int, dtd_categ_2 varchar, dtd_rating_2 int, dtd_categ_3 varchar, dtd_rating_3 int, wtd_category_cde varchar, wtd_bite_cnt int, wtd_page_cnt int, wtd_site_cnt int, wtd_run_cnt int, wtd_categ_1 varchar, wtd_rating_1 int, wtd_categ_2 varchar, wtd_rating_2 int, wtd_categ_3 varchar, wtd_rating_3 int, mtd_category_cde varchar, mtd_bite_cnt int, mtd_page_cnt int, mtd_site_cnt int, mtd_run_cnt int, mtd_categ_1 varchar, mtd_rating_1 int, mtd_categ_2 varchar, mtd_rating_2 int, mtd_categ_3 varchar, mtd_rating_3 int, tot_category_cde varchar, tot_bite_cnt int, tot_page_cnt int, tot_site_cnt int, tot_run_cnt int, tot_categ_1 varchar, tot_rating_1 int, tot_categ_2 varchar, tot_rating_2 int, tot_categ_3 varchar, tot_rating_3 int, last_bite_dte timestamp, added_dte timestamp, CONSTRAINT pk_foo PRIMARY KEY (bar) ); -----Original Message----- From: Tom Lane <tgl@sss.pgh.pa.us> To: Darrin Ladd <dladd@newfoundcomm.net> Cc: PGSQL General <pgsql-general@postgresql.org> Date: Friday, September 29, 2000 12:40 AM Subject: Re: [GENERAL] reldesc does not exit >"Darrin Ladd" <dladd@newfoundcomm.net> writes: >> I continually get the following error when I truncate a very large table in >> my db: >> NOTICE: trying to delete a reldesc that does not exist >> Is this something that I should be concerned about? > >Possibly. That's a "shouldn't ever happen" kind of message, so it >certainly indicates some sort of bug. Severity of bug is unguessable >at this point. I don't see this when doing TRUNCATE on a plain-vanilla >table, so I guess there is something special about your situation. >What Postgres version are you running, on what platform? May we see >the full definition of the table in question? Also, how large is "very >large"? > > regards, tom lane
"Darrin Ladd" <dladd@newfoundcomm.net> writes: > I have Postgres 7.0.2 installed on an Alpha running Red Hat Linux 6.2. The > table is truncated and loaded with approximately 40,000 records per day. > The load is done by performing a COPY FROM. The first time it was ever > loaded, the load was done using insert statements, with autocommit on and > the box froze half way through. Ever since then, every time the truncation > is performed, the reldesc warning has been displayed. Hm. Nothing out-of-the-ordinary about your table definition or what you're doing with it, and 40K records is certainly not anything that's going to stress the system. An Alpha, on the other hand, is not such a common platform. I am thinking that there is probably some 64-bit portability bug lurking in the hashtable code that manages the reldesc cache. It might be dependent on the exact table name and/or OID. Could you tell us the real name of this table (I assume it's not "foo") and the OID (do "select oid from pg_class where relname = 'table name'")? Do you see the same notice when you do a TRUNCATE on other tables? Does anyone else running an Alpha see this sort of notice when doing a TRUNCATE TABLE? regards, tom lane
Sorry, don't know why I felt I should change the table's name. The table's name is spider and the oid is 443616. Uh oh, yes, I do get the same notice when truncating other tables, even tables not in the same database :( -----Original Message----- From: Tom Lane <tgl@sss.pgh.pa.us> To: Darrin Ladd <dladd@newfoundcomm.net> Cc: PGSQL General <pgsql-general@postgresql.org> Date: Friday, September 29, 2000 10:35 AM Subject: Re: [GENERAL] reldesc does not exit >"Darrin Ladd" <dladd@newfoundcomm.net> writes: >> I have Postgres 7.0.2 installed on an Alpha running Red Hat Linux 6.2. The >> table is truncated and loaded with approximately 40,000 records per day. >> The load is done by performing a COPY FROM. The first time it was ever >> loaded, the load was done using insert statements, with autocommit on and >> the box froze half way through. Ever since then, every time the truncation >> is performed, the reldesc warning has been displayed. > >Hm. Nothing out-of-the-ordinary about your table definition or what >you're doing with it, and 40K records is certainly not anything that's >going to stress the system. > >An Alpha, on the other hand, is not such a common platform. I am >thinking that there is probably some 64-bit portability bug lurking >in the hashtable code that manages the reldesc cache. It might be >dependent on the exact table name and/or OID. Could you tell us the >real name of this table (I assume it's not "foo") and the OID (do >"select oid from pg_class where relname = 'table name'")? Do you >see the same notice when you do a TRUNCATE on other tables? > >Does anyone else running an Alpha see this sort of notice when doing >a TRUNCATE TABLE? > > regards, tom lane
"Darrin Ladd" <dladd@newfoundcomm.net> writes: > Uh oh, yes, I do get the same notice when truncating other tables, even > tables not in the same database :( OK, so it's not so data-dependent after all. Sounds like it's probably a flat-out bug associated with 64-bit-int machines. Curious that we haven't heard about it before; you're certainly not the only person running Postgres on Alphas. The notice as such is pretty harmless, but if I'm right that it indicates a lookup failure in the hashtable code, there's a potential for much more serious problems caused by lookup failures in other hashtables. It needs to be investigated. pgsql.com has an Alpha on loan from DEC that I will try to reproduce the problem on, but that machine seems to be down at the moment :-(. I believe that machine isn't running RedHat anyway, but DEC Unix, so it's possible that it won't show the problem. In that case I might need to ask for a temporary account on your machine (doesn't have to be the postgres account, just an unprivileged user account that I can compile a debug version of the code on...) Will keep you posted. regards, tom lane
I wrote: > "Darrin Ladd" <dladd@newfoundcomm.net> writes: >> Uh oh, yes, I do get the same notice when truncating other tables, even >> tables not in the same database :( > OK, so it's not so data-dependent after all. Sounds like it's probably > a flat-out bug associated with 64-bit-int machines. No, I was guessing wrong. Turns out it's a fundamental bug in TRUNCATE that could show up on any machine, depending on chance behavior of memory allocation, with consequences up to and including backend coredump. (TRUNCATE on a table with indexes would fail unless closing and re-opening the relcache entry recreated the relcache entry at exactly the same memory address it had before :-(.) Apparently the RedHat LinuxAlpha distro is somewhat more likely than other platforms to move things around in memory, for reasons not immediately obvious; else we'd have seen this sooner on other machines. I have fixed this for 7.0.3, due out soon. If you need a fix now the patch against 7.0.2 is attached. regards, tom lane *** heap.c.orig Thu May 25 17:25:32 2000 --- heap.c Sat Sep 30 14:41:51 2000 *************** *** 1091,1134 **** * RelationTruncateIndexes - This routine is used to truncate all * indices associated with the heap relation to zero tuples. * The routine will truncate and then reconstruct the indices on ! * the relation specified by the heapRelation parameter. * -------------------------------- */ static void ! RelationTruncateIndexes(Relation heapRelation) { ! Relation indexRelation, ! currentIndex; ScanKeyData entry; HeapScanDesc scan; ! HeapTuple indexTuple, ! procTuple, ! classTuple; ! Form_pg_index index; ! Oid heapId, ! indexId, ! procId, ! accessMethodId; ! Node *oldPred = NULL; ! PredInfo *predInfo; ! List *cnfPred = NULL; ! AttrNumber *attributeNumberA; ! FuncIndexInfo fInfo, ! *funcInfo = NULL; ! int i, ! numberOfAttributes; ! char *predString; ! ! heapId = RelationGetRelid(heapRelation); ! ! /* Scan pg_index to find indexes on heapRelation */ indexRelation = heap_openr(IndexRelationName, AccessShareLock); ScanKeyEntryInitialize(&entry, 0, Anum_pg_index_indrelid, F_OIDEQ, ObjectIdGetDatum(heapId)); scan = heap_beginscan(indexRelation, false, SnapshotNow, 1, &entry); while (HeapTupleIsValid(indexTuple = heap_getnext(scan, 0))) { /* * For each index, fetch index attributes so we can apply --- 1091,1132 ---- * RelationTruncateIndexes - This routine is used to truncate all * indices associated with the heap relation to zero tuples. * The routine will truncate and then reconstruct the indices on ! * the relation specified by the heapId parameter. * -------------------------------- */ static void ! RelationTruncateIndexes(Oid heapId) { ! Relation indexRelation; ScanKeyData entry; HeapScanDesc scan; ! HeapTuple indexTuple; + /* Scan pg_index to find indexes on specified heap */ indexRelation = heap_openr(IndexRelationName, AccessShareLock); ScanKeyEntryInitialize(&entry, 0, Anum_pg_index_indrelid, F_OIDEQ, ObjectIdGetDatum(heapId)); scan = heap_beginscan(indexRelation, false, SnapshotNow, 1, &entry); + while (HeapTupleIsValid(indexTuple = heap_getnext(scan, 0))) { + Relation heapRelation, + currentIndex; + HeapTuple procTuple, + classTuple; + Form_pg_index index; + Oid indexId, + procId, + accessMethodId; + Node *oldPred = NULL; + PredInfo *predInfo; + List *cnfPred = NULL; + AttrNumber *attributeNumberA; + FuncIndexInfo fInfo, + *funcInfo = NULL; + int i, + numberOfAttributes; + char *predString; /* * For each index, fetch index attributes so we can apply *************** *** 1183,1192 **** elog(ERROR, "RelationTruncateIndexes: index access method not found"); accessMethodId = ((Form_pg_class) GETSTRUCT(classTuple))->relam; /* Open our index relation */ currentIndex = index_open(indexId); - if (currentIndex == NULL) - elog(ERROR, "RelationTruncateIndexes: can't open index relation"); /* Obtain exclusive lock on it, just to be sure */ LockRelation(currentIndex, AccessExclusiveLock); --- 1181,1197 ---- elog(ERROR, "RelationTruncateIndexes: index access method not found"); accessMethodId = ((Form_pg_class) GETSTRUCT(classTuple))->relam; + /* + * We have to re-open the heap rel each time through this loop + * because index_build will close it again. We need grab no lock, + * however, because we assume heap_truncate is holding an exclusive + * lock on the heap rel. + */ + heapRelation = heap_open(heapId, NoLock); + Assert(heapRelation != NULL); + /* Open our index relation */ currentIndex = index_open(indexId); /* Obtain exclusive lock on it, just to be sure */ LockRelation(currentIndex, AccessExclusiveLock); *************** *** 1205,1220 **** InitIndexStrategy(numberOfAttributes, currentIndex, accessMethodId); index_build(heapRelation, currentIndex, numberOfAttributes, attributeNumberA, 0, NULL, funcInfo, predInfo); - /* * index_build will close both the heap and index relations (but ! * not give up the locks we hold on them). That's fine for the ! * index, but we need to open the heap again. We need no new ! * lock, since this backend still has the exclusive lock grabbed ! * by heap_truncate. */ - heapRelation = heap_open(heapId, NoLock); - Assert(heapRelation != NULL); } /* Complete the scan and close pg_index */ --- 1210,1219 ---- InitIndexStrategy(numberOfAttributes, currentIndex, accessMethodId); index_build(heapRelation, currentIndex, numberOfAttributes, attributeNumberA, 0, NULL, funcInfo, predInfo); /* * index_build will close both the heap and index relations (but ! * not give up the locks we hold on them). */ } /* Complete the scan and close pg_index */ *************** *** 1270,1286 **** rel->rd_nblocks = 0; /* If this relation has indexes, truncate the indexes too */ ! RelationTruncateIndexes(rel); /* * Close the relation, but keep exclusive lock on it until commit. */ heap_close(rel, NoLock); - - /* - * Is this really necessary? - */ - RelationForgetRelation(rid); } --- 1269,1280 ---- rel->rd_nblocks = 0; /* If this relation has indexes, truncate the indexes too */ ! RelationTruncateIndexes(rid); /* * Close the relation, but keep exclusive lock on it until commit. */ heap_close(rel, NoLock); }