Re: Postgresql 8.4.1 segfault, backtrace - Mailing list pgsql-bugs
From | Richard Neill |
---|---|
Subject | Re: Postgresql 8.4.1 segfault, backtrace |
Date | |
Msg-id | 4AD76B73.7020007@hermes.cam.ac.uk Whole thread Raw |
In response to | Re: Postgresql 8.4.1 segfault, backtrace (Tom Lane <tgl@sss.pgh.pa.us>) |
List | pgsql-bugs |
Dear Tom, Thanks for this, and sorry for not replying earlier. We finally obtained a window to deploy this patch on the real (rather busy!) production system as of last Saturday evening. The good news is that the patch has now been in place for 5 days, and, despite some very high loading, it has survived without a single crash. I'd venture to say that this issue is now fixed. Best wishes, Richard Tom Lane wrote: > I wrote: >> I'll get you a real fix as soon as I can, but might not be till >> tomorrow. > > The attached patch (against 8.4.x) fixes the problem as far as I can > tell. Please test. > > regards, tom lane > > > > ------------------------------------------------------------------------ > > Index: src/backend/utils/cache/relcache.c > =================================================================== > RCS file: /cvsroot/pgsql/src/backend/utils/cache/relcache.c,v > retrieving revision 1.287 > diff -c -r1.287 relcache.c > *** src/backend/utils/cache/relcache.c 11 Jun 2009 14:49:05 -0000 1.287 > --- src/backend/utils/cache/relcache.c 25 Sep 2009 17:32:02 -0000 > *************** > *** 1386,1392 **** > * > * The data we insert here is pretty incomplete/bogus, but it'll serve to > * get us launched. RelationCacheInitializePhase2() will read the real > ! * data from pg_class and replace what we've done here. > */ > relation->rd_rel = (Form_pg_class) palloc0(CLASS_TUPLE_SIZE); > > --- 1386,1394 ---- > * > * The data we insert here is pretty incomplete/bogus, but it'll serve to > * get us launched. RelationCacheInitializePhase2() will read the real > ! * data from pg_class and replace what we've done here. Note in particular > ! * that relowner is left as zero; this cues RelationCacheInitializePhase2 > ! * that the real data isn't there yet. > */ > relation->rd_rel = (Form_pg_class) palloc0(CLASS_TUPLE_SIZE); > > *************** > *** 2603,2619 **** > * rows and replace the fake entries with them. Also, if any of the > * relcache entries have rules or triggers, load that info the hard way > * since it isn't recorded in the cache file. > */ > hash_seq_init(&status, RelationIdCache); > > while ((idhentry = (RelIdCacheEnt *) hash_seq_search(&status)) != NULL) > { > Relation relation = idhentry->reldesc; > > /* > * If it's a faked-up entry, read the real pg_class tuple. > */ > ! if (needNewCacheFile && relation->rd_isnailed) > { > HeapTuple htup; > Form_pg_class relp; > --- 2605,2635 ---- > * rows and replace the fake entries with them. Also, if any of the > * relcache entries have rules or triggers, load that info the hard way > * since it isn't recorded in the cache file. > + * > + * Whenever we access the catalogs to read data, there is a possibility > + * of a shared-inval cache flush causing relcache entries to be removed. > + * Since hash_seq_search only guarantees to still work after the *current* > + * entry is removed, it's unsafe to continue the hashtable scan afterward. > + * We handle this by restarting the scan from scratch after each access. > + * This is theoretically O(N^2), but the number of entries that actually > + * need to be fixed is small enough that it doesn't matter. > */ > hash_seq_init(&status, RelationIdCache); > > while ((idhentry = (RelIdCacheEnt *) hash_seq_search(&status)) != NULL) > { > Relation relation = idhentry->reldesc; > + bool restart = false; > + > + /* > + * Make sure *this* entry doesn't get flushed while we work with it. > + */ > + RelationIncrementReferenceCount(relation); > > /* > * If it's a faked-up entry, read the real pg_class tuple. > */ > ! if (relation->rd_rel->relowner == InvalidOid) > { > HeapTuple htup; > Form_pg_class relp; > *************** > *** 2630,2636 **** > * Copy tuple to relation->rd_rel. (See notes in > * AllocateRelationDesc()) > */ > - Assert(relation->rd_rel != NULL); > memcpy((char *) relation->rd_rel, (char *) relp, CLASS_TUPLE_SIZE); > > /* Update rd_options while we have the tuple */ > --- 2646,2651 ---- > *************** > *** 2639,2660 **** > RelationParseRelOptions(relation, htup); > > /* > ! * Also update the derived fields in rd_att. > */ > ! relation->rd_att->tdtypeid = relp->reltype; > ! relation->rd_att->tdtypmod = -1; /* unnecessary, but... */ > ! relation->rd_att->tdhasoid = relp->relhasoids; > > ReleaseSysCache(htup); > } > > /* > * Fix data that isn't saved in relcache cache file. > */ > if (relation->rd_rel->relhasrules && relation->rd_rules == NULL) > RelationBuildRuleLock(relation); > if (relation->rd_rel->relhastriggers && relation->trigdesc == NULL) > RelationBuildTriggers(relation); > } > > /* > --- 2654,2710 ---- > RelationParseRelOptions(relation, htup); > > /* > ! * Check the values in rd_att were set up correctly. (We cannot > ! * just copy them over now: formrdesc must have set up the > ! * rd_att data correctly to start with, because it may already > ! * have been copied into one or more catcache entries.) > */ > ! Assert(relation->rd_att->tdtypeid == relp->reltype); > ! Assert(relation->rd_att->tdtypmod == -1); > ! Assert(relation->rd_att->tdhasoid == relp->relhasoids); > > ReleaseSysCache(htup); > + > + /* relowner had better be OK now, else we'll loop forever */ > + if (relation->rd_rel->relowner == InvalidOid) > + elog(ERROR, "invalid relowner in pg_class entry for \"%s\"", > + RelationGetRelationName(relation)); > + > + restart = true; > } > > /* > * Fix data that isn't saved in relcache cache file. > + * > + * relhasrules or relhastriggers could possibly be wrong or out of > + * date. If we don't actually find any rules or triggers, clear the > + * local copy of the flag so that we don't get into an infinite loop > + * here. We don't make any attempt to fix the pg_class entry, though. > */ > if (relation->rd_rel->relhasrules && relation->rd_rules == NULL) > + { > RelationBuildRuleLock(relation); > + if (relation->rd_rules == NULL) > + relation->rd_rel->relhasrules = false; > + restart = true; > + } > if (relation->rd_rel->relhastriggers && relation->trigdesc == NULL) > + { > RelationBuildTriggers(relation); > + if (relation->trigdesc == NULL) > + relation->rd_rel->relhastriggers = false; > + restart = true; > + } > + > + /* Release hold on the relation */ > + RelationDecrementReferenceCount(relation); > + > + /* Now, restart the hashtable scan if needed */ > + if (restart) > + { > + hash_seq_term(&status); > + hash_seq_init(&status, RelationIdCache); > + } > } > > /*
pgsql-bugs by date: