Thread: failed runcheck
First a core dump which can be relieved by: Index: catalog.c =================================================================== RCS file: /home/projects/pgsql/cvsroot/pgsql/src/backend/catalog/catalog.c,v retrieving revision 1.34 diff -c -u -r1.34 catalog.c --- catalog.c 2000/10/16 14:52:02 1.34 +++ catalog.c 2000/10/21 17:07:09 @@ -173,7 +173,7 @@boolIsSystemRelationName(const char *relname){ - if (relname[0] && relname[1] && relname[2]) + if (relname && relname[0] && relname[1] && relname[2]) return (relname[0] == 'p' && relname[1]== 'g' && relname[2] == '_'); (symptoms at the end of message) But now the bit I don't see how to solve: the regression postmaster doesn't startup because it can't find tmp_check/data/base/1/1259. The only files I see are 1/{1255,PG_VERSION}. Where does 1259 come from? Cheers, Patrick #0 IsSystemRelationName (relname=0x0) at catalog.c:176 #1 0x807ed9a in IsSharedSystemRelationName (relname=0x0) at catalog.c:197 #2 0x80e9272 in RelationInitLockInfo (relation=0x82af018) at lmgr.c:119 #3 0x81202ef in formrdesc (relationName=0x816ad7e "pg_class", natts=22, att=0x8173600) at relcache.c:1193 #4 0x8120c12 in RelationCacheInitialize () at relcache.c:1953 #5 0x81266b3 in InitPostgres (dbname=0xbfbfd666 "template1", username=0x0) at postinit.c:329 #6 0x807dde0 in BootstrapMain (argc=7, argv=0xbfbfd510) at bootstrap.c:358 #7 0x80bc67c in main (argc=8, argv=0xbfbfd50c) at main.c:119 #8 0x806367e in ___start () (gdb) print *relation $3 = {rd_fd = -1, rd_nblocks = 0, rd_refcnt = 0, rd_myxactonly = 0 '\000', rd_isnailed = 0 '\000', rd_unlinked = 0 '\000',rd_indexfound = 0 '\000', rd_uniqueindex = 0 '\000', rd_am = 0x1000001, rd_rel = 0x0, rd_id = 0, rd_indexlist =0x82af0a0, rd_lockInfo = {lockRelId = {relId = 0, dbId = 0}}, rd_att = 0x0, rd_rules = 0x0, rd_rulescxt = 0x82af128, rd_istrat= 0x0, rd_support = 0x0, trigdesc = 0x0}
Applied > First a core dump which can be relieved by: > > Index: catalog.c > =================================================================== > RCS file: /home/projects/pgsql/cvsroot/pgsql/src/backend/catalog/catalog.c,v > retrieving revision 1.34 > diff -c -u -r1.34 catalog.c > --- catalog.c 2000/10/16 14:52:02 1.34 > +++ catalog.c 2000/10/21 17:07:09 > @@ -173,7 +173,7 @@ > bool > IsSystemRelationName(const char *relname) > { > - if (relname[0] && relname[1] && relname[2]) > + if (relname && relname[0] && relname[1] && relname[2]) > return (relname[0] == 'p' && > relname[1] == 'g' && > relname[2] == '_'); > > (symptoms at the end of message) > > But now the bit I don't see how to solve: the regression postmaster doesn't > startup because it can't find tmp_check/data/base/1/1259. The only files I > see are 1/{1255,PG_VERSION}. Where does 1259 come from? > > Cheers, > > Patrick > > > #0 IsSystemRelationName (relname=0x0) at catalog.c:176 > #1 0x807ed9a in IsSharedSystemRelationName (relname=0x0) at catalog.c:197 > #2 0x80e9272 in RelationInitLockInfo (relation=0x82af018) at lmgr.c:119 > #3 0x81202ef in formrdesc (relationName=0x816ad7e "pg_class", natts=22, > att=0x8173600) at relcache.c:1193 > #4 0x8120c12 in RelationCacheInitialize () at relcache.c:1953 > #5 0x81266b3 in InitPostgres (dbname=0xbfbfd666 "template1", username=0x0) > at postinit.c:329 > #6 0x807dde0 in BootstrapMain (argc=7, argv=0xbfbfd510) at bootstrap.c:358 > #7 0x80bc67c in main (argc=8, argv=0xbfbfd50c) at main.c:119 > #8 0x806367e in ___start () > (gdb) print *relation > $3 = {rd_fd = -1, rd_nblocks = 0, rd_refcnt = 0, rd_myxactonly = 0 '\000', > rd_isnailed = 0 '\000', rd_unlinked = 0 '\000', rd_indexfound = 0 '\000', > rd_uniqueindex = 0 '\000', rd_am = 0x1000001, rd_rel = 0x0, rd_id = 0, > rd_indexlist = 0x82af0a0, rd_lockInfo = {lockRelId = {relId = 0, dbId = 0}}, > rd_att = 0x0, rd_rules = 0x0, rd_rulescxt = 0x82af128, rd_istrat = 0x0, > rd_support = 0x0, trigdesc = 0x0} > > -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 853-3000+ If your life is a hard drive, | 830 Blythe Avenue + Christ can be your backup. | Drexel Hill, Pennsylvania19026
Patrick Welche <prlw1@newn.cam.ac.uk> writes: > [ core dump due to ] > #0 IsSystemRelationName (relname=0x0) at catalog.c:176 > #1 0x807ed9a in IsSharedSystemRelationName (relname=0x0) at catalog.c:197 > #2 0x80e9272 in RelationInitLockInfo (relation=0x82af018) at lmgr.c:119 > #3 0x81202ef in formrdesc (relationName=0x816ad7e "pg_class", natts=22, > att=0x8173600) at relcache.c:1193 > #4 0x8120c12 in RelationCacheInitialize () at relcache.c:1953 and proposes to fix this by having IsSystemRelationName make an arbitrary decision about whether a NULL input pointer should be considered to represent a system relation name or not. I do not like that, because AFAICS the decision is completely arbitrary. IsSystemRelationName shouldn't be called with a NULL pointer in the first place, and it has every right to cause a coredump if that happens. It looks to me like the immediate bug here is that RelationGetPhysicalRelationName is returning a NULL pointer. Probably there are some missing or out-of-order steps in relcache initialization? I've been offline for a couple days due to DSL line failure :-( so I haven't seen Vadim's latest checkins. But I'm betting this is a bug in the changes to use OIDs as physical relnames. Do we even need RelationGetPhysicalRelationName anymore, and if so, what does it mean? Next question: why is RelationInitLockInfo using RelationGetPhysicalRelationName to get the input data for IsSharedSystemRelationName --- shouldn't that be a test on logical relation name? Or maybe the entire premise of IsSharedSystemRelationName is bogus now, and we ought to use some other way to decide if a relation is cross-database or not? regards, tom lane
> Patrick Welche <prlw1@newn.cam.ac.uk> writes: > > [ core dump due to ] > > #0 IsSystemRelationName (relname=0x0) at catalog.c:176 > > #1 0x807ed9a in IsSharedSystemRelationName (relname=0x0) at catalog.c:197 > > #2 0x80e9272 in RelationInitLockInfo (relation=0x82af018) at lmgr.c:119 > > #3 0x81202ef in formrdesc (relationName=0x816ad7e "pg_class", natts=22, > > att=0x8173600) at relcache.c:1193 > > #4 0x8120c12 in RelationCacheInitialize () at relcache.c:1953 > > and proposes to fix this by having IsSystemRelationName make an > arbitrary decision about whether a NULL input pointer should be > considered to represent a system relation name or not. I do not > like that, because AFAICS the decision is completely arbitrary. > IsSystemRelationName shouldn't be called with a NULL pointer > in the first place, and it has every right to cause a coredump > if that happens. I have removed the change. It will dump core again. > > It looks to me like the immediate bug here is that > RelationGetPhysicalRelationName is returning a NULL pointer. Probably > there are some missing or out-of-order steps in relcache initialization? > > I've been offline for a couple days due to DSL line failure :-( so That is bad. Mine has been great, but I had my first DSL burp for 3 hours on Friday after 4 months of continuous uptime. > I haven't seen Vadim's latest checkins. But I'm betting this is a bug > in the changes to use OIDs as physical relnames. Do we even need > RelationGetPhysicalRelationName anymore, and if so, what does it mean? Good bet. > > Next question: why is RelationInitLockInfo using > RelationGetPhysicalRelationName to get the input data for > IsSharedSystemRelationName --- shouldn't that be a test on logical > relation name? Or maybe the entire premise of > IsSharedSystemRelationName is bogus now, and we ought to use some other > way to decide if a relation is cross-database or not? No, because if they create a temp table that masks a system table in the current session, you want the physical name so it can know if it is a real system table, or a temp/fake one. You can ask why would someone try this, but it will work, or do something while trying. -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 853-3000+ If your life is a hard drive, | 830 Blythe Avenue + Christ can be your backup. | Drexel Hill, Pennsylvania19026
Bruce Momjian <pgman@candle.pha.pa.us> writes: >> Next question: why is RelationInitLockInfo using >> RelationGetPhysicalRelationName to get the input data for >> IsSharedSystemRelationName --- shouldn't that be a test on logical >> relation name? Or maybe the entire premise of >> IsSharedSystemRelationName is bogus now, and we ought to use some other >> way to decide if a relation is cross-database or not? > No, because if they create a temp table that masks a system table in the > current session, you want the physical name so it can know if it is a > real system table, or a temp/fake one. Well, you clearly don't want to be fooled by temp relations. I was sorta visualizing a check based on relation OIDs instead of names... regards, tom lane
Did you run make distclean? I've run regtests before committing changes. Vadim ----- Original Message ----- From: "Patrick Welche" <prlw1@newn.cam.ac.uk> To: <pgsql-hackers@postgresql.org> Sent: Saturday, October 21, 2000 10:17 AM Subject: [HACKERS] failed runcheck > First a core dump which can be relieved by: > > Index: catalog.c > =================================================================== > RCS file: /home/projects/pgsql/cvsroot/pgsql/src/backend/catalog/catalog.c,v > retrieving revision 1.34 > diff -c -u -r1.34 catalog.c > --- catalog.c 2000/10/16 14:52:02 1.34 > +++ catalog.c 2000/10/21 17:07:09 > @@ -173,7 +173,7 @@ > bool > IsSystemRelationName(const char *relname) > { > - if (relname[0] && relname[1] && relname[2]) > + if (relname && relname[0] && relname[1] && relname[2]) > return (relname[0] == 'p' && > relname[1] == 'g' && > relname[2] == '_'); > > (symptoms at the end of message) > > But now the bit I don't see how to solve: the regression postmaster doesn't > startup because it can't find tmp_check/data/base/1/1259. The only files I > see are 1/{1255,PG_VERSION}. Where does 1259 come from? > > Cheers, > > Patrick > > > #0 IsSystemRelationName (relname=0x0) at catalog.c:176 > #1 0x807ed9a in IsSharedSystemRelationName (relname=0x0) at catalog.c:197 > #2 0x80e9272 in RelationInitLockInfo (relation=0x82af018) at lmgr.c:119 > #3 0x81202ef in formrdesc (relationName=0x816ad7e "pg_class", natts=22, > att=0x8173600) at relcache.c:1193 > #4 0x8120c12 in RelationCacheInitialize () at relcache.c:1953 > #5 0x81266b3 in InitPostgres (dbname=0xbfbfd666 "template1", username=0x0) > at postinit.c:329 > #6 0x807dde0 in BootstrapMain (argc=7, argv=0xbfbfd510) at bootstrap.c:358 > #7 0x80bc67c in main (argc=8, argv=0xbfbfd50c) at main.c:119 > #8 0x806367e in ___start () > (gdb) print *relation > $3 = {rd_fd = -1, rd_nblocks = 0, rd_refcnt = 0, rd_myxactonly = 0 '\000', > rd_isnailed = 0 '\000', rd_unlinked = 0 '\000', rd_indexfound = 0 '\000', > rd_uniqueindex = 0 '\000', rd_am = 0x1000001, rd_rel = 0x0, rd_id = 0, > rd_indexlist = 0x82af0a0, rd_lockInfo = {lockRelId = {relId = 0, dbId = 0}}, > rd_att = 0x0, rd_rules = 0x0, rd_rulescxt = 0x82af128, rd_istrat = 0x0, > rd_support = 0x0, trigdesc = 0x0} >
On Sat, Oct 21, 2000 at 01:48:39PM -0700, Vadim Mikheev wrote: > Did you run make distclean? I've run regtests before committing changes. Just made sure - different computer - fresh cvs update/distclean/configure/make cd src/test/regress gmake clean gmake all gmake runcheck same coredump #1 0x807f4be in IsSharedSystemRelationName (relname=0x0) at catalog.c:197 ^^^^^^^^^^^ ie. relname[0] at catalog.c:176 is dereferencing a null pointer. Cheers, Patrick
Patrick Welche <prlw1@newn.cam.ac.uk> writes: > On Sat, Oct 21, 2000 at 01:48:39PM -0700, Vadim Mikheev wrote: >> Did you run make distclean? I've run regtests before committing changes. > Just made sure - different computer - fresh cvs update/distclean/configure/make > same coredump > #1 0x807f4be in IsSharedSystemRelationName (relname=0x0) at catalog.c:197 > ^^^^^^^^^^^ > ie. relname[0] at catalog.c:176 is dereferencing a null pointer. Interesting. Current sources pass regress tests on my machine, same as for Vadim. I think you have found a platform-specific bug. Could you dig into it a little further and try to determine where the NULL is coming from? regards, tom lane
On Sun, Oct 22, 2000 at 01:26:07AM -0400, Tom Lane wrote: > Bruce Momjian <pgman@candle.pha.pa.us> writes: > >> Next question: why is RelationInitLockInfo using > >> RelationGetPhysicalRelationName to get the input data for > >> IsSharedSystemRelationName --- shouldn't that be a test on logical > >> relation name? Or maybe the entire premise of > >> IsSharedSystemRelationName is bogus now, and we ought to use some other > >> way to decide if a relation is cross-database or not? > > > No, because if they create a temp table that masks a system table in the > > current session, you want the physical name so it can know if it is a > > real system table, or a temp/fake one. > > Well, you clearly don't want to be fooled by temp relations. I was > sorta visualizing a check based on relation OIDs instead of names... > Well, when I did a test implementation of OID filenames, lo these many moons ago, I hacked around this problem by adding the (fixed) shared system table oids to the static array that is searched for matches by IsSharedSystemRelationName. Admittedly, a hack, but it got past all the regresion tests. Ross -- Open source code is like a natural resource, it's the result of providing food and sunshine to programmers, and then staying out of their way. [...] [It] is not going away because it has utility for both the developers and users independent of economic motivations. Jim Flynn, Sunnyvale, Calif.
"Ross J. Reedstrom" <reedstrm@rice.edu> writes: >> Well, you clearly don't want to be fooled by temp relations. I was >> sorta visualizing a check based on relation OIDs instead of names... > Well, when I did a test implementation of OID filenames, lo these many > moons ago, I hacked around this problem by adding the (fixed) shared > system table oids to the static array that is searched for matches by > IsSharedSystemRelationName. > Admittedly, a hack, but it got past all the regresion tests. Not a hack at all, IMHO, since all the shared system rels have nailed-down OIDs. It's pure historical artifact that IsSharedSystemRelationName wasn't IsSharedSystemRelationOID in the first place. We probably want to be thinking about merging the "shared system relation" concept together with the "tablespace" concept once we start to implement tablespaces. But for now, I think testing the OIDs is fine. regards, tom lane
On Mon, Oct 23, 2000 at 10:26:39AM -0400, Tom Lane wrote: > > Could you dig into it a little further and try to determine where the > NULL is coming from? All clear now! (I did do another cvs update in the meantime, but either way, I can't now repeat the previously repeatable core dump) Cheers, Patrick