Thread: New regression driver
I't committed. There is a new shell script run_check.sh in the regression test directory. It is driven by the conficuration file ./sql/run_check.tests and runs most of our tests parallel. It is invoked with the new GNUmakefile target 'runcheck'. The regress.sh is using the new tests file too by extracting the tests to run via awk, so ./sql/tests is obsolete now and subject to be removed soon. Bruce Momjian wrote: > Any modifications to shared pg_ tables would be a problem. Also, pg_log > and pg_variable locking is not happening in there either, is it? Thus, it does a complete independant database installation below the regression test, starting it's own postmaster (and terminating it at the end, of course). The entire test suite can be run without even shutting down the currently installed database. So a ...src > ./configure ...src > make ...src > cd test/regression ...src/test/regression > make clean all runcheck sequence will compile and temporarily install the new build under the regression path, and then run all the tests against it. I think if my new test driver has settled, we should change the GNUmakefile to just print some messages if 'make runtest' is typed. The current runtest target should IMHO still be availabe under another name, to test the real life installation created by 'make install'. Alternatively (IMHO better) some parameter to run_check.sh could tell if it should create it's own, temporary installation, or if it should use the existing installed database system and the already running postmaster. Tom Lane wrote: > In other words, you've already exposed a bug! Right on! Absolutely right and I've commented out that code for now. It is in utils/cache/catcache.c line 996. The comments say that the code should prevent the backend from entering infinite recursion while loading new cache entries. But the flag used for it seems to live in shared memory, thus it is affected by other backends too. If the flag is true doesn't tell if a backend set it itself, or if another one did. If we really need this check, it must be implemented in another way. Another bug I discoverd is that Vadims WAL code allways looks for the pg_control file in the PGDATA or compiled in default directory, ignoring the -D switch. Haven't fixed it up to now, but run_check.sh uses PGDATA, so it's safe at the moment. I ran the old regress.sh using the default installation parallel to the run_check.sh using it's own installation and postmaster. Jan -- #======================================================================# # It's easier to get forgiveness for being wrong than for being right. # # Let's break this rule - forgive me. # #========================================= wieck@debis.com (Jan Wieck) #
wieck@debis.com (Jan Wieck) writes: > There is a new shell script run_check.sh in the regression > test directory. It is driven by the conficuration file > ./sql/run_check.tests and runs most of our tests parallel. It > is invoked with the new GNUmakefile target 'runcheck'. This is way cool. I had to fix a couple of silly little portability problems, but I like it. > I think if my new test driver has settled, we should change > the GNUmakefile to just print some messages if 'make runtest' > is typed. The current runtest target should IMHO still be > availabe under another name, to test the real life > installation created by 'make install'. > Alternatively (IMHO better) some parameter to run_check.sh > could tell if it should create it's own, temporary > installation, or if it should use the existing installed > database system and the already running postmaster. We should leave the old driver available, so that if an unexpected problem arises one can easily check to see if it's being triggered by concurrent execution or not. Or, run_check could have a parameter to force serialized execution, if you would rather have just one script. In that case we could toss the old runtest and rename run_check to runtest. (If we do keep both scripts, can we pick more helpful names than "runtest" and "run_check"? The difference is not immediately obvious...) I agree that run_check needs to be able to test a normal installation as well as a temporary one. > Absolutely right and I've commented out that code for now. > It is in utils/cache/catcache.c line 996. The comments say > that the code should prevent the backend from entering > infinite recursion while loading new cache entries. But the > flag used for it seems to live in shared memory, thus it is > affected by other backends too. If the flag is true doesn't > tell if a backend set it itself, or if another one did. If we > really need this check, it must be implemented in another > way. I will look at this. I don't think that the catcaches live in shared memory, so the problem is probably not what you suggest. The fact that the behavior is different under load may point to a real problem, not just an insufficiently clever debugging check. > I ran the old regress.sh using the default installation > parallel to the run_check.sh using it's own installation and > postmaster. They both give the same results on my platform, too. regards, tom lane
Tom Lane <tgl@sss.pgh.pa.us> writes: > wieck@debis.com (Jan Wieck) writes: >> It is in utils/cache/catcache.c line 996. The comments say >> that the code should prevent the backend from entering >> infinite recursion while loading new cache entries. > I will look at this. I don't think that the catcaches live in > shared memory, so the problem is probably not what you suggest. > The fact that the behavior is different under load may point to a > real problem, not just an insufficiently clever debugging check. Indeed, this is a real bug, and commenting out the code that caught it is not the right fix! What is happening is that utils/inval.c is trying to initialize some variables that contain OIDs of system relations. This means calling the catcache routines in order to look up relation names in pg_class. However, if a shared cache inval message arrives from another backend while that's happening, we recursively invoke inval.c to deal with the message. And inval.c sees that its OID variables aren't initialized yet, so it recursively calls the catcache routines to try to get them initialized. Or, if just the first one's been initialized so far, ValidateHacks() assumes they're all valid, and you can end up at the elog(FATAL) panic at the bottom of CacheIdInvalidate(). I've got a core dump which contains a ten-deep recursion between inval.c and syscache.c, culminating in elog(FATAL) because the eleventh incoming sinval message was just slow enough to let inval.c's first OID variable get filled in before it arrived. In short: we don't deal very robustly with cache invals happening during backend startup. Send invals at a new backend with just the right timing, and it'll choke. I am not sure if this bug is of long standing or if we introduced it since 6.5. It's possible I created it while messing with the relcache stuff a month or two ago. But I can easily believe that it's been there a long time and we never had a way of reproducing the problem with any reliability before. I think the fix is to rip out inval.c's attempt to look up system relation names, and just give it hardwired knowledge of their OIDs. Even though it sort-of works to do the lookups, it's bad practice for routines that are potentially called during catcache initialization to depend on the catcache to be already working. And there are other places that already have hardwired knowledge of the system relation OIDs, so... regards, tom lane
> I think the fix is to rip out inval.c's attempt to look up system > relation names, and just give it hardwired knowledge of their OIDs. > Even though it sort-of works to do the lookups, it's bad practice for > routines that are potentially called during catcache initialization > to depend on the catcache to be already working. And there are other > places that already have hardwired knowledge of the system relation > OIDs, so... FYI, I am in the process of coding all cache miss lookups to use new system indexes. I have also added code to SearchSelfReferences() because pg_operator has some fancy depency on its lookup using an index, and has to have certain lookup happen with an sequential and not an index scan. -- Bruce Momjian | http://www.op.net/~candle maillist@candle.pha.pa.us | (610) 853-3000+ If your life is a hard drive, | 830 Blythe Avenue + Christ can be your backup. | Drexel Hill, Pennsylvania19026
Bruce Momjian <pgman@candle.pha.pa.us> writes: > ... I have also added code to SearchSelfReferences() > because pg_operator has some fancy depency on its lookup using an index, > and has to have certain lookup happen with an sequential and not an > index scan. Say what? That's got to be a symptom of a bug somewhere. Maybe pg_operator needs some CommandCounterIncrement calls so that the tuples it inserts become visible earlier? What are you seeing exactly? For that matter, SearchSelfReferences looks like one giant kluge to me. Who added this, and why, and what's the logic? (Undocumented kluges are very high on my hate list.) regards, tom lane
> -----Original Message----- > From: owner-pgsql-hackers@postgreSQL.org > [mailto:owner-pgsql-hackers@postgreSQL.org]On Behalf Of Tom Lane > Sent: Sunday, November 21, 1999 11:12 AM > To: Bruce Momjian > Cc: PostgreSQL HACKERS > Subject: Re: [HACKERS] New regression driver > > > Bruce Momjian <pgman@candle.pha.pa.us> writes: > > ... I have also added code to SearchSelfReferences() > > because pg_operator has some fancy depency on its lookup using an index, > > and has to have certain lookup happen with an sequential and not an > > index scan. > > Say what? That's got to be a symptom of a bug somewhere. Maybe > pg_operator needs some CommandCounterIncrement calls so that the > tuples it inserts become visible earlier? What are you seeing exactly? > > For that matter, SearchSelfReferences looks like one giant kluge to me. > Who added this, and why, and what's the logic? (Undocumented kluges > are very high on my hate list.) > It's me who added the function. I left it undocumented,sorry. Bruce,could you add an document on it ? Bruce added a new index to pg_index. Index scan needs an information of pg_index. If we use the new index,we needs the information about the index in pg_index. Doesn't this cause a real cycle ? I added the function in order to hold one tuple which causes a real cycle. The tuple in pg_index should be scanned sequentially. I don't think it's the best solution. Please change it if there's a better way. Regards. Hiroshi Inoue Inoue@tpf.co.jp
> It's me who added the function. > I left it undocumented,sorry. > Bruce,could you add an document on it ? Done. Will commit soon. > > Bruce added a new index to pg_index. > Index scan needs an information of pg_index. > If we use the new index,we needs the information about the index > in pg_index. > Doesn't this cause a real cycle ? Yes. I am using it for a pg_operator index too. > > I added the function in order to hold one tuple which causes a real > cycle. The tuple in pg_index should be scanned sequentially. > > I don't think it's the best solution. > Please change it if there's a better way. I talked to Tom, and we think it is a good solution. -- Bruce Momjian | http://www.op.net/~candle maillist@candle.pha.pa.us | (610) 853-3000+ If your life is a hard drive, | 830 Blythe Avenue + Christ can be your backup. | Drexel Hill, Pennsylvania19026