Thread: New regression driver

New regression driver

From

wieck@debis.com (Jan Wieck)

Date:

19 November 1999, 14:38:36

I't committed.

    There  is  a  new shell script run_check.sh in the regression
    test directory.  It  is  driven  by  the  conficuration  file
    ./sql/run_check.tests and runs most of our tests parallel. It
    is invoked with the new GNUmakefile target 'runcheck'.


    The regress.sh is using the new tests file too by  extracting
    the  tests to run via awk, so ./sql/tests is obsolete now and
    subject to be removed soon.

Bruce Momjian wrote:

> Any modifications to shared pg_ tables would be a problem.  Also, pg_log
> and pg_variable locking is not happening in there either, is it?

    Thus, it does a complete  independant  database  installation
    below  the regression test, starting it's own postmaster (and
    terminating it at the end, of course). The entire test  suite
    can be run without even shutting down the currently installed
    database.

    So a

       ...src > ./configure
       ...src > make
       ...src > cd test/regression
       ...src/test/regression > make clean all runcheck

    sequence will compile and temporarily install the  new  build
    under the regression path, and then run all the tests against
    it.

    I think if my new test driver has settled, we  should  change
    the GNUmakefile to just print some messages if 'make runtest'
    is typed.  The current runtest target should  IMHO  still  be
    availabe   under   another   name,  to  test  the  real  life
    installation created by 'make install'.

    Alternatively (IMHO better) some  parameter  to  run_check.sh
    could   tell   if   it  should  create  it's  own,  temporary
    installation, or if it  should  use  the  existing  installed
    database system and the already running postmaster.

Tom Lane wrote:

> In other words, you've already exposed a bug!  Right on!

    Absolutely  right  and  I've commented out that code for now.
    It is in utils/cache/catcache.c line 996.  The  comments  say
    that  the  code  should  prevent  the  backend  from entering
    infinite recursion while loading new cache entries.  But  the
    flag  used  for it seems to live in shared memory, thus it is
    affected by other backends too. If the flag is  true  doesn't
    tell if a backend set it itself, or if another one did. If we
    really need this check, it must  be  implemented  in  another
    way.

    Another bug I discoverd is that Vadims WAL code allways looks
    for the pg_control file in the PGDATA or compiled in  default
    directory,  ignoring  the  -D  switch. Haven't fixed it up to
    now, but run_check.sh  uses  PGDATA,  so  it's  safe  at  the
    moment.

    I  ran  the  old  regress.sh  using  the default installation
    parallel to the run_check.sh using it's own installation  and
    postmaster.


Jan

--

#======================================================================#
# It's easier to get forgiveness for being wrong than for being right. #
# Let's break this rule - forgive me.                                  #
#========================================= wieck@debis.com (Jan Wieck) #

Re: [HACKERS] New regression driver

From

Tom Lane

Date:

20 November 1999, 17:00:24

wieck@debis.com (Jan Wieck) writes:
>     There  is  a  new shell script run_check.sh in the regression
>     test directory.  It  is  driven  by  the  conficuration  file
>     ./sql/run_check.tests and runs most of our tests parallel. It
>     is invoked with the new GNUmakefile target 'runcheck'.

This is way cool.  I had to fix a couple of silly little portability
problems,  but I like it.

>     I think if my new test driver has settled, we  should  change
>     the GNUmakefile to just print some messages if 'make runtest'
>     is typed.  The current runtest target should  IMHO  still  be
>     availabe   under   another   name,  to  test  the  real  life
>     installation created by 'make install'.

>     Alternatively (IMHO better) some  parameter  to  run_check.sh
>     could   tell   if   it  should  create  it's  own,  temporary
>     installation, or if it  should  use  the  existing  installed
>     database system and the already running postmaster.

We should leave the old driver available, so that if an unexpected
problem arises one can easily check to see if it's being triggered by
concurrent execution or not.  Or, run_check could have a parameter to
force serialized execution, if you would rather have just one script.
In that case we could toss the old runtest and rename run_check to
runtest.  (If we do keep both scripts, can we pick more helpful names
than "runtest" and "run_check"?  The difference is not immediately
obvious...)

I agree that run_check needs to be able to test a normal installation
as well as a temporary one.

>     Absolutely  right  and  I've commented out that code for now.
>     It is in utils/cache/catcache.c line 996.  The  comments  say
>     that  the  code  should  prevent  the  backend  from entering
>     infinite recursion while loading new cache entries.  But  the
>     flag  used  for it seems to live in shared memory, thus it is
>     affected by other backends too. If the flag is  true  doesn't
>     tell if a backend set it itself, or if another one did. If we
>     really need this check, it must  be  implemented  in  another
>     way.

I will look at this.  I don't think that the catcaches live in
shared memory, so the problem is probably not what you suggest.
The fact that the behavior is different under load may point to a
real problem, not just an insufficiently clever debugging check.

>     I  ran  the  old  regress.sh  using  the default installation
>     parallel to the run_check.sh using it's own installation  and
>     postmaster.

They both give the same results on my platform, too.
        regards, tom lane

Re: [HACKERS] New regression driver

From

Tom Lane

Date:

20 November 1999, 19:11:25

Tom Lane <tgl@sss.pgh.pa.us> writes:
> wieck@debis.com (Jan Wieck) writes:
>> It is in utils/cache/catcache.c line 996.  The  comments  say
>> that  the  code  should  prevent  the  backend  from entering
>> infinite recursion while loading new cache entries.

> I will look at this.  I don't think that the catcaches live in
> shared memory, so the problem is probably not what you suggest.
> The fact that the behavior is different under load may point to a
> real problem, not just an insufficiently clever debugging check.

Indeed, this is a real bug, and commenting out the code that caught
it is not the right fix!

What is happening is that utils/inval.c is trying to initialize some
variables that contain OIDs of system relations.  This means calling
the catcache routines in order to look up relation names in pg_class.
However, if a shared cache inval message arrives from another backend
while that's happening, we recursively invoke inval.c to deal with the
message.  And inval.c sees that its OID variables aren't initialized
yet, so it recursively calls the catcache routines to try to get them
initialized.  Or, if just the first one's been initialized so far,
ValidateHacks() assumes they're all valid, and you can end up at the
elog(FATAL) panic at the bottom of CacheIdInvalidate().  I've got a core
dump which contains a ten-deep recursion between inval.c and syscache.c,
culminating in elog(FATAL) because the eleventh incoming sinval message
was just slow enough to let inval.c's first OID variable get filled in
before it arrived.

In short: we don't deal very robustly with cache invals happening
during backend startup.  Send invals at a new backend with just the
right timing, and it'll choke.

I am not sure if this bug is of long standing or if we introduced it
since 6.5.  It's possible I created it while messing with the relcache
stuff a month or two ago.  But I can easily believe that it's been
there a long time and we never had a way of reproducing the problem
with any reliability before.

I think the fix is to rip out inval.c's attempt to look up system
relation names, and just give it hardwired knowledge of their OIDs.
Even though it sort-of works to do the lookups, it's bad practice for
routines that are potentially called during catcache initialization
to depend on the catcache to be already working.  And there are other
places that already have hardwired knowledge of the system relation
OIDs, so...
        regards, tom lane

Re: [HACKERS] New regression driver

From

Bruce Momjian

Date:

20 November 1999, 20:01:25

> I think the fix is to rip out inval.c's attempt to look up system
> relation names, and just give it hardwired knowledge of their OIDs.
> Even though it sort-of works to do the lookups, it's bad practice for
> routines that are potentially called during catcache initialization
> to depend on the catcache to be already working.  And there are other
> places that already have hardwired knowledge of the system relation
> OIDs, so...

FYI, I am in the process of coding all cache miss lookups to use new
system indexes.  I have also added code to SearchSelfReferences()
because pg_operator has some fancy depency on its lookup using an index,
and has to have certain lookup happen with an sequential and not an
index scan.

--  Bruce Momjian                        |  http://www.op.net/~candle maillist@candle.pha.pa.us            |  (610)
853-3000+  If your life is a hard drive,     |  830 Blythe Avenue +  Christ can be your backup.        |  Drexel Hill,
Pennsylvania19026

Re: [HACKERS] New regression driver

From

Tom Lane

Date:

20 November 1999, 21:13:26

Bruce Momjian <pgman@candle.pha.pa.us> writes:
> ... I have also added code to SearchSelfReferences()
> because pg_operator has some fancy depency on its lookup using an index,
> and has to have certain lookup happen with an sequential and not an
> index scan.

Say what?  That's got to be a symptom of a bug somewhere.  Maybe
pg_operator needs some CommandCounterIncrement calls so that the
tuples it inserts become visible earlier?  What are you seeing exactly?

For that matter, SearchSelfReferences looks like one giant kluge to me.
Who added this, and why, and what's the logic?  (Undocumented kluges
are very high on my hate list.)
        regards, tom lane

RE: [HACKERS] New regression driver

From

"Hiroshi Inoue"

Date:

21 November 1999, 21:38:43

> -----Original Message-----
> From: owner-pgsql-hackers@postgreSQL.org
> [mailto:owner-pgsql-hackers@postgreSQL.org]On Behalf Of Tom Lane
> Sent: Sunday, November 21, 1999 11:12 AM
> To: Bruce Momjian
> Cc: PostgreSQL HACKERS
> Subject: Re: [HACKERS] New regression driver
>
>
> Bruce Momjian <pgman@candle.pha.pa.us> writes:
> > ... I have also added code to SearchSelfReferences()
> > because pg_operator has some fancy depency on its lookup using an index,
> > and has to have certain lookup happen with an sequential and not an
> > index scan.
>
> Say what?  That's got to be a symptom of a bug somewhere.  Maybe
> pg_operator needs some CommandCounterIncrement calls so that the
> tuples it inserts become visible earlier?  What are you seeing exactly?
>
> For that matter, SearchSelfReferences looks like one giant kluge to me.
> Who added this, and why, and what's the logic?  (Undocumented kluges
> are very high on my hate list.)
>

It's me who added the function.
I left it undocumented,sorry.
Bruce,could you add an document on it ?

Bruce added a new index to pg_index.
Index scan needs an information of pg_index.
If we use the new index,we needs the information about the index
in pg_index.
Doesn't this cause a real cycle ?

I added the function in order to hold one tuple which causes a real
cycle. The tuple in pg_index should be scanned sequentially.

I don't think it's the best solution.
Please change it if there's a better way.

Regards.

Hiroshi Inoue
Inoue@tpf.co.jp

Re: [HACKERS] New regression driver

From

Bruce Momjian

Date:

21 November 1999, 21:50:43

> It's me who added the function.
> I left it undocumented,sorry.
> Bruce,could you add an document on it ?

Done.  Will commit soon.

> 
> Bruce added a new index to pg_index.
> Index scan needs an information of pg_index.
> If we use the new index,we needs the information about the index
> in pg_index.
> Doesn't this cause a real cycle ?

Yes. I am using it for a pg_operator index too.

> 
> I added the function in order to hold one tuple which causes a real
> cycle. The tuple in pg_index should be scanned sequentially.
> 
> I don't think it's the best solution.
> Please change it if there's a better way.

I talked to Tom, and we think it is a good solution.

--  Bruce Momjian                        |  http://www.op.net/~candle maillist@candle.pha.pa.us            |  (610)
853-3000+  If your life is a hard drive,     |  830 Blythe Avenue +  Christ can be your backup.        |  Drexel Hill,
Pennsylvania19026