Thread: Re: Regression test failure date.

Re: Regression test failure date.

From
Tom Lane
Date:
Bruce Momjian <pgman@candle.pha.pa.us> writes:
> I am now seeing this error in 2003-03-03.

> CREATE TABLE INSERT_CHILD (cx INT default 42,
> cy INT CHECK (cy > x))
> INHERITS (INSERT_TBL);
> + ERROR:  RelationClearRelation: relation 130996 deleted while still in use

I have a theory about the failures that occur while creating tables.
If a relcache flush were to occur due to SI buffer overrun between
creation of the new rel's relcache entry by RelationBuildLocalRelation
and completion of the command, then you'd see an error exactly like the
above, because the relcache would try to rebuild the cache entry by
reading the pg_class and pg_attribute rows for the relation.  Which
would possibly not exist yet, and even if they did exist they'd be
invisible under SnapshotNow rules.

However this bug is of long standing, and it doesn't seem all that
probable as an explanation for your difficulties.  It would be worth
running the tests with log_min_messages set to DEBUG4 (along with the
verbosity setting, please) and see if you observe "cache state reset"
log entries just before the failures.

In any case this would not explain failures during DROP TABLE, so
there's another issue to look for.
        regards, tom lane


Re: Regression test failure date.

From
Bruce Momjian
Date:
Tom Lane wrote:
> Bruce Momjian <pgman@candle.pha.pa.us> writes:
> > I am now seeing this error in 2003-03-03.
> 
> >   CREATE TABLE INSERT_CHILD (cx INT default 42,
> >         cy INT CHECK (cy > x))
> >         INHERITS (INSERT_TBL);
> > + ERROR:  RelationClearRelation: relation 130996 deleted while still in use
> 
> Define "now seeing".  Did you change something?  Did you just run more
> test cycles and it happened one time?  Did it suddenly start to happen a
> lot?

Ran more cycles, that's all.  I had reported 2003-03-03 was fine, but
only ran a few tests that previous time.  I am looking at the
mid-February date range now.

--  Bruce Momjian                        |  http://candle.pha.pa.us pgman@candle.pha.pa.us               |  (610)
359-1001+  If your life is a hard drive,     |  13 Roberts Road +  Christ can be your backup.        |  Newtown Square,
Pennsylvania19073
 


Re: Regression test failure date.

From
Bruce Momjian
Date:
Tom, is the attached regression diff considered normal?  This was
generated by current CVS.

I am trying to determine what is a normal error and what is something to
be concerned about.

Also, I am up to Feb 25 with no errors, but am still testing.

---------------------------------------------------------------------------

Tom Lane wrote:
> Bruce Momjian <pgman@candle.pha.pa.us> writes:
> > I am now seeing this error in 2003-03-03.
>
> > CREATE TABLE INSERT_CHILD (cx INT default 42,
> > cy INT CHECK (cy > x))
> > INHERITS (INSERT_TBL);
> > + ERROR:  RelationClearRelation: relation 130996 deleted while still in use
>
> I have a theory about the failures that occur while creating tables.
> If a relcache flush were to occur due to SI buffer overrun between
> creation of the new rel's relcache entry by RelationBuildLocalRelation
> and completion of the command, then you'd see an error exactly like the
> above, because the relcache would try to rebuild the cache entry by
> reading the pg_class and pg_attribute rows for the relation.  Which
> would possibly not exist yet, and even if they did exist they'd be
> invisible under SnapshotNow rules.
>
> However this bug is of long standing, and it doesn't seem all that
> probable as an explanation for your difficulties.  It would be worth
> running the tests with log_min_messages set to DEBUG4 (along with the
> verbosity setting, please) and see if you observe "cache state reset"
> log entries just before the failures.
>
> In any case this would not explain failures during DROP TABLE, so
> there's another issue to look for.
>
>             regards, tom lane
>

--
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 359-1001
  +  If your life is a hard drive,     |  13 Roberts Road
  +  Christ can be your backup.        |  Newtown Square, Pennsylvania 19073
*** ./expected/constraints.out    Mon Jul 28 13:50:13 2003
--- ./results/constraints.out    Mon Jul 28 18:32:55 2003
***************
*** 80,102 ****
  CREATE TABLE CHECK2_TBL (x int, y text, z int,
      CONSTRAINT SEQUENCE_CON
      CHECK (x > 3 and y <> 'check failed' and z < 8));
  INSERT INTO CHECK2_TBL VALUES (4, 'check ok', -2);
  INSERT INTO CHECK2_TBL VALUES (1, 'x check failed', -2);
! ERROR:  new row for relation "check2_tbl" violates CHECK constraint "sequence_con"
  INSERT INTO CHECK2_TBL VALUES (5, 'z check failed', 10);
! ERROR:  new row for relation "check2_tbl" violates CHECK constraint "sequence_con"
  INSERT INTO CHECK2_TBL VALUES (0, 'check failed', -2);
! ERROR:  new row for relation "check2_tbl" violates CHECK constraint "sequence_con"
  INSERT INTO CHECK2_TBL VALUES (6, 'check failed', 11);
! ERROR:  new row for relation "check2_tbl" violates CHECK constraint "sequence_con"
  INSERT INTO CHECK2_TBL VALUES (7, 'check ok', 7);
  SELECT '' AS two, * from CHECK2_TBL;
!  two | x |    y     | z
! -----+---+----------+----
!      | 4 | check ok | -2
!      | 7 | check ok |  7
! (2 rows)
!
  --
  -- Check constraints on INSERT
  --
--- 80,100 ----
  CREATE TABLE CHECK2_TBL (x int, y text, z int,
      CONSTRAINT SEQUENCE_CON
      CHECK (x > 3 and y <> 'check failed' and z < 8));
+ ERROR:  relation 126581 deleted while still in use
  INSERT INTO CHECK2_TBL VALUES (4, 'check ok', -2);
+ ERROR:  relation "check2_tbl" does not exist
  INSERT INTO CHECK2_TBL VALUES (1, 'x check failed', -2);
! ERROR:  relation "check2_tbl" does not exist
  INSERT INTO CHECK2_TBL VALUES (5, 'z check failed', 10);
! ERROR:  relation "check2_tbl" does not exist
  INSERT INTO CHECK2_TBL VALUES (0, 'check failed', -2);
! ERROR:  relation "check2_tbl" does not exist
  INSERT INTO CHECK2_TBL VALUES (6, 'check failed', 11);
! ERROR:  relation "check2_tbl" does not exist
  INSERT INTO CHECK2_TBL VALUES (7, 'check ok', 7);
+ ERROR:  relation "check2_tbl" does not exist
  SELECT '' AS two, * from CHECK2_TBL;
! ERROR:  relation "check2_tbl" does not exist
  --
  -- Check constraints on INSERT
  --

======================================================================

*** ./expected/misc.out    Mon Jul 28 13:50:13 2003
--- ./results/misc.out    Mon Jul 28 18:33:04 2003
***************
*** 580,586 ****
   c
   c_star
   char_tbl
-  check2_tbl
   check_seq
   check_tbl
   circle_tbl
--- 580,585 ----
***************
*** 660,666 ****
   toyemp
   varchar_tbl
   xacttest
! (96 rows)

  --SELECT name(equipment(hobby_construct(text 'skywalking', text 'mer'))) AS equip_name;
  SELECT hobbies_by_name('basketball');
--- 659,665 ----
   toyemp
   varchar_tbl
   xacttest
! (95 rows)

  --SELECT name(equipment(hobby_construct(text 'skywalking', text 'mer'))) AS equip_name;
  SELECT hobbies_by_name('basketball');

======================================================================


Re: Regression test failure date.

From
Tom Lane
Date:
Bruce Momjian <pgman@candle.pha.pa.us> writes:
> Tom, is the attached regression diff considered normal?  This was
> generated by current CVS.

Well, this *looks* like it could be an example of the SI-overrun-
during-create behavior I was talking about.  But if you weren't running
a verbose log to show whether a cache flush occurred just before the
error, there's no way to know for sure.

Right at the moment I am more interested in the other cases though
(cache lookup failure during DROP) since I have no plausible
explanation for them.
        regards, tom lane


Re: Regression test failure date.

From
Bruce Momjian
Date:
Tom Lane wrote:
> Bruce Momjian <pgman@candle.pha.pa.us> writes:
> > Tom, is the attached regression diff considered normal?  This was
> > generated by current CVS.
> 
> Well, this *looks* like it could be an example of the SI-overrun-
> during-create behavior I was talking about.  But if you weren't running
> a verbose log to show whether a cache flush occurred just before the
> error, there's no way to know for sure.

OK.

> Right at the moment I am more interested in the other cases though
> (cache lookup failure during DROP) since I have no plausible
> explanation for them.

Thanks.  That's what I need to know.

--  Bruce Momjian                        |  http://candle.pha.pa.us pgman@candle.pha.pa.us               |  (610)
359-1001+  If your life is a hard drive,     |  13 Roberts Road +  Christ can be your backup.        |  Newtown Square,
Pennsylvania19073
 


Re: Regression test failure date.

From
Tom Lane
Date:
I said:
>> I have a theory about the failures that occur while creating tables.
>> If a relcache flush were to occur due to SI buffer overrun between
>> creation of the new rel's relcache entry by RelationBuildLocalRelation
>> and completion of the command, then you'd see an error exactly like the
>> above, because the relcache would try to rebuild the cache entry by
>> reading the pg_class and pg_attribute rows for the relation.

After further study, though, the above theory falls flat on its face:
the relcache does *not* attempt to rebuild new relcache entries after
an SI overrun (see the comments to RelationCacheInvalidate).  So I'm
back to wondering what the heck is causing any of these messages.

I think we really need to see a stack trace from one of the failures.
Could you try running CVS tip with an "abort()" call replacing the
"relation %u deleted while still in use" elog?  (It's line 1797
in src/backend/utils/cache/relcache.c in CVS tip.)  Then when you
get the failure, get a stack trace with gdb from the core dump.
        regards, tom lane


Re: Regression test failure date.

From
Bruce Momjian
Date:
OK, on it now!

---------------------------------------------------------------------------

Tom Lane wrote:
> I said:
> >> I have a theory about the failures that occur while creating tables.
> >> If a relcache flush were to occur due to SI buffer overrun between
> >> creation of the new rel's relcache entry by RelationBuildLocalRelation
> >> and completion of the command, then you'd see an error exactly like the
> >> above, because the relcache would try to rebuild the cache entry by
> >> reading the pg_class and pg_attribute rows for the relation.
> 
> After further study, though, the above theory falls flat on its face:
> the relcache does *not* attempt to rebuild new relcache entries after
> an SI overrun (see the comments to RelationCacheInvalidate).  So I'm
> back to wondering what the heck is causing any of these messages.
> 
> I think we really need to see a stack trace from one of the failures.
> Could you try running CVS tip with an "abort()" call replacing the
> "relation %u deleted while still in use" elog?  (It's line 1797
> in src/backend/utils/cache/relcache.c in CVS tip.)  Then when you
> get the failure, get a stack trace with gdb from the core dump.
> 
>             regards, tom lane
> 
> ---------------------------(end of broadcast)---------------------------
> TIP 9: the planner will ignore your desire to choose an index scan if your
>       joining column's datatypes do not match
> 

--  Bruce Momjian                        |  http://candle.pha.pa.us pgman@candle.pha.pa.us               |  (610)
359-1001+  If your life is a hard drive,     |  13 Roberts Road +  Christ can be your backup.        |  Newtown Square,
Pennsylvania19073