Thread: Re: I might be getting closer?
[ cc to hackers] It certainly looks closer, particularly because the failure is s simple domain constraint failure and not a more internal error. Have you tried moving ahead a few days to see if the bug was fixed in CVS? --------------------------------------------------------------------------- Robert Creager wrote: -- Start of PGP signed section. > > Hey Bruce, > > I can get version 2003-02-01 to only fail one test, and sporadically at > that (2 out of 50 runs): > > *** ./expected/domain.out Sat Jul 26 12:24:18 2003 > --- ./results/domain.out Sat Jul 26 12:56:01 2003 > *************** > *** 263,269 **** > insert into domcontest values (5); > alter domain con drop constraint t; > insert into domcontest values (-5); --fails > ! ERROR: ExecEvalConstraintTest: Domain con constraint $1 failed > insert into domcontest values (42); > -- cleanup > drop domain ddef1 restrict; > --- 263,269 ---- > insert into domcontest values (5); > alter domain con drop constraint t; > insert into domcontest values (-5); --fails > ! ERROR: ExecEvalConstraintTest: Domain con constraint failed > insert into domcontest values (42); > -- cleanup > drop domain ddef1 restrict; > > ====================================================================== > > -- > 13:04:42 up 8 days, 17:05, 2 users, load average: 1.84, 1.24, 1.34 -- End of PGP section, PGP failed! -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 359-1001+ If your life is a hard drive, | 13 Roberts Road + Christ can be your backup. | Newtown Square, Pennsylvania19073
On Sat, 26 Jul 2003 16:49:27 -0400 (EDT) Bruce Momjian <pgman@candle.pha.pa.us> said something like: > [ cc to hackers] > > It certainly looks closer, particularly because the failure is s > simple domain constraint failure and not a more internal error. > > Have you tried moving ahead a few days to see if the bug was fixed in > CVS? > No. I'll run 2003-02-15 next. I just got the domain failure on 2003-01-26 after 42 passes. -- 15:03:30 up 8 days, 19:04, 2 users, load average: 2.40, 2.15, 2.31
I found it (I think)... Looks like something was done after the 15'th... 2003-02-15 passes 50/50 and 33/33 on second pass (so far) 2003-02-16 fails 6/50 vacuum failed 1 times misc failed 3 times sanity_check failed 3 times inherit failed 1 times triggersfailed 4 times 2003-02-18 fails 11/50 constraints failed 5 times sanity_check failed 3 times misc failed 8 times inherit failed 2 times rules failed 1 times triggers failed 5 times Cheers, Rob -- 17:42:41 up 8 days, 21:43, 2 users, load average: 3.62, 2.69, 2.35
Robert Creager <Robert_Creager@LogicalChaos.org> writes: > Looks like something was done after the 15'th... > 2003-02-15 passes 50/50 and 33/33 on second pass (so far) > 2003-02-16 fails 6/50 As far back as that! Okay, many thanks for the info --- that will help. I'm buried in error message editing right now but will look at the diffs in that timeframe tomorrow, unless someone beats me to it. regards, tom lane
Robert Creager <Robert_Creager@LogicalChaos.org> writes: > 2003-02-15 passes 50/50 and 33/33 on second pass (so far) > 2003-02-16 fails 6/50 I looked in the CVS logs while waiting for a compile, and the only patch I see that goes anywhere near the locking or cache code around that time is this one: 2003-02-17 21:13 momjian * src/: backend/storage/lmgr/deadlock.c,backend/storage/lmgr/lock.c, backend/storage/lmgr/proc.c,backend/utils/adt/lockfuncs.c,include/storage/lock.h,include/storage/proc.h: Rename 'holder'references to 'proclock'for PROCLOCK references, for consistency. which seems like a safe change (I assume it was just a search-and-replace; do you recall, Bruce?) and anyway the time is not quite right. What time of day did your successive pulls correspond to, anyway? (I believe my cvs2cl printout above is showing me EST.) regards, tom lane
Tom Lane wrote: > Robert Creager <Robert_Creager@LogicalChaos.org> writes: > > 2003-02-15 passes 50/50 and 33/33 on second pass (so far) > > 2003-02-16 fails 6/50 > > I looked in the CVS logs while waiting for a compile, and the only patch > I see that goes anywhere near the locking or cache code around that time > is this one: > > 2003-02-17 21:13 momjian > > * src/: backend/storage/lmgr/deadlock.c, > backend/storage/lmgr/lock.c, backend/storage/lmgr/proc.c, > backend/utils/adt/lockfuncs.c, include/storage/lock.h, > include/storage/proc.h: Rename 'holder' references to 'proclock' > for PROCLOCK references, for consistency. > > which seems like a safe change (I assume it was just a > search-and-replace; do you recall, Bruce?) and anyway the time is not > quite right. Yes, just a rename operation. > What time of day did your successive pulls correspond to, anyway? > (I believe my cvs2cl printout above is showing me EST.) For the date range: pgcvs log -d'2003-02-15 00:00:00 GMT<2003-02-18 00:00:00 GMT' -rHEAD I see: --------------------------------------------------------------------------- /src/include/optimizer/pathnode.h tglTeach plannerhow to propagate pathkeys from sub-SELECTs in FROM up tothe outer query. (The implementation is a bit klugy, butit would takenontrivial restructuring to make it nicer, which this is probably notworth.) This avoids unnecessary sortsteps in examples likeSELECT foo,count(*) FROM (SELECT ... ORDER BY foo,bar) sub GROUP BY foowhich means there is nowa reasonable technique for controlling theorder of inputs to custom aggregates, even in the grouping case. --- /src/test/regress/expected/case.out tglCOALESCE()and NULLIF() are now first-class expressions, not macrosthat turn into CASE expressions. They evaluate theirarguments at mostonce. Patch by Kris Jurka, review and (very light) editorializing by me. --- /doc/TODO.detail/exists momjianRemove IN/EXISTS TODO.detail item. -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 359-1001+ If your life is a hard drive, | 13 Roberts Road + Christ can be your backup. | Newtown Square, Pennsylvania19073
I am seeing repeatable success from a CVS of 2003-05-01, and repeatable failure from current CVS. I have only been running nightly paralell regression runs since June 27, so it is possible that the paralell regression was broken in February, fixed in May, then broken some time after that. I will test June 1 now. --------------------------------------------------------------------------- Robert Creager wrote: -- Start of PGP signed section. > > I found it (I think)... > > Looks like something was done after the 15'th... > > 2003-02-15 passes 50/50 and 33/33 on second pass (so far) > 2003-02-16 fails 6/50 > vacuum failed 1 times > misc failed 3 times > sanity_check failed 3 times > inherit failed 1 times > triggers failed 4 times > 2003-02-18 fails 11/50 > constraints failed 5 times > sanity_check failed 3 times > misc failed 8 times > inherit failed 2 times > rules failed 1 times > triggers failed 5 times > > Cheers, > Rob > > -- > 17:42:41 up 8 days, 21:43, 2 users, load average: 3.62, 2.69, 2.35 -- End of PGP section, PGP failed! -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 359-1001+ If your life is a hard drive, | 13 Roberts Road + Christ can be your backup. | Newtown Square, Pennsylvania19073
On Sat, 26 Jul 2003 20:24:56 -0400 Tom Lane <tgl@sss.pgh.pa.us> said something like: > > What time of day did your successive pulls correspond to, anyway? > (I believe my cvs2cl printout above is showing me EST.) > > regards, tom lane > > I'm MST, and I did not specify a timezone on the cvs updates. just <cvs update -D 2003-02-16> I can re-do with a specific time/date if you tell me what you want. Or give me a range. I take a few minutes to do a complete cvs download. Later, Rob -- 19:10:13 up 8 days, 23:10, 2 users, load average: 0.00, 0.00, 0.00
On Sat, 26 Jul 2003 21:08:46 -0400 (EDT) Bruce Momjian <pgman@candle.pha.pa.us> said something like: > > I am seeing repeatable success from a CVS of 2003-05-01, and > repeatable failure from current CVS. > > I have only been running nightly paralell regression runs since June > 27, so it is possible that the paralell regression was broken in > February, fixed in May, then broken some time after that. > > I will test June 1 now. > I don't know about that Bruce. When I grabbed 2003-05-01, I have 2 failures in 15 runs so far. One item I did have to change was to move from bison 1.5 to bison 1.875. I've attached included the first failure one. *** ./expected/triggers.out Sat Nov 23 11:13:22 2002 --- ./results/triggers.out Sat Jul 26 20:10:18 2003 *************** *** 87,92 **** --- 87,93 ---- NOTICE: check_pkeys_fkey_cascade: 1 tuple(s) of fkeys are deleted NOTICE: check_pkeys_fkey_cascade: 1 tuple(s)of fkeys2 are deleted DROP TABLE pkeys; + ERROR: cache lookup of relation 129432 failed DROP TABLE fkeys; DROP TABLE fkeys2; -- -- I've disabled the funny_dup17test because the new semantics ====================================================================== *** ./expected/sanity_check.out Mon Aug 19 13:33:36 2002 --- ./results/sanity_check.out Sat Jul 26 20:10:20 2003 *************** *** 58,68 **** pg_statistic | t pg_trigger | t pg_type | t road | t shighway | t tenk1 | t tenk2 | t ! (52 rows) -- -- another sanity check: every system catalog that has OIDs should have--- 58,69 ---- pg_statistic | t pg_trigger | t pg_type | t + pkeys | t road | t shighway | t tenk1 | t tenk2 | t ! (53 rows) -- -- another sanity check: every system catalog that has OIDs should have ====================================================================== *** ./expected/misc.out Sat Jul 26 20:03:48 2003 --- ./results/misc.out Sat Jul 26 20:10:22 2003 *************** *** 633,638 **** --- 633,639 ---- onek2 path_tbl person + pkeys point_tbl polygon_tbl ramp *************** *** 657,663 **** toyemp varchar_tbl xacttest ! (93 rows) --SELECT name(equipment(hobby_construct(text 'skywalking', text 'mer'))) AS equip_name; SELECT hobbies_by_name('basketball'); --- 658,664 ---- toyemp varchar_tbl xacttest ! (94 rows) --SELECT name(equipment(hobby_construct(text 'skywalking', text 'mer'))) AS equip_name; SELECT hobbies_by_name('basketball'); ====================================================================== -- 20:11:31 up 9 days, 12 min, 2 users, load average: 2.86, 2.30, 1.52
Bruce Momjian <pgman@candle.pha.pa.us> writes: > I have only been running nightly paralell regression runs since June 27, > so it is possible that the paralell regression was broken in February, > fixed in May, then broken some time after that. Any further progress on this? My best theory at the moment is that we have a problem with relcache entry creation failing if it's interrupted by an SI inval message at just the right time. I don't much want to grovel through six months worth of changelog entries looking for candidate mistakes, though. regards, tom lane
I will stand by the fact that I cannot generate failures from 2003-02-15 (200+ runs), and I can from 2003-02-16. Just to make sure I didn't screw up the cvs usage, I'll try again tonight if I get the chance and re-download re-test these two days. I can set up a script that will step through weekly dates starting from 'now' and see if the 02-16 problem might of been fixed and then re-introduced if you like. 2003-02-16 fails 6/50 vacuum failed 1 times misc failed 3 times sanity_check failed 3 times inherit failed 1 times triggersfailed 4 times Cheers, Rob On Mon, 28 Jul 2003 02:14:32 -0400 Tom Lane <tgl@sss.pgh.pa.us> said something like: > Bruce Momjian <pgman@candle.pha.pa.us> writes: > > I have only been running nightly paralell regression runs since June > > 27, so it is possible that the paralell regression was broken in > > February, fixed in May, then broken some time after that. > > Any further progress on this? > > My best theory at the moment is that we have a problem with relcache > entry creation failing if it's interrupted by an SI inval message at > just the right time. I don't much want to grovel through six months > worth of changelog entries looking for candidate mistakes, though. > > regards, tom lane > > ---------------------------(end of > broadcast)--------------------------- TIP 3: if posting/reading > through Usenet, please send an appropriate > subscribe-nomail command to majordomo@postgresql.org so that > your message can get through to the mailing list cleanly > > -- 06:57:40 up 10 days, 10:57, 2 users, load average: 2.17, 2.08, 1.83
I am testing this today. I found 2003-03-03 to not generate a failure in 20 tests, so I am moving forward to April/May. --------------------------------------------------------------------------- Robert Creager wrote: -- Start of PGP signed section. > > I will stand by the fact that I cannot generate failures from > 2003-02-15 (200+ runs), and I can from 2003-02-16. Just to make sure I > didn't screw up the cvs usage, I'll try again tonight if I get the > chance and re-download re-test these two days. > > I can set up a script that will step through weekly dates starting from > 'now' and see if the 02-16 problem might of been fixed and then > re-introduced if you like. > > 2003-02-16 fails 6/50 > vacuum failed 1 times > misc failed 3 times > sanity_check failed 3 times > inherit failed 1 times > triggers failed 4 times > > Cheers, > Rob > > On Mon, 28 Jul 2003 02:14:32 -0400 > Tom Lane <tgl@sss.pgh.pa.us> said something like: > > > Bruce Momjian <pgman@candle.pha.pa.us> writes: > > > I have only been running nightly paralell regression runs since June > > > 27, so it is possible that the paralell regression was broken in > > > February, fixed in May, then broken some time after that. > > > > Any further progress on this? > > > > My best theory at the moment is that we have a problem with relcache > > entry creation failing if it's interrupted by an SI inval message at > > just the right time. I don't much want to grovel through six months > > worth of changelog entries looking for candidate mistakes, though. > > > > regards, tom lane > > > > ---------------------------(end of > > broadcast)--------------------------- TIP 3: if posting/reading > > through Usenet, please send an appropriate > > subscribe-nomail command to majordomo@postgresql.org so that > > your message can get through to the mailing list cleanly > > > > > > > -- > 06:57:40 up 10 days, 10:57, 2 users, load average: 2.17, 2.08, 1.83 -- End of PGP section, PGP failed! -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 359-1001+ If your life is a hard drive, | 13 Roberts Road + Christ can be your backup. | Newtown Square, Pennsylvania19073
I am now seeing this error in 2003-03-03. CREATE TABLE INSERT_CHILD (cx INT default 42, cy INT CHECK (cy > x)) INHERITS (INSERT_TBL); + ERROR: RelationClearRelation: relation 130996 deleted while still in use --------------------------------------------------------------------------- Bruce Momjian wrote: > > I am testing this today. I found 2003-03-03 to not generate a failure > in 20 tests, so I am moving forward to April/May. > > --------------------------------------------------------------------------- > > Robert Creager wrote: > -- Start of PGP signed section. > > > > I will stand by the fact that I cannot generate failures from > > 2003-02-15 (200+ runs), and I can from 2003-02-16. Just to make sure I > > didn't screw up the cvs usage, I'll try again tonight if I get the > > chance and re-download re-test these two days. > > > > I can set up a script that will step through weekly dates starting from > > 'now' and see if the 02-16 problem might of been fixed and then > > re-introduced if you like. > > > > 2003-02-16 fails 6/50 > > vacuum failed 1 times > > misc failed 3 times > > sanity_check failed 3 times > > inherit failed 1 times > > triggers failed 4 times > > > > Cheers, > > Rob > > > > On Mon, 28 Jul 2003 02:14:32 -0400 > > Tom Lane <tgl@sss.pgh.pa.us> said something like: > > > > > Bruce Momjian <pgman@candle.pha.pa.us> writes: > > > > I have only been running nightly paralell regression runs since June > > > > 27, so it is possible that the paralell regression was broken in > > > > February, fixed in May, then broken some time after that. > > > > > > Any further progress on this? > > > > > > My best theory at the moment is that we have a problem with relcache > > > entry creation failing if it's interrupted by an SI inval message at > > > just the right time. I don't much want to grovel through six months > > > worth of changelog entries looking for candidate mistakes, though. > > > > > > regards, tom lane > > > > > > ---------------------------(end of > > > broadcast)--------------------------- TIP 3: if posting/reading > > > through Usenet, please send an appropriate > > > subscribe-nomail command to majordomo@postgresql.org so that > > > your message can get through to the mailing list cleanly > > > > > > > > > > > > -- > > 06:57:40 up 10 days, 10:57, 2 users, load average: 2.17, 2.08, 1.83 > -- End of PGP section, PGP failed! > > -- > Bruce Momjian | http://candle.pha.pa.us > pgman@candle.pha.pa.us | (610) 359-1001 > + If your life is a hard drive, | 13 Roberts Road > + Christ can be your backup. | Newtown Square, Pennsylvania 19073 > > ---------------------------(end of broadcast)--------------------------- > TIP 2: you can get off all lists at once with the unregister command > (send "unregister YourEmailAddressHere" to majordomo@postgresql.org) > -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 359-1001+ If your life is a hard drive, | 13 Roberts Road + Christ can be your backup. | Newtown Square, Pennsylvania19073
Bruce Momjian <pgman@candle.pha.pa.us> writes: > I am now seeing this error in 2003-03-03. > CREATE TABLE INSERT_CHILD (cx INT default 42, > cy INT CHECK (cy > x)) > INHERITS (INSERT_TBL); > + ERROR: RelationClearRelation: relation 130996 deleted while still in use Define "now seeing". Did you change something? Did you just run more test cycles and it happened one time? Did it suddenly start to happen a lot? regards, tom lane