Thread: Re: [COMMITTERS] pgsql: Fix brain fade in DefineIndex(): it was continuing to access the

Andrew Dunstan <andrew@dunslane.net> writes:
> [redirected to -hackers]

> Tom Lane wrote:
>> Found by trying to run the regression tests with CLOBBER_CACHE_ALWAYS enabled.
>> Maybe we should try to do that on a regular basis --- it's awfully slow,
>> but perhaps some fast buildfarm machine could do it once in awhile.

> We currently don't have any way of providing for settings other than via 
> the configure script options. We could in theory make provision for 
> adding stuff to pg_config_manual.h, although it would be a bit more 
> difficult in the vpath (I guess for vpath we could just blow that file 
> away along with the files bison, flex etc. lob in the repo).

This particular issue could be implemented just by adding
-DCLOBBER_CACHE_ALWAYS to CFLAGS (or CPPFLAGS if you want to be anal
about it).  I suppose that no new buildfarm mechanism is required ---
someone just needs to set up an animal configured that way, and
scheduled to run only maybe once a week or something like that.
        regards, tom lane



Tom Lane wrote:
> Andrew Dunstan <andrew@dunslane.net> writes:
>   
>> [redirected to -hackers]
>>     
>
>   
>> Tom Lane wrote:
>>     
>>> Found by trying to run the regression tests with CLOBBER_CACHE_ALWAYS enabled.
>>> Maybe we should try to do that on a regular basis --- it's awfully slow,
>>> but perhaps some fast buildfarm machine could do it once in awhile.
>>>       
>
>   
>> We currently don't have any way of providing for settings other than via 
>> the configure script options. We could in theory make provision for 
>> adding stuff to pg_config_manual.h, although it would be a bit more 
>> difficult in the vpath (I guess for vpath we could just blow that file 
>> away along with the files bison, flex etc. lob in the repo).
>>     
>
> This particular issue could be implemented just by adding
> -DCLOBBER_CACHE_ALWAYS to CFLAGS (or CPPFLAGS if you want to be anal
> about it).  I suppose that no new buildfarm mechanism is required ---
> someone just needs to set up an animal configured that way, and
> scheduled to run only maybe once a week or something like that.
>
>             
>   

Ah. Ok. That makes sense. How long does such a regression run usually take?

cheers

andrew


Andrew Dunstan <andrew@dunslane.net> writes:
> Tom Lane wrote:
>> This particular issue could be implemented just by adding
>> -DCLOBBER_CACHE_ALWAYS to CFLAGS (or CPPFLAGS if you want to be anal
>> about it).  I suppose that no new buildfarm mechanism is required ---
>> someone just needs to set up an animal configured that way, and
>> scheduled to run only maybe once a week or something like that.

> Ah. Ok. That makes sense. How long does such a regression run usually take?

On my x86_64 machine (dual 2.8GHz Xeon EM64T) it's on the order of two
or three hours --- I haven't timed it carefully, but somewhere along
there.  That's just for the core regression tests, I've never tried
contrib or PL tests.

It should be a separate animal, and not something that an existing one
does every-so-often, or we might mistake anything it finds for an
irreproducible transient failure.  Consistent failures on the same
animal will stand out of the noise, though.
        regards, tom lane



Tom Lane wrote:
> Andrew Dunstan <andrew@dunslane.net> writes:
>   
>> Tom Lane wrote:
>>     
>>> This particular issue could be implemented just by adding
>>> -DCLOBBER_CACHE_ALWAYS to CFLAGS (or CPPFLAGS if you want to be anal
>>> about it).  I suppose that no new buildfarm mechanism is required ---
>>> someone just needs to set up an animal configured that way, and
>>> scheduled to run only maybe once a week or something like that.
>>>       
>
>   
>> Ah. Ok. That makes sense. How long does such a regression run usually take?
>>     
>
> On my x86_64 machine (dual 2.8GHz Xeon EM64T) it's on the order of two
> or three hours --- I haven't timed it carefully, but somewhere along
> there.  That's just for the core regression tests, I've never tried
> contrib or PL tests.
>
> It should be a separate animal, and not something that an existing one
> does every-so-often, or we might mistake anything it finds for an
> irreproducible transient failure.  Consistent failures on the same
> animal will stand out of the noise, though.
>
>
>   


I tried this on a little P3 I have lying around:

[andrew@marmaduke bf]$ ./run_build.pl --test --conf=figeater.conf
Mon Aug 27 17:03:55 2007: buildfarm run for figeater:HEAD starting
[17:03:55] checking out source ...
[17:04:17] checking if build run needed ...
[17:04:18] creating vpath build dir pgsql.11834 ...
[17:04:18] running configure ...
[17:06:06] running make ...
[17:31:14] running make check ...
[00:43:28] running make contrib ...
Branch: HEAD
Stage Contrib failed with status 2


Good thing it failed - goodness only knows how long the extra runs would 
have taken :-)

Does someone have a box with lots of grunt that can spare some cycles 
for a few hours once a week or so?

cheers

andrew



Hi all,
On Tue, 28 Aug 2007, Andrew Dunstan wrote:

> Date: Tue, 28 Aug 2007 12:34:46 -0400
> From: Andrew Dunstan <andrew@dunslane.net>
> Newsgroups: pgsql.hackers
> Subject: Re: [COMMITTERS] pgsql: Fix brain fade in DefineIndex(): it was
>     continuing to access the
>
>
>
> Tom Lane wrote:
> > Andrew Dunstan <andrew@dunslane.net> writes:
> >
> >> Tom Lane wrote:
> >>
> >>> This particular issue could be implemented just by adding
> >>> -DCLOBBER_CACHE_ALWAYS to CFLAGS (or CPPFLAGS if you want to be anal
> >>> about it).  I suppose that no new buildfarm mechanism is required ---
> >>> someone just needs to set up an animal configured that way, and
> >>> scheduled to run only maybe once a week or something like that.
> >>>
> >
> >
> >> Ah. Ok. That makes sense. How long does such a regression run usually take?
> >>
> >
> > On my x86_64 machine (dual 2.8GHz Xeon EM64T) it's on the order of two
> > or three hours --- I haven't timed it carefully, but somewhere along
> > there.  That's just for the core regression tests, I've never tried
> > contrib or PL tests.
> >
> > It should be a separate animal, and not something that an existing one
> > does every-so-often, or we might mistake anything it finds for an
> > irreproducible transient failure.  Consistent failures on the same
> > animal will stand out of the noise, though.
> >
> >
> >
>
>
> I tried this on a little P3 I have lying around:
>
> [andrew@marmaduke bf]$ ./run_build.pl --test --conf=figeater.conf
> Mon Aug 27 17:03:55 2007: buildfarm run for figeater:HEAD starting
> [17:03:55] checking out source ...
> [17:04:17] checking if build run needed ...
> [17:04:18] creating vpath build dir pgsql.11834 ...
> [17:04:18] running configure ...
> [17:06:06] running make ...
> [17:31:14] running make check ...
> [00:43:28] running make contrib ...
> Branch: HEAD
> Stage Contrib failed with status 2
>
>
> Good thing it failed - goodness only knows how long the extra runs would
> have taken :-)
>
> Does someone have a box with lots of grunt that can spare some cycles
> for a few hours once a week or so?
I've just configure centaur (CentOS 5) like this.
Do you need it for every version (8.1, 8.2) or just HEAD
Do you need it just once a week or every run?
I can also configure wharthog (unixware) like this if you need...
>
> cheers
>
> andrew
>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 3: Have you checked our extensive FAQ?
>
>                http://www.postgresql.org/docs/faq
>
>

-- 
Olivier PRENANT                    Tel: +33-5-61-50-97-00 (Work)
15, Chemin des Monges                +33-5-61-50-97-01 (Fax)
31190 AUTERIVE                       +33-6-07-63-80-64 (GSM)
FRANCE                          Email: ohp@pyrenet.fr
------------------------------------------------------------------------------
Make your life a dream, make your dream a reality. (St Exupery)



ohp@pyrenet.fr wrote:
>>
>> Does someone have a box with lots of grunt that can spare some cycles
>> for a few hours once a week or so?
>>     
> I've just configure centaur (CentOS 5) like this.
> Do you need it for every version (8.1, 8.2) or just HEAD
> Do you need it just once a week or every run?
> I can also configure wharthog (unixware) like this if you need...
>   
>

I have no idea - probably just HEAD, but Tom will have a better idea :-) 
I imagine once a week would do, but as he says it should be a separate 
animal, not an occasional change in an existing animal. So don't change 
your existing buildfarm members please.

cheers

andrew


Andrew Dunstan <andrew@dunslane.net> writes:
> ohp@pyrenet.fr wrote:
>> I've just configure centaur (CentOS 5) like this.
>> Do you need it for every version (8.1, 8.2) or just HEAD
>> Do you need it just once a week or every run?
>> I can also configure wharthog (unixware) like this if you need...

> I have no idea - probably just HEAD, but Tom will have a better idea :-) 

HEAD only is enough IMHO.  If we find any bugs it should be pretty
obvious whether they need to be back-patched.
        regards, tom lane


ohp@pyrenet.fr writes:
> Ok, do you need it on unixware?
> do you need it more than once a week?

No, and no.

> FYI, centaurc is running make check for 3 hours now and has'nt finished!

You're finding out why I suggested a really low duty cycle ;-)
        regards, tom lane


Hi Tom
On Thu, 30 Aug 2007, Tom Lane wrote:

> Date: Thu, 30 Aug 2007 10:04:34 -0400
> From: Tom Lane <tgl@sss.pgh.pa.us>
> To: Andrew Dunstan <andrew@dunslane.net>
> Cc: ohp@pyrenet.fr, PostgreSQL-development <pgsql-hackers@postgresql.org>
> Subject: Re: [HACKERS] Re: [COMMITTERS] pgsql: Fix brain fade in
>     DefineIndex(): it was continuing to access the
>
> Andrew Dunstan <andrew@dunslane.net> writes:
> > ohp@pyrenet.fr wrote:
> >> I've just configure centaur (CentOS 5) like this.
> >> Do you need it for every version (8.1, 8.2) or just HEAD
> >> Do you need it just once a week or every run?
> >> I can also configure wharthog (unixware) like this if you need...
>
> > I have no idea - probably just HEAD, but Tom will have a better idea :-)
>
> HEAD only is enough IMHO.  If we find any bugs it should be pretty
> obvious whether they need to be back-patched.
>
Ok, do you need it on unixware?
do you need it more than once a week?
FYI, centaurc is running make check for 3 hours now and has'nt finished!
Regards
>             regards, tom lane
>

-- 
Olivier PRENANT                    Tel: +33-5-61-50-97-00 (Work)
15, Chemin des Monges                +33-5-61-50-97-01 (Fax)
31190 AUTERIVE                       +33-6-07-63-80-64 (GSM)
FRANCE                          Email: ohp@pyrenet.fr
------------------------------------------------------------------------------
Make your life a dream, make your dream a reality. (St Exupery)



ohp@pyrenet.fr wrote:
> Ok, do you need it on unixware?
> do you need it more than once a week?
> FYI, centaurc is running make check for 3 hours now and has'nt finished!
>   
>

The platform probably doesn't matter so much, I suspect. Once a week on 
a separately registered animal should be fine. Whatever you have that's 
fastest.

And don't be surprised how long it takes - "make check" on my little 
PIII/FC5 box took 7 hours.

cheers

andrew


Hi Tom and Andrew,

On Thu, 30 Aug 2007, Tom Lane wrote:

> Date: Thu, 30 Aug 2007 10:04:34 -0400
> From: Tom Lane <tgl@sss.pgh.pa.us>
> To: Andrew Dunstan <andrew@dunslane.net>
> Cc: ohp@pyrenet.fr, PostgreSQL-development <pgsql-hackers@postgresql.org>
> Subject: Re: [HACKERS] Re: [COMMITTERS] pgsql: Fix brain fade in
>     DefineIndex(): it was continuing to access the
>
> Andrew Dunstan <andrew@dunslane.net> writes:
> > ohp@pyrenet.fr wrote:
> >> I've just configure centaur (CentOS 5) like this.
> >> Do you need it for every version (8.1, 8.2) or just HEAD
> >> Do you need it just once a week or every run?
> >> I can also configure wharthog (unixware) like this if you need...
>
> > I have no idea - probably just HEAD, but Tom will have a better idea :-)
>
> HEAD only is enough IMHO.  If we find any bugs it should be pretty
> obvious whether they need to be back-patched.
>
>             regards, tom lane
>
While waiting for my application for another animal, I made some tests and
was surprised that cluster test failed with an ordering error.
Maybe an ORDER BY is missing in the test query.

Here's an exerpt of check.log:

============== creating temporary installation        ==============
============== initializing database system           ==============
============== starting postmaster                    ==============
running on port 55678 with pid 6928
============== creating database "regression"         ==============
CREATE DATABASE
ALTER DATABASE
============== installing plpgsql                     ==============
CREATE LANGUAGE
============== running regression test queries        ==============
parallel group (15 tests):  text name oid char float4 int2 boolean varchar int8 float8 int4 uuid bit enum numeric
boolean             ... ok    char                 ... ok    name                 ... ok    varchar              ... ok
  text                 ... ok    int2                 ... ok    int4                 ... ok    int8                 ...
ok   oid                  ... ok    float4               ... ok    float8               ... ok    bit
...ok    numeric              ... ok    uuid                 ... ok    enum                 ... ok
 
test strings              ... ok
test numerology           ... ok
parallel group (18 tests):  comments path lseg time timetz reltime polygon tinterval box circle abstime tstypes point
intervaltimestamptz timestamp date inet    point                ... ok    lseg                 ... ok    box
     ... ok    path                 ... ok    polygon              ... ok    circle               ... ok    date
        ... ok    time                 ... ok    timetz               ... ok    timestamp            ... ok
timestamptz         ... ok    interval             ... ok    abstime              ... ok    reltime              ... ok
  tinterval            ... ok    inet                 ... ok    tstypes              ... ok    comments             ...
ok
parallel group (5 tests):  geometry horology type_sanity oidjoins opr_sanity    geometry             ... ok    horology
           ... ok    oidjoins             ... ok    type_sanity          ... ok    opr_sanity           ... ok
 
test insert               ... ok
test create_function_1    ... ok
test create_type          ... ok
test create_table         ... ok
test create_function_2    ... ok
parallel group (2 tests):  copy copyselect    copy                 ... ok    copyselect           ... ok
parallel group (8 tests):  create_operator create_aggregate vacuum drop_if_exists create_misc constraints inherit
triggers   constraints          ... ok    triggers             ... ok    create_misc          ... ok
create_aggregate    ... ok    create_operator      ... ok    inherit              ... ok    vacuum               ... ok
  drop_if_exists       ... ok
 
parallel group (2 tests):  create_view create_index    create_index         ... ok    create_view          ... ok
test sanity_check         ... ok
test errors               ... ok
test select               ... ok
parallel group (20 tests):  select_distinct_on btree_index select_into delete select_distinct namespace update
hash_indexselect_having union case random select_implicit prepared_xacts aggregates transactions subselect portals join
arrays   select_into          ... ok    select_distinct      ... ok    select_distinct_on   ... ok    select_implicit
  ... ok    select_having        ... ok    subselect            ... ok    union                ... ok    case
     ... ok    join                 ... ok    aggregates           ... ok    transactions         ... ok    random
        ... ok    portals              ... ok    arrays               ... ok    btree_index          ... ok
hash_index          ... ok    update               ... ok    namespace            ... ok    prepared_xacts       ... ok
  delete               ... ok
 
test privileges           ... ok
test misc                 ... ok
parallel group (9 tests):  select_views combocid guc portals_p2 dependency tsearch cluster rules foreign_key
select_views        ... ok    portals_p2           ... ok    rules                ... ok    foreign_key          ... ok
  cluster              ... FAILED    dependency           ... ok    guc                  ... ok    combocid
...ok    tsearch              ... ok
 
parallel group (18 tests):  limit xml plancache rowtypes prepare sequence temp without_oid largeobject copy2
polymorphismconversion returning truncate rangefuncs domain alter_table plpgsql    plancache            ... ok    limit
              ... ok    plpgsql              ... ok    copy2                ... ok    temp                 ... ok
domain              ... ok    rangefuncs           ... ok    prepare              ... ok    without_oid          ... ok
  conversion           ... ok    truncate             ... ok    alter_table          ... ok    sequence             ...
ok   polymorphism         ... ok    rowtypes             ... ok    returning            ... ok    largeobject
...ok    xml                  ... ok
 
test stats                ... ok
test tablespace           ... ok
============== shutting down postmaster               ==============
server stopped

========================1 of 111 tests failed.
========================

The differences that caused some tests to fail can be viewed in the
file "./regression.diffs".  A copy of the test summary that you see
above is saved in the file "./regression.out".

make: *** [check] Erreur 1


================== pgsql.25500/src/test/regress/regression.diffs ===================
*** ./expected/cluster.out    Fri Aug 31 16:08:02 2007
--- ./results/cluster.out    Fri Aug 31 16:30:32 2007
***************
*** 252,259 **** SELECT conname FROM pg_constraint WHERE conrelid = 'clstr_tst'::regclass;     conname
----------------
-  clstr_tst_pkey  clstr_tst_con (2 rows)
 SELECT relname, relkind,
--- 252,259 ---- SELECT conname FROM pg_constraint WHERE conrelid = 'clstr_tst'::regclass;     conname ----------------
clstr_tst_con
 
+  clstr_tst_pkey (2 rows)
 SELECT relname, relkind,

======================================================================

Best regards
-- 
Olivier PRENANT                    Tel: +33-5-61-50-97-00 (Work)
15, Chemin des Monges                +33-5-61-50-97-01 (Fax)
31190 AUTERIVE                       +33-6-07-63-80-64 (GSM)
FRANCE                          Email: ohp@pyrenet.fr
------------------------------------------------------------------------------
Make your life a dream, make your dream a reality. (St Exupery)


ohp@pyrenet.fr wrote:
> Hi Tom and Andrew,
> 
> On Thu, 30 Aug 2007, Tom Lane wrote:
> 
> While waiting for my application for another animal, I made some tests and
> was surprised that cluster test failed with an ordering error.

This is running with CLOBBER_CACHE_ALWAYS set, right?  I think it is
quite possible that an autovacuum came and processed the catalog,
leading to different ordering.

> Maybe an ORDER BY is missing in the test query.

Yeah, an ORDER BY should be enough.

>   SELECT conname FROM pg_constraint WHERE conrelid = 'clstr_tst'::regclass;
>       conname
>   ----------------
> -  clstr_tst_pkey
>    clstr_tst_con
>   (2 rows)
> 
>   SELECT relname, relkind,
> --- 252,259 ----
>   SELECT conname FROM pg_constraint WHERE conrelid = 'clstr_tst'::regclass;
>       conname
>   ----------------
>    clstr_tst_con
> +  clstr_tst_pkey
>   (2 rows)


-- 
Alvaro Herrera                                http://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.


Alvaro Herrera <alvherre@commandprompt.com> writes:
> ohp@pyrenet.fr wrote:
>> While waiting for my application for another animal, I made some tests and
>> was surprised that cluster test failed with an ordering error.

> This is running with CLOBBER_CACHE_ALWAYS set, right?  I think it is
> quite possible that an autovacuum came and processed the catalog,
> leading to different ordering.

I've seen this exact ordering difference once or twice before but hadn't
got round to looking into the cause.  I think Alvaro is right though,
because what I see in pg_constraint after a "typical" serial regression
test is
 ctid  |            conname            
--------+-------------------------------(0,1)  | cardinal_number_domain_check(0,5)  | check_con(0,6)  |
sequence_con(0,7) | insert_con(0,8)  | insert_tbl_check(0,9)  | rule_and_refint_t1_pkey(0,10) |
rule_and_refint_t2_pkey(0,11)| rule_and_refint_t3_pkey(0,12) | rule_and_refint_t3_id3a_fkey(0,13) |
rule_and_refint_t3_id3a_fkey1(1,1) | copy_con(1,10) | foo(1,11) | inhx_pkey(3,4)  | clstr_tst_s_pkey(3,5)  |
clstr_tst_pkey(3,6) | clstr_tst_con(3,26) | con_check(4,2)  | str_domain2_check(4,3)  | pos_int_check
 
(19 rows)

The planner seems to prefer to do the query at issue by seqscan,
regardless of whether pg_constraint has been vacuumed/analyzed lately.
So the result will depend on where these two rows get dropped.  As
you can see, page 2 is entirely empty, so we could see the reported
result if clstr_tst_pkey went into page 3 and then an autovacuum
reported page 2 as having free space before the clstr_tst_con row was
inserted.  This is a sufficiently narrow window to be unlikely, but
not impossible; and it's easy to believe that CLOBBER_CACHE_ALWAYS could
widen the window.

ORDER BY added, as suggested by Alvaro.
        regards, tom lane