Thread: windows regression failure - prepared xacts

windows regression failure - prepared xacts

From
Andrew Dunstan
Date:
I am consistently seeing the regression failure shown below on my 
Windows machine. See 
http://www.pgbuildfarm.org/cgi-bin/show_log.pl?nm=loris&dt=2005-07-07%2013:54:13

(On the plus side, I am now building happily and passing regression 
tests with ASPerl, and hope to add ASPython and ASTcl to the list shortly).

cheers

andrew


================== pgsql.2072/src/test/regress/regression.diffs ===================
*** ./expected/prepared_xacts.out    Thu Jul  7 09:55:18 2005
--- ./results/prepared_xacts.out    Thu Jul  7 10:20:37 2005
***************
*** 179,189 **** -- Commit table creation COMMIT PREPARED 'regress-one'; \d pxtest2
!     Table "public.pxtest2"
!  Column |  Type   | Modifiers 
! --------+---------+-----------
!  a      | integer | 
!  SELECT * FROM pxtest2;  a  ---
--- 179,185 ---- -- Commit table creation COMMIT PREPARED 'regress-one'; \d pxtest2
! ERROR:  cache lookup failed for relation 27240 SELECT * FROM pxtest2;  a  ---

======================================================================







Re: windows regression failure - prepared xacts

From
Andrew Dunstan
Date:
I never got a reply to this, but I am still seeing it from time to time 
- twice today in fact. Any suggestions?

cheers

andrew

Andrew Dunstan wrote:

>
> I am consistently seeing the regression failure shown below on my 
> Windows machine. See 
> http://www.pgbuildfarm.org/cgi-bin/show_log.pl?nm=loris&dt=2005-07-07%2013:54:13 
>
>
>
> ================== pgsql.2072/src/test/regress/regression.diffs 
> ===================
> *** ./expected/prepared_xacts.out    Thu Jul  7 09:55:18 2005
> --- ./results/prepared_xacts.out    Thu Jul  7 10:20:37 2005
> ***************
> *** 179,189 ****
>  -- Commit table creation
>  COMMIT PREPARED 'regress-one';
>  \d pxtest2
> !     Table "public.pxtest2"
> !  Column |  Type   | Modifiers ! --------+---------+-----------
> !  a      | integer | !  SELECT * FROM pxtest2;
>   a  ---
> --- 179,185 ----
>  -- Commit table creation
>  COMMIT PREPARED 'regress-one';
>  \d pxtest2
> ! ERROR:  cache lookup failed for relation 27240
>  SELECT * FROM pxtest2;
>   a  ---
>
> ======================================================================



Re: windows regression failure - prepared xacts

From
Tom Lane
Date:
Andrew Dunstan <andrew@dunslane.net> writes:
> I never got a reply to this, but I am still seeing it from time to time 
> - twice today in fact. Any suggestions?

I've been puzzled by that too.  It seems to indicate that the syscache
inval message that the COMMIT should send is either not getting sent at
all, or is being processed too late.  Neither of these ideas seems very
promising, especially considering that we're looking at a single backend
as both source and recipient of the message --- a race condition doesn't
seem credible.  And there's nothing very platform-specific in that code
either.  (I tried for awhile to explain it as some kind of deficiency
in the signal emulation we use on Windows, but there's no signals used
for normal sinval processing, so that doesn't seem to hold water.)

Are we sure that it only happens on Windows?  Anyone else seen a similar
failure in the prepared_xacts test?

>> *** ./expected/prepared_xacts.out    Thu Jul  7 09:55:18 2005
>> --- ./results/prepared_xacts.out    Thu Jul  7 10:20:37 2005
>> ***************
>> *** 179,189 ****
>> -- Commit table creation
>> COMMIT PREPARED 'regress-one';
>> \d pxtest2
>> !     Table "public.pxtest2"
>> !  Column |  Type   | Modifiers ! --------+---------+-----------
>> !  a      | integer | !  SELECT * FROM pxtest2;
>> a  ---
>> --- 179,185 ----
>> -- Commit table creation
>> COMMIT PREPARED 'regress-one';
>> \d pxtest2
>> ! ERROR:  cache lookup failed for relation 27240
>> SELECT * FROM pxtest2;
>> a  ---
        regards, tom lane


Re: windows regression failure - prepared xacts

From
Andrew Dunstan
Date:
further (anecdotal) data point: I have usually seen this after doing a 
number of builds. Rebooting seems to cure the problem (and that's 
happened today agin - I have just seen 2 builds work). Maybe some sort 
of strange shmem corruption?

cheers

andrew

Tom Lane wrote:

>Andrew Dunstan <andrew@dunslane.net> writes:
>  
>
>>I never got a reply to this, but I am still seeing it from time to time 
>>- twice today in fact. Any suggestions?
>>    
>>
>
>I've been puzzled by that too.  It seems to indicate that the syscache
>inval message that the COMMIT should send is either not getting sent at
>all, or is being processed too late.  Neither of these ideas seems very
>promising, especially considering that we're looking at a single backend
>as both source and recipient of the message --- a race condition doesn't
>seem credible.  And there's nothing very platform-specific in that code
>either.  (I tried for awhile to explain it as some kind of deficiency
>in the signal emulation we use on Windows, but there's no signals used
>for normal sinval processing, so that doesn't seem to hold water.)
>
>Are we sure that it only happens on Windows?  Anyone else seen a similar
>failure in the prepared_xacts test?
>
>  
>
>>>*** ./expected/prepared_xacts.out    Thu Jul  7 09:55:18 2005
>>>--- ./results/prepared_xacts.out    Thu Jul  7 10:20:37 2005
>>>***************
>>>*** 179,189 ****
>>>-- Commit table creation
>>>COMMIT PREPARED 'regress-one';
>>>\d pxtest2
>>>!     Table "public.pxtest2"
>>>!  Column |  Type   | Modifiers ! --------+---------+-----------
>>>!  a      | integer | !  SELECT * FROM pxtest2;
>>>a  ---
>>>--- 179,185 ----
>>>-- Commit table creation
>>>COMMIT PREPARED 'regress-one';
>>>\d pxtest2
>>>! ERROR:  cache lookup failed for relation 27240
>>>SELECT * FROM pxtest2;
>>>a  ---
>>>      
>>>
>
>            regards, tom lane
>
>---------------------------(end of broadcast)---------------------------
>TIP 5: don't forget to increase your free space map settings
>
>  
>


Re: windows regression failure - prepared xacts

From
Tom Lane
Date:
Andrew Dunstan <andrew@dunslane.net> writes:
> further (anecdotal) data point: I have usually seen this after doing a 
> number of builds. Rebooting seems to cure the problem (and that's 
> happened today agin - I have just seen 2 builds work). Maybe some sort 
> of strange shmem corruption?

Hmmm ... that still doesn't make any sense, given that the test is
being run on a freshly started postmaster.  Unless it's a hardware
problem?  Have you seen this on more than one machine?
        regards, tom lane


Re: windows regression failure - prepared xacts

From
Andrew Dunstan
Date:

Tom Lane wrote:

>Andrew Dunstan <andrew@dunslane.net> writes:
>  
>
>>further (anecdotal) data point: I have usually seen this after doing a 
>>number of builds. Rebooting seems to cure the problem (and that's 
>>happened today agin - I have just seen 2 builds work). Maybe some sort 
>>of strange shmem corruption?
>>    
>>
>
>Hmmm ... that still doesn't make any sense, given that the test is
>being run on a freshly started postmaster.  Unless it's a hardware
>problem?  Have you seen this on more than one machine?
>
>
>  
>
No :-( But I find it hard to believe that a hardware failure would lead 
to this precise error repeatedly. Stranger things have happened, I guess.

ON a related note, we need more Windows boxes on the buildfarm - ideally 
living in a data center somewhere so we can automate builds, rather than 
relying on my laptop and Jim's Windows box which seems to build 
intermittently.

cheers

andrew


Re: windows regression failure - prepared xacts

From
"Dave Page"
Date:

> -----Original Message-----
> From: pgsql-hackers-owner@postgresql.org
> [mailto:pgsql-hackers-owner@postgresql.org] On Behalf Of
> Andrew Dunstan
> Sent: 14 July 2005 13:35
> To: Tom Lane
> Cc: PostgreSQL-development
> Subject: Re: [HACKERS] windows regression failure - prepared xacts
>
>
>
> ON a related note, we need more Windows boxes on the
> buildfarm - ideally
> living in a data center somewhere so we can automate builds,
> rather than
> relying on my laptop and Jim's Windows box which seems to build
> intermittently.

I might be able to help out there. What's required to be part of the
build farm?

Regards, Dave.


Re: windows regression failure - prepared xacts

From
Andrew Dunstan
Date:

Dave Page wrote:

> 
>
>  
>
>>-----Original Message-----
>>From: pgsql-hackers-owner@postgresql.org 
>>[mailto:pgsql-hackers-owner@postgresql.org] On Behalf Of 
>>Andrew Dunstan
>>Sent: 14 July 2005 13:35
>>To: Tom Lane
>>Cc: PostgreSQL-development
>>Subject: Re: [HACKERS] windows regression failure - prepared xacts
>>
>>
>>
>>ON a related note, we need more Windows boxes on the 
>>buildfarm - ideally 
>>living in a data center somewhere so we can automate builds, 
>>rather than 
>>relying on my laptop and Jim's Windows box which seems to build 
>>intermittently.
>>    
>>
>
>I might be able to help out there. 
>

Excellent.

>What's required to be part of the
>build farm?
>
>
>  
>

Short answer:

. your box will need to be able to contact http://www.pgbuildfarm.org 
either directly or via proxy, and it wiull need access to a CVS repo, 
either the one at postgresql.org or a mirror (you can set up your own 
mirror using CSVup on a linux or FBSD box).
. have a working postgresql build environment for your platform (for 
Windows this means MSys/MinGW with the libz and libintl stuff,  and 
ideally native Python and Tcl).
. Windows only: you will need a native perl installed as well as the one 
in the MSys DTK. The one from ActiveState works fine.
. download and unpack the latest release of client code from 
http://pgfoundry.org/frs/?group_id=1000040
. read instructions at 
http://pgfoundry.org/docman/view.php/1000040/4/PGBuildFarm-HOWTO.txt
. get the software running locally using flags --force--nostatus --nosend
. register your machine at http://www.pgbuildfarm.org/register.html
. when you receive credentials, put them in the config file, and 
schedule regular builds (without those flags) for the branches you want 
to support. (For Windows that should be HEAD and optionally REL8_0_STABLE).

Feel free to ask me questions if anything isn't clear.

There's a short description of how it works at 
http://www.onlamp.com/pub/a/onlamp/2005/02/24/pg_buildfarm.html

cheers

andrew





Re: windows regression failure - prepared xacts

From
"Dave Page"
Date:

> -----Original Message-----
> From: Andrew Dunstan [mailto:andrew@dunslane.net]
> Sent: 14 July 2005 14:36
> To: Dave Page
> Cc: Tom Lane; PostgreSQL-development
> Subject: Re: [HACKERS] windows regression failure - prepared xacts
>
>
>
> Short answer:
>
> . your box will need to be able to contact http://www.pgbuildfarm.org
> either directly or via proxy, and it wiull need access to a CVS repo,
> either the one at postgresql.org or a mirror (you can set up your own
> mirror using CSVup on a linux or FBSD box).

Right, that should be OK. As long as you don't need access /to/ the box.

> . have a working postgresql build environment for your platform (for
> Windows this means MSys/MinGW with the libz and libintl stuff,  and
> ideally native Python and Tcl).
> . Windows only: you will need a native perl installed as well
> as the one
> in the MSys DTK. The one from ActiveState works fine.

Yep, no problem there. Well, I say that - I find that if I have the DTK
perl installed, --with-perl fails miserably on my laptop, so I normally
only have ActiveState installed.

> . download and unpack the latest release of client code from
> http://pgfoundry.org/frs/?group_id=1000040
> . read instructions at
> http://pgfoundry.org/docman/view.php/1000040/4/PGBuildFarm-HOWTO.txt
> . get the software running locally using flags
> --force--nostatus --nosend
> . register your machine at http://www.pgbuildfarm.org/register.html
> . when you receive credentials, put them in the config file, and
> schedule regular builds (without those flags) for the
> branches you want
> to support. (For Windows that should be HEAD and optionally
> REL8_0_STABLE).
>
> Feel free to ask me questions if anything isn't clear.
>
> There's a short description of how it works at
> http://www.onlamp.com/pub/a/onlamp/2005/02/24/pg_buildfarm.html

OK. I'll have to run it past one of my colleagues (who is out until
Monday) as he technically 'owns' our Windows dev server. It will be a
2K3 Server in case you're interested.

I'll let you know either way.

Regards, Dave


Re: windows regression failure - prepared xacts

From
Andrew Dunstan
Date:

Dave Page wrote:

>>
>>Short answer:
>>
>>. your box will need to be able to contact http://www.pgbuildfarm.org 
>>either directly or via proxy, and it wiull need access to a CVS repo, 
>>either the one at postgresql.org or a mirror (you can set up your own 
>>mirror using CSVup on a linux or FBSD box).
>>    
>>
>
>Right, that should be OK. As long as you don't need access /to/ the box.
>  
>

No. You own the box, and you run it. I never touch it.

>>. Windows only: you will need a native perl installed as well 
>>as the one 
>>in the MSys DTK. The one from ActiveState works fine.
>>    
>>
>
>Yep, no problem there. Well, I say that - I find that if I have the DTK
>perl installed, --with-perl fails miserably on my laptop, so I normally
>only have ActiveState installed.
>  
>

For buildfarm you need both, as the main buildfarm script needs to run 
under the DTK perl, while the auxiliary script that does the web txn 
needs to run under native perl.

It all works quite happily - the trick is that in the buildfarm config 
you say something like this:
   'build_env' => {                     PATH => "/c/tcl/bin:/c/python24:/c/perl/bin/:$ENV{PATH}",                  }

Then you will build with the native perl while running under the DTK perl.


>OK. I'll have to run it past one of my colleagues (who is out until
>Monday) as he technically 'owns' our Windows dev server. It will be a
>2K3 Server in case you're interested.
>
>I'll let you know either way.
>
>
>  
>

Thanks

andrew


Re: windows regression failure - prepared xacts

From
"Dave Page"
Date:

> -----Original Message-----
> From: Andrew Dunstan [mailto:andrew@dunslane.net]
> Sent: 14 July 2005 15:17
> To: Dave Page
> Cc: Tom Lane; PostgreSQL-development
> Subject: Re: [HACKERS] windows regression failure - prepared xacts
>
>
> >Yep, no problem there. Well, I say that - I find that if I
> have the DTK
> >perl installed, --with-perl fails miserably on my laptop, so
> I normally
> >only have ActiveState installed.
> >
> >
>
> For buildfarm you need both, as the main buildfarm script
> needs to run
> under the DTK perl, while the auxiliary script that does the web txn
> needs to run under native perl.
>
> It all works quite happily - the trick is that in the
> buildfarm config
> you say something like this:
>
>     'build_env' => {
>                       PATH =>
> "/c/tcl/bin:/c/python24:/c/perl/bin/:$ENV{PATH}",
>                    }
>
> Then you will build with the native perl while running under
> the DTK perl.

Right. I'll sort it out - it's going to become a PITA real soon now
anyway as this release I will be doing the pgInstaller builds again,
plus I'm working on Slony for which I need DTK's autoconf, so I can't
just uninstall DTK this time.

Regards, Dave.


Re: windows regression failure - prepared xacts

From
Petr Jelinek
Date:
I did some testing and I think that I can confirm this - on my 
workstation under Windows 2000 with latest CVS and gcc 3.2.3 it randomly 
fails (it sometimes works and sometimes fails even with same binary) on 
prepared_xacts and always fails on rules.

Tested 3 rebuilds and about 10 checks on each and I have about 50% 
failure on prepared_xacts.
Using configure without any parameters, error is same as in you mail 
(cache lookup failed ...).

I have another 2 windows machines with different hardware and different 
windows versions (first is XP and second is Win 2003) I could do some 
testing on them tomorrow if you are interested.

-- 
Regards
Petr Jelinek (PJMODOS)



Re: windows regression failure - prepared xacts

From
Andrew Dunstan
Date:
So it's not hardware and doesn't seem to be Windows version specific (my 
tests were run on XPPro).

Looks like we have some major digging to do :-(

cheers

andrew

Petr Jelinek wrote:

> I did some testing and I think that I can confirm this - on my 
> workstation under Windows 2000 with latest CVS and gcc 3.2.3 it 
> randomly fails (it sometimes works and sometimes fails even with same 
> binary) on prepared_xacts and always fails on rules.
>
> Tested 3 rebuilds and about 10 checks on each and I have about 50% 
> failure on prepared_xacts.
> Using configure without any parameters, error is same as in you mail 
> (cache lookup failed ...).
>
> I have another 2 windows machines with different hardware and 
> different windows versions (first is XP and second is Win 2003) I 
> could do some testing on them tomorrow if you are interested.
>


Re: The reason for loris' intermittent prepared_xacts failures

From
Petr Jelinek
Date:
Andrew Dunstan wrote: <blockquote cite="mid42D6A64F.7000809@dunslane.net" type="cite"><font face="Arial, Helvetica,
sans-serif"><fontface="Arial, Helvetica, sans-serif"><tt>How things will work with that setting on a non-English locale
boxI don't know - it might help if we had a Windows buildfarm member from one of our colleagues in a non-English
speakingcountry.</tt></font></font><br /></blockquote> Actually my workstation is non-English (Czech) and it works (All
98tests passed several times). It also fixed my rules test problem - it always failed before.<br /><br /> About that
Tomsconclusion - I was trying to analyze it too but I lack knowledge of Postgres internals so I didn't come up with any
finalconclusion. I can however agree on few things - that OID isn't OID of pxtest2 and it seems to happend only if
concurentdrop table occurs in one of parralel tests. And I also agree with that PS.<br /><br /> On related note, I was
alsotesting that signal emulation when analyzing this and it seems to work without any errors (except for start of
coursebut thats known) - I think Tom had some doubts about it after what he wrote in his first reply.<br /><br /><pre
class="moz-signature"cols="72">-- 
 
Regards
Petr Jelinek (PJMODOS)
</pre>

Re: The reason for loris' intermittent prepared_xacts failures

From
Tom Lane
Date:
Petr Jelinek <pjmodos@parba.cz> writes:
> On related note, I was also testing that signal emulation when analyzing 
> this and it seems to work without any errors (except for start of course 
> but thats known) - I think Tom had some doubts about it after what he 
> wrote in his first reply.

It was just one of the first things that came to mind when wondering
"why does this only seem to happen on Windows?"  At this point I'm
convinced that the answer is "because pg_regress sets up the locale
differently on Windows than anywhere else".  But at the time I hadn't
suspected a locale issue ...
        regards, tom lane