Thread: windows regression failure - prepared xacts
I am consistently seeing the regression failure shown below on my Windows machine. See http://www.pgbuildfarm.org/cgi-bin/show_log.pl?nm=loris&dt=2005-07-07%2013:54:13 (On the plus side, I am now building happily and passing regression tests with ASPerl, and hope to add ASPython and ASTcl to the list shortly). cheers andrew ================== pgsql.2072/src/test/regress/regression.diffs =================== *** ./expected/prepared_xacts.out Thu Jul 7 09:55:18 2005 --- ./results/prepared_xacts.out Thu Jul 7 10:20:37 2005 *************** *** 179,189 **** -- Commit table creation COMMIT PREPARED 'regress-one'; \d pxtest2 ! Table "public.pxtest2" ! Column | Type | Modifiers ! --------+---------+----------- ! a | integer | ! SELECT * FROM pxtest2; a --- --- 179,185 ---- -- Commit table creation COMMIT PREPARED 'regress-one'; \d pxtest2 ! ERROR: cache lookup failed for relation 27240 SELECT * FROM pxtest2; a --- ======================================================================
I never got a reply to this, but I am still seeing it from time to time - twice today in fact. Any suggestions? cheers andrew Andrew Dunstan wrote: > > I am consistently seeing the regression failure shown below on my > Windows machine. See > http://www.pgbuildfarm.org/cgi-bin/show_log.pl?nm=loris&dt=2005-07-07%2013:54:13 > > > > ================== pgsql.2072/src/test/regress/regression.diffs > =================== > *** ./expected/prepared_xacts.out Thu Jul 7 09:55:18 2005 > --- ./results/prepared_xacts.out Thu Jul 7 10:20:37 2005 > *************** > *** 179,189 **** > -- Commit table creation > COMMIT PREPARED 'regress-one'; > \d pxtest2 > ! Table "public.pxtest2" > ! Column | Type | Modifiers ! --------+---------+----------- > ! a | integer | ! SELECT * FROM pxtest2; > a --- > --- 179,185 ---- > -- Commit table creation > COMMIT PREPARED 'regress-one'; > \d pxtest2 > ! ERROR: cache lookup failed for relation 27240 > SELECT * FROM pxtest2; > a --- > > ======================================================================
Andrew Dunstan <andrew@dunslane.net> writes: > I never got a reply to this, but I am still seeing it from time to time > - twice today in fact. Any suggestions? I've been puzzled by that too. It seems to indicate that the syscache inval message that the COMMIT should send is either not getting sent at all, or is being processed too late. Neither of these ideas seems very promising, especially considering that we're looking at a single backend as both source and recipient of the message --- a race condition doesn't seem credible. And there's nothing very platform-specific in that code either. (I tried for awhile to explain it as some kind of deficiency in the signal emulation we use on Windows, but there's no signals used for normal sinval processing, so that doesn't seem to hold water.) Are we sure that it only happens on Windows? Anyone else seen a similar failure in the prepared_xacts test? >> *** ./expected/prepared_xacts.out Thu Jul 7 09:55:18 2005 >> --- ./results/prepared_xacts.out Thu Jul 7 10:20:37 2005 >> *************** >> *** 179,189 **** >> -- Commit table creation >> COMMIT PREPARED 'regress-one'; >> \d pxtest2 >> ! Table "public.pxtest2" >> ! Column | Type | Modifiers ! --------+---------+----------- >> ! a | integer | ! SELECT * FROM pxtest2; >> a --- >> --- 179,185 ---- >> -- Commit table creation >> COMMIT PREPARED 'regress-one'; >> \d pxtest2 >> ! ERROR: cache lookup failed for relation 27240 >> SELECT * FROM pxtest2; >> a --- regards, tom lane
further (anecdotal) data point: I have usually seen this after doing a number of builds. Rebooting seems to cure the problem (and that's happened today agin - I have just seen 2 builds work). Maybe some sort of strange shmem corruption? cheers andrew Tom Lane wrote: >Andrew Dunstan <andrew@dunslane.net> writes: > > >>I never got a reply to this, but I am still seeing it from time to time >>- twice today in fact. Any suggestions? >> >> > >I've been puzzled by that too. It seems to indicate that the syscache >inval message that the COMMIT should send is either not getting sent at >all, or is being processed too late. Neither of these ideas seems very >promising, especially considering that we're looking at a single backend >as both source and recipient of the message --- a race condition doesn't >seem credible. And there's nothing very platform-specific in that code >either. (I tried for awhile to explain it as some kind of deficiency >in the signal emulation we use on Windows, but there's no signals used >for normal sinval processing, so that doesn't seem to hold water.) > >Are we sure that it only happens on Windows? Anyone else seen a similar >failure in the prepared_xacts test? > > > >>>*** ./expected/prepared_xacts.out Thu Jul 7 09:55:18 2005 >>>--- ./results/prepared_xacts.out Thu Jul 7 10:20:37 2005 >>>*************** >>>*** 179,189 **** >>>-- Commit table creation >>>COMMIT PREPARED 'regress-one'; >>>\d pxtest2 >>>! Table "public.pxtest2" >>>! Column | Type | Modifiers ! --------+---------+----------- >>>! a | integer | ! SELECT * FROM pxtest2; >>>a --- >>>--- 179,185 ---- >>>-- Commit table creation >>>COMMIT PREPARED 'regress-one'; >>>\d pxtest2 >>>! ERROR: cache lookup failed for relation 27240 >>>SELECT * FROM pxtest2; >>>a --- >>> >>> > > regards, tom lane > >---------------------------(end of broadcast)--------------------------- >TIP 5: don't forget to increase your free space map settings > > >
Andrew Dunstan <andrew@dunslane.net> writes: > further (anecdotal) data point: I have usually seen this after doing a > number of builds. Rebooting seems to cure the problem (and that's > happened today agin - I have just seen 2 builds work). Maybe some sort > of strange shmem corruption? Hmmm ... that still doesn't make any sense, given that the test is being run on a freshly started postmaster. Unless it's a hardware problem? Have you seen this on more than one machine? regards, tom lane
Tom Lane wrote: >Andrew Dunstan <andrew@dunslane.net> writes: > > >>further (anecdotal) data point: I have usually seen this after doing a >>number of builds. Rebooting seems to cure the problem (and that's >>happened today agin - I have just seen 2 builds work). Maybe some sort >>of strange shmem corruption? >> >> > >Hmmm ... that still doesn't make any sense, given that the test is >being run on a freshly started postmaster. Unless it's a hardware >problem? Have you seen this on more than one machine? > > > > No :-( But I find it hard to believe that a hardware failure would lead to this precise error repeatedly. Stranger things have happened, I guess. ON a related note, we need more Windows boxes on the buildfarm - ideally living in a data center somewhere so we can automate builds, rather than relying on my laptop and Jim's Windows box which seems to build intermittently. cheers andrew
> -----Original Message----- > From: pgsql-hackers-owner@postgresql.org > [mailto:pgsql-hackers-owner@postgresql.org] On Behalf Of > Andrew Dunstan > Sent: 14 July 2005 13:35 > To: Tom Lane > Cc: PostgreSQL-development > Subject: Re: [HACKERS] windows regression failure - prepared xacts > > > > ON a related note, we need more Windows boxes on the > buildfarm - ideally > living in a data center somewhere so we can automate builds, > rather than > relying on my laptop and Jim's Windows box which seems to build > intermittently. I might be able to help out there. What's required to be part of the build farm? Regards, Dave.
Dave Page wrote: > > > > >>-----Original Message----- >>From: pgsql-hackers-owner@postgresql.org >>[mailto:pgsql-hackers-owner@postgresql.org] On Behalf Of >>Andrew Dunstan >>Sent: 14 July 2005 13:35 >>To: Tom Lane >>Cc: PostgreSQL-development >>Subject: Re: [HACKERS] windows regression failure - prepared xacts >> >> >> >>ON a related note, we need more Windows boxes on the >>buildfarm - ideally >>living in a data center somewhere so we can automate builds, >>rather than >>relying on my laptop and Jim's Windows box which seems to build >>intermittently. >> >> > >I might be able to help out there. > Excellent. >What's required to be part of the >build farm? > > > > Short answer: . your box will need to be able to contact http://www.pgbuildfarm.org either directly or via proxy, and it wiull need access to a CVS repo, either the one at postgresql.org or a mirror (you can set up your own mirror using CSVup on a linux or FBSD box). . have a working postgresql build environment for your platform (for Windows this means MSys/MinGW with the libz and libintl stuff, and ideally native Python and Tcl). . Windows only: you will need a native perl installed as well as the one in the MSys DTK. The one from ActiveState works fine. . download and unpack the latest release of client code from http://pgfoundry.org/frs/?group_id=1000040 . read instructions at http://pgfoundry.org/docman/view.php/1000040/4/PGBuildFarm-HOWTO.txt . get the software running locally using flags --force--nostatus --nosend . register your machine at http://www.pgbuildfarm.org/register.html . when you receive credentials, put them in the config file, and schedule regular builds (without those flags) for the branches you want to support. (For Windows that should be HEAD and optionally REL8_0_STABLE). Feel free to ask me questions if anything isn't clear. There's a short description of how it works at http://www.onlamp.com/pub/a/onlamp/2005/02/24/pg_buildfarm.html cheers andrew
> -----Original Message----- > From: Andrew Dunstan [mailto:andrew@dunslane.net] > Sent: 14 July 2005 14:36 > To: Dave Page > Cc: Tom Lane; PostgreSQL-development > Subject: Re: [HACKERS] windows regression failure - prepared xacts > > > > Short answer: > > . your box will need to be able to contact http://www.pgbuildfarm.org > either directly or via proxy, and it wiull need access to a CVS repo, > either the one at postgresql.org or a mirror (you can set up your own > mirror using CSVup on a linux or FBSD box). Right, that should be OK. As long as you don't need access /to/ the box. > . have a working postgresql build environment for your platform (for > Windows this means MSys/MinGW with the libz and libintl stuff, and > ideally native Python and Tcl). > . Windows only: you will need a native perl installed as well > as the one > in the MSys DTK. The one from ActiveState works fine. Yep, no problem there. Well, I say that - I find that if I have the DTK perl installed, --with-perl fails miserably on my laptop, so I normally only have ActiveState installed. > . download and unpack the latest release of client code from > http://pgfoundry.org/frs/?group_id=1000040 > . read instructions at > http://pgfoundry.org/docman/view.php/1000040/4/PGBuildFarm-HOWTO.txt > . get the software running locally using flags > --force--nostatus --nosend > . register your machine at http://www.pgbuildfarm.org/register.html > . when you receive credentials, put them in the config file, and > schedule regular builds (without those flags) for the > branches you want > to support. (For Windows that should be HEAD and optionally > REL8_0_STABLE). > > Feel free to ask me questions if anything isn't clear. > > There's a short description of how it works at > http://www.onlamp.com/pub/a/onlamp/2005/02/24/pg_buildfarm.html OK. I'll have to run it past one of my colleagues (who is out until Monday) as he technically 'owns' our Windows dev server. It will be a 2K3 Server in case you're interested. I'll let you know either way. Regards, Dave
Dave Page wrote: >> >>Short answer: >> >>. your box will need to be able to contact http://www.pgbuildfarm.org >>either directly or via proxy, and it wiull need access to a CVS repo, >>either the one at postgresql.org or a mirror (you can set up your own >>mirror using CSVup on a linux or FBSD box). >> >> > >Right, that should be OK. As long as you don't need access /to/ the box. > > No. You own the box, and you run it. I never touch it. >>. Windows only: you will need a native perl installed as well >>as the one >>in the MSys DTK. The one from ActiveState works fine. >> >> > >Yep, no problem there. Well, I say that - I find that if I have the DTK >perl installed, --with-perl fails miserably on my laptop, so I normally >only have ActiveState installed. > > For buildfarm you need both, as the main buildfarm script needs to run under the DTK perl, while the auxiliary script that does the web txn needs to run under native perl. It all works quite happily - the trick is that in the buildfarm config you say something like this: 'build_env' => { PATH => "/c/tcl/bin:/c/python24:/c/perl/bin/:$ENV{PATH}", } Then you will build with the native perl while running under the DTK perl. >OK. I'll have to run it past one of my colleagues (who is out until >Monday) as he technically 'owns' our Windows dev server. It will be a >2K3 Server in case you're interested. > >I'll let you know either way. > > > > Thanks andrew
> -----Original Message----- > From: Andrew Dunstan [mailto:andrew@dunslane.net] > Sent: 14 July 2005 15:17 > To: Dave Page > Cc: Tom Lane; PostgreSQL-development > Subject: Re: [HACKERS] windows regression failure - prepared xacts > > > >Yep, no problem there. Well, I say that - I find that if I > have the DTK > >perl installed, --with-perl fails miserably on my laptop, so > I normally > >only have ActiveState installed. > > > > > > For buildfarm you need both, as the main buildfarm script > needs to run > under the DTK perl, while the auxiliary script that does the web txn > needs to run under native perl. > > It all works quite happily - the trick is that in the > buildfarm config > you say something like this: > > 'build_env' => { > PATH => > "/c/tcl/bin:/c/python24:/c/perl/bin/:$ENV{PATH}", > } > > Then you will build with the native perl while running under > the DTK perl. Right. I'll sort it out - it's going to become a PITA real soon now anyway as this release I will be doing the pgInstaller builds again, plus I'm working on Slony for which I need DTK's autoconf, so I can't just uninstall DTK this time. Regards, Dave.
I did some testing and I think that I can confirm this - on my workstation under Windows 2000 with latest CVS and gcc 3.2.3 it randomly fails (it sometimes works and sometimes fails even with same binary) on prepared_xacts and always fails on rules. Tested 3 rebuilds and about 10 checks on each and I have about 50% failure on prepared_xacts. Using configure without any parameters, error is same as in you mail (cache lookup failed ...). I have another 2 windows machines with different hardware and different windows versions (first is XP and second is Win 2003) I could do some testing on them tomorrow if you are interested. -- Regards Petr Jelinek (PJMODOS)
So it's not hardware and doesn't seem to be Windows version specific (my tests were run on XPPro). Looks like we have some major digging to do :-( cheers andrew Petr Jelinek wrote: > I did some testing and I think that I can confirm this - on my > workstation under Windows 2000 with latest CVS and gcc 3.2.3 it > randomly fails (it sometimes works and sometimes fails even with same > binary) on prepared_xacts and always fails on rules. > > Tested 3 rebuilds and about 10 checks on each and I have about 50% > failure on prepared_xacts. > Using configure without any parameters, error is same as in you mail > (cache lookup failed ...). > > I have another 2 windows machines with different hardware and > different windows versions (first is XP and second is Win 2003) I > could do some testing on them tomorrow if you are interested. >
Andrew Dunstan wrote: <blockquote cite="mid42D6A64F.7000809@dunslane.net" type="cite"><font face="Arial, Helvetica, sans-serif"><fontface="Arial, Helvetica, sans-serif"><tt>How things will work with that setting on a non-English locale boxI don't know - it might help if we had a Windows buildfarm member from one of our colleagues in a non-English speakingcountry.</tt></font></font><br /></blockquote> Actually my workstation is non-English (Czech) and it works (All 98tests passed several times). It also fixed my rules test problem - it always failed before.<br /><br /> About that Tomsconclusion - I was trying to analyze it too but I lack knowledge of Postgres internals so I didn't come up with any finalconclusion. I can however agree on few things - that OID isn't OID of pxtest2 and it seems to happend only if concurentdrop table occurs in one of parralel tests. And I also agree with that PS.<br /><br /> On related note, I was alsotesting that signal emulation when analyzing this and it seems to work without any errors (except for start of coursebut thats known) - I think Tom had some doubts about it after what he wrote in his first reply.<br /><br /><pre class="moz-signature"cols="72">-- Regards Petr Jelinek (PJMODOS) </pre>
Petr Jelinek <pjmodos@parba.cz> writes: > On related note, I was also testing that signal emulation when analyzing > this and it seems to work without any errors (except for start of course > but thats known) - I think Tom had some doubts about it after what he > wrote in his first reply. It was just one of the first things that came to mind when wondering "why does this only seem to happen on Windows?" At this point I'm convinced that the answer is "because pg_regress sets up the locale differently on Windows than anywhere else". But at the time I hadn't suspected a locale issue ... regards, tom lane