Thread: Beta 6 Regression results on Redat 7.0.
Ok, thanks to our snowstorm :-0 I have been working on the beta 6 RPM situation on my _slow_ notebook today (power outages for ten minutes at a time happening at hour or so intervals due to 45mph+ winds and a foot of snow....). Well, I have preliminary RPM's built -- just need to work on the contrib tree situation. I ran regression the usual RPM way (which I am fully aware is not the normally approved method, but it _would_ be the method any RPM beta testers would use), and got a different failure, one that is not locale related (LC_ALL=C both for the initdb and the postmaster startup in the newest initscript). See attached regression.diffs for details of the temptest failure I experienced. Regression run with CWD=/usr/share/test/regress, user=postgres. ./pg_regress --schedule=parallel_schedule This is the only regression test failure I have found thus far. I have never seen this failure before, so I'm not sure where to proceed. Now to attack the contrib tree (looking forward to my new notebook, as this old P133 takes an hour and twenty minutes to slog through a full build....). Seeing that RC1 is in prep, is there a pressing need to upload and release beta 6 RPM's, or will it be a day or two before RC1? -- Lamar Owen WGCR Internet Radio 1 Peter 4:11
Lamar Owen <lamar.owen@wgcr.org> writes: > DROP TABLE temptest; > + NOTICE: FlushRelationBuffers(temptest, 0): block 0 is referenced (private 0, global 1) > + ERROR: heap_drop_with_catalog: FlushRelationBuffers returned -2 > SELECT * FROM temptest; Hoo, that's interesting ... Exactly what fileset were you using again? > Seeing that RC1 is in prep, is there a pressing need to upload and > release beta 6 RPM's, or will it be a day or two before RC1? I think you might as well wait for RC1 as far as actually making RPMs goes. But do you want to let anyone else check out the RPM build process? For instance, I've been wondering what you did about the which-set-of-headers-to-install issue. regards, tom lane
On Tue, 20 Mar 2001, Lamar Owen wrote: > Ok, thanks to our snowstorm :-0 I have been working on the beta 6 RPM situation > on my _slow_ notebook today (power outages for ten minutes at a time happening > at hour or so intervals due to 45mph+ winds and a foot of snow....). > > Well, I have preliminary RPM's built -- just need to work on the contrib tree > situation. I ran regression the usual RPM way (which I am fully aware is not > the normally approved method, but it _would_ be the method any RPM beta testers > would use), and got a different failure, one that is not locale related > (LC_ALL=C both for the initdb and the postmaster startup in the newest > initscript). See attached regression.diffs for details of the temptest failure > I experienced. > > Regression run with CWD=/usr/share/test/regress, user=postgres. > ./pg_regress --schedule=parallel_schedule > > This is the only regression test failure I have found thus far. I have never > seen this failure before, so I'm not sure where to proceed. > > Now to attack the contrib tree (looking forward to my new notebook, as this old > P133 takes an hour and twenty minutes to slog through a full build....). > > Seeing that RC1 is in prep, is there a pressing need to upload and release beta > 6 RPM's, or will it be a day or two before RC1? Im going to do RC1 tonight ... so no pressng need :)
On Tue, 20 Mar 2001, Tom Lane wrote: > Lamar Owen <lamar.owen@wgcr.org> writes: > > DROP TABLE temptest; > > + NOTICE: FlushRelationBuffers(temptest, 0): block 0 is referenced (private 0, global 1) > > + ERROR: heap_drop_with_catalog: FlushRelationBuffers returned -2 > > SELECT * FROM temptest; > Hoo, that's interesting ... Exactly what fileset were you using again? When you say 'fileset', I'm assuming you are referring to the --schedule parameter -- I am invoking the following command: ./pg_regress --schedule=parallel_schedule 7.1beta6 distribution tarball. LC_ALL=C. Compiled on RedHat 7 as shipped. I'm rerunning to see if it is intermittent. Second run -- no error. Running a third time......no error. Now I'm confused. What would cause such an error, Tom? I'm going to check on my desktop, once power gets more stable (and it quits lightning -- yes, a snowstorm with lightning :-0 I certainly got what I wanted.....). So, more to come later. > > Seeing that RC1 is in prep, is there a pressing need to upload and > > release beta 6 RPM's, or will it be a day or two before RC1? > I think you might as well wait for RC1 as far as actually making RPMs > goes. But do you want to let anyone else check out the RPM build > process? For instance, I've been wondering what you did about the > which-set-of-headers-to-install issue. Oh, ok. Spec file attached. All other files needed are the beta6 tarball and the contents of the beta4-1 source rpm, with names changed to match the beta6 version number. There are some other changes I have to merge in -- particularly a set from Karl for the optional PL/Perl build, as well as others, so this is a preliminary spec file. But I was just getting the basic build done and tested. To directly answer your question, I'm using 'make install-all-headers' and stuffing it into the devel rpm in one piece at this time. -- Lamar Owen WGCR Internet Radio 1 Peter 4:11
Lamar Owen <lamar.owen@wgcr.org> writes: > DROP TABLE temptest; > + NOTICE: FlushRelationBuffers(temptest, 0): block 0 is referenced (private 0, global 1) > + ERROR: heap_drop_with_catalog: FlushRelationBuffers returned -2 > SELECT * FROM temptest; >> Hoo, that's interesting ... Exactly what fileset were you using again? > When you say 'fileset', I'm assuming you are referring to the --schedule > parameter -- No, I was wondering about whether you had an inconsistent set of source files, or had managed to not do a complete rebuild, or something like that. The above error should be entirely impossible considering that the table in question is a temp table that's not been touched by any other backend. If you did manage to get this from a clean build then I think we have a serious problem to look at. >> I think you might as well wait for RC1 as far as actually making RPMs >> goes. But do you want to let anyone else check out the RPM build >> process? For instance, I've been wondering what you did about the >> which-set-of-headers-to-install issue. > Oh, ok. Spec file attached. All other files needed are the beta6 tarball and > the contents of the beta4-1 source rpm, with names changed to match the beta6 > version number. OK, I will pull the files and try to replicate this on my own laptop. Does anyone else have time to try to duplicate the problem tonight? If it's replicatable at all, I think it's a release stopper. > To directly answer your question, I'm using 'make install-all-headers' and > stuffing it into the devel rpm in one piece at this time. Works for me. regards, tom lane
On Tue, 20 Mar 2001, Tom Lane wrote: > Lamar Owen <lamar.owen@wgcr.org> writes: > > DROP TABLE temptest; > > + NOTICE: FlushRelationBuffers(temptest, 0): block 0 is referenced (private 0, global 1) > > + ERROR: heap_drop_with_catalog: FlushRelationBuffers returned -2 > > SELECT * FROM temptest; > >> Hoo, that's interesting ... Exactly what fileset were you using again? > > When you say 'fileset', I'm assuming you are referring to the --schedule > > parameter -- > No, I was wondering about whether you had an inconsistent set of source > files, or had managed to not do a complete rebuild, or something like > that. The above error should be entirely impossible considering that > the table in question is a temp table that's not been touched by any > other backend. If you did manage to get this from a clean build then > I think we have a serious problem to look at. Standard RPM rebuild -- always wipes the whole build tree out and re-expands from the tarball, reapplies patches, and rebuilds from scratch every time I change even the smallest detail in the spec file -- which is why it takes so long to get these things out. So, no, this is a scratch build from a fresh tarball. > Does anyone else have time to try to duplicate the problem tonight? > If it's replicatable at all, I think it's a release stopper. I have not yet been able to repeat the problem. I am running my fifth regression test run (which takes a long time on this P133) with a freshly initdb'ed PGDATA -- the previous regression runs were done on the same PGDATA tree as the first run was done on. Took 12 minutes 40 seconds, but I can't repeat the error. I'm hoping it was a problem on my machine -- educate me on what caused the error so I can see if something in my setup did something not so nice. So, the score is one error out of six test runs, thus far. -- Lamar Owen WGCR Internet Radio 1 Peter 4:11
Lamar Owen <lamar.owen@wgcr.org> writes: > I'm hoping it was a problem on my machine -- educate me on > what caused the error Well, that's exactly what I'd like to know. The direct cause of the error is that DROP TABLE is finding that some other backend has a reference-count hold on a page of the temp table it's trying to drop. Since no other backend should be trying to touch this temp table, there's something pretty fishy here. Given that this is a parallel test, you may be looking at a low-probability timing-dependent failure. I'd say set up the machine and run repeat tests for an hour or three ... that's what I plan to do here. BTW, what postmaster parameters are you using --- -B and so forth? regards, tom lane
> I'm rerunning to see if it is intermittent. Second run -- no > error. Running a third time......no error. Now I'm confused. > What would cause such an error, Tom? I'm going to check on my Hmm, concurrent checkpoint? Probably we could simplify dirty test in ByfferSync() - ie test bufHdr->cntxDirty without holding shlock (and pin!) on buffer: should be good as long as we set cntxDirty flag *before* XLogInsert in access methods. Have to look more... Vadim
On Tue, 20 Mar 2001, Tom Lane wrote: > Since no other backend should be trying to touch this temp table, > there's something pretty fishy here. I see. > Given that this is a parallel test, you may be looking at a > low-probability timing-dependent failure. I'd say set up the machine > and run repeat tests for an hour or three ... that's what I plan to do > here. As a broadcast engineer, I'm a little too familiar with such things. But this isn't an engineer list, so I'll spare you the war stories. :-) > BTW, what postmaster parameters are you using --- -B and so forth? Default. To be changed before RPM release, but currently it is the default. The only option that postmaster.opts records is -D, and I'm not passing anything else. -- Lamar Owen WGCR Internet Radio 1 Peter 4:11
On Tue, 20 Mar 2001, Tom Lane wrote: > Lamar Owen <lamar.owen@wgcr.org> writes: > > I'm hoping it was a problem on my machine -- educate me on > > what caused the error > > Well, that's exactly what I'd like to know. The direct cause of the > error is that DROP TABLE is finding that some other backend has a > reference-count hold on a page of the temp table it's trying to drop. > Since no other backend should be trying to touch this temp table, > there's something pretty fishy here. > > Given that this is a parallel test, you may be looking at a > low-probability timing-dependent failure. I'd say set up the machine > and run repeat tests for an hour or three ... that's what I plan to do > here. Okay, I roll'd an RC1 but haven't put it up for FTP yet ... I'll wait for a few hours to see if anyone can reproduce this, and, if not, put out what I've rolled ... say, 00:00AST ...
"Mikheev, Vadim" <vmikheev@SECTORBASE.COM> writes: > Hmm, concurrent checkpoint? Probably we could simplify dirty test > in ByfferSync() - ie test bufHdr->cntxDirty without holding > shlock (and pin!) on buffer: should be good as long as we set > cntxDirty flag *before* XLogInsert in access methods. Have to > look more... Yes, I'm wondering if some other backend is trying to write/flush the buffer (maybe as part of a checkpoint, maybe not). But seems like we should have seen this before, if so; that's not a low- probability scenario, particularly with just 64 buffers... regards, tom lane
The Hermit Hacker <scrappy@hub.org> writes: > Okay, I roll'd an RC1 but haven't put it up for FTP yet ... I'll wait for > a few hours to see if anyone can reproduce this, and, if not, put out what > I've rolled ... This will not be RC1 :-( I'm been running one backend doing repeated iterations of CREATE TABLE temptest(col int); INSERT INTO temptest VALUES (1); CREATE TEMP TABLE temptest(col int); INSERT INTO temptest VALUES (2); SELECT * FROM temptest; DROP TABLE temptest; SELECT * FROM temptest; DROP TABLE temptest; and another one doing repeated CHECKPOINTs. I've already gotten a couple occurrences of Lamar's failure. I think the problem is that BufferSync unconditionally does PinBuffer on each buffer, and holds the pin during intervals where it's released BufMgrLock, even if there's not really anything for it to do on that buffer. If someone else is running FlushRelationBuffers then it's possible for that routine to see a nonzero pin count when it looks. Vadim, what do you think about how to change this? I think this is BufferSync's fault not FlushRelationBuffers's ... regards, tom lane
On Tue, 20 Mar 2001, Tom Lane wrote: > This will not be RC1 :-( > 'Ive already gotten a > couple occurrences of Lamar's failure. Well, I was at least hoping it was a problem here -- particularly since I haven't been able to reproduce it. But, since it is not a local problem, I'm glad I caught it -- on the first regression test run, no less. I've run a dozen tests since without duplication. Although, like you, Tom, I'm curious as to why it hadn't showed up before -- is the fact that this is a slow machine a factor, possibly? Although I am now much more leery of our regression suite -- this issue isn't even tested, in reality. Do we have _any_ WAL-related tests? The parallel testing is a good thing -- but I wonder what boundary conditions aren't getting tested. -- Lamar Owen WGCR Internet Radio 1 Peter 4:11
Lamar Owen <lamar.owen@wgcr.org> writes: > Although I am now much more leery of our regression suite The regression tests are not at all designed to test concurrent behavior, and never have been. The parallel form runs some tests in parallel, true, but those tests are deliberately designed not to interact. So I don't put any faith in the regression tests as a means to catch bugs like this. We need some thought and work on better concurrent tests... regards, tom lane
> I think the problem is that BufferSync unconditionally does PinBuffer > on each buffer, and holds the pin during intervals where it's released > BufMgrLock, even if there's not really anything for it to do on that > buffer. If someone else is running FlushRelationBuffers then it's > possible for that routine to see a nonzero pin count when it looks. > > Vadim, what do you think about how to change this? I think this is > BufferSync's fault not FlushRelationBuffers's ... I'm looking there right now... Vadim
>> I think the problem is that BufferSync unconditionally does PinBuffer >> on each buffer, and holds the pin during intervals where it's released >> BufMgrLock, even if there's not really anything for it to do on that >> buffer. If someone else is running FlushRelationBuffers then it's >> possible for that routine to see a nonzero pin count when it looks. Further note: this bug does not arise in 7.0.* because in that code, BufferSync will only pin buffers that have been dirtied in the current transaction. This cannot affect a concurrent FlushRelationBuffers, which should be holding exclusive lock on the table it's flushing. Or can it? The above is safe enough for user tables, but on system tables we have a bad habit of releasing locks early. It seems possible that a VACUUM on a system table might see pins due to BufferSyncs running in concurrent transactions that have altered that system table. Perhaps this issue does explain some of the reports of FlushRelationBuffers failure that we've seen from the field. regards, tom lane
> Seeing that RC1 is in prep, is there a pressing need to upload and release beta > 6 RPM's, or will it be a day or two before RC1? Can I get the src rpm to give a try on Mandrake? I had trouble with 7.0.3 (a mysterious disappearing file in the perl build) and would like to see where we are at with 7.1... - Thomas
> Further note: this bug does not arise in 7.0.* because in that code, > BufferSync will only pin buffers that have been dirtied in the current > transaction. This cannot affect a concurrent FlushRelationBuffers, > which should be holding exclusive lock on the table it's flushing. > > Or can it? The above is safe enough for user tables, but on system > tables we have a bad habit of releasing locks early. It seems possible > that a VACUUM on a system table might see pins due to BufferSyncs > running in concurrent transactions that have altered that system table. > > Perhaps this issue does explain some of the reports of > FlushRelationBuffers failure that we've seen from the field. Another possible source of this problem (in 7.0.X) is BufferReplace..? Vadim
On Tue, 20 Mar 2001, Thomas Lockhart wrote: > > Seeing that RC1 is in prep, is there a pressing need to upload and release beta > > 6 RPM's, or will it be a day or two before RC1? > Can I get the src rpm to give a try on Mandrake? I had trouble with > 7.0.3 (a mysterious disappearing file in the perl build) and would like > to see where we are at with 7.1... Sure. If you want to try out one already up there, pull the beta4 set off the ftp site. I'm on dialup right now -- it will take quite some time to get an src.rpm up for beta 6. Although, it does look like it may be a little bit before RC1, now. I'm at beta6-0.2 right now, with several changes to make in the line, but, I can upload if you can wait a couple of hours (I'm in a rebuild right now for 0.2, which will take 77 minutes or more on this machine, and then I have to scp it over to hub.). Tomorrow morning, if I can get out of the snow-covered driveway and to work, I can upload it much quicker. I'll go ahead and upload the one I'm testing with right now if you'd like. -- Lamar Owen WGCR Internet Radio 1 Peter 4:11
> I'll go ahead and upload the one I'm testing with right now if you'd like. Not necessary, unless (I suppose) that you know the rpm for beta 4 is broken. That vintage CVS tree behaved well enough for me try it out afaicr... - Thomas
On Tue, 20 Mar 2001, Thomas Lockhart wrote: > > I'll go ahead and upload the one I'm testing with right now if you'd like. > Not necessary, unless (I suppose) that you know the rpm for beta 4 is > broken. That vintage CVS tree behaved well enough for me try it out > afaicr... It's a good start to test with for the purposes for which I think you want to test for. (and I'm an English teacher by night -- argh). Beta 6 changes a few minor things and one major thing -- the minor things are: - Separate libs package with requisite dependency redo - Change in the initscript to use pg_ctl to (properly) stop postmaster (no kill -9's here this time :-)) - Change in the initscript to initdb with LC_ALL=C and to start postmaster with LC_ALL=C as well. - devel subpackage now uses make install-all-headers instead of cpp hack to pull in required headers for client and serverdevelopment. The major thing is going to be a build of the contrib tree and a contrib subpackage -- the source will remain as part of the docs, but now that whole set of useful files will be built out. That is what I was beginning to do when I stumbled across the regression failure that subsequently took the rest of the afternoon to track. Before final release I have a rewrite of the README to do, as well as a full update of the migration scripts for testing. I'm looking at /usr/lib/pgsql/contrib/* for the contrib stuff. -- Lamar Owen WGCR Internet Radio 1 Peter 4:11
> It's a good start to test with for the purposes for which I think you want to > test for. (and I'm an English teacher by night -- argh). :) Mandrake (as of 7.2) still does a brain-dead mix of "-O3" and "-ffast-math", which is a risky and unnecessary combination according to the gcc folks (and which kills some of our date/time rounding). From the man page for gcc: -ffast-mathThis option should never be turned on by any `-O' optionsince it can result in incorrect output for programswhichdepend on an exact implementation of IEEE or ANSIrules/specifications for math functions. I'd like to get away from having to post a non-brain-dead /root/.rpmrc file which omits the -ffast-math flag. Can you suggest mechanisms for putting a "-fno-fast-math" into the spec file? Isn't there a mechanism to mark things as "distro specific"? Suggestions? Also, I'm getting the same symptom as I had for 7.0.3 with a "disappearing file". Anyone seen this? I recall tracing this back for the 7.0.3 case and found that Pg.bs existed in the build tree, at least at some point in the build, but then goes away. 7.0.2, at least at the time I did the build, did not have the problem :( File not found: /var/tmp/postgresql-7.1beta4-root/ (cont'd) usr/lib/perl5/site_perl/5.6.0/i386-linux/auto/Pg/Pg.bs - Thomas
At 3/20/2001 09:24 PM, Thomas Lockhart wrote: > > It's a good start to test with for the purposes for which I think you > want to > > test for. (and I'm an English teacher by night -- argh). > >:) > >Mandrake (as of 7.2) still does a brain-dead mix of "-O3" and >"-ffast-math", which is a risky and unnecessary combination according to >the gcc folks (and which kills some of our date/time rounding). From the >man page for gcc: > >-ffast-math > This option should never be turned on by any `-O' option > since it can result in incorrect output for programs which > depend on an exact implementation of IEEE or ANSI > rules/specifications for math functions. > >I'd like to get away from having to post a non-brain-dead /root/.rpmrc >file which omits the -ffast-math flag. Can you suggest mechanisms for >putting a "-fno-fast-math" into the spec file? Isn't there a mechanism >to mark things as "distro specific"? Suggestions? I don't know if it helps. But, a stock install has the environment MACHTYPE=i586-mandrake-linux. If you hunt for mandrake in the MACHTYPE variable you could reset those variables. Also, I think those are set in the rpmrc file of the distro for the i386 target. If you specify anything else like i486, i686, you don't have that problem. It would be in the RPM_OPT_FLAGS or RPM_OPTS part of the build environment. I don't think there would be a problem overriding it, in fact, I would recommend the following : RPM_OPTS="$RPM_OPTS -fno-fast-math". Since gcc will take the last argument as overriding the first, it would be a nice safeguard. Even setting CFLAGS="$CFLAGS -fno-fast-math" might be good idea. Hope this helps, Thomas
Thomas Lockhart <lockhart@alumni.caltech.edu> writes: > > It's a good start to test with for the purposes for which I think you want to > > test for. (and I'm an English teacher by night -- argh). > > :) > > Mandrake (as of 7.2) still does a brain-dead mix of "-O3" and > "-ffast-math", which is a risky and unnecessary combination according to > the gcc folks (and which kills some of our date/time rounding). From the > man page for gcc: > > -ffast-math > This option should never be turned on by any `-O' option > since it can result in incorrect output for programs which > depend on an exact implementation of IEEE or ANSI > rules/specifications for math functions. > > I'd like to get away from having to post a non-brain-dead /root/.rpmrc > file which omits the -ffast-math flag. Can you suggest mechanisms for > putting a "-fno-fast-math" into the spec file? Isn't there a mechanism > to mark things as "distro specific"? Suggestions? If Mandrake wants to be broken, let them - and tell them. -- Trond Eivind Glomsrød Red Hat, Inc.
> If Mandrake wants to be broken, let them - and tell them. They know ;) But just as with RH, they build ~1500 packages, so it is probably not realistic to get them to change their build standards over one misbehavior in one package. The goal here is to get PostgreSQL to work well for as many platforms as possible. Heck, we even build for M$ ;) So, I'm still looking for the best way to add a compile flag while making it clear that it is for one distro only. Of course, it would be possible to just add it at the end of the flags, but it would be nice to do that only when necessary. Regards. - Thomas
> I'm been running one backend doing repeated iterations of > > CREATE TABLE temptest(col int); > INSERT INTO temptest VALUES (1); > > CREATE TEMP TABLE temptest(col int); > INSERT INTO temptest VALUES (2); > SELECT * FROM temptest; > DROP TABLE temptest; > > SELECT * FROM temptest; > DROP TABLE temptest; > > and another one doing repeated CHECKPOINTs. I've already gotten a > couple occurrences of Lamar's failure. I wasn't able to reproduce failure with current sources. Vadim
Thomas Lockhart writes: > Mandrake (as of 7.2) still does a brain-dead mix of "-O3" and > "-ffast-math", which is a risky and unnecessary combination according to > the gcc folks (and which kills some of our date/time rounding). From the > man page for gcc: > > -ffast-math > This option should never be turned on by any `-O' option > since it can result in incorrect output for programs which > depend on an exact implementation of IEEE or ANSI > rules/specifications for math functions. You're reading this wrong. What this means is: "If you're working on GCC, do not ever think of enabling -ffast-math implicitly by any -Ox level [since most other -fxxx options are grouped under some -Ox], since programs that might want optimization could still depend on correct IEEE math." In particular, Mandrake is not wrong to compile with -O3 and -ffast-math. The consequence would only be slightly incorrect math results, and that is what indeed happened. -- Peter Eisentraut peter_e@gmx.net http://yi.org/peter-e/
NO! It's not "Mandrake" that will be broken. Mandrake is also often used by new Linux users who wouldn't have the slightest idea about setting GCC options. It'll be THEM that have broken installations if we take this approach (as an aside, that means that WE will be probably also be answering more questions about PostgreSQL being broken on Mandrake systems). Isn't it better that PostgreSQL works with what it's got on a system AND ALSO that someone notifies the Mandrake people regarding the problem? Regards and best wishes, Justin Clift Trond Eivind Glomsrød wrote: > <snip> > > If Mandrake wants to be broken, let them - and tell them. > > -- > Trond Eivind Glomsrød > Red Hat, Inc. > > ---------------------------(end of broadcast)--------------------------- > TIP 2: you can get off all lists at once with the unregister command > (send "unregister YourEmailAddressHere" to majordomo@postgresql.org)
Is the right approach for the ./configure script to check for the existence of the /etc/mandrake-release file as at least an initial indicator that the compile is happening on Mandrake? Regards and best wishes, Justin Clift Thomas Lockhart wrote: > > > If Mandrake wants to be broken, let them - and tell them. > > They know ;) But just as with RH, they build ~1500 packages, so it is > probably not realistic to get them to change their build standards over > one misbehavior in one package. > > The goal here is to get PostgreSQL to work well for as many platforms as > possible. Heck, we even build for M$ ;) > > So, I'm still looking for the best way to add a compile flag while > making it clear that it is for one distro only. Of course, it would be > possible to just add it at the end of the flags, but it would be nice to > do that only when necessary. > > Regards. > > - Thomas > > ---------------------------(end of broadcast)--------------------------- > TIP 3: if posting/reading through Usenet, please send an appropriate > subscribe-nomail command to majordomo@postgresql.org so that your > message can get through to the mailing list cleanly
Justin Clift <aa2@bigpond.net.au> writes: > It's not "Mandrake" that will be broken. Mandrake is also often used by > new Linux users who wouldn't have the slightest idea about setting GCC > options. It'll be THEM that have broken installations if we take this > approach (as an aside, that means that WE will be probably also be > answering more questions about PostgreSQL being broken on Mandrake > systems). > > Isn't it better that PostgreSQL works with what it's got on a system AND > ALSO that someone notifies the Mandrake people regarding the problem? Most people will use what the vendor ship - a vendor (like us) look into the benefits (stability, performance, compatiblity) of different packages, and make a selection. If they've done a choice of which options are used in their distribution, they are obviously fine with the consequences. -- Trond Eivind Glomsrød Red Hat, Inc.
Justin Clift <aa2@bigpond.net.au> writes: >> So, I'm still looking for the best way to add a compile flag while >> making it clear that it is for one distro only. Since this is only an RPM problem, it should be solved in the RPM spec file, not by hacking the configure script. We had at least one similar patch in the 7.0 spec file (for -fsigned-char stupidity in the RPM configuration on LinuxPPC). That's not needed anymore, but couldn't you fix Mandrake the same way? regards, tom lane
> You're reading this wrong. What this means is: > "If you're working on GCC, do not ever think of enabling -ffast-math > implicitly by any -Ox level [since most other -fxxx options are grouped > under some -Ox], since programs that might want optimization could still > depend on correct IEEE math." > In particular, Mandrake is not wrong to compile with -O3 and -ffast-math. > The consequence would only be slightly incorrect math results, and that is > what indeed happened. ?? I think we agree. It happens to be the case that slightly incorrect results are wrong results, and that full IEEE math conformance gives exactly correct results. For the case of date/time, the "slightly wrong" results round up to 60.0 seconds for times on an even minute boundary, which is just plain wrong. - Thomas
Thomas Lockhart writes: > ?? I think we agree. It happens to be the case that slightly incorrect > results are wrong results, and that full IEEE math conformance gives > exactly correct results. For the case of date/time, the "slightly wrong" > results round up to 60.0 seconds for times on an even minute boundary, > which is just plain wrong. Well, you're going to have to ask a numerical analyst about this. If you take that stance then -ffast-math is always wrong, no matter what the combination of other switches. The "wrong" results might be harder to reproduce without any optimization going on, but they could still happen. -- Peter Eisentraut peter_e@gmx.net http://yi.org/peter-e/
> Well, you're going to have to ask a numerical analyst about this. If you > take that stance then -ffast-math is always wrong, no matter what the > combination of other switches. The "wrong" results might be harder to > reproduce without any optimization going on, but they could still happen. Grumble. OK, I'll rephrase my statement: it is not "wrong", but "does not produce the *required* result". The date/time stuff relies on conventional IEEE arithmetic rounding and truncation rules to produce the world-wide, universally accepted conventions for date/time representation. And will do so *if* the compiler produces math which conforms to IEEE (and many other, in my experience) conventions for arithmetic. So, if someone actually would want to get date/time results which conform to those conventions, and if they would characterize that conformance as "correct", then they might make the leap of phrase to characterize nonconformance to those conventions as "wrong". - Thomas (who is just finishing eight days of jury duty ;)