Thread: Call for port testing on fmgr changes
I have finished updating about half of Postgres' builtin functions to the new style of fmgr interface. At this point, everything that accepts any pass-by-value datatypes is converted; the remaining work is for functions that use only pass-by-reference datatypes, and therefore receive only pointer arguments. This should already take care of function-call-related portability problems on many platforms. In particular these changes should eliminate the need for Ryan Kirkpatrick's Linux/Alpha patches, and should also solve the known problems on PPC builds with optimization higher than -O0. We might be able to increase the optimization level on other platforms that have had trouble with function-call optimizations, too. I'd like to get the current code tested by some people with Alphas (or other 64-bit platforms), PPCs, and anything else that has had optimization-related problems. You can get the 7.1 development code from our CVS server, or use the current daily-snapshot tarball (see ftp://ftp.postgresql.org/pub/dev/). regards, tom lane
> eliminate the need for Ryan Kirkpatrick's Linux/Alpha patches, and > should also solve the known problems on PPC builds with optimization > higher than -O0. We might be able to increase the optimization level I have change the -O0 to -O2. -- Bruce Momjian | http://www.op.net/~candle pgman@candle.pha.pa.us | (610) 853-3000 + If your life is a hard drive, | 830 Blythe Avenue + Christ can be your backup. | Drexel Hill, Pennsylvania 19026
On Wed, 14 Jun 2000, Tom Lane wrote: > I have finished updating about half of Postgres' builtin functions to > the new style of fmgr interface. At this point, everything that accepts > any pass-by-value datatypes is converted; the remaining work is for > functions that use only pass-by-reference datatypes, and therefore > receive only pointer arguments. Cool! I need to study up a bit on fmgr, but anything that help Linux/Alpha is a good thing. :) > This should already take care of function-call-related portability > problems on many platforms. In particular these changes should > eliminate the need for Ryan Kirkpatrick's Linux/Alpha patches, and Not all of my Linux/Alpha patches are related to fmgr, there are a few related to s_lock and such (which can probably be put into the source tree with #ifdefs). But this should take care of the majority of the Linux/Alpha patch. I will download the snapshot today at work (where I have "real" bandwidth :) and test things out this weekend. I should have a report by Monday. Maybe I will even have a patch that can be safely applied to the development source tree. :) TTYL. --------------------------------------------------------------------------- | "For to me to live is Christ, and to die is gain." | | --- Philippians 1:21 (KJV) | --------------------------------------------------------------------------- | Ryan Kirkpatrick | Boulder, Colorado | http://www.rkirkpat.net/ | ---------------------------------------------------------------------------
Ryan Kirkpatrick <pgsql@rkirkpat.net> writes: > Not all of my Linux/Alpha patches are related to fmgr, there are a > few related to s_lock and such (which can probably be put into the source > tree with #ifdefs). Right-o. Up to now we haven't worried about that, but now that the fundamental problem is fixed (I hope), we can start cleaning up the smaller details so that Linux/Alpha will be supported out-of-the-box. regards, tom lane
On Fri, 16 Jun 2000, Ryan Kirkpatrick wrote: > I will download the snapshot today at work (where I have "real" > bandwidth :) and test things out this weekend. I should have a report by > Monday. Maybe I will even have a patch that can be safely applied to > the development source tree. :) TTYL. Ok, I have tested the new snapshot. I have good news and I have bad news... First of all, to build pgsql correctly from the snapshot tarball (dated Fri, Jun 16th), I had to run a 'make distclean' first. There were some config.status files laying around that were confusing the configure run. Don't know if that is just par for pgsql devel snapshots or if there was mistake somewhere in the building of the snapshot. Once I got past that, I realized that the only non-fmgr related Linux/Alpha patch was a adding a single line to the linux_alpha template file. If I had realized that earlier, I would have submitted that patch long ago. But it is attached now, probably not even work using 'patch' on (i.e. easier to hand code it in), and really should not break any other platforms. :) Now, for the fmgr rewrite testing... I compiled pgsql w/o any patches or source code modifications save for the two I mentioned above (make distclean and added line to linux_alpha template). 'initdb' ran without problems. Regression tests failed on geometry as expected (off by one in nth decimal place), but also failed on timestamp, tinterval, horology, and abstime. These latter failures were characteristic of running release versions of pgsql w/o the Linux/Alpha patches. Years that are several centuries off. :( I tracked the problem down to the following suspect functions. I found these as they were functions listed in the Linux/Alpha patches (as needing Datum datatypes for function params), but did not appear to have been rewritten from the new fmgr (i.e. no PG_FUNCTION_ARGS in declaration). These functions are all in src/backend/utils/adt/nabstime.c. abstime2tm tm2abstime AbsoluteTimeIsBefore AbsoluteTimeIsAfter reltime2tm If these are converted to the new fmgr format, then I think the regression tests mentioned above should pass on Linux/Alpha. I think I will leave the conversion to some one who is more familiar with the new fmgr than me. Once that is done, and the attached patch is applied to the source tree, there is a very good chance that pgsql will FINALLY work out of the box on Linux/Alpha. :) Let me know when the new fmgr is ready to test again, if you need any more information on the above, or if there is anything else I can do help this process. TTYL. --------------------------------------------------------------------------- | "For to me to live is Christ, and to die is gain." | | --- Philippians 1:21 (KJV) | --------------------------------------------------------------------------- | Ryan Kirkpatrick | Boulder, Colorado | http://www.rkirkpat.net/ | ---------------------------------------------------------------------------
Attachment
Applied. > On Fri, 16 Jun 2000, Ryan Kirkpatrick wrote: > > > I will download the snapshot today at work (where I have "real" > > bandwidth :) and test things out this weekend. I should have a report by > > Monday. Maybe I will even have a patch that can be safely applied to > > the development source tree. :) TTYL. > > Ok, I have tested the new snapshot. I have good news and I have > bad news... > First of all, to build pgsql correctly from the snapshot tarball > (dated Fri, Jun 16th), I had to run a 'make distclean' first. There were > some config.status files laying around that were confusing the configure > run. Don't know if that is just par for pgsql devel snapshots or if there > was mistake somewhere in the building of the snapshot. > Once I got past that, I realized that the only non-fmgr related > Linux/Alpha patch was a adding a single line to the linux_alpha template > file. If I had realized that earlier, I would have submitted that patch -- Bruce Momjian | http://www.op.net/~candle pgman@candle.pha.pa.us | (610) 853-3000 + If your life is a hard drive, | 830 Blythe Avenue + Christ can be your backup. | Drexel Hill, Pennsylvania 19026
Ryan Kirkpatrick <pgsql@rkirkpat.net> writes: > First of all, to build pgsql correctly from the snapshot tarball > (dated Fri, Jun 16th), I had to run a 'make distclean' first. There were > some config.status files laying around that were confusing the configure > run. That's odd, the snapshot is supposed to have been made from a distcleaned directory tree. Maybe something busted in the snapshot generation script? Marc? > I tracked the problem down to the following suspect functions. I > found these as they were functions listed in the Linux/Alpha patches (as > needing Datum datatypes for function params), but did not appear to have > been rewritten from the new fmgr (i.e. no PG_FUNCTION_ARGS in > declaration). These functions are all in src/backend/utils/adt/nabstime.c. > abstime2tm > tm2abstime > AbsoluteTimeIsBefore > AbsoluteTimeIsAfter > reltime2tm > If these are converted to the new fmgr format, then I think the regression > tests mentioned above should pass on Linux/Alpha. Hmm. I did not touch these because they aren't fmgr-callable (and in fact I feel fairly safe in saying that AbsoluteTimeIsAfter isn't called period, since it's ifdef'd out...). As best I can tell, the other four are only called from places that see valid prototypes for them, so if your compiler is failing to call them correctly then your compiler is broken. I suspect that the real problem is not call sequences, but something else that happens to be affecting these routines (and maybe related code that doesn't get exercised by the regression tests). Maybe something like macro-constants that are declared with not quite the right type (unsigned or not, long or not) and need to be casted to match whatever they're being compared to. We've seen that before. Could you dig a little more and try to identify exactly what's going wrong? Anyway, it sure sounds like we've broken the back of the problems. Couple more bug fixes and we'll be there. That's good news indeed! regards, tom lane
On Mon, 19 Jun 2000, Tom Lane wrote: > > declaration). These functions are all in src/backend/utils/adt/nabstime.c. > > abstime2tm > > tm2abstime > > AbsoluteTimeIsBefore > > AbsoluteTimeIsAfter > > reltime2tm > > Hmm. I did not touch these because they aren't fmgr-callable (and in ... > I suspect that the real problem is not call sequences, but something > else that happens to be affecting these routines (and maybe related code ... > they're being compared to. We've seen that before. > Could you dig a little more and try to identify exactly what's > going wrong? Will do. I ran out of time last weekend to actually test if these functions were the cause of the problem or not. They just looked suspcious given the patches I had. I will did deeper and see what I can find, but it will probably not happen until next weekend. Will post when I have found something. > Anyway, it sure sounds like we've broken the back of the problems. > Couple more bug fixes and we'll be there. That's good news indeed! Yes, very good news indeed! :) --------------------------------------------------------------------------- | "For to me to live is Christ, and to die is gain." | | --- Philippians 1:21 (KJV) | --------------------------------------------------------------------------- | Ryan Kirkpatrick | Boulder, Colorado | http://www.rkirkpat.net/ | ---------------------------------------------------------------------------
> On Mon, 19 Jun 2000, Tom Lane wrote: > > > > declaration). These functions are all in src/backend/utils/adt/nabstime.c. > > > abstime2tm > > > tm2abstime > > > AbsoluteTimeIsBefore > > > AbsoluteTimeIsAfter > > > reltime2tm > > > > Hmm. I did not touch these because they aren't fmgr-callable (and in > ... > > I suspect that the real problem is not call sequences, but something > > else that happens to be affecting these routines (and maybe related code > ... > > they're being compared to. We've seen that before. > > Could you dig a little more and try to identify exactly what's > > going wrong? > > Will do. I ran out of time last weekend to actually test if these > functions were the cause of the problem or not. They just looked suspcious > given the patches I had. I will did deeper and see what I can find, but it > will probably not happen until next weekend. Will post when I have found > something. Tamotsu Nakagawa has posted a fix for this to a local mail list in Japan. Can someone comment on this? According to him, with the patch now only the geometry test fails. void -abstime2tm(AbsoluteTime time, int *tzp, struct tm * tm, char *tzn) +abstime2tm(AbsoluteTime _time, int *tzp, struct tm * tm, char *tzn) { + time_t time = (time_t) _time; #ifdef USE_POSIX_TIME struct tm *tx;
Tatsuo Ishii <t-ishii@sra.co.jp> writes: > Tamotsu Nakagawa has posted a fix for this to a local mail list in > Japan. Can someone comment on this? According to him, with the patch > now only the geometry test fails. > void > -abstime2tm(AbsoluteTime time, int *tzp, struct tm * tm, char *tzn) > +abstime2tm(AbsoluteTime _time, int *tzp, struct tm * tm, char *tzn) > { > + time_t time = (time_t) _time; > #ifdef USE_POSIX_TIME > struct tm *tx; Hmm, that makes all kinds of sense if time_t is not the same size as AbsoluteTime --- which wouldn't surprise me at all on a 64-bit system. time_t *ought* to be 64-bits on such a machine. The casts in that routine, tx = localtime((time_t *) &time); are obviously bogus if so. Can anyone with an Alpha comment? What surprises me more is the implication that this is the only place that makes such a bogus assumption about the size of time_t. I'd have guessed there are more places... regards, tom lane
> Hmm, that makes all kinds of sense if time_t is not the same size as > AbsoluteTime --- which wouldn't surprise me at all on a 64-bit system. > time_t *ought* to be 64-bits on such a machine. The casts in that > routine, > tx = localtime((time_t *) &time); > are obviously bogus if so. Can anyone with an Alpha comment? I haven't had an Alpha for a couple of years, but I *strongly* recall that time_t is 64 bits on that machine. - Thomas
Thomas Lockhart wrote: > > Hmm, that makes all kinds of sense if time_t is not the same size as > > AbsoluteTime --- which wouldn't surprise me at all on a 64-bit system. > > time_t *ought* to be 64-bits on such a machine. The casts in that > > routine, > > tx = localtime((time_t *) &time); > > are obviously bogus if so. Can anyone with an Alpha comment? > > I haven't had an Alpha for a couple of years, but I *strongly* recall > that time_t is 64 bits on that machine. > In <sys/types.h> time_t is defined as an int4, i.e. 4 bytes. To double-check I wrote a program to print sizeof: sizeof(time_t)=4 (DU 4.0F, cc) So I guess it is 32 bits. On the whole they have stuck to traditional sizes for traditional types -- it would just have broken too many programmes otherwise. Of course they are going to have to make time_t 64 bits within the next 30 years .... Adriaan
On Sun, 25 Jun 2000, Tom Lane wrote: > Tatsuo Ishii <t-ishii@sra.co.jp> writes: > > Tamotsu Nakagawa has posted a fix for this to a local mail list in > > Japan. Can someone comment on this? According to him, with the patch > > now only the geometry test fails. > > > void > > -abstime2tm(AbsoluteTime time, int *tzp, struct tm * tm, char *tzn) > > +abstime2tm(AbsoluteTime _time, int *tzp, struct tm * tm, char *tzn) > > { > > + time_t time = (time_t) _time; > > #ifdef USE_POSIX_TIME > > struct tm *tx; > > Hmm, that makes all kinds of sense if time_t is not the same size as > AbsoluteTime --- which wouldn't surprise me at all on a 64-bit system. > time_t *ought* to be 64-bits on such a machine. The casts in that > routine, > tx = localtime((time_t *) &time); > are obviously bogus if so. Can anyone with an Alpha comment? On Linux/Alpha, time_t is indeed 64 bit (i.e. printf("%d\n",sizeof(time_t)) results in an '8'). If AbsoluteTime is only 32 bit, then that would definitely cause a problem. > What surprises me more is the implication that this is the only place > that makes such a bogus assumption about the size of time_t. I'd have > guessed there are more places... I will apply the above patch, check for other instances of time_t size assumptions, and see what the result is this afternoon. TTYL. --------------------------------------------------------------------------- | "For to me to live is Christ, and to die is gain." | | --- Philippians 1:21 (KJV) | --------------------------------------------------------------------------- | Ryan Kirkpatrick | Boulder, Colorado | http://www.rkirkpat.net/ | ---------------------------------------------------------------------------
> > > void > > > -abstime2tm(AbsoluteTime time, int *tzp, struct tm * tm, char *tzn) > > > +abstime2tm(AbsoluteTime _time, int *tzp, struct tm * tm, char *tzn) > > > { > > > + time_t time = (time_t) _time; > > > #ifdef USE_POSIX_TIME > > > struct tm *tx; > > Ok, the above patch does indeed solve the problem. And this appears to be the only place AbsoluteTime needs to be copied to a time_t variable. I can't find any other casts of AbsoluteTime to time_t, and with this patch applied all regression tests pass just fine (save for geometry of course with its standard off by one in nth decimal place difference). Additionally, I do not see how this patch could break other platforms. At worst, it is a minor slow down that might even be optimized out by some compiliers when they see that sizeof(AbsoluteTime) == sizeof(time_t). I will defer to the core developers on how you want to apply this patch to the source tree (i.e. with #ifdef alpha && linux or as above). Though probably best to add a bit of a comment beside it so someone does not remove it later thinking they are "optimizing" the code. :) At this point, Linux/Alpha should actually run out of the box! Let me know when this patch is applied (in what ever form it ends up as) and I will download a new snapshot and test it. TTYL. --------------------------------------------------------------------------- | "For to me to live is Christ, and to die is gain." | | --- Philippians 1:21 (KJV) | --------------------------------------------------------------------------- | Ryan Kirkpatrick | Boulder, Colorado | http://www.rkirkpat.net/ | ---------------------------------------------------------------------------
Tom Lane writes: > What surprises me more is the implication that this is the only place > that makes such a bogus assumption about the size of time_t. I'd have > guessed there are more places... A lot of those were fixed for 7.0. -- Peter Eisentraut Sernanders väg 10:115 peter_e@gmx.net 75262 Uppsala http://yi.org/peter-e/ Sweden
Ryan Kirkpatrick <pgsql@rkirkpat.net> writes: > Ok, the above patch does indeed solve the problem. And this > appears to be the only place AbsoluteTime needs to be copied to a time_t > variable. I can't find any other casts of AbsoluteTime to time_t, Great! > and > with this patch applied all regression tests pass just fine (save for > geometry of course with its standard off by one in nth decimal place > difference). Probably we should write that off as a platform issue and create an Alpha-specific expected-output file for geometry. See the documentation about platform-specific files, and please send along a patch to add one. > Additionally, I do not see how this patch could break other > platforms. At worst, it is a minor slow down that might even be optimized > out by some compiliers when they see that sizeof(AbsoluteTime) == > sizeof(time_t). I will defer to the core developers on how you want to > apply this patch to the source tree (i.e. with #ifdef alpha && linux or as > above). No, we should just apply it as is, no #ifdef. There are going to be more and more platforms with 64-bit time_t. regards, tom lane
On Sun, 25 Jun 2000, Tom Lane wrote: > > with this patch applied all regression tests pass just fine (save for > > geometry of course with its standard off by one in nth decimal place > > difference). > > Probably we should write that off as a platform issue and create an > Alpha-specific expected-output file for geometry. See the documentation > about platform-specific files, and please send along a patch to add one. I remember finding a geometry.out file in the expected directory that did match. Seems to me it was a Solaris/Sun one.... Anyway, I will look into it and generate a patch to rid ourselves of that annoyance. Probably will be next Sunday before it happens. As for the comments about different geometry outputs with different optimization levels, the linux_alpha template is set to use -O2, which seems pretty standard for safe, yet useful optimization. Though I have had reports that newer versions of gcc (from Mandrake 7.1) break pgsql (spinlocks get stuck) with that optimization level. My alpha is still running Debian 2.1 and so my gcc is a little old. Probably will upgrade before the end of the summer and have to deal with that issue then. Yuck. :( Of course, if some one wants to beat me to it, feel free :) > > Additionally, I do not see how this patch could break other > > platforms. At worst, it is a minor slow down that might even be optimized > > out by some compiliers when they see that sizeof(AbsoluteTime) == > > sizeof(time_t). I will defer to the core developers on how you want to > > apply this patch to the source tree (i.e. with #ifdef alpha && linux or as > > above). > > No, we should just apply it as is, no #ifdef. There are going to be > more and more platforms with 64-bit time_t. No problem here with that, go ahead and apply it. --------------------------------------------------------------------------- | "For to me to live is Christ, and to die is gain." | | --- Philippians 1:21 (KJV) | --------------------------------------------------------------------------- | Ryan Kirkpatrick | Boulder, Colorado | http://www.rkirkpat.net/ | ---------------------------------------------------------------------------
Applied as you suggested. > > On Mon, 19 Jun 2000, Tom Lane wrote: > > > > > > declaration). These functions are all in src/backend/utils/adt/nabstime.c. > > > > abstime2tm > > > > tm2abstime > > > > AbsoluteTimeIsBefore > > > > AbsoluteTimeIsAfter > > > > reltime2tm > > > > > > Hmm. I did not touch these because they aren't fmgr-callable (and in > > ... > > > I suspect that the real problem is not call sequences, but something > > > else that happens to be affecting these routines (and maybe related code > > ... > > > they're being compared to. We've seen that before. > > > Could you dig a little more and try to identify exactly what's > > > going wrong? > > > > Will do. I ran out of time last weekend to actually test if these > > functions were the cause of the problem or not. They just looked suspcious > > given the patches I had. I will did deeper and see what I can find, but it > > will probably not happen until next weekend. Will post when I have found > > something. > > Tamotsu Nakagawa has posted a fix for this to a local mail list in > Japan. Can someone comment on this? According to him, with the patch > now only the geometry test fails. > > void > -abstime2tm(AbsoluteTime time, int *tzp, struct tm * tm, char *tzn) > +abstime2tm(AbsoluteTime _time, int *tzp, struct tm * tm, char *tzn) > { > + time_t time = (time_t) _time; > #ifdef USE_POSIX_TIME > struct tm *tx; > > -- Bruce Momjian | http://www.op.net/~candle pgman@candle.pha.pa.us | (610) 853-3000 + If your life is a hard drive, | 830 Blythe Avenue + Christ can be your backup. | Drexel Hill, Pennsylvania 19026
On Sun, 25 Jun 2000, Tom Lane wrote: > Probably we should write that off as a platform issue and create an > Alpha-specific expected-output file for geometry. I noticed the other day that the geometry output differs with the compiler optimization level. That can't be good. -- Peter Eisentraut Sernanders väg 10:115 peter_e@gmx.net 75262 Uppsala http://yi.org/peter-e/ Sweden
Linux/Alpha Regression Test Patch (Was Re: Call for port testing on fmgr changes -- Results! )
From
Ryan Kirkpatrick
Date:
On Sun, 25 Jun 2000, Tom Lane wrote: > > and > > with this patch applied all regression tests pass just fine (save for > > geometry of course with its standard off by one in nth decimal place > > difference). > > Probably we should write that off as a platform issue and create an > Alpha-specific expected-output file for geometry. See the documentation > about platform-specific files, and please send along a patch to add one. The patch is attached, just adds a line to the resultmap file as the geometry-solaris-precision.out file matched the Linux/Alpha output. Also, the geometry-cygwin-precision.out is an exact match to the geometry-solaris-precision.out file if anyone is interested in reducing the number of geometry files. :) Now, all regression tests are passing (snapshot from 2000/7/1), and this is on a build of pgsql straight out of the box. Just untarred, configure (w/linux_alpha template), and a make. So, it looks like we have finally reached out-of-the-box compatiblity for Linux/Alpha. :) Though I have heard that some of the optional interfaces, ODBC in particular, are not compiling on the Linux/Alpha. I guess one's work is never done... TTYL. PS. The 'make runcheck' for the regression tests is quite a nice way of testing a pgsql build without affecting a current pgsql install. One gripe, it calls 'gmake' to do the temporary install, which does not exists, at least under Debian 2.1 Linux, as GNU make is the only make installed. Maybe, test for gmake, if it not found, just use make, or use make outright on linux machines (detected from arch configurations). Just a suggestion. PPS. The snapshots still have a dirty config.cache and config.status file included with them, requiring a make distclean immediately after untar. --------------------------------------------------------------------------- | "For to me to live is Christ, and to die is gain." | | --- Philippians 1:21 (KJV) | --------------------------------------------------------------------------- | Ryan Kirkpatrick | Boulder, Colorado | http://www.rkirkpat.net/ | ---------------------------------------------------------------------------
Attachment
Re: [HACKERS] Linux/Alpha Regression Test Patch (Was Re: Call for port testing on fmgr changes -- Results! )
From
Bruce Momjian
Date:
Applied. > On Sun, 25 Jun 2000, Tom Lane wrote: > > > > and > > > with this patch applied all regression tests pass just fine (save for > > > geometry of course with its standard off by one in nth decimal place > > > difference). > > > > Probably we should write that off as a platform issue and create an > > Alpha-specific expected-output file for geometry. See the documentation > > about platform-specific files, and please send along a patch to add one. > > The patch is attached, just adds a line to the resultmap file as > the geometry-solaris-precision.out file matched the Linux/Alpha output. > Also, the geometry-cygwin-precision.out is an exact match to the > geometry-solaris-precision.out file if anyone is interested in reducing > the number of geometry files. :) -- Bruce Momjian | http://www.op.net/~candle pgman@candle.pha.pa.us | (610) 853-3000 + If your life is a hard drive, | 830 Blythe Avenue + Christ can be your backup. | Drexel Hill, Pennsylvania 19026
On Wed, 5 Jul 2000, Bruce Momjian wrote: > Applied. > > > The patch is attached, just adds a line to the resultmap file as > > the geometry-solaris-precision.out file matched the Linux/Alpha output. > > Also, the geometry-cygwin-precision.out is an exact match to the > > geometry-solaris-precision.out file if anyone is interested in reducing > > the number of geometry files. :) Verified as valid. I downloaded today's snapshot, built it, and then ran the regression checks (make runcheck). I made no modifications to the source tree (i.e. no patches), and all regression tests passed. I think at this point we can declare that pgsql works out of the box on Linux/Alpha. :) I will of course periodically check the snapshots to make sure nothing new is broken on the Linux/Alpha platform. TTYL. --------------------------------------------------------------------------- | "For to me to live is Christ, and to die is gain." | | --- Philippians 1:21 (KJV) | --------------------------------------------------------------------------- | Ryan Kirkpatrick | Boulder, Colorado | http://www.rkirkpat.net/ | ---------------------------------------------------------------------------