Thread: Should we add crc32 in libpgport?
I have been working with xlogdump and noticed that unfortunately it cannot be installed without access to a postgres build directory, which makes the exported functionality in src/include/utils/pg_crc.h useless unless one has access to pg_crc.o -- which would only happen if a build directory is lying around. Yet, pg_crc.h is *installed* in server/utils, at least from my Debian package. Worse yet, those crc implementations are the same but ever-so-slightly different (hopefully in an entirely non-semantically-important way). On more inspection, I also realized that the hstore and ltree contribs *also* have crc32 implementations, dating back to 2006 and 2002, respectively. So I think the following actions might make sense: * stop the distribution of pg_crc.h XOR include the crc tables into libpgport.a necessary to make them work * Utilize the canonical pgport implementation of crc in both contribs I tried my very best (mostly git log and reading the linked discussion in the archives, as well as searching the archives) to find any explanation why this is anything but an oversight that seems to consistently result in less-desirable engineering in anything that is compiled separately from Postgres but intends to link against it when it comes to computing a CRC. Copying CRC32 implementations everywhere is not the worst thing, but I find it inadequately explained why it's necessary for now, at least. -- fdr
On Mon, Jan 16, 2012 at 4:56 PM, Daniel Farina <daniel@heroku.com> wrote: > I have been working with xlogdump and noticed that unfortunately it > cannot be installed without access to a postgres build directory, > which makes the exported functionality in src/include/utils/pg_crc.h > useless unless one has access to pg_crc.o -- which would only happen > if a build directory is lying around. Yet, pg_crc.h is *installed* in > server/utils, at least from my Debian package. Worse yet, those crc > implementations are the same but ever-so-slightly different (hopefully > in an entirely non-semantically-important way). Out-of-order editing snafu. "Worse yet, those crc implementations are the..." should come after the note about there being two additional crc implementations in the postgres contribs. Looking back on it, it's obvious why those contribs had it in the first place: because they started external, and needed CRC, and the most expedient thing to do is include yet another implementation. So arguably this problem has occurred three times: in xlogdump, and in pre-contrib versions of hstore, and ltree. It's just the latter two sort of get a free pass by the virtue of having access to the postgres build directory as contribs in this day and age. -- fdr
On Mon, Jan 16, 2012 at 8:09 PM, Daniel Farina <daniel@heroku.com> wrote: > On Mon, Jan 16, 2012 at 4:56 PM, Daniel Farina <daniel@heroku.com> wrote: >> I have been working with xlogdump and noticed that unfortunately it >> cannot be installed without access to a postgres build directory, >> which makes the exported functionality in src/include/utils/pg_crc.h >> useless unless one has access to pg_crc.o -- which would only happen >> if a build directory is lying around. Yet, pg_crc.h is *installed* in >> server/utils, at least from my Debian package. Worse yet, those crc >> implementations are the same but ever-so-slightly different (hopefully >> in an entirely non-semantically-important way). > > Out-of-order editing snafu. "Worse yet, those crc implementations are > the..." should come after the note about there being two additional > crc implementations in the postgres contribs. > > Looking back on it, it's obvious why those contribs had it in the > first place: because they started external, and needed CRC, and the > most expedient thing to do is include yet another implementation. So > arguably this problem has occurred three times: in xlogdump, and in > pre-contrib versions of hstore, and ltree. It's just the latter two > sort of get a free pass by the virtue of having access to the postgres > build directory as contribs in this day and age. I think you make a compelling case. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Mon, Jan 16, 2012 at 6:18 PM, Robert Haas <robertmhaas@gmail.com> wrote: > I think you make a compelling case. That's enough for me to just do it. Expect a patch soon. -- fdr
Daniel Farina <daniel@heroku.com> writes: > Copying CRC32 implementations everywhere is not the worst thing, but I > find it inadequately explained why it's necessary for now, at least. Agreed, but I don't care for your proposed solution (put it in libpgport) because that assumes a fact not in evidence, namely that external projects have access to libpgport either. Is it possible to put enough stuff in pg_crc.h so that external code could just include that, perhaps after an extra #define to enable extra code? In the worst case we could just do #ifdef PROVIDE_CRC_IMPLEMENTATION ... current contents of pg_crc.c ...#endif but perhaps there's some intermediate possibility that's less ugly. As for whether we could drop the existing near-duplicate code in contrib/, I think we'd first have to convince ourselves that it was functionally identical, because otherwise replacing those versions would break existing ltree and hstore indexes. regards, tom lane
On Mon, Jan 16, 2012 at 9:33 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Daniel Farina <daniel@heroku.com> writes: >> Copying CRC32 implementations everywhere is not the worst thing, but I >> find it inadequately explained why it's necessary for now, at least. > > Agreed, but I don't care for your proposed solution (put it in > libpgport) because that assumes a fact not in evidence, namely that > external projects have access to libpgport either. I see. Because of ./configure --disabled-shared is a supported option. > Is it possible to put enough stuff in pg_crc.h so that external code could > just include that, perhaps after an extra #define to enable extra code? Yes. As a nice side effect, we manage to get rid of a self-described ugly hack, involving exposing the function from libpgport, so outside the ugly preprocessor dealing, we do score a victory. Related to that, I have also demoted the symbol from extern to static. There are a couple of build-process special-cases for utilities like pg_controldata and pg_resetxlog that are thankfully able to be removed. In addition, it seemed pretty weird that this wasn't so much a "port" (like stub gettimeofday implementations) but rather a function desired on all platforms -- the degenerate case, where zero platforms have the function already. So a minor plus of anti-awkwardness of calling it a 'port'. > As for whether we could drop the existing near-duplicate code in > contrib/, I think we'd first have to convince ourselves that it was > functionally identical, because otherwise replacing those versions would > break existing ltree and hstore indexes. True. It *is* billed CRC32, so unless there's a bug it *should* be identical -- but if not, a version bump of the extension/type may be necessary (do we even know what to do about that, given pg_upgrade?). I'm not sure what beyond careful inspection (which I haven't done) and testing a small corpus for binary equivalence what is to be done about that to be convincing, though. I'll submit the dedup patch separately, I currently only have ltree done. See the attached patch. It has a detailed cover letter/comment at the top of the file. I have confirmed it applies, builds, and relieves one of my problems in building xlogdump without access to postgres .o files. I think the other is surmountable in that project (sprompt.o, which seems hardly as fundamental). I don't think I've tested the CRC64 path at all, as it is not used anywhere -- it's sort of there just to occupy symbol space, as well as I can tell, per its comments ("reserved"). -- fdr
Attachment
On Tue, Jan 17, 2012 at 2:14 AM, Daniel Farina <daniel@heroku.com> wrote: > See the attached patch. It has a detailed cover letter/comment at the > top of the file. I have amended that description to be more accurate. -- fdr
Attachment
At 2012-01-17 13:43:11 -0800, daniel@heroku.com wrote: > > > See the attached patch. It has a detailed cover letter/comment at > > the top of the file. The patch looks good, but the resetxlog and controldata Makefile hunks didn't apply. I don't know why, but I've attached updated versions of those hunks below, to save someone a moment. Everything else is fine. -- ams
Attachment
Daniel Farina <daniel@heroku.com> writes: > On Tue, Jan 17, 2012 at 2:14 AM, Daniel Farina <daniel@heroku.com> wrote: >> See the attached patch. �It has a detailed cover letter/comment at the >> top of the file. > I have amended that description to be more accurate. I looked at this a bit, and it seems to go considerably further than I had in mind: unless I've missed something, this will instantiate a couple of kilobytes of static data in every .c file that includes pg_crc.h, directly or indirectly. While that might be tolerable in an external project, there are going to be a a LOT of copies of that table in the backend, many of them unused. Did you check to see how much larger the backend executable got? For that matter, aren't there a lot of build warnings about unused static variables? I think we'd be better off to continue to instantiate the table in just one file in the backend. I'm not sure whether there'd be multiple copies in libpq, but that would be appropriate to worry about too. I am not sure if we'd still need the CRCDLLIMPORT hack or not in that scenario. It occurs to me that rather than an #ifdef symbol, it might be appropriate to put the constant table into a separate .h file, say pg_crc_tables.h, so that users would control it by including that file or not rather than an #ifdef symbol. This is pretty trivial but might look a bit nicer. regards, tom lane
On Tue, Jan 31, 2012 at 3:43 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Daniel Farina <daniel@heroku.com> writes: >> On Tue, Jan 17, 2012 at 2:14 AM, Daniel Farina <daniel@heroku.com> wrote: >>> See the attached patch. It has a detailed cover letter/comment at the >>> top of the file. > >> I have amended that description to be more accurate. > > I looked at this a bit, and it seems to go considerably further than > I had in mind: unless I've missed something, this will instantiate a > couple of kilobytes of static data in every .c file that includes > pg_crc.h, directly or indirectly. While that might be tolerable in an > external project, there are going to be a a LOT of copies of that table > in the backend, many of them unused. Did you check to see how much > larger the backend executable got? For that matter, aren't there a lot > of build warnings about unused static variables? Ah, yes, I think my optimizations were off when building, or something. I didn't get such verbosity at first, and then I remember doing something slightly different and then getting a lot of output. I didn't pay attention to the build size. I will investigate. > It occurs to me that rather than an #ifdef symbol, it might be > appropriate to put the constant table into a separate .h file, > say pg_crc_tables.h, so that users would control it by including > that file or not rather than an #ifdef symbol. This is pretty > trivial but might look a bit nicer. I agree, I was about to say "what about a preprocessor hack..." after the last paragraph, but saw you beat me to the punch. I'll have a look soon. -- fdr
On Fri, Feb 3, 2012 at 7:33 PM, Daniel Farina <daniel@heroku.com> wrote: > Ah, yes, I think my optimizations were off when building, or > something. I didn't get such verbosity at first, and then I remember > doing something slightly different and then getting a lot of output. > I didn't pay attention to the build size. I will investigate. [...] > > I agree, I was about to say "what about a preprocessor hack..." after > the last paragraph, but saw you beat me to the punch. I'll have a look soon. Ping! -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Thu, Feb 16, 2012 at 6:09 AM, Robert Haas <robertmhaas@gmail.com> wrote: > On Fri, Feb 3, 2012 at 7:33 PM, Daniel Farina <daniel@heroku.com> wrote: >> Ah, yes, I think my optimizations were off when building, or >> something. I didn't get such verbosity at first, and then I remember >> doing something slightly different and then getting a lot of output. >> I didn't pay attention to the build size. I will investigate. > [...] >> >> I agree, I was about to say "what about a preprocessor hack..." after >> the last paragraph, but saw you beat me to the punch. I'll have a look soon. > > Ping! Err, yes. Clearly I've managed to not do this, and not see your email until now. Here's what I intend to do: 1) Split the tables into another header file, per Tom's suggestion 2) #include those tables in pgport exactly once. Per Tom's objection that pgport is not always available in distributions, that is not the only way the table will be exposed, but as pgport is definitely built and available when building postgres proper. 3) Third-party projects and contribs should use the header file, and not libpgport It's still a bit awkward in that one is including something that's not really a "port" (except in the degenerate sense, as no system has these tables defined vs, say, gettimeofday, where Windows needs a port), but it's the only thing that I can see that is compiled once and can be linked against repeatedly in postgres's build without having to, say, directly cross-reference the pg_crc.o file (as seen in the two command line utilities that need crc). Thoughts? -- fdr
On Wed, Feb 22, 2012 at 8:17 PM, Daniel Farina <daniel@heroku.com> wrote: > Thoughts? Thinking unnecessary. Motion is progress. Here is a patch that uses this exact plan: pgport for the tables, broken out into a header file that is included in the building of libpgport. I have confirmed by objdump -t that multiple copies of the table are not included in the postgres binary and the bloat has not occurred. The patch has a detailed cover letter, as per the previous submissions. -- fdr
Attachment
Daniel Farina <daniel@heroku.com> writes: > Thinking unnecessary. Motion is progress. Here is a patch that uses > this exact plan: pgport for the tables, broken out into a header file > that is included in the building of libpgport. I have confirmed by > objdump -t that multiple copies of the table are not included in the > postgres binary and the bloat has not occurred. Applied with minor adjustments; mainly, I cleaned up some additional traces of the old way of building pg_resetxlog and pg_controldata. regards, tom lane