Thread: split TOAST support out of postgres.h
Most backend code doesn't actually need the variable-length data types support (TOAST support) in postgres.h. So I figured we could try to put it into a separate header file. That makes postgres.h more manageable, and it avoids including a bunch of complicated unused stuff everywhere. I picked "varatt.h" as the name. Then we could either 1) Include varatt.h in postgres.h, similar to elog.h and palloc.h. That way we clean up the files a bit but don't change any external interfaces. 2) Just let everyone who needs it include the new file. 3) Compromise: You can avoid most "damage" by having fmgr.h include varatt.h. That satisfies most data types and extension code. That way, there are only a few places that need an explicit include of varatt.h. I went with the last option in my patch. Thoughts?
Attachment
On Wed, 28 Dec 2022 at 08:07, Peter Eisentraut <peter.eisentraut@enterprisedb.com> wrote:
Most backend code doesn't actually need the variable-length data types
support (TOAST support) in postgres.h. So I figured we could try to put
it into a separate header file. That makes postgres.h more manageable,
and it avoids including a bunch of complicated unused stuff everywhere.
I picked "varatt.h" as the name. Then we could either
[…]
I went with the last option in my patch.
Thoughts?
This is a bit of a bikeshed suggestion, but I'm wondering if you considered calling it toast.h? Only because the word is so distinctive within Postgres; everybody knows exactly to what it refers.
I definitely agree with the principle of organizing and splitting up the header files. Personally, I don't mind importing a bunch of headers if I'm using a bunch of subsystems so I would be OK with needing to import this new header if I need it.
Peter Eisentraut <peter.eisentraut@enterprisedb.com> writes: > ... Then we could either > 1) Include varatt.h in postgres.h, similar to elog.h and palloc.h. That > way we clean up the files a bit but don't change any external interfaces. > 2) Just let everyone who needs it include the new file. > 3) Compromise: You can avoid most "damage" by having fmgr.h include > varatt.h. That satisfies most data types and extension code. That way, > there are only a few places that need an explicit include of varatt.h. > I went with the last option in my patch. I dunno, #3 seems kind of unprincipled. Also, since fmgr.h is included so widely, I doubt it is buying very much in terms of reducing header footprint. How bad is it to do #2? regards, tom lane
Hi,
I've thought about this while working on Pluggable TOAST and 64-bit
TOAST value ID myself. Agree with #2 to seem the best of the above.
There are not so many places where a new header should be included.
On Wed, Dec 28, 2022 at 6:08 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
Peter Eisentraut <peter.eisentraut@enterprisedb.com> writes:
> ... Then we could either
> 1) Include varatt.h in postgres.h, similar to elog.h and palloc.h. That
> way we clean up the files a bit but don't change any external interfaces.
> 2) Just let everyone who needs it include the new file.
> 3) Compromise: You can avoid most "damage" by having fmgr.h include
> varatt.h. That satisfies most data types and extension code. That way,
> there are only a few places that need an explicit include of varatt.h.
> I went with the last option in my patch.
I dunno, #3 seems kind of unprincipled. Also, since fmgr.h is included
so widely, I doubt it is buying very much in terms of reducing header
footprint. How bad is it to do #2?
regards, tom lane
Hi, On 2022-12-28 09:07:12 -0500, Isaac Morland wrote: > This is a bit of a bikeshed suggestion, but I'm wondering if you considered > calling it toast.h? Only because the word is so distinctive within > Postgres; everybody knows exactly to what it refers. We have a bunch of toast*.h files already. The new header should pretty much only contain the types, given how widely the header is going to be included. So maybe toast_type.h? Greetings, Andres Freund
On Thu, 29 Dec 2022 at 18:16, Andres Freund <andres@anarazel.de> wrote: > > Hi, > > On 2022-12-28 09:07:12 -0500, Isaac Morland wrote: > > This is a bit of a bikeshed suggestion, but I'm wondering if you considered > > calling it toast.h? Only because the word is so distinctive within > > Postgres; everybody knows exactly to what it refers. > > We have a bunch of toast*.h files already. The new header should pretty much > only contain the types, given how widely the header is going to be > included. So maybe toast_type.h? My 2 cents: I don't think that toast_anything.h is appropriate, because even though the varatt infrastructure does enable externally-stored oversized attributes (which is the essence of TOAST), this is not the only (or primary) use of the type. Example: Indexes do not (can not?) support toasted values, but generally do support variable length attributes that would be pulled in with varatt.h. I don't see why we'd call the headers of variable-length attributes after one small - but not insignifcant - use case. Kind regards, Matthias van de Meent
Matthias van de Meent <boekewurm+postgres@gmail.com> writes: > On Thu, 29 Dec 2022 at 18:16, Andres Freund <andres@anarazel.de> wrote: >> We have a bunch of toast*.h files already. The new header should pretty much >> only contain the types, given how widely the header is going to be >> included. So maybe toast_type.h? > My 2 cents: I don't think that toast_anything.h is appropriate, > because even though the varatt infrastructure does enable > externally-stored oversized attributes (which is the essence of > TOAST), this is not the only (or primary) use of the type. +1 ... varatt.h sounded fine to me. I'd suggest varlena.h except we have one already. regards, tom lane
On 28.12.22 16:07, Tom Lane wrote: > Peter Eisentraut <peter.eisentraut@enterprisedb.com> writes: >> ... Then we could either > >> 1) Include varatt.h in postgres.h, similar to elog.h and palloc.h. That >> way we clean up the files a bit but don't change any external interfaces. > >> 2) Just let everyone who needs it include the new file. > >> 3) Compromise: You can avoid most "damage" by having fmgr.h include >> varatt.h. That satisfies most data types and extension code. That way, >> there are only a few places that need an explicit include of varatt.h. > >> I went with the last option in my patch. > > I dunno, #3 seems kind of unprincipled. Also, since fmgr.h is included > so widely, I doubt it is buying very much in terms of reducing header > footprint. How bad is it to do #2? See this incremental patch set. It seems like maybe there is some intermediate abstraction that a lot of these places should be using that we haven't thought of yet.
Attachment
Peter Eisentraut <peter.eisentraut@enterprisedb.com> writes: > On 28.12.22 16:07, Tom Lane wrote: >> I dunno, #3 seems kind of unprincipled. Also, since fmgr.h is included >> so widely, I doubt it is buying very much in terms of reducing header >> footprint. How bad is it to do #2? > See this incremental patch set. Wow, 41 files requiring varatt.h is a lot fewer than I would have guessed. I think that bears out my feeling that fmgr.h wasn't a great location: I count 117 #includes of that, many of which are in .h files themselves so that many more .c files would be required to read them. (You did check that this passes cpluspluscheck/headerscheck, right?) > It seems like maybe there is some intermediate abstraction that a lot of > these places should be using that we haven't thought of yet. Hmm. Perhaps, but I think I'm content with this version of the patch. regards, tom lane
On 2022-12-30 Fr 11:50, Tom Lane wrote: > Peter Eisentraut <peter.eisentraut@enterprisedb.com> writes: >> On 28.12.22 16:07, Tom Lane wrote: >>> I dunno, #3 seems kind of unprincipled. Also, since fmgr.h is included >>> so widely, I doubt it is buying very much in terms of reducing header >>> footprint. How bad is it to do #2? >> See this incremental patch set. > Wow, 41 files requiring varatt.h is a lot fewer than I would have guessed. > I think that bears out my feeling that fmgr.h wasn't a great location: > I count 117 #includes of that, many of which are in .h files themselves > so that many more .c files would be required to read them. > > (You did check that this passes cpluspluscheck/headerscheck, right?) > >> It seems like maybe there is some intermediate abstraction that a lot of >> these places should be using that we haven't thought of yet. > Hmm. Perhaps, but I think I'm content with this version of the patch. Looked good to me too. cheers andrew -- Andrew Dunstan EDB: https://www.enterprisedb.com
On 30.12.22 17:50, Tom Lane wrote: > Peter Eisentraut <peter.eisentraut@enterprisedb.com> writes: >> On 28.12.22 16:07, Tom Lane wrote: >>> I dunno, #3 seems kind of unprincipled. Also, since fmgr.h is included >>> so widely, I doubt it is buying very much in terms of reducing header >>> footprint. How bad is it to do #2? > >> See this incremental patch set. > > Wow, 41 files requiring varatt.h is a lot fewer than I would have guessed. > I think that bears out my feeling that fmgr.h wasn't a great location: > I count 117 #includes of that, many of which are in .h files themselves > so that many more .c files would be required to read them. committed
On Tue, Jan 10, 2023 at 06:07:49AM +0100, Peter Eisentraut wrote: > On 30.12.22 17:50, Tom Lane wrote: > >Peter Eisentraut <peter.eisentraut@enterprisedb.com> writes: > >>On 28.12.22 16:07, Tom Lane wrote: > >>>I dunno, #3 seems kind of unprincipled. Also, since fmgr.h is included > >>>so widely, I doubt it is buying very much in terms of reducing header > >>>footprint. How bad is it to do #2? > > > >>See this incremental patch set. > > > >Wow, 41 files requiring varatt.h is a lot fewer than I would have guessed. > >I think that bears out my feeling that fmgr.h wasn't a great location: > >I count 117 #includes of that, many of which are in .h files themselves > >so that many more .c files would be required to read them. > > committed SET_VARSIZE alone appears in 74 pgxn distributions, so I predict extension breakage en masse. I would revert this.
On 10.01.23 08:39, Noah Misch wrote: > On Tue, Jan 10, 2023 at 06:07:49AM +0100, Peter Eisentraut wrote: >> On 30.12.22 17:50, Tom Lane wrote: >>> Peter Eisentraut <peter.eisentraut@enterprisedb.com> writes: >>>> On 28.12.22 16:07, Tom Lane wrote: >>>>> I dunno, #3 seems kind of unprincipled. Also, since fmgr.h is included >>>>> so widely, I doubt it is buying very much in terms of reducing header >>>>> footprint. How bad is it to do #2? >>> >>>> See this incremental patch set. >>> >>> Wow, 41 files requiring varatt.h is a lot fewer than I would have guessed. >>> I think that bears out my feeling that fmgr.h wasn't a great location: >>> I count 117 #includes of that, many of which are in .h files themselves >>> so that many more .c files would be required to read them. >> >> committed > > SET_VARSIZE alone appears in 74 pgxn distributions, so I predict extension > breakage en masse. I would revert this. Well, that was sort of my thinking, but people seemed to like this. I'm happy to consider alternatives.
On Tue, Jan 10, 2023 at 3:48 AM Peter Eisentraut <peter.eisentraut@enterprisedb.com> wrote: > >>> Wow, 41 files requiring varatt.h is a lot fewer than I would have guessed. > >>> I think that bears out my feeling that fmgr.h wasn't a great location: > >>> I count 117 #includes of that, many of which are in .h files themselves > >>> so that many more .c files would be required to read them. > >> > >> committed > > > > SET_VARSIZE alone appears in 74 pgxn distributions, so I predict extension > > breakage en masse. I would revert this. > > Well, that was sort of my thinking, but people seemed to like this. I'm > happy to consider alternatives. I don't think that the number of extensions that get broken is really the right metric. It's not fantastic to break large numbers of extensions, of course, but if the solution is merely to add an #if PG_VERSION_NUM >= whatever #include "newstuff" #endif then I don't think it's really an issue. If an extension doesn't have an author who can do at least that much updating when a new PG release comes out, then it's basically unmaintained, and I just don't feel that bad about breaking unmaintained extensions now and then, even annually. Of course, if we go and remove something that's very widely used and for which there's no simple workaround, that sucks. Say, removing LWLocks entirely. But we don't usually do that sort of thing unless there's a good reason and significant benefits. I don't think it would be very nice to do something like this in a minor release. But in a new major release, I think it's fine. I've been on the hook to maintain extensions in the face of these kinds of changes at various times over the years, and it's never taken me much time. -- Robert Haas EDB: http://www.enterprisedb.com
Robert Haas <robertmhaas@gmail.com> writes: > On Tue, Jan 10, 2023 at 3:48 AM Peter Eisentraut > <peter.eisentraut@enterprisedb.com> wrote: >>> SET_VARSIZE alone appears in 74 pgxn distributions, so I predict extension >>> breakage en masse. I would revert this. >> Well, that was sort of my thinking, but people seemed to like this. I'm >> happy to consider alternatives. > I don't think it would be very nice to do something like this in a > minor release. But in a new major release, I think it's fine. I've > been on the hook to maintain extensions in the face of these kinds of > changes at various times over the years, and it's never taken me much > time. Yeah, that was my thinking. We could never do any header refactoring at all if the standard is "will some extension author need to add a #if". In practice, we make bigger adjustments than this all the time, both in header layout and in individual function APIs. Now, there is a fair question whether splitting this code out of postgres.h is worth any trouble at all. TBH my initial reaction had been "no". But once we found that only 40-ish backend files need to read this new header, I became a "yes" vote because it seems clear that there will be a total-compilation-time benefit. regards, tom lane
On Tue, Jan 10, 2023 at 9:46 AM Tom Lane <tgl@sss.pgh.pa.us> wrote: > Now, there is a fair question whether splitting this code out of > postgres.h is worth any trouble at all. TBH my initial reaction > had been "no". But once we found that only 40-ish backend files > need to read this new header, I became a "yes" vote because it > seems clear that there will be a total-compilation-time benefit. I wasn't totally about this, either, but I think on balance it's probably a smart thing to do. I always found it kind of weird that we put that stuff in postgres.h. It seems to privilege the TOAST mechanism to an undue degree; what's the argument, for example, that TOAST macros are more generally relevant than CHECK_FOR_INTERRUPTS() or LWLockAcquire or HeapTuple? It felt to me like we'd just decided that one subsystem gets to go into the main header file and everybody else just had to have their own headers. One thing that's particularly awkward about that is that if you want to write some front-end code that knows about how varlenas are stored on disk, it was very awkward with the old structure. You're not supposed to include "postgres.h" into frontend code, but if the stuff you need is defined there then what else can you do? So I generally think that anything that's in postgres.h should have a strong claim to be not only widely-needed in the backend, but also never needed at all in any other executable. -- Robert Haas EDB: http://www.enterprisedb.com
On Tue, Jan 10, 2023 at 12:00:46PM -0500, Robert Haas wrote: > On Tue, Jan 10, 2023 at 9:46 AM Tom Lane <tgl@sss.pgh.pa.us> wrote: > > Now, there is a fair question whether splitting this code out of > > postgres.h is worth any trouble at all. TBH my initial reaction > > had been "no". But once we found that only 40-ish backend files > > need to read this new header, I became a "yes" vote because it > > seems clear that there will be a total-compilation-time benefit. A time claim with no benchmarks is a red flag. I've chosen to run one: export CCACHE_DISABLE=1 change=d952373a987bad331c0e499463159dd142ced1ef for commit in $change $change^; do echo === git checkout $commit git checkout $commit for n in `seq 1 200`; do make -j20 clean; env time make -j20; done done Results: commit median mean count d952373a987bad331c0e499463159dd142ced1ef 49.35 49.37 200 d952373a987bad331c0e499463159dd142ced1ef^ 49.33 49.36 200 That is to say, the patch made the build a bit slower, not faster. That's with GCC 4.8.5 (RHEL 7). I likely should have interleaved the run types, but in any case the speed win didn't show up. > I wasn't totally about this, either, but I think on balance it's > probably a smart thing to do. I always found it kind of weird that we > put that stuff in postgres.h. It seems to privilege the TOAST > mechanism to an undue degree; what's the argument, for example, that > TOAST macros are more generally relevant than CHECK_FOR_INTERRUPTS() > or LWLockAcquire or HeapTuple? It felt to me like we'd just decided > that one subsystem gets to go into the main header file and everybody > else just had to have their own headers. > > One thing that's particularly awkward about that is that if you want > to write some front-end code that knows about how varlenas are stored > on disk, it was very awkward with the old structure. You're not > supposed to include "postgres.h" into frontend code, but if the stuff > you need is defined there then what else can you do? So I generally > think that anything that's in postgres.h should have a strong claim to > be not only widely-needed in the backend, but also never needed at all > in any other executable. If the patch had just made postgres.h include varatt.h, like it does elog.h, I'd consider that change a nonnegative. Grouping things is nice, even if it makes compilation a bit slower. That also covers your frontend use case. How about that? I agree fixing any one extension is easy enough. Thinking back to the htup_details.h refactor, I found the aggregate pain unreasonable relative to alleged benefits, even though each individual extension wasn't too bad. SET_VARSIZE is used more (74 pgxn distributions) than htup_details.h (62).
On Wed, Jan 11, 2023 at 1:14 AM Noah Misch <noah@leadboat.com> wrote: > If the patch had just made postgres.h include varatt.h, like it does elog.h, > I'd consider that change a nonnegative. Grouping things is nice, even if it > makes compilation a bit slower. That also covers your frontend use case. How > about that? I'm not direly opposed to that, but I'm also unconvinced that having the varatt.h stuff is important enough relative to other things to justify giving it a privileged place forever. > I agree fixing any one extension is easy enough. Thinking back to the > htup_details.h refactor, I found the aggregate pain unreasonable relative to > alleged benefits, even though each individual extension wasn't too bad. > SET_VARSIZE is used more (74 pgxn distributions) than htup_details.h (62). What annoyed me about that refactoring is that, in most cases where I had been including htup.h, I had to change it to include htup_details.h. In the main source tree, too, we've got 31 places that include access/htup.h and 241 that include access/htup_details.h. It's hard to argue that it was worth splitting the file given those numbers, and in fact I think that change was a mistake. But that doesn't mean every such change is a mistake. -- Robert Haas EDB: http://www.enterprisedb.com
On 10.01.23 09:48, Peter Eisentraut wrote: > On 10.01.23 08:39, Noah Misch wrote: >> On Tue, Jan 10, 2023 at 06:07:49AM +0100, Peter Eisentraut wrote: >>> On 30.12.22 17:50, Tom Lane wrote: >>>> Peter Eisentraut <peter.eisentraut@enterprisedb.com> writes: >>>>> On 28.12.22 16:07, Tom Lane wrote: >>>>>> I dunno, #3 seems kind of unprincipled. Also, since fmgr.h is >>>>>> included >>>>>> so widely, I doubt it is buying very much in terms of reducing header >>>>>> footprint. How bad is it to do #2? >>>> >>>>> See this incremental patch set. >>>> >>>> Wow, 41 files requiring varatt.h is a lot fewer than I would have >>>> guessed. >>>> I think that bears out my feeling that fmgr.h wasn't a great location: >>>> I count 117 #includes of that, many of which are in .h files themselves >>>> so that many more .c files would be required to read them. >>> >>> committed >> >> SET_VARSIZE alone appears in 74 pgxn distributions, so I predict >> extension >> breakage en masse. I would revert this. > > Well, that was sort of my thinking, but people seemed to like this. I'm > happy to consider alternatives. Given the subsequent discussion, I'll keep it as is for now but consider it a semi-open item. It's easy to change.