Thread: split TOAST support out of postgres.h

split TOAST support out of postgres.h

From
Peter Eisentraut
Date:
Most backend code doesn't actually need the variable-length data types 
support (TOAST support) in postgres.h.  So I figured we could try to put 
it into a separate header file.  That makes postgres.h more manageable, 
and it avoids including a bunch of complicated unused stuff everywhere. 
I picked "varatt.h" as the name.  Then we could either

1) Include varatt.h in postgres.h, similar to elog.h and palloc.h.  That 
way we clean up the files a bit but don't change any external interfaces.

2) Just let everyone who needs it include the new file.

3) Compromise: You can avoid most "damage" by having fmgr.h include 
varatt.h.  That satisfies most data types and extension code.  That way, 
there are only a few places that need an explicit include of varatt.h.

I went with the last option in my patch.

Thoughts?
Attachment

Re: split TOAST support out of postgres.h

From
Isaac Morland
Date:
On Wed, 28 Dec 2022 at 08:07, Peter Eisentraut <peter.eisentraut@enterprisedb.com> wrote:
Most backend code doesn't actually need the variable-length data types
support (TOAST support) in postgres.h.  So I figured we could try to put
it into a separate header file.  That makes postgres.h more manageable,
and it avoids including a bunch of complicated unused stuff everywhere.
I picked "varatt.h" as the name.  Then we could either
[…] 
I went with the last option in my patch.

Thoughts?

This is a bit of a bikeshed suggestion, but I'm wondering if you considered calling it toast.h? Only because the word is so distinctive within Postgres; everybody knows exactly to what it refers.

I definitely agree with the principle of organizing and splitting up the header files. Personally, I don't mind importing a bunch of headers if I'm using a bunch of subsystems so I would be OK with needing to import this new header if I need it.

Re: split TOAST support out of postgres.h

From
Tom Lane
Date:
Peter Eisentraut <peter.eisentraut@enterprisedb.com> writes:
> ... Then we could either

> 1) Include varatt.h in postgres.h, similar to elog.h and palloc.h.  That 
> way we clean up the files a bit but don't change any external interfaces.

> 2) Just let everyone who needs it include the new file.

> 3) Compromise: You can avoid most "damage" by having fmgr.h include 
> varatt.h.  That satisfies most data types and extension code.  That way, 
> there are only a few places that need an explicit include of varatt.h.

> I went with the last option in my patch.

I dunno, #3 seems kind of unprincipled.  Also, since fmgr.h is included
so widely, I doubt it is buying very much in terms of reducing header
footprint.  How bad is it to do #2?

            regards, tom lane



Re: split TOAST support out of postgres.h

From
Nikita Malakhov
Date:
Hi,

I've thought about this while working on Pluggable TOAST and 64-bit
TOAST value ID myself. Agree with #2 to seem the best of the above.
There are not so many places where a new header should be included.

On Wed, Dec 28, 2022 at 6:08 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
Peter Eisentraut <peter.eisentraut@enterprisedb.com> writes:
> ... Then we could either

> 1) Include varatt.h in postgres.h, similar to elog.h and palloc.h.  That
> way we clean up the files a bit but don't change any external interfaces.

> 2) Just let everyone who needs it include the new file.

> 3) Compromise: You can avoid most "damage" by having fmgr.h include
> varatt.h.  That satisfies most data types and extension code.  That way,
> there are only a few places that need an explicit include of varatt.h.

> I went with the last option in my patch.

I dunno, #3 seems kind of unprincipled.  Also, since fmgr.h is included
so widely, I doubt it is buying very much in terms of reducing header
footprint.  How bad is it to do #2?

                        regards, tom lane




--
Regards,
Nikita Malakhov
Postgres Professional 

Re: split TOAST support out of postgres.h

From
Andres Freund
Date:
Hi,

On 2022-12-28 09:07:12 -0500, Isaac Morland wrote:
> This is a bit of a bikeshed suggestion, but I'm wondering if you considered
> calling it toast.h? Only because the word is so distinctive within
> Postgres; everybody knows exactly to what it refers.

We have a bunch of toast*.h files already. The new header should pretty much
only contain the types, given how widely the header is going to be
included. So maybe toast_type.h?

Greetings,

Andres Freund



Re: split TOAST support out of postgres.h

From
Matthias van de Meent
Date:
On Thu, 29 Dec 2022 at 18:16, Andres Freund <andres@anarazel.de> wrote:
>
> Hi,
>
> On 2022-12-28 09:07:12 -0500, Isaac Morland wrote:
> > This is a bit of a bikeshed suggestion, but I'm wondering if you considered
> > calling it toast.h? Only because the word is so distinctive within
> > Postgres; everybody knows exactly to what it refers.
>
> We have a bunch of toast*.h files already. The new header should pretty much
> only contain the types, given how widely the header is going to be
> included. So maybe toast_type.h?

My 2 cents: I don't think that toast_anything.h is appropriate,
because even though the varatt infrastructure does enable
externally-stored oversized attributes (which is the essence of
TOAST), this is not the only (or primary) use of the type.

Example: Indexes do not (can not?) support toasted values, but
generally do support variable length attributes that would be pulled
in with varatt.h. I don't see why we'd call the headers of
variable-length attributes after one small - but not insignifcant -
use case.

Kind regards,

Matthias van de Meent



Re: split TOAST support out of postgres.h

From
Tom Lane
Date:
Matthias van de Meent <boekewurm+postgres@gmail.com> writes:
> On Thu, 29 Dec 2022 at 18:16, Andres Freund <andres@anarazel.de> wrote:
>> We have a bunch of toast*.h files already. The new header should pretty much
>> only contain the types, given how widely the header is going to be
>> included. So maybe toast_type.h?

> My 2 cents: I don't think that toast_anything.h is appropriate,
> because even though the varatt infrastructure does enable
> externally-stored oversized attributes (which is the essence of
> TOAST), this is not the only (or primary) use of the type.

+1 ... varatt.h sounded fine to me.  I'd suggest varlena.h except
we have one already.

            regards, tom lane



Re: split TOAST support out of postgres.h

From
Peter Eisentraut
Date:
On 28.12.22 16:07, Tom Lane wrote:
> Peter Eisentraut <peter.eisentraut@enterprisedb.com> writes:
>> ... Then we could either
> 
>> 1) Include varatt.h in postgres.h, similar to elog.h and palloc.h.  That
>> way we clean up the files a bit but don't change any external interfaces.
> 
>> 2) Just let everyone who needs it include the new file.
> 
>> 3) Compromise: You can avoid most "damage" by having fmgr.h include
>> varatt.h.  That satisfies most data types and extension code.  That way,
>> there are only a few places that need an explicit include of varatt.h.
> 
>> I went with the last option in my patch.
> 
> I dunno, #3 seems kind of unprincipled.  Also, since fmgr.h is included
> so widely, I doubt it is buying very much in terms of reducing header
> footprint.  How bad is it to do #2?

See this incremental patch set.

It seems like maybe there is some intermediate abstraction that a lot of 
these places should be using that we haven't thought of yet.
Attachment

Re: split TOAST support out of postgres.h

From
Tom Lane
Date:
Peter Eisentraut <peter.eisentraut@enterprisedb.com> writes:
> On 28.12.22 16:07, Tom Lane wrote:
>> I dunno, #3 seems kind of unprincipled.  Also, since fmgr.h is included
>> so widely, I doubt it is buying very much in terms of reducing header
>> footprint.  How bad is it to do #2?

> See this incremental patch set.

Wow, 41 files requiring varatt.h is a lot fewer than I would have guessed.
I think that bears out my feeling that fmgr.h wasn't a great location:
I count 117 #includes of that, many of which are in .h files themselves
so that many more .c files would be required to read them.

(You did check that this passes cpluspluscheck/headerscheck, right?)

> It seems like maybe there is some intermediate abstraction that a lot of 
> these places should be using that we haven't thought of yet.

Hmm.  Perhaps, but I think I'm content with this version of the patch.

            regards, tom lane



Re: split TOAST support out of postgres.h

From
Andrew Dunstan
Date:
On 2022-12-30 Fr 11:50, Tom Lane wrote:
> Peter Eisentraut <peter.eisentraut@enterprisedb.com> writes:
>> On 28.12.22 16:07, Tom Lane wrote:
>>> I dunno, #3 seems kind of unprincipled.  Also, since fmgr.h is included
>>> so widely, I doubt it is buying very much in terms of reducing header
>>> footprint.  How bad is it to do #2?
>> See this incremental patch set.
> Wow, 41 files requiring varatt.h is a lot fewer than I would have guessed.
> I think that bears out my feeling that fmgr.h wasn't a great location:
> I count 117 #includes of that, many of which are in .h files themselves
> so that many more .c files would be required to read them.
>
> (You did check that this passes cpluspluscheck/headerscheck, right?)
>
>> It seems like maybe there is some intermediate abstraction that a lot of 
>> these places should be using that we haven't thought of yet.
> Hmm.  Perhaps, but I think I'm content with this version of the patch.


Looked good to me too.


cheers


andrew

--
Andrew Dunstan
EDB: https://www.enterprisedb.com




Re: split TOAST support out of postgres.h

From
Peter Eisentraut
Date:
On 30.12.22 17:50, Tom Lane wrote:
> Peter Eisentraut <peter.eisentraut@enterprisedb.com> writes:
>> On 28.12.22 16:07, Tom Lane wrote:
>>> I dunno, #3 seems kind of unprincipled.  Also, since fmgr.h is included
>>> so widely, I doubt it is buying very much in terms of reducing header
>>> footprint.  How bad is it to do #2?
> 
>> See this incremental patch set.
> 
> Wow, 41 files requiring varatt.h is a lot fewer than I would have guessed.
> I think that bears out my feeling that fmgr.h wasn't a great location:
> I count 117 #includes of that, many of which are in .h files themselves
> so that many more .c files would be required to read them.

committed




Re: split TOAST support out of postgres.h

From
Noah Misch
Date:
On Tue, Jan 10, 2023 at 06:07:49AM +0100, Peter Eisentraut wrote:
> On 30.12.22 17:50, Tom Lane wrote:
> >Peter Eisentraut <peter.eisentraut@enterprisedb.com> writes:
> >>On 28.12.22 16:07, Tom Lane wrote:
> >>>I dunno, #3 seems kind of unprincipled.  Also, since fmgr.h is included
> >>>so widely, I doubt it is buying very much in terms of reducing header
> >>>footprint.  How bad is it to do #2?
> >
> >>See this incremental patch set.
> >
> >Wow, 41 files requiring varatt.h is a lot fewer than I would have guessed.
> >I think that bears out my feeling that fmgr.h wasn't a great location:
> >I count 117 #includes of that, many of which are in .h files themselves
> >so that many more .c files would be required to read them.
> 
> committed

SET_VARSIZE alone appears in 74 pgxn distributions, so I predict extension
breakage en masse.  I would revert this.



Re: split TOAST support out of postgres.h

From
Peter Eisentraut
Date:
On 10.01.23 08:39, Noah Misch wrote:
> On Tue, Jan 10, 2023 at 06:07:49AM +0100, Peter Eisentraut wrote:
>> On 30.12.22 17:50, Tom Lane wrote:
>>> Peter Eisentraut <peter.eisentraut@enterprisedb.com> writes:
>>>> On 28.12.22 16:07, Tom Lane wrote:
>>>>> I dunno, #3 seems kind of unprincipled.  Also, since fmgr.h is included
>>>>> so widely, I doubt it is buying very much in terms of reducing header
>>>>> footprint.  How bad is it to do #2?
>>>
>>>> See this incremental patch set.
>>>
>>> Wow, 41 files requiring varatt.h is a lot fewer than I would have guessed.
>>> I think that bears out my feeling that fmgr.h wasn't a great location:
>>> I count 117 #includes of that, many of which are in .h files themselves
>>> so that many more .c files would be required to read them.
>>
>> committed
> 
> SET_VARSIZE alone appears in 74 pgxn distributions, so I predict extension
> breakage en masse.  I would revert this.

Well, that was sort of my thinking, but people seemed to like this.  I'm 
happy to consider alternatives.




Re: split TOAST support out of postgres.h

From
Robert Haas
Date:
On Tue, Jan 10, 2023 at 3:48 AM Peter Eisentraut
<peter.eisentraut@enterprisedb.com> wrote:
> >>> Wow, 41 files requiring varatt.h is a lot fewer than I would have guessed.
> >>> I think that bears out my feeling that fmgr.h wasn't a great location:
> >>> I count 117 #includes of that, many of which are in .h files themselves
> >>> so that many more .c files would be required to read them.
> >>
> >> committed
> >
> > SET_VARSIZE alone appears in 74 pgxn distributions, so I predict extension
> > breakage en masse.  I would revert this.
>
> Well, that was sort of my thinking, but people seemed to like this.  I'm
> happy to consider alternatives.

I don't think that the number of extensions that get broken is really
the right metric. It's not fantastic to break large numbers of
extensions, of course, but if the solution is merely to add an #if
PG_VERSION_NUM >= whatever #include "newstuff" #endif then I don't
think it's really an issue. If an extension doesn't have an author who
can do at least that much updating when a new PG release comes out,
then it's basically unmaintained, and I just don't feel that bad about
breaking unmaintained extensions now and then, even annually.

Of course, if we go and remove something that's very widely used and
for which there's no simple workaround, that sucks. Say, removing
LWLocks entirely. But we don't usually do that sort of thing unless
there's a good reason and significant benefits.

I don't think it would be very nice to do something like this in a
minor release. But in a new major release, I think it's fine. I've
been on the hook to maintain extensions in the face of these kinds of
changes at various times over the years, and it's never taken me much
time.

-- 
Robert Haas
EDB: http://www.enterprisedb.com



Re: split TOAST support out of postgres.h

From
Tom Lane
Date:
Robert Haas <robertmhaas@gmail.com> writes:
> On Tue, Jan 10, 2023 at 3:48 AM Peter Eisentraut
> <peter.eisentraut@enterprisedb.com> wrote:
>>> SET_VARSIZE alone appears in 74 pgxn distributions, so I predict extension
>>> breakage en masse.  I would revert this.

>> Well, that was sort of my thinking, but people seemed to like this.  I'm
>> happy to consider alternatives.

> I don't think it would be very nice to do something like this in a
> minor release. But in a new major release, I think it's fine. I've
> been on the hook to maintain extensions in the face of these kinds of
> changes at various times over the years, and it's never taken me much
> time.

Yeah, that was my thinking.  We could never do any header refactoring
at all if the standard is "will some extension author need to add a #if".
In practice, we make bigger adjustments than this all the time,
both in header layout and in individual function APIs.

Now, there is a fair question whether splitting this code out of
postgres.h is worth any trouble at all.  TBH my initial reaction
had been "no".  But once we found that only 40-ish backend files
need to read this new header, I became a "yes" vote because it
seems clear that there will be a total-compilation-time benefit.

            regards, tom lane



Re: split TOAST support out of postgres.h

From
Robert Haas
Date:
On Tue, Jan 10, 2023 at 9:46 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Now, there is a fair question whether splitting this code out of
> postgres.h is worth any trouble at all.  TBH my initial reaction
> had been "no".  But once we found that only 40-ish backend files
> need to read this new header, I became a "yes" vote because it
> seems clear that there will be a total-compilation-time benefit.

I wasn't totally about this, either, but I think on balance it's
probably a smart thing to do. I always found it kind of weird that we
put that stuff in postgres.h. It seems to privilege the TOAST
mechanism to an undue degree; what's the argument, for example, that
TOAST macros are more generally relevant than CHECK_FOR_INTERRUPTS()
or LWLockAcquire or HeapTuple? It felt to me like we'd just decided
that one subsystem gets to go into the main header file and everybody
else just had to have their own headers.

One thing that's particularly awkward about that is that if you want
to write some front-end code that knows about how varlenas are stored
on disk, it was very awkward with the old structure. You're not
supposed to include "postgres.h" into frontend code, but if the stuff
you need is defined there then what else can you do? So I generally
think that anything that's in postgres.h should have a strong claim to
be not only widely-needed in the backend, but also never needed at all
in any other executable.

-- 
Robert Haas
EDB: http://www.enterprisedb.com



Re: split TOAST support out of postgres.h

From
Noah Misch
Date:
On Tue, Jan 10, 2023 at 12:00:46PM -0500, Robert Haas wrote:
> On Tue, Jan 10, 2023 at 9:46 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> > Now, there is a fair question whether splitting this code out of
> > postgres.h is worth any trouble at all.  TBH my initial reaction
> > had been "no".  But once we found that only 40-ish backend files
> > need to read this new header, I became a "yes" vote because it
> > seems clear that there will be a total-compilation-time benefit.

A time claim with no benchmarks is a red flag.  I've chosen to run one:

export CCACHE_DISABLE=1
change=d952373a987bad331c0e499463159dd142ced1ef
for commit in $change $change^; do
    echo === git checkout $commit
    git checkout $commit
    for n in `seq 1 200`; do make -j20 clean; env time make -j20; done
done

Results:

commit                                  median mean count
d952373a987bad331c0e499463159dd142ced1ef 49.35 49.37 200
d952373a987bad331c0e499463159dd142ced1ef^ 49.33 49.36 200

That is to say, the patch made the build a bit slower, not faster.  That's
with GCC 4.8.5 (RHEL 7).  I likely should have interleaved the run types, but
in any case the speed win didn't show up.

> I wasn't totally about this, either, but I think on balance it's
> probably a smart thing to do. I always found it kind of weird that we
> put that stuff in postgres.h. It seems to privilege the TOAST
> mechanism to an undue degree; what's the argument, for example, that
> TOAST macros are more generally relevant than CHECK_FOR_INTERRUPTS()
> or LWLockAcquire or HeapTuple? It felt to me like we'd just decided
> that one subsystem gets to go into the main header file and everybody
> else just had to have their own headers.
> 
> One thing that's particularly awkward about that is that if you want
> to write some front-end code that knows about how varlenas are stored
> on disk, it was very awkward with the old structure. You're not
> supposed to include "postgres.h" into frontend code, but if the stuff
> you need is defined there then what else can you do? So I generally
> think that anything that's in postgres.h should have a strong claim to
> be not only widely-needed in the backend, but also never needed at all
> in any other executable.

If the patch had just made postgres.h include varatt.h, like it does elog.h,
I'd consider that change a nonnegative.  Grouping things is nice, even if it
makes compilation a bit slower.  That also covers your frontend use case.  How
about that?

I agree fixing any one extension is easy enough.  Thinking back to the
htup_details.h refactor, I found the aggregate pain unreasonable relative to
alleged benefits, even though each individual extension wasn't too bad.
SET_VARSIZE is used more (74 pgxn distributions) than htup_details.h (62).



Re: split TOAST support out of postgres.h

From
Robert Haas
Date:
On Wed, Jan 11, 2023 at 1:14 AM Noah Misch <noah@leadboat.com> wrote:
> If the patch had just made postgres.h include varatt.h, like it does elog.h,
> I'd consider that change a nonnegative.  Grouping things is nice, even if it
> makes compilation a bit slower.  That also covers your frontend use case.  How
> about that?

I'm not direly opposed to that, but I'm also unconvinced that having
the varatt.h stuff is important enough relative to other things to
justify giving it a privileged place forever.

> I agree fixing any one extension is easy enough.  Thinking back to the
> htup_details.h refactor, I found the aggregate pain unreasonable relative to
> alleged benefits, even though each individual extension wasn't too bad.
> SET_VARSIZE is used more (74 pgxn distributions) than htup_details.h (62).

What annoyed me about that refactoring is that, in most cases where I
had been including htup.h, I had to change it to include
htup_details.h. In the main source tree, too, we've got 31 places that
include access/htup.h and 241 that include access/htup_details.h. It's
hard to argue that it was worth splitting the file given those
numbers, and in fact I think that change was a mistake. But that
doesn't mean every such change is a mistake.

-- 
Robert Haas
EDB: http://www.enterprisedb.com



Re: split TOAST support out of postgres.h

From
Peter Eisentraut
Date:
On 10.01.23 09:48, Peter Eisentraut wrote:
> On 10.01.23 08:39, Noah Misch wrote:
>> On Tue, Jan 10, 2023 at 06:07:49AM +0100, Peter Eisentraut wrote:
>>> On 30.12.22 17:50, Tom Lane wrote:
>>>> Peter Eisentraut <peter.eisentraut@enterprisedb.com> writes:
>>>>> On 28.12.22 16:07, Tom Lane wrote:
>>>>>> I dunno, #3 seems kind of unprincipled.  Also, since fmgr.h is 
>>>>>> included
>>>>>> so widely, I doubt it is buying very much in terms of reducing header
>>>>>> footprint.  How bad is it to do #2?
>>>>
>>>>> See this incremental patch set.
>>>>
>>>> Wow, 41 files requiring varatt.h is a lot fewer than I would have 
>>>> guessed.
>>>> I think that bears out my feeling that fmgr.h wasn't a great location:
>>>> I count 117 #includes of that, many of which are in .h files themselves
>>>> so that many more .c files would be required to read them.
>>>
>>> committed
>>
>> SET_VARSIZE alone appears in 74 pgxn distributions, so I predict 
>> extension
>> breakage en masse.  I would revert this.
> 
> Well, that was sort of my thinking, but people seemed to like this.  I'm 
> happy to consider alternatives.

Given the subsequent discussion, I'll keep it as is for now but consider 
it a semi-open item.  It's easy to change.