Thread: Should we make Bitmapsets a kind of Node?

Should we make Bitmapsets a kind of Node?

From
Tom Lane
Date:
I got annoyed (not for the first time) by the fact that the
partitioned_rels field of AppendPath and MergeAppendPath is a list of
Relids, i.e., Bitmapsets.  This is problematic because a Bitmapset is
not a type of Node, and thus a List of them is really an invalid data
structure.  The main practical consequence is that pprint() fails to
print these path types accurately, which is an issue for debugging.

We've had some related problems before, so I'm wondering if it's time
to bite the bullet and turn Bitmapsets into legal Nodes.  We'd have
to add a nodetag field to them, which is free on 64-bit machines due
to alignment considerations, but would increase the size of most
Bitmapsets on 32-bit machines.  OTOH, I do not think we're optimizing
for 32-bit machines anymore.

Another issue is that the outfuncs/readfuncs print representation
currently looks like "(b 1 2 ...)" which isn't a normal
representation for a Node.  I'd be inclined to try to preserve that
representation, because I think we'd have to special-case Bitmapsets
anyway given their variable number of unnamed entries.  But I've not
tried to actually code anything for it.

Thoughts?

            regards, tom lane



Re: Should we make Bitmapsets a kind of Node?

From
Peter Geoghegan
Date:
On Fri, Jan 29, 2021 at 12:12 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> I got annoyed (not for the first time) by the fact that the
> partitioned_rels field of AppendPath and MergeAppendPath is a list of
> Relids, i.e., Bitmapsets.  This is problematic because a Bitmapset is
> not a type of Node, and thus a List of them is really an invalid data
> structure.  The main practical consequence is that pprint() fails to
> print these path types accurately, which is an issue for debugging.

So we don't actually require T_List-type Lists to only contain entries
of type Node already? ISTM that T_List-type Lists cannot *mostly* be a
Node that consists of a collection of linked Nodes. It has to be
all-or-nothing. The "Node-ness" of a List should never be situational
or implicit -- allowing that seems like a recipe for disaster. This
kind of "code reuse" is not a virtue at all.

If tightening things up here turns out to be a problem someplace, then
I'm okay with that code using some other solution. That could mean
expanding the definition of a Node in some way that was not originally
considered (when it nevertheless makes sense), or it could mean using
some other data structure instead.

Might be good to Assert() that this rule is followed in certain key
list.c functions.

> We've had some related problems before, so I'm wondering if it's time
> to bite the bullet and turn Bitmapsets into legal Nodes.  We'd have
> to add a nodetag field to them, which is free on 64-bit machines due
> to alignment considerations, but would increase the size of most
> Bitmapsets on 32-bit machines.  OTOH, I do not think we're optimizing
> for 32-bit machines anymore.

+1 from me.

I'm prepared to say that 32-bit performance shouldn't be a concern
these days, except perhaps with really significant regressions. And
even then, only when there is no clear upside. If anybody really does
run Postgres 14 on a 32-bit platform, they should be much more
concerned about bugs that slip in because nobody owns hardware like
that anymore. It's probably much riskier to use 32-bit x86 today than
it is to use (say) POWER8, or some other contemporary minority
platform.

-- 
Peter Geoghegan



Re: Should we make Bitmapsets a kind of Node?

From
Tom Lane
Date:
Peter Geoghegan <pg@bowt.ie> writes:
> On Fri, Jan 29, 2021 at 12:12 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> I got annoyed (not for the first time) by the fact that the
>> partitioned_rels field of AppendPath and MergeAppendPath is a list of
>> Relids, i.e., Bitmapsets.  This is problematic because a Bitmapset is
>> not a type of Node, and thus a List of them is really an invalid data
>> structure.  The main practical consequence is that pprint() fails to
>> print these path types accurately, which is an issue for debugging.

> So we don't actually require T_List-type Lists to only contain entries
> of type Node already?

I seem to recall that there are some places that use Lists to store
plain "char *" strings (not wrapped in T_String), and there are
definitely places that use lists of non-Node structs.  That's a kluge,
but I don't really object to it in narrowly-scoped data structures.
I think it's a good bit south of acceptable in anything declared in
include/nodes/*.h, though.  Publicly visible Node types ought to be
fully manipulable by the standard backend/nodes/ functions.

> ISTM that T_List-type Lists cannot *mostly* be a
> Node that consists of a collection of linked Nodes. It has to be
> all-or-nothing.

Right.  Any situation where you have a List of things that aren't
Nodes has to be one where you know a-priori that everything in this
List is a $whatever.  If the List is only used within a small bit
of code, that's fine, and adding the overhead to make the contents
be real Nodes wouldn't be worth the trouble.

> It's probably much riskier to use 32-bit x86 today than
> it is to use (say) POWER8, or some other contemporary minority
> platform.

We do still have x86 in the buildfarm, as well as some other
32-bit platforms, so I don't agree that it's that much less
tested than non-mainstream 64-bit platforms.  But I do agree
it's not our main development focus anymore, and shouldn't be.

            regards, tom lane



Re: Should we make Bitmapsets a kind of Node?

From
Peter Geoghegan
Date:
On Fri, Jan 29, 2021 at 4:01 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> > It's probably much riskier to use 32-bit x86 today than
> > it is to use (say) POWER8, or some other contemporary minority
> > platform.
>
> We do still have x86 in the buildfarm, as well as some other
> 32-bit platforms, so I don't agree that it's that much less
> tested than non-mainstream 64-bit platforms.  But I do agree
> it's not our main development focus anymore, and shouldn't be.

I was arguing that it's much less tested *in effect*. It seems like
the trend is very much in the direction of less and less ISA level
differentiation.

Consider (just to pick one example) the rationale behind the RISC-V initiative:

https://en.wikipedia.org/wiki/RISC-V#Rationale

In many ways my x86-64 Macbook is closer to the newer M1 Macbook than
it is to some old 32-bit x86 machine. I suspect that this matters. I
am speculating here, of course -- I have to because there is no
guidance to work off of. I don't know anybody that still runs Postgres
(or anything like it) on a 32-bit platform. I think that Michael
Paquier owns a Raspberry Pi zero, but that hardly counts.

-- 
Peter Geoghegan



Re: Should we make Bitmapsets a kind of Node?

From
Tom Lane
Date:
Peter Geoghegan <pg@bowt.ie> writes:
> On Fri, Jan 29, 2021 at 4:01 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> We do still have x86 in the buildfarm, as well as some other
>> 32-bit platforms, so I don't agree that it's that much less
>> tested than non-mainstream 64-bit platforms.

> I don't know anybody that still runs Postgres
> (or anything like it) on a 32-bit platform. I think that Michael
> Paquier owns a Raspberry Pi zero, but that hardly counts.

Hmph ... three of my five buildfarm animals are 32-bit, plus I
have got 32-bit OSes for my Raspberry Pi ;-).  Admittedly, none
of those represent hardware someone would put a serious database
on today.  But in terms of testing diversity, I think they're
a lot more credible than thirty-one flavors of Linux on x86_64.

            regards, tom lane



Re: Should we make Bitmapsets a kind of Node?

From
Michael Paquier
Date:
On Fri, Jan 29, 2021 at 08:53:49PM -0500, Tom Lane wrote:
> Peter Geoghegan <pg@bowt.ie> writes:
>> I don't know anybody that still runs Postgres
>> (or anything like it) on a 32-bit platform. I think that Michael
>> Paquier owns a Raspberry Pi zero, but that hardly counts.

hamster died a couple of years ago, it was a RPI1 and I have not
bought one after.  RIP to it.  I still have dangomushi, a RPI2, based
on armv7l and that's 32-bit.  Heikki has chipmunk, which is a RPI1
last time we discussed about that.  The good thing about those
machines is that they are low-energy consumers, and silent.  So it is
easy to forget about them and just let them be.

> Hmph ... three of my five buildfarm animals are 32-bit, plus I
> have got 32-bit OSes for my Raspberry Pi ;-).  Admittedly, none
> of those represent hardware someone would put a serious database
> on today.  But in terms of testing diversity, I think they're
> a lot more credible than thirty-one flavors of Linux on x86_64.

Those 32-bit modules are still being sold actively by the RPI
foundation, and used as cheap machines for education purposes, so I
think that it is still useful for Postgres to have active buildfarm
members for 32-bit architectures.
--
Michael

Attachment

Re: Should we make Bitmapsets a kind of Node?

From
Peter Geoghegan
Date:
On Fri, Jan 29, 2021 at 5:53 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Hmph ... three of my five buildfarm animals are 32-bit, plus I
> have got 32-bit OSes for my Raspberry Pi ;-).  Admittedly, none
> of those represent hardware someone would put a serious database
> on today.  But in terms of testing diversity, I think they're
> a lot more credible than thirty-one flavors of Linux on x86_64.

Fair enough.

To be clear I meant testing in the deepest and most general sense --
not simply running the tests. If you happen to be using approximately
the same platform as most Postgres hackers, it's reasonable to expect
to run into fewer bugs tied to portability issues. Regardless of
whether or not the minority platform you were considering has
theoretical testing parity.

Broad trends have made it easier to write portable C code, but that
doesn't apply to 32-bit machines, I imagine. Including even the
extremely low power 32-bit chips that are not yet fully obsolete, like
the Raspberry Pi Zero's chip.

-- 
Peter Geoghegan



Re: Should we make Bitmapsets a kind of Node?

From
Peter Geoghegan
Date:
On Fri, Jan 29, 2021 at 6:33 PM Michael Paquier <michael@paquier.xyz> wrote:
> Those 32-bit modules are still being sold actively by the RPI
> foundation, and used as cheap machines for education purposes, so I
> think that it is still useful for Postgres to have active buildfarm
> members for 32-bit architectures.

But I'm not arguing against that. I'm merely arguing that it is okay
to regress 32-bit platforms (within reason) in order to make them more
like 64-bit platforms. This makes them less prone to subtle
portability bugs that the regression tests won't catch, so even 32-bit
Postgres may well come out ahead, in a certain sense.

-- 
Peter Geoghegan



Re: Should we make Bitmapsets a kind of Node?

From
Tom Lane
Date:
Peter Geoghegan <pg@bowt.ie> writes:
> Broad trends have made it easier to write portable C code, but that
> doesn't apply to 32-bit machines, I imagine. Including even the
> extremely low power 32-bit chips that are not yet fully obsolete, like
> the Raspberry Pi Zero's chip.

Meh.  To my mind, the most interesting aspects of different hardware
platforms for our purposes are

* alignment sensitivity (particularly, is unaligned access expensive);
* spinlock support, and after that various other atomic instructions;
* endianness

Pointer width is interesting, but really it's a solved problem
compared to these.

            regards, tom lane



Re: Should we make Bitmapsets a kind of Node?

From
Peter Geoghegan
Date:
On Fri, Jan 29, 2021 at 6:44 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Pointer width is interesting, but really it's a solved problem
> compared to these.

What about USE_FLOAT8_BYVAL?

-- 
Peter Geoghegan



Re: Should we make Bitmapsets a kind of Node?

From
Tom Lane
Date:
Peter Geoghegan <pg@bowt.ie> writes:
> On Fri, Jan 29, 2021 at 6:44 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> Pointer width is interesting, but really it's a solved problem
>> compared to these.

> What about USE_FLOAT8_BYVAL?

That's an annoyance, sure, but I don't recall many recent bugs
related to violations of that coding rule.  As long as you don't
violate the Datum abstraction it's pretty safe.

            regards, tom lane



Re: Should we make Bitmapsets a kind of Node?

From
Tom Lane
Date:
Now that commit f003a7522 did away with the partitioned_rels fields,
my original motivation for doing $SUBJECT is gone.  It might still be
worth doing, but I'm not planning to tackle it right now.

            regards, tom lane



Re: Should we make Bitmapsets a kind of Node?

From
David Rowley
Date:
On Tue, 2 Feb 2021 at 09:23, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>
> Now that commit f003a7522 did away with the partitioned_rels fields,
> my original motivation for doing $SUBJECT is gone.  It might still be
> worth doing, but I'm not planning to tackle it right now.

I'm not sure if the misuse of Lists to store non-Node types should be
all that surprising. lappend() accepts a void pointer rather than a
Node *. I also didn't catch anything that indicates storing non-Node
types is bad practise.

Maybe it's worth still adding something to some comments in list.c to
try and reduce the chances of someone making this mistake again in the
future?

David



Re: Should we make Bitmapsets a kind of Node?

From
Tom Lane
Date:
David Rowley <dgrowleyml@gmail.com> writes:
> On Tue, 2 Feb 2021 at 09:23, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> Now that commit f003a7522 did away with the partitioned_rels fields,
>> my original motivation for doing $SUBJECT is gone.  It might still be
>> worth doing, but I'm not planning to tackle it right now.

> I'm not sure if the misuse of Lists to store non-Node types should be
> all that surprising.

Well, as I tried to clarify upthread, it's only a problem if the list
is a subfield of a recognized Node type.  Random private data structures
can and do contain lists of $whatever.  But if you put something in a
Node type then you'd better be prepared to teach backend/nodes/*.c about
it.

            regards, tom lane