On Tue, 8 Jun 2021 at 08:28, Peter Eisentraut
<peter.eisentraut@enterprisedb.com> wrote:
>
> I wrote a script to automatically generate the node support functions
> (copy, equal, out, and read, as well as the node tags enum) from the
> struct definitions.
Thanks for working on this. I agree that it would be nice to see
improvements in this area.
It's almost 2 years ago now, but I'm wondering if you saw what Andres
proposed in [1]? The idea was basically to make a metadata array of
the node structs so that, instead of having to output large amounts of
.c code to do read/write/copy/equals, instead just have small
functions that loop over the elements in the array for the given
struct and perform the required operation based on the type.
There were still quite a lot of unsolved problems, for example, how to
determine the length of arrays so that we know how many bytes to
compare in equal funcs. I had a quick look at what you've got and
see you've got a solution for that by looking at the last "int" field
before the array and using that. (I wonder if you'd be better to use
something more along the lines of your pg_node_attr() for that?)
There's quite a few advantages having the metadata array rather than
the current approach:
1. We don't need to compile 4 huge .c files and link them into the
postgres binary. I imagine this will make the binary a decent amount
smaller.
2. We can easily add more operations on nodes. e.g serialize nodes
for sending plans to parallel workers. or generating a hash value so
we can store node types in a hash table.
One disadvantage would be what Andres mentioned in [2]. He found
around a 5% performance regression. However, looking at the
NodeTypeComponents struct in [1], we might be able to speed it up
further by shrinking that struct down a bit and just storing an uint16
position into a giant char array which contains all of the field
names. I imagine they wouldn't take more than 64k. fieldtype could see
a similar change. That would take the NodeTypeComponents struct from
26 bytes down to 14 bytes, which means about double the number of
field metadata we could fit on a cache line.
Do you have any thoughts about that approach instead?
David
[1] https://www.postgresql.org/message-id/20190828234136.fk2ndqtld3onfrrp@alap3.anarazel.de
[2] https://www.postgresql.org/message-id/20190920051857.2fhnvhvx4qdddviz@alap3.anarazel.de