Re: WIP: Generic functions for Node types using generated metadata - Mailing list pgsql-hackers

From Fabien COELHO
Subject Re: WIP: Generic functions for Node types using generated metadata
Date
Msg-id alpine.DEB.2.21.1908301414100.28828@lancre
Whole thread Raw
In response to WIP: Generic functions for Node types using generated metadata  (Andres Freund <andres@anarazel.de>)
Responses Re: WIP: Generic functions for Node types using generated metadata
List pgsql-hackers
Hello Andres,

Just my 0.02 €:

> There's been a lot of complaints over the years about how annoying it is
> to keep the out/read/copy/equalfuncs.c functions in sync with the actual
> underlying structs.
>
> There've been various calls for automating their generation, but no
> actual patches that I am aware of.

I started something a while back, AFAICR after spending stupid time 
looking for a stupid missing field copy or whatever. I wrote a (simple) 
perl script deriving all (most) node utility functions for the header 
files.

I gave up as the idea did not gather much momentum from committers, so I 
assumed the effort would be rejected in the end. AFAICR the feedback 
spirit was something like "node definition do not change often, we can 
manage it by hand".

> There also recently has been discussion about generating more efficient
> memory layout for node trees that we know are read only (e.g. plan trees
> inside the plancache), and about copying such trees more efficiently
> (e.g. by having one big allocation, and then just adjusting pointers).

If pointers are relative to the start, it could be just indexes that do 
not need much adjusting.

> One way to approach this problem would be to to parse the type 
> definitions, and directly generate code for the various functions. But 
> that does mean that such a code-generator needs to be expanded for each 
> such functions.

No big deal for the effort I made. The issue was more dealing with 
exceptions (eg "we do not serialize this field because it is not used for 
some reason") and understanding some implicit assumptions in the struct 
declarations.

> An alternative approach is to have a parser of the node definitions that
> doesn't generate code directly, but instead generates metadata. And then
> use that metadata to write node aware functions.  This seems more
> promising to me.

Hmmm. The approach we had in an (old) research project was to write the 
meta data, and derive all struct & utility functions from these. It is 
simpler this way because you save parsing some C, and it can be made 
language agnostic (i.e. serializing the data structure from a language and 
reading its value from another).

> I'm fairly sure this metadata can also be used to write the other
> currently existing node functions.

Beware of strange exceptions…

> With regards to using libclang for the parsing: I chose that because it
> seemed the easiest to experiment with, compared to annotating all the
> structs with enough metadata to be able to easily parse them from a perl
> script.

I did not find this an issue when I tried, because the annotation needed 
is basically the type name of the field.

> The node definitions are after all distributed over quite a few headers.

Yep.

> I think it might even be the correct way forward, over inventing our own
> mini-languages and writing ad-hoc parsers for those. It sure is easier
> to understand plain C code, compared to having to understand various
> embeded mini-languages consisting out of macros.

Dunno.

> The obvious drawback is that it'd require more people to install 
> libclang - a significant imposition.

Indeed. A perl-only dependence would be much simpler that relying on a 
particular library from a particular compiler to compile postgres, 
possibly with an unrelated compiler.

> Alternatively we could annotate the code enough to be able to write our
> own parser, or use some other C parser.

If you can dictate some conventions, eg one line/one field, simple perl 
regexpr would work well I think, you would not need a parser per se.

> I don't really want to invest significantly more time into this without
> first debating the general idea.

That what I did, and I quitted quickly:-)

On the general idea, I'm 100% convinced that stupid utility functions 
should be either generic or generated, not maintained by hand.

-- 
Fabien.

pgsql-hackers by date:

Previous
From: Andrey Borodin
Date:
Subject: Re: Yet another fast GiST build
Next
From: Rajkumar Raghuwanshi
Date:
Subject: Re: block-level incremental backup