Re: Thoughts on using Text::Template for our autogenerated code? - Mailing list pgsql-hackers

From Corey Huinker
Subject Re: Thoughts on using Text::Template for our autogenerated code?
Date
Msg-id CADkLM=dwGKCgicsyTPBjtgtRMcgqAE+V4zgLXTAVLs8fe1oVJA@mail.gmail.com
Whole thread Raw
In response to Re: Thoughts on using Text::Template for our autogenerated code?  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-hackers
Yeah, it's somewhat hard to believe that the cost/benefit ratio would be
attractive.  But maybe you could mock up some examples of what the input
could look like, and get people on board (or not) before writing any
code.


tl;dr - I tried a few things, nothing that persuades myself let alone the community, but perhaps some ideas for the future.

I borrowed Bertrand's ongoing work for waiteventnames.* because that is what got me thinking about this in the first place. I considered a few different templating libraries:

There is no perl implementation of the golang template library (example of that here: https://blog.gopheracademy.com/advent-2017/using-go-templates/ ) that I could find.

Text::Template does not support loops, and as such it is no better than here-docs.

Template Toolkit seems to do what we need, but it has a kitchen sink of dependencies that make it an unattractive option, so I didn't even attempt it.

HTML::Template has looping and if/then/else constructs, and it is a single standalone library. It also does a "separation of concerns" wherein you pass in parameter names and values, and some parameters can be for loops, which means you pass an arrayref of hashrefs that the template engine loops over. That's where the advantages stop, however. It is fairly verbose, and because it is HTML-centric it isn't very good about controlling whitespace, which leads to piling template directives onto the same line in order to avoid spurious newlines. As such I cannot recommend it.

My ideal template library would have text something like this:

[% loop events %]
[% $enum_value %]
[% if __first__ +%] = [%+ $inital_value %][% endif %]
[% if ! __last__ %],[% endif +%]
[% end loop %]
[% loop xml_blocks indent: relative,spaces,4 %]
<row>
  <SomeElement attrib=[%attrib_val%]>[%element_body%]/>
</row>
[% end loop %]

[%+ means "leading whitespace matters", +%] means "trailing whitespace matters"
That pseudocode is a mix of ASP, HTML::Template. The special variables __first__ and __last__ refer to which iteration of the loop we are on. You would pass it a data structure like this:

{events: [ { enum_value: "abc", initial_value: "def"}, ... { enum_value: "wuv", initial_value: "xyz" } ],
 xml_block: [ {attrib_val: "one", element_body: "two"} ]
 }

I did one initial pass with just converting printf statements to here-docs, and the results were pretty unsatisfying. It wasn't really possible to "see" the output files take shape. 

My next attempt was to take the "separation of concerns" code from the HTML::Template version, constructing the nested data structure of resolved output values, and then iterating over that once per output file. This resulted in something cleaner, partly because we're only writing one file type at a time, partly because the interpolated variables have names much closer to their output meaning.

In doing this, it occurred to me that a lot of this effort is in getting the code to conform to our own style guide, at the cost of the generator code being less readable. What if we wrote the generator and formatted the code in a way that made sense for the generator, and then pgindented it. That's not the workflow right now, but perhaps it could be.

Conclusions:
- There is no "good enough" template engine that doesn't require big changes in dependencies.
- pgindent will not save you from a run-on sentence, like putting all of a typdef enum values on one line
- There is some clarity value in either separating input processing from the output processing, or making the input align more closely with the outputs
- Fiddling with indentation and spacing detracts from legibility no matter what method is used.
- here docs are basically ok but they necessarily confuse output indentation with code indentation. it is possible to de-indent them them with <<~ but that's a 5.25+ feature.
- Any of these principles can be applied at any time, with no overhaul required.
 

"sorted-" is the slightly modified version of Bertrand's code.
"eof-as-is-" is a direct conversion of the original but using here-docs.
"heredoc-fone-file-at-a-time-" first generates an output-friendly data structure
"needs-pgindent-" is what is possible if we format for our own readability and make pgindent fix the output, though it was not a perfect output match
Attachment

pgsql-hackers by date:

Previous
From: Jeff Davis
Date:
Subject: Re: running logical replication as the subscription owner
Next
From: Andres Freund
Date:
Subject: Re: Should vacuum process config file reload more often