Re: Portal->commandTag as an enum - Mailing list pgsql-hackers
From | John Naylor |
---|---|
Subject | Re: Portal->commandTag as an enum |
Date | |
Msg-id | CACPNZCs1McaYxMrBXC39rMBfFerQYyaDFHL1-H=8JFUnRE7Rhw@mail.gmail.com Whole thread Raw |
In response to | Re: Portal->commandTag as an enum (Mark Dilger <mark.dilger@enterprisedb.com>) |
Responses |
Re: Portal->commandTag as an enum
|
List | pgsql-hackers |
Hi Mark, On Wed, Feb 19, 2020 at 10:40 AM Mark Dilger <mark.dilger@enterprisedb.com> wrote: > This would only make sense to me if the string held in $_ had already been checked for safety, but Catalog.pm does verylittle to verify that the string is safe to eval. The assumption here, so far as I can infer, is that we don’t embedanything dangerous in our .dat files, so this should be ok. That may be true for the moment, but I can imagine a daywhen we start embedding perl functions as quoted text inside a data file, or shell commands as text which look enoughlike perl for eval() to be able to execute them. Developers who edit these .dat files and mess up the quoting, andthen rerun ‘make’ to get the new .c and .h files generated, may not like the side effects. Perhaps I’m being overly paranoid…. The use case for that seems slim. However, at a brief glance your module seems more robust in other ways. > Rather than add more code generation logic based on the design of Catalog.pm, I wrote a perl based data file parser thatparses .dat files and returns vivified perl data, as Catalog.pm does, but with much stricter parsing logic to make certainnothing dangerous gets eval()ed. I put the new module in DataFile.pm. > [...] > The new parser is more flexible about the structure of the data, which seems good to me for making it easier to add ormodify data files in the future. The new parser does not yet have a means of hacking up the data to add autogeneratedfields and such that Catalog.pm does, but I think a more clean break between parsing and autovivifying fieldswould be good anyway. Separation of concerns sounds like a good idea, but I've not fully thought it through. For one advantage, I think it might be nicer to have indexing.dat and toasting.dat instead of having to dig the info out of C macros. This was evident while recently experimenting with generating catcache control data. As for the patch, I have not done a full review, but I have some comments based on a light read-through: utils/Makefile: +# location of commandtag.dat +headerdir = $(top_srcdir)/src/include/utils This variable name is too generic for what the comment says it is. A better abstraction, if we want one, would be the full path to the commandtag input file. The other script invocations in this Makefile don't do it this way, but that's a separate patch. +# location to write generated headers +sourcedir = $(top_srcdir)/src/backend/utils Calling the output the source is bound to confuse people. The comment implies all generated headers, not just the ones introduced by the patch. I would just output to the current directory (i.e. have an --output option with a default empty string). Also, if we want to output somewhere else, I would imagine it'd be under the top builddir, not srcdir. +$(PERL) -I $(top_srcdir)/src/include/utils $< --headerdir=$(headerdir) --sourcedir=$(sourcedir) --inputfile=$(headerdir)/commandtag.dat 1. headerdir is entirely unused by the script 2. We can default to working dir for the output as mentioned above 3. -I $(top_srcdir)/src/include/utils is supposed to point to the dir containing DataFile.pm, but since gencommandtag.pl has "use lib..." it's probably not needed here. I don't recall why we keep the "-I" elsewhere. (ditto in Solution.pm) I'm thinking it would look something like this: +$(PERL) $< --inputfile=$(top_srcdir)/src/include/utils/commandtag.dat -- utils/misc/Makefile +all: distprep + # Note: guc-file.c is not deleted by 'make clean', # since we want to ship it in distribution tarballs. clean: @rm -f lex.yy.c + +maintainer-clean: clean Seems non-functional. -- DataFiles.pm I haven't studied this in detail, but I'll suggest that if this meant to have wider application, maybe it should live in src/tools ? I'm not familiar with using different IO routines depending on the OS -- what's the benefit of that? -- gencommandtag.pl slurp_without_comments() is unused. sanity_check_data() seems longer than the main body of the script (minus header boilerplate), and I wonder if we can pare it down some. For one, I have difficulty imagining anyone would accidentally type an unprintable or non-ascii character in a command tag and somehow not realize it. For another, duplicating checks that were done earlier seems like a maintenance headache. dataerror() is defined near the top, but other functions are defined at the bottom. +# Generate all output internally before outputting anything, to avoid +# partially overwriting generated files under error conditions My personal preference is, having this as a design requirement sacrifices readability for unclear gain, especially since a "chunk" also includes things like header boilerplate. That said, the script is also short enough that it doesn't make a huge difference either way. Speaking of boilerplate, it's better for readability to separate that from actual code such as: typedef enum CommandTag { #define FIRST_COMMANDTAG COMMANDTAG_$sorted[0]->{taglabel}) -- tcop/dest.c + * We no longer display LastOid, but to preserve the wire protocol, + * we write InvalidOid where the LastOid used to be written. For + * efficiency in the snprintf(), hard-code InvalidOid as zero. Hmm, is hard-coding zero going to make any difference here? -- John Naylor https://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
pgsql-hackers by date: