Re: Portal->commandTag as an enum - Mailing list pgsql-hackers
From | Mark Dilger |
---|---|
Subject | Re: Portal->commandTag as an enum |
Date | |
Msg-id | DD6EF499-E5DF-44A0-8A9B-06B775062076@enterprisedb.com Whole thread Raw |
In response to | Re: Portal->commandTag as an enum (John Naylor <john.naylor@2ndquadrant.com>) |
Responses |
Re: Portal->commandTag as an enum
|
List | pgsql-hackers |
> On Feb 19, 2020, at 3:31 AM, John Naylor <john.naylor@2ndquadrant.com> wrote: > > Hi Mark, Hi John, thanks for reviewing! > On Wed, Feb 19, 2020 at 10:40 AM Mark Dilger > <mark.dilger@enterprisedb.com> wrote: >> This would only make sense to me if the string held in $_ had already been checked for safety, but Catalog.pm does verylittle to verify that the string is safe to eval. The assumption here, so far as I can infer, is that we don’t embedanything dangerous in our .dat files, so this should be ok. That may be true for the moment, but I can imagine a daywhen we start embedding perl functions as quoted text inside a data file, or shell commands as text which look enoughlike perl for eval() to be able to execute them. Developers who edit these .dat files and mess up the quoting, andthen rerun ‘make’ to get the new .c and .h files generated, may not like the side effects. Perhaps I’m being overly paranoid…. > > The use case for that seems slim. However, at a brief glance your > module seems more robust in other ways. > >> Rather than add more code generation logic based on the design of Catalog.pm, I wrote a perl based data file parser thatparses .dat files and returns vivified perl data, as Catalog.pm does, but with much stricter parsing logic to make certainnothing dangerous gets eval()ed. I put the new module in DataFile.pm. >> [...] >> The new parser is more flexible about the structure of the data, which seems good to me for making it easier to add ormodify data files in the future. The new parser does not yet have a means of hacking up the data to add autogeneratedfields and such that Catalog.pm does, but I think a more clean break between parsing and autovivifying fieldswould be good anyway. > > Separation of concerns sounds like a good idea, but I've not fully > thought it through. For one advantage, I think it might be nicer to > have indexing.dat and toasting.dat instead of having to dig the info > out of C macros. This was evident while recently experimenting with > generating catcache control data. I guess you mean macros DECLARE_UNIQUE_INDEX and DECLARE_TOAST. I don’t mind converting that to .dat files, though I’m mindfulof Tom’s concern expressed early in this thread about the amount of code churn. Is there sufficient demand for refactoringthis stuff? There are more reasons in the conversation below to refactor the perl modules and code generationscripts. > As for the patch, I have not done a full review, but I have some > comments based on a light read-through: > > utils/Makefile: > > +# location of commandtag.dat > +headerdir = $(top_srcdir)/src/include/utils > > This variable name is too generic for what the comment says it is. A > better abstraction, if we want one, would be the full path to the > commandtag input file. The other script invocations in this Makefile > don't do it this way, but that's a separate patch. > > +# location to write generated headers > +sourcedir = $(top_srcdir)/src/backend/utils > > Calling the output the source is bound to confuse people. The comment > implies all generated headers, not just the ones introduced by the > patch. I would just output to the current directory (i.e. have an > --output option with a default empty string). Also, if we want to > output somewhere else, I would imagine it'd be under the top builddir, > not srcdir. > > +$(PERL) -I $(top_srcdir)/src/include/utils $< > --headerdir=$(headerdir) --sourcedir=$(sourcedir) > --inputfile=$(headerdir)/commandtag.dat > > 1. headerdir is entirely unused by the script > 2. We can default to working dir for the output as mentioned above > 3. -I $(top_srcdir)/src/include/utils is supposed to point to the dir > containing DataFile.pm, but since gencommandtag.pl has "use lib..." > it's probably not needed here. I don't recall why we keep the "-I" > elsewhere. (ditto in Solution.pm) > > I'm thinking it would look something like this: > > +$(PERL) $< --inputfile=$(top_srcdir)/src/include/utils/commandtag.dat\ I have taken all this advice in v5 of the patch. --inputfile and --outputdir (previously named --sourcedir) are now optionalwith the defaults as you suggested. > -- > utils/misc/Makefile > > +all: distprep > + > # Note: guc-file.c is not deleted by 'make clean', > # since we want to ship it in distribution tarballs. > clean: > @rm -f lex.yy.c > + > +maintainer-clean: clean > > Seems non-functional. Yeah, I also had an unnecessary addition to .gitignore in that directory. I had originally placed the commandtag stuff herebefore moving it one directory up. Thanks for catching that. > -- > DataFiles.pm > > I haven't studied this in detail, but I'll suggest that if this meant > to have wider application, maybe it should live in src/tools ? We don’t seem to have a standard place for perl modules. src/test/perl has some that are specifically for tap testing, andsrc/backend/catalog has some for catalog data file processing. I put DataFiles.pm in src/backend/catalog because that’swhere most data file processing currently is located. src/tools has PerfectHash.pm, and a bunch of Windows specificmodules under src/tools/msvc. > I'm not familiar with using different IO routines depending on the OS > -- what's the benefit of that? I think you are talking about the slurp_file routine. That came directly from the TestLib.pm module. I don't have enoughperl-on-windows experience to comment about why it does things that way. I was reluctant to have DataFile.pm 'useTestLib', since DataFile has absolutely nothing to do with regression testing. I don't like copying the function, either,though I chose that as the lesser evil. Which is more evil is debateable. src/test/perl/ contains SimpleTee.pm and RecursiveCopy.pm, neither of which contain functionality limited to just testing. I think they could be moved to src/tools. src/test/perl/TestLib.pm contains a mixture of testing specific functionsand more general purpose functions. For instance, TestLib.pm contains functions to read in a file or directory(slurp_file(filepath) and slurp_dir(dirpath), respectively). I think we should have just one implementation ofthose in just one place. Neither TestLib nor DataFile seem appropriate, nor does src/test/perl seem right. I checkedwhether Perl ships with core module support for this and didn't find anything. There is a cpan module named File::Slurp,but it is not a core module so far as I can tell, and it does more than we want. Should I submit a separate patch refactoring the location of perl modules and functions of general interest into src/tools? src/tools/perl? I am not changing DataFile.pm's duplicate copy of slurp_file in v5 of the patch, as I don't yet know the best way to approachthe problem. I expect there will have to be a v6 once this has been adequately debated. > -- > gencommandtag.pl > > slurp_without_comments() is unused. Right. An earlier version of gencommandtag.pl didn't use DataFile.pm, and I neglected to remove this function when I transitionedto using DataFile.pm. Thanks for noticing! > sanity_check_data() seems longer than the main body of the script > (minus header boilerplate), and I wonder if we can pare it down some. > For one, I have difficulty imagining anyone would accidentally type an > unprintable or non-ascii character in a command tag and somehow not > realize it. I'm uncertain about that. There is logic in EndCommand in tcop/dest.c that specifically warns that no encoding conversionwill be performed due to the assumption that command tags contain only 7-bit ascii. I think that's a perfectlyreasonable assumption in the C-code, but it needs to be checked by gencommandtag.pl because the bugs that mightensue from inserting an accent character or whatever could be subtle enough to not be caught right away. Such mistakesonly get easier as time goes by, as the tendency for editors to change your quotes into "smart quotes" and such getsmore common, and potentially as the assumption that PostgreSQL has been internationalized gets more common. Hopefully,we're moving more and more towards supporting non-ascii in more and more places. It might be less obvious to acontributor some years hence that they cannot stick an accented character into a command tag. (Compare, for example, thatit used to be widely accepted that you shouldn't stick spaces and hyphens into file names, but now a fair number of programmerswill do that without flinching.) As for checking for unprintable characters, the case is weaker. I'm not too motivated to remove the check, though. > For another, duplicating checks that were done earlier > seems like a maintenance headache. Hmmm. As long as gencommandtag.pl is the only user of DataFile.pm, I'm inclined to agree that we're double-checking somethings. The code comments I wrote certainly say so. But if DataFile.pm gets wider adoption, it might start to acceptmore varied input, and then gencommandtag.pl will need to assert its own set of validation. There is also the distinctionbetween checking that the input data file meets the syntax requirements of the *parser* vs. making certain thatthe vivified perl structures meet the semantic requirements of the *code generator*. You may at this point be able toassert that meeting the first guarantees meeting the second, but that can't be expected to hold indefinitely. It would be easier to decide these matters if we knew whether commandtag logic will ever be removed and whether DataFilewill ever gain wider adoption for code generation purposes.... > dataerror() is defined near the top, but other functions are defined > at the bottom. Moved. > +# Generate all output internally before outputting anything, to avoid > +# partially overwriting generated files under error conditions > > My personal preference is, having this as a design requirement > sacrifices readability for unclear gain, especially since a "chunk" > also includes things like header boilerplate. That said, the script is > also short enough that it doesn't make a huge difference either way. Catalog.pm writes a temporary file and then moves it to the final file name at the end. DataFile buffers the output andonly writes it after all the code generation has succeeded. There is no principled basis for these two modules tacklingthe same problem in two different ways. Perhaps that's another argument for pulling this kind of functionality outof random places and consolidating it in one or more modules in src/tools. > Speaking of boilerplate, it's better for readability to separate that > from actual code such as: > > typedef enum CommandTag > { > #define FIRST_COMMANDTAG COMMANDTAG_$sorted[0]->{taglabel}) Good idea. While I was doing this, I also consolidated the duplicated boilerplate into just one function. I think thisfunction, too, should go in just one perl module somewhere. See boilerplate_header() for details. > -- > tcop/dest.c > > + * We no longer display LastOid, but to preserve the wire protocol, > + * we write InvalidOid where the LastOid used to be written. For > + * efficiency in the snprintf(), hard-code InvalidOid as zero. > > Hmm, is hard-coding zero going to make any difference here? Part of the value of refactoring the commandtag logic is to make it easier to remove the whole ugly mess later. Having snprintfwrite the Oid into the string obfuscates the stupidity of what is really being done here. Putting the zero directlyinto the format string makes it clearer, to my eyes, that nothing clever is afoot. I have removed the sentence about efficiency. Thanks for mentioning it. — Mark Dilger EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Attachment
pgsql-hackers by date: