Thread: remove quoting hacks and simplify bootscanner.l
For the bootstrap data conversion, it was desirable for postgres.bki to remain unchanged, so some ugly quoting hacks were added to genbki.pl to match the quoting conventions in the DATA() lines. At this point, it's possible (and worthwhile I think) to remove those, and along the way simplify the tokenizing rules in bootscanner.l. This will result in some largish changes to postgres.bki, but they're easy to reason about and have no functional consequence. Make check passes. Patch 0001 removes the special case rule that dashes, negative numbers, and octals remain unquoted, so handling these cases can now be removed from bootscanner.l as well. Change in postgres.bki: Dashes and negative numbers will now be quoted. Patch 0002 removes type- and attribute-specific ad-hoc quoting rules. Change in postgres.bki: Array-like types in pg_proc that only have one element will no longer be quoted. Currently, Catalog.pm, genbki.pl, and bootscanner.l all have different ideas on how to parse and format array types. Patch 0003 rips all that out and does it once and for all in Catalog.pm. Change in postgres.bki: Array types now look like '_foo'. -John Naylor
Attachment
John Naylor <jcnaylor@gmail.com> writes: > For the bootstrap data conversion, it was desirable for postgres.bki > to remain unchanged, so some ugly quoting hacks were added to > genbki.pl to match the quoting conventions in the DATA() lines. At > this point, it's possible (and worthwhile I think) to remove those, > and along the way simplify the tokenizing rules in bootscanner.l. Although we're past feature freeze, this all seems like reasonable code cleanup, and probably best to include it now rather than waiting for v12. Any objections? regards, tom lane
John Naylor <jcnaylor@gmail.com> writes: > For the bootstrap data conversion, it was desirable for postgres.bki > to remain unchanged, so some ugly quoting hacks were added to > genbki.pl to match the quoting conventions in the DATA() lines. At > this point, it's possible (and worthwhile I think) to remove those, > and along the way simplify the tokenizing rules in bootscanner.l. This > will result in some largish changes to postgres.bki, but they're easy > to reason about and have no functional consequence. Make check passes. Forgot to follow up to this last night, but I pushed this with a couple of changes: * I didn't see a reason to remove '-' from the set of "id" characters. That'd force quoting of data fields that are just "-", which there are a lot of, so it would bulk up the .bki file for no gain. * I didn't like assuming that Perl's \w exactly matches the set of characters in the "id" production, so I changed that to use a regex character class matching bootscanner.l's. Also I did a bit of additional work to make single and double quotes less magic. It was kind of tempting to rethink how bootscanner.l parses double-quoted fields, but in the end I just left that as-is and made the Perl code cope with it. I think as long as people can write quotes in the .dat files without thinking too hard, nobody will care how weird it looks in the .bki file. regards, tom lane