Thread: remove quoting hacks and simplify bootscanner.l

remove quoting hacks and simplify bootscanner.l

From
John Naylor
Date:
For the bootstrap data conversion, it was desirable for postgres.bki
to remain unchanged, so some ugly quoting hacks were added to
genbki.pl to match the quoting conventions in the DATA() lines. At
this point, it's possible (and worthwhile I think) to remove those,
and along the way simplify the tokenizing rules in bootscanner.l. This
will result in some largish changes to postgres.bki, but they're easy
to reason about and have no functional consequence. Make check passes.

Patch 0001 removes the special case rule that dashes, negative
numbers, and octals remain unquoted, so handling these cases can now
be removed from bootscanner.l as well. Change in postgres.bki: Dashes
and negative numbers will now be quoted.

Patch 0002 removes type- and attribute-specific ad-hoc quoting rules.
Change in postgres.bki: Array-like types in pg_proc that only have one
element will no longer be quoted.

Currently, Catalog.pm, genbki.pl, and bootscanner.l all have different
ideas on how to parse and format array types. Patch 0003 rips all that
out and does it once and for all in Catalog.pm. Change in
postgres.bki: Array types now look like '_foo'.

-John Naylor

Attachment

Re: remove quoting hacks and simplify bootscanner.l

From
Tom Lane
Date:
John Naylor <jcnaylor@gmail.com> writes:
> For the bootstrap data conversion, it was desirable for postgres.bki
> to remain unchanged, so some ugly quoting hacks were added to
> genbki.pl to match the quoting conventions in the DATA() lines. At
> this point, it's possible (and worthwhile I think) to remove those,
> and along the way simplify the tokenizing rules in bootscanner.l.

Although we're past feature freeze, this all seems like reasonable
code cleanup, and probably best to include it now rather than waiting
for v12.  Any objections?

            regards, tom lane


Re: remove quoting hacks and simplify bootscanner.l

From
Tom Lane
Date:
John Naylor <jcnaylor@gmail.com> writes:
> For the bootstrap data conversion, it was desirable for postgres.bki
> to remain unchanged, so some ugly quoting hacks were added to
> genbki.pl to match the quoting conventions in the DATA() lines. At
> this point, it's possible (and worthwhile I think) to remove those,
> and along the way simplify the tokenizing rules in bootscanner.l. This
> will result in some largish changes to postgres.bki, but they're easy
> to reason about and have no functional consequence. Make check passes.

Forgot to follow up to this last night, but I pushed this with a couple of
changes:

* I didn't see a reason to remove '-' from the set of "id" characters.
That'd force quoting of data fields that are just "-", which there are
a lot of, so it would bulk up the .bki file for no gain.

* I didn't like assuming that Perl's \w exactly matches the set of
characters in the "id" production, so I changed that to use a
regex character class matching bootscanner.l's.

Also I did a bit of additional work to make single and double quotes
less magic.  It was kind of tempting to rethink how bootscanner.l
parses double-quoted fields, but in the end I just left that as-is
and made the Perl code cope with it.  I think as long as people can
write quotes in the .dat files without thinking too hard, nobody
will care how weird it looks in the .bki file.

            regards, tom lane