Re: benchmarking Flex practices - Mailing list pgsql-hackers

From John Naylor
Subject Re: benchmarking Flex practices
Date
Msg-id CACPNZCuGfVe+Wp+4N7rVfQ4GAYjuCHzSsx8NR8=2n_OTwU9Eow@mail.gmail.com
Whole thread Raw
In response to Re: benchmarking Flex practices  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: benchmarking Flex practices  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-hackers
On Tue, Nov 26, 2019 at 5:51 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
>
> [ My apologies for being so slow to get back to this ]

No worries -- it's a nice-to-have, not something our users are excited about.

> It struck me though that there's another solution we haven't discussed,
> and that's to make the token lookahead filter in parser.c do the work
> of converting UIDENT [UESCAPE SCONST] to IDENT, and similarly for the
> string case.

I recently tried again to get gram.y to handle it without precedence
hacks (or at least hacks with less mystery) and came to the conclusion
that maybe it just doesn't belong in the grammar after all. I hadn't
thought of any alternatives, so thanks for working on that!

It seems something is not quite right in v9 with the error position reporting:

 SELECT U&'wrong: +0061' UESCAPE '+';
 ERROR:  invalid Unicode escape character at or near "'+'"
 LINE 1: SELECT U&'wrong: +0061' UESCAPE '+';
-                                        ^
+                               ^

The caret is not pointing to the third token, or the second for that
matter. What worked for me was un-truncating the current token before
calling yylex again. To see if I'm on the right track, I've included
this in the attached, which applies on top of your v9.

> Generally, I'm pretty happy with this approach: it touches gram.y
> hardly at all, and it removes just about all of the complexity from
> scan.l.  I'm happier about dropping the support code into parser.c
> than the other choices we've discussed.

Seems like the best of both worlds. If we ever wanted to ditch the
whole token filter and use Bison's %glr mode, we'd have extra work to
do, but there doesn't seem to be a rush to do so anyway.

> There's still undone work here, though:
>
> * I did not touch psql.  Probably your patch is fine for that.
>
> * I did not do more with ecpg than get it to compile, using the
> same hacks as in your v7.  It still fails its regression tests,
> but now the reason is that what we've done in parser/parser.c
> needs to be transposed into the identical functionality in
> ecpg/preproc/parser.c.  Or at least some kind of functionality
> there.  A problem with this approach is that it presumes we can
> reduce a UIDENT sequence to a plain IDENT, but to do so we need
> assumptions about the target encoding, and I'm not sure that
> ecpg should make any such assumptions.  Maybe ecpg should just
> reject all cases that produce non-ASCII identifiers?  (Probably
> it could be made to do something smarter with more work, but
> it's not clear to me that it's worth the trouble.)

Hmm, I thought we only allowed Unicode escapes in the first place if
the server encoding was UTF-8. Or did you mean something else?

> If this seems like a reasonable approach to you, please fill in
> the missing psql and ecpg bits.

Will do.

-- 
John Naylor                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Attachment

pgsql-hackers by date:

Previous
From: Etsuro Fujita
Date:
Subject: Re: A problem about partitionwise join
Next
From: Ranier Vilela
Date:
Subject: [PATCH] Remove twice assignment with var pageop (nbtree.c).