Re: benchmarking Flex practices - Mailing list pgsql-hackers

From Tom Lane
Subject Re: benchmarking Flex practices
Date
Msg-id 25775.1564414812@sss.pgh.pa.us
Whole thread Raw
In response to Re: benchmarking Flex practices  (John Naylor <john.naylor@2ndquadrant.com>)
Responses Re: benchmarking Flex practices  (John Naylor <john.naylor@2ndquadrant.com>)
List pgsql-hackers
John Naylor <john.naylor@2ndquadrant.com> writes:
> On Sun, Jul 21, 2019 at 3:14 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> So I'm feeling like maybe we should experiment to see what that
>> solution looks like, before we commit to going in this direction.
>> What do you think?

> Given the above wrinkles, I thought it was worth trying. Attached is a
> rough patch (don't mind the #include mess yet :-) ) that works like
> this:

> The lexer returns UCONST from xus and UIDENT from xui. The grammar has
> rules that are effectively:

> SCONST { do nothing}
> | UCONST { esc char is backslash }
> | UCONST UESCAPE SCONST { esc char is from $3 }

> ...where UESCAPE is now an unreserved keyword. To prevent shift-reduce
> conflicts, I added UIDENT to the %nonassoc precedence list to match
> IDENT, and for UESCAPE I added a %left precedence declaration. Maybe
> there's a more principled way. I also added an unsigned char type to
> the %union, but it worked fine on my compiler without it.

I think it might be better to drop the separate "Uescape" production and
just inline that into the calling rules, exactly per your sketch above.
You could avoid duplicating the escape-checking logic by moving that into
the str_udeescape support function.  This would avoid the need for the
"uchr" union variant, but more importantly it seems likely to be more
future-proof: IME, any time you can avoid or postpone shift/reduce
decisions, it's better to do so.

I didn't try, but I think this might allow dropping the %left for
UESCAPE.  That bothers me because I don't understand why it's
needed or what precedence level it ought to have.

> litbuf_udeescape() and check_uescapechar() were moved to gram.y. The
> former had be massaged to give error messages similar to HEAD. They're
> not quite identical, but the position info is preserved. Some of the
> functions I moved around don't seem to have any test coverage, so I
> should eventually do some work in that regard.

I don't terribly like the cross-calls you have between gram.y and scan.l
in this formulation.  If we have to make these functions (hexval() etc)
non-static anyway, maybe we should shove them all into scansup.c?

> -Binary size is very close to v6. That is to say the grammar tables
> grew by about the same amount the scanner table shrank, so the binary
> is still about  200kB smaller than HEAD.

OK.

> -Performance is very close to v6 with the information_schema and
> pgbench-like queries with standard strings, which is to say also very
> close to HEAD. When the latter was changed to use Unicode escapes,
> however, it was about 15% slower than HEAD. That's a big regression
> and I haven't tried to pinpoint why.

I don't quite follow what you changed to produce the slower test case?
But that seems to be something we'd better run to ground before
deciding whether to go this way.

> -The ecpg changes here are only the bare minimum from HEAD to get it
> to compile, since I'm borrowing its additional token names (although
> they mean slightly different things). After a bit of experimentation,
> it's clear there's a bit more work needed to get it functional, and
> it's not easy to debug, so I'm putting that off until we decide
> whether this is the way forward.

On the whole I like this approach, modulo the performance question.
Let's try to work that out before worrying about ecpg.

            regards, tom lane



pgsql-hackers by date:

Previous
From: Stephen Frost
Date:
Subject: Re: pg_upgrade fails with non-standard ACL
Next
From: vignesh C
Date:
Subject: Re: Is ParsePrepareRecord dead function