Re: remaining sql/json patches - Mailing list pgsql-hackers
| From | Amit Langote |
|---|---|
| Subject | Re: remaining sql/json patches |
| Date | |
| Msg-id | CA+HiwqHGOL8a5MjfOd_jbWMontKizqjoqCf5HtqCXhNjnpUktw@mail.gmail.com Whole thread |
| In response to | Re: remaining sql/json patches (Andres Freund <andres@anarazel.de>) |
| Responses |
Re: remaining sql/json patches
|
| List | pgsql-hackers |
On Thu, Nov 23, 2023 at 4:38 AM Andres Freund <andres@anarazel.de> wrote:
> On 2023-11-21 12:52:35 +0900, Amit Langote wrote:
> > version gram.o text bytes %change gram.c bytes %change
> >
> > 9.6 534010 - 2108984 -
> > 10 582554 9.09 2258313 7.08
> > 11 584596 0.35 2313475 2.44
> > 12 590957 1.08 2341564 1.21
> > 13 590381 -0.09 2357327 0.67
> > 14 600707 1.74 2428841 3.03
> > 15 633180 5.40 2495364 2.73
> > 16 653464 3.20 2575269 3.20
> > 17-sqljson 672800 2.95 2709422 3.97
> >
> > So if we put SQL/JSON (including JSON_TABLE()) into 17, we end up with a gram.o 2.95% larger than v16, which
grantedis a somewhat larger bump, though also smaller than with some of recent releases.
>
> I think it's ok to increase the size if it's necessary increases - but I also
> think we've been a bit careless at times, and that that has made the parser
> slower. There's probably also some "infrastructure" work we could do combat
> some of the growth too.
>
> I know I triggered the use of the .c bytes and text size, but it'd probably
> more sensible to look at the size of the important tables generated by bison.
> I think the most relevant defines are:
>
> #define YYLAST 117115
> #define YYNTOKENS 521
> #define YYNNTS 707
> #define YYNRULES 3300
> #define YYNSTATES 6255
> #define YYMAXUTOK 758
>
>
> I think a lot of the reason we end up with such a big "state transition" space
> is that a single addition to e.g. col_name_keyword or unreserved_keyword
> increases the state space substantially, because it adds new transitions to so
> many places. We're in quadratic territory, I think. We might be able to do
> some lexer hackery to avoid that, but not sure.
One thing I noticed when looking at the raw parsing times across
versions is that they improved a bit around v12 and then some in v13:
9.0 0.000060 s
9.6 0.000061 s
10 0.000061 s
11 0.000063 s
12 0.000055 s
13 0.000054 s
15 0.000057 s
16 0.000059 s
I think they might be due to the following commits in v12 and v13 resp.:
commit c64d0cd5ce24a344798534f1bc5827a9199b7a6e
Author: Tom Lane <tgl@sss.pgh.pa.us>
Date: Wed Jan 9 19:47:38 2019 -0500
Use perfect hashing, instead of binary search, for keyword lookup.
...
Discussion: https://postgr.es/m/20190103163340.GA15803@britannica.bec.de
commit 7f380c59f800f7e0fb49f45a6ff7787256851a59
Author: Tom Lane <tgl@sss.pgh.pa.us>
Date: Mon Jan 13 15:04:31 2020 -0500
Reduce size of backend scanner's tables.
...
Discussion:
https://postgr.es/m/CACPNZCvaoa3EgVWm5yZhcSTX6RAtaLgniCPcBVOCwm8h3xpWkw@mail.gmail.com
I haven't read the whole discussions there to see if the target(s)
included the metrics you've mentioned though, either directly or
indirectly.
--
Thanks, Amit Langote
EDB: http://www.enterprisedb.com
pgsql-hackers by date: