Home > mailing lists

Re: remaining sql/json patches - Mailing list pgsql-hackers

From	Amit Langote
Subject	Re: remaining sql/json patches
Date	November 24, 2023 09:32:06
Msg-id	CA+HiwqHGOL8a5MjfOd_jbWMontKizqjoqCf5HtqCXhNjnpUktw@mail.gmail.com Whole thread
In response to	Re: remaining sql/json patches (Andres Freund <andres@anarazel.de>)
Responses	Re: remaining sql/json patches
List	pgsql-hackers

Tree view

On Thu, Nov 23, 2023 at 4:38 AM Andres Freund <andres@anarazel.de> wrote:
> On 2023-11-21 12:52:35 +0900, Amit Langote wrote:
> > version     gram.o text bytes  %change  gram.c bytes  %change
> >
> > 9.6         534010             -        2108984       -
> > 10          582554             9.09     2258313       7.08
> > 11          584596             0.35     2313475       2.44
> > 12          590957             1.08     2341564       1.21
> > 13          590381            -0.09     2357327       0.67
> > 14          600707             1.74     2428841       3.03
> > 15          633180             5.40     2495364       2.73
> > 16          653464             3.20     2575269       3.20
> > 17-sqljson  672800             2.95     2709422       3.97
> >
> > So if we put SQL/JSON (including JSON_TABLE()) into 17, we end up with a gram.o 2.95% larger than v16, which
grantedis a somewhat larger bump, though also smaller than with some of recent releases. 
>
> I think it's ok to increase the size if it's necessary increases - but I also
> think we've been a bit careless at times, and that that has made the parser
> slower.  There's probably also some "infrastructure" work we could do combat
> some of the growth too.
>
> I know I triggered the use of the .c bytes and text size, but it'd probably
> more sensible to look at the size of the important tables generated by bison.
> I think the most relevant defines are:
>
> #define YYLAST   117115
> #define YYNTOKENS  521
> #define YYNNTS  707
> #define YYNRULES  3300
> #define YYNSTATES  6255
> #define YYMAXUTOK   758
>
>
> I think a lot of the reason we end up with such a big "state transition" space
> is that a single addition to e.g. col_name_keyword or unreserved_keyword
> increases the state space substantially, because it adds new transitions to so
> many places. We're in quadratic territory, I think.  We might be able to do
> some lexer hackery to avoid that, but not sure.

One thing I noticed when looking at the raw parsing times across
versions is that they improved a bit around v12 and then some in v13:

9.0    0.000060 s
9.6    0.000061 s
10     0.000061 s
11     0.000063 s
12     0.000055 s
13     0.000054 s
15     0.000057 s
16     0.000059 s

I think they might be due to the following commits in v12 and v13 resp.:

commit c64d0cd5ce24a344798534f1bc5827a9199b7a6e
Author: Tom Lane <tgl@sss.pgh.pa.us>
Date:   Wed Jan 9 19:47:38 2019 -0500
    Use perfect hashing, instead of binary search, for keyword lookup.
    ...
    Discussion: https://postgr.es/m/20190103163340.GA15803@britannica.bec.de

commit 7f380c59f800f7e0fb49f45a6ff7787256851a59
Author: Tom Lane <tgl@sss.pgh.pa.us>
Date:   Mon Jan 13 15:04:31 2020 -0500
    Reduce size of backend scanner's tables.
    ...
    Discussion:
https://postgr.es/m/CACPNZCvaoa3EgVWm5yZhcSTX6RAtaLgniCPcBVOCwm8h3xpWkw@mail.gmail.com

I haven't read the whole discussions there to see if the target(s)
included the metrics you've mentioned though, either directly or
indirectly.

--
Thanks, Amit Langote
EDB: http://www.enterprisedb.com

pgsql-hackers by date:

From: "Drouvot, Bertrand"
Date: 24 November 2023, 09:16:41
Subject: Re: Synchronizing slots from primary to standby

From: Amit Kapila
Date: 24 November 2023, 09:45:10
Subject: Re: Synchronizing slots from primary to standby

Re: remaining sql/json patches - Mailing list pgsql-hackers

Previous

Next