Re: Proposal to use JSON for Postgres Parser format - Mailing list pgsql-hackers

From Matthias van de Meent
Subject Re: Proposal to use JSON for Postgres Parser format
Date
Msg-id CAEze2WgeznFH0yQVcLa+-oV6Uiq4MMGsrZ6vmr-Q0nyZg3dn-w@mail.gmail.com
Whole thread Raw
In response to Re: Proposal to use JSON for Postgres Parser format  (Michel Pelletier <pelletier.michel@gmail.com>)
List pgsql-hackers
On Mon, 31 Oct 2022 at 15:56, Michel Pelletier
<pelletier.michel@gmail.com> wrote:
> On Mon, Oct 31, 2022 at 6:15 AM Matthias van de Meent <boekewurm+postgres@gmail.com> wrote:
>> On Mon, 31 Oct 2022 at 13:46, Alexander Korotkov <aekorotkov@gmail.com> wrote:
>>> On Fri, Oct 28, 2022 at 4:27 PM Andrew Dunstan <andrew@dunslane.net> wrote:
>>>> On 2022-10-27 Th 19:38, Andres Freund wrote:
>>>>> Hi,
>>>>>
>>>>> On 2022-09-19 22:29:15 -0400, Tom Lane wrote:
>>>>>> Maybe a compromise could be found whereby we provide a conversion function
>>>>>> that converts whatever the catalog storage format is to some JSON
>>>>>> equivalent.  That would address the needs of external code that doesn't want
>>>>>> to write a custom parser, while not tying us directly to JSON.
>>>>> +1
>>>>
>>>> Agreed.
>>>
>>> +1
>>>
>>> Michel, it seems that you now have a green light to implement node to
>>> json function.
>>
>> I think that Tom's proposal that we +1 is on a pg_node_tree to json
>> SQL function / cast; which is tangentially related to the "nodeToJson
>> / changing the storage format of pg_node_tree to json" proposal, but
>> not the same.
>
>
> I agree.
>
>>
>> I will add my +1 to Tom's proposal for that function/cast, but I'm not
>> sure on changing the storage format of pg_node_tree to json.
>
>
> I'm going to spike on this function and will get back to the thread with any updates.

Michel, did you get a result from this spike?

I'm asking, because as I spiked most of my ideas on updating the node
text format, and am working on wrapping it up into a patch (or
patchset) later this week. The ideas for this are:

1. Don't write fields with default values for their types, such as
NULL for Node* fields;
2. Reset location fields before transforming the node tree to text
when we don't have a copy of the original query, which removes
location fields from serialization with step 1;
3. Add default() node labels to struct fields that do not share the
field type's default, allowing more fields to be omitted with step 1;
4. Add special default_ref() pg_node_attr for node fields that default
to other node field's values, used in Var's varnosyn/varattnosyn as
refering to varno/varattno; and
5. Truncate trailing 0s in Const' outDatum notation of by-ref types,
so that e.g. Consts with `name` data don't waste so much space with 0s

Currently, it reduces the pg_total_relation_size metric of pg_rewrite
after TOAST compression by 35% vs pg16, down to 483328 bytes / 59
pages, from 753664 bytes / 92 pages. The raw size of the ev_action
column's data (that is, before compression) is reduced by 55% to
1.18MB (from 2.80MB), and the largest default shipped row (the
information_schema.columns view) in that table is reduced to 'only'
78kB raw, from 193kB.

RW performance hasn't been tested yet, so that is still to be determined...

Kind regards,

Matthias van de Meent
Neon (https://neon.tech)



pgsql-hackers by date:

Previous
From: "Drouvot, Bertrand"
Date:
Subject: Re: Synchronizing slots from primary to standby
Next
From: jian he
Date:
Subject: Re: PATCH: Add REINDEX tag to event triggers