Thanks for the explanation. I keep forgetting, even if I've worked on parsing code many times.
I understand backwards compatibility concerns, but this particular case does seem to show an extreme waste, especially since I don't think it's really necessary here. I don't see the ambiguity of nesting parentheses as (((1))), for example. It appears that the escaping rules have been kept simple in order to make parsing 1-2 edge cases simpler, at the cost of most ordinary cases being way too verbose.
Sure, nesting 32 levels without even actual named composite types is not realistic, as this was just a client side integration test for client side logic. But nesting 5 levels, including named composite types is not unseen, and that already incurs 2^4 = 16 unnecessary double quotes for each data item on each row.
In my opinion, the rich system of user-defined data structures is PostgreSQL's biggest strength, even compared to Oracle, the other popular ORDBMS. If there's such a significant performance penalty to using them, people simply might not.
Of course, there's the workaround of serialising JSON to clients instead of the native text format...
PG Bug reporting form <noreply@postgresql.org> writes:
> I don't understand how such excessive memory consumption is produced by such
> a "simple" query.
Compare
regression=# select row(row(1));
row
---------
("(1)")
(1 row)
regression=# select row(row(row(1)));
row
---------------
("(""(1)"")")
(1 row)
regression=# select row(row(row(row(1))));
row
-------------------------
("(""(""""(1)"""")"")")
(1 row)
Each layer of ROW() has to quote the string representing the next
lower layer, so it doubles all the double-quotes per the quoting
rules at
https://www.postgresql.org/docs/current/rowtypes.html#ROWTYPES-IO-SYNTAX
So the output length nearly doubles for each layer of ROW(), and yes
it doesn't take that many layers to get to something excessive.
Maybe we could redefine the quoting rules in a way that's friendlier
to this sort of thing, but it would surely make them a lot more
complicated, which would likely introduce enough client-side bugs
to make the change unattractive. Not to mention the whole backwards
compatibility question.
On the whole I can't get excited about this, since I see no very
good reason to use such a data structure.
regards, tom lane