Re: array size exceeds the maximum allowed (1073741823) when building a json - Mailing list pgsql-performance

From Nicolas Paris
Subject Re: array size exceeds the maximum allowed (1073741823) when building a json
Date
Msg-id CA+ssMOSuBmUFWvWn_=6vow63Wp7seEhUF_TDCO7Duip3j2GUUw@mail.gmail.com
Whole thread Raw
In response to Re: array size exceeds the maximum allowed (1073741823) when building a json  (Merlin Moncure <mmoncure@gmail.com>)
Responses Re: array size exceeds the maximum allowed (1073741823) when building a json  (Merlin Moncure <mmoncure@gmail.com>)
List pgsql-performance


2016-06-09 15:31 GMT+02:00 Merlin Moncure <mmoncure@gmail.com>:
On Wed, Jun 8, 2016 at 1:04 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Michael Paquier <michael.paquier@gmail.com> writes:
>> On Tue, Jun 7, 2016 at 10:03 PM, Josh Berkus <josh@agliodbs.com> wrote:
>>> On 06/07/2016 08:42 AM, Nicolas Paris wrote:
>>>> Will this 1GO restriction is supposed to increase in a near future ?
>
>>> Not planned, no.  Thing is, that's the limit for a field in general, not
>>> just JSON; changing it would be a fairly large patch.  It's desireable,
>>> but AFAIK nobody is working on it.
>
>> And there are other things to consider on top of that, like the
>> maximum allocation size for palloc, the maximum query string size,
>> COPY, etc. This is no small project, and the potential side-effects
>> should not be underestimated.
>
> It's also fair to doubt that client-side code would "just work" with
> no functionality or performance problems for such large values.
>
> I await with interest the OP's results on other JSON processors that
> have no issues with GB-sized JSON strings.

Yup.  Most json libraries and tools are going to be disgusting memory
hogs or have exponential behaviors especially when you consider you
are doing the transformation as well.  Just prettifying json documents
over 1GB can be a real challenge.

Fortunately the workaround here is pretty easy.  Keep your query
exactly as is but remove the final aggregation step so that it returns
a set. Next, make a small application that runs this query and does
the array bits around each row (basically prepending the final result
with [ appending the final result with ] and putting , between rows).

​The point is when prepending/appending leads to deal with strings.
Transforming each value of the resultset to a string implies to escape the double quote.
then:
row1 contains {"hello":"world"}
step 1 = prepend -> "[{\"hello\":\"world\"}"
step 2 = append -> "[{\"hello\":\"world\"},"
and so on
the json is corrupted. Hopelly I am sure I am on a wrong way about that.

 
It's essential that you use a client library that does not buffer the
entire result in memory before emitting results.   This can be done in
psql (FETCH mode), java, libpq (single row mode), etc.   I suspect
node.js pg module can do this as well, and there certainty will be
others.

The basic objective is you want the rows to be streamed out of the
database without being buffered.  If you do that, you should be able
to stream arbitrarily large datasets out of the database to a json
document assuming the server can produce the query.

merlin

pgsql-performance by date:

Previous
From: Merlin Moncure
Date:
Subject: Re: array size exceeds the maximum allowed (1073741823) when building a json
Next
From: Glyn Astill
Date:
Subject: Re: slony rpm help slony1-95-2.2.2-1.rhel6.x86_64