Re: array size exceeds the maximum allowed (1073741823) when building a json - Mailing list pgsql-performance

From Merlin Moncure
Subject Re: array size exceeds the maximum allowed (1073741823) when building a json
Date
Msg-id CAHyXU0wiGM8VzuRdAwipJ3g1sr6bKt2wtkm_56xF0GU74gf7xw@mail.gmail.com
Whole thread Raw
In response to Re: array size exceeds the maximum allowed (1073741823) when building a json  (Nicolas Paris <niparisco@gmail.com>)
List pgsql-performance
On Thu, Jun 9, 2016 at 8:43 AM, Nicolas Paris <niparisco@gmail.com> wrote:
>
>
> 2016-06-09 15:31 GMT+02:00 Merlin Moncure <mmoncure@gmail.com>:
>>
>> On Wed, Jun 8, 2016 at 1:04 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> > Michael Paquier <michael.paquier@gmail.com> writes:
>> >> On Tue, Jun 7, 2016 at 10:03 PM, Josh Berkus <josh@agliodbs.com> wrote:
>> >>> On 06/07/2016 08:42 AM, Nicolas Paris wrote:
>> >>>> Will this 1GO restriction is supposed to increase in a near future ?
>> >
>> >>> Not planned, no.  Thing is, that's the limit for a field in general,
>> >>> not
>> >>> just JSON; changing it would be a fairly large patch.  It's
>> >>> desireable,
>> >>> but AFAIK nobody is working on it.
>> >
>> >> And there are other things to consider on top of that, like the
>> >> maximum allocation size for palloc, the maximum query string size,
>> >> COPY, etc. This is no small project, and the potential side-effects
>> >> should not be underestimated.
>> >
>> > It's also fair to doubt that client-side code would "just work" with
>> > no functionality or performance problems for such large values.
>> >
>> > I await with interest the OP's results on other JSON processors that
>> > have no issues with GB-sized JSON strings.
>>
>> Yup.  Most json libraries and tools are going to be disgusting memory
>> hogs or have exponential behaviors especially when you consider you
>> are doing the transformation as well.  Just prettifying json documents
>> over 1GB can be a real challenge.
>>
>> Fortunately the workaround here is pretty easy.  Keep your query
>> exactly as is but remove the final aggregation step so that it returns
>> a set. Next, make a small application that runs this query and does
>> the array bits around each row (basically prepending the final result
>> with [ appending the final result with ] and putting , between rows).
>
>
> The point is when prepending/appending leads to deal with strings.
> Transforming each value of the resultset to a string implies to escape the
> double quote.
> then:
> row1 contains {"hello":"world"}
> step 1 = prepend -> "[{\"hello\":\"world\"}"
> step 2 = append -> "[{\"hello\":\"world\"},"

right 3 rows contain {"hello":"world"}

before iteration: emit '['
before every row except the first, prepend ','
after iteration: emit ']'

you end up with:
[{"hello":"world"}
,{"hello":"world"}
,{"hello":"world"}]

...which is 100% valid json as long as each row of the set is a json object.

in SQL, the technique is like this:
select ('[' || string_agg(j::text, ',') || ']')::json from (select
json_build_object('hello', 'world') j from generate_series(1,3)) q;

the difference is, instead of having the database do the string_agg
step, it's handled on the client during iteration over the output of
generate_series.

merlin


pgsql-performance by date:

Previous
From: Jeff Janes
Date:
Subject: Re: Performance of LIKE/NOT LIKE when used in single query
Next
From: Andres Freund
Date:
Subject: Re: 9.4 -> 9.5 regression with queries through pgbouncer on RHEL 6