Thread: array size exceeds the maximum allowed (1073741823) when building a json

array size exceeds the maximum allowed (1073741823) when building a json

From
Nicolas Paris
Date:
Hello,

I run a query transforming huge tables to a json document based on a period.
It works great for a modest period (little dataset).
However, when increasing the period (huge dataset) I get this error:
SQL ERROR[54000]
ERROR: array size exceeds the maximum allowed (1073741823)


Thanks by advance,
Informations:
postgresql 9.4
shared_buffers = 55GB
64bit Red Hat Enterprise Linux Server release 6.7
the query:
WITH sel AS
(SELECT ids_pat,
ids_nda
FROM eds.nda
WHERE (dt_deb_nda >= '20150101'
AND dt_deb_nda <= '20150401')),
diag AS
( SELECT ids_nda_rum,
json_agg(diago) AS diago,
count(1) AS total
FROM
(SELECT ids_nda_rum,
json_build_object( 'cd_cim', cd_cim, 'lib_cim',lib_typ_diag_tr, 'dt_cim',dt_exec) AS diago
FROM eds.fait_diag_tr
WHERE ids_nda IN
(SELECT ids_nda
FROM sel)
ORDER BY dt_exec) AS diago2
GROUP BY ids_nda_rum),
act AS
( SELECT ids_nda_rum,
json_agg(acto) AS acto,
count(1) AS total
FROM
( SELECT ids_nda_rum,
json_build_object( 'cd_act',cd_ccam, 'dt_act',dt_exec) AS acto
FROM eds.fait_act_tr
WHERE ids_nda IN
(SELECT ids_nda
FROM sel)
ORDER BY dt_exec) AS acto2
GROUP BY ids_nda_rum ),
ghm AS
( SELECT ids_nda_rum,
json_agg(ghmo) AS ghmo,
count(1) AS total
FROM
( SELECT ids_nda_rum,
json_build_object( 'cd_ghm',cd_ghm, 'cd_ghs',cd_ghs, 'status',lib_statut_tr, 'dt_maj_rum_ghm',dt_maj_rum_ghm) AS ghmo
FROM eds.nda_rum_ghm_tr
LEFT JOIN eds.nda_rum_tr rum USING (ids_nda_rum)
WHERE nda_rum_ghm_tr.ids_nda IN
(SELECT ids_nda
FROM sel)
AND rum.cd_rum = 'RSS'
ORDER BY dt_maj_rum_ghm) AS ghmo
GROUP BY ids_nda_rum ),
lab AS
(SELECT ids_nda,
json_agg(lab) AS labo,
count(1) AS total
FROM
(SELECT ids_nda,
json_build_object( 'valeur_type_tr',valeur_type_tr, 'dt_fait', dt_fait, 'unite',unite, 'cd_test_lab',cd_test_lab, 'valeur_sign_tr',valeur_sign_tr, 'valeur_num_tr',valeur_num_tr, 'valeur_text_tr',valeur_text_tr, 'valeur_abnormal_tr',valeur_abnormal_tr) AS lab
FROM eds.fait_lab_tr
WHERE ids_nda IN
(SELECT ids_nda
FROM sel)
ORDER BY dt_fait) AS labo
GROUP BY ids_nda),
rum AS
( SELECT ids_nda,
json_agg(rum) AS rumo,
count(1) AS total
FROM
( SELECT ids_nda,
json_build_object( 'cd_rum',cd_rum, 'dt_deb_rum', dt_deb_rum, 'dt_fin_rum', dt_fin_rum, 'diag', json_build_object('total',diag.total,'diag',diag.diago), 'act', json_build_object('total',act.total,'act',act.acto) ) AS rum
FROM eds.nda_rum_tr
LEFT JOIN diag USING (ids_nda_rum)
LEFT JOIN act USING (ids_nda_rum)
WHERE ids_nda IN
(SELECT ids_nda
FROM sel)
AND cd_rum = 'RUM' ) AS rumo
GROUP BY ids_nda),
rss AS
( SELECT ids_nda,
json_agg(rss) AS rsso,
count(1) AS total
FROM
( SELECT ids_nda,
json_build_object( 'cd_rum',cd_rum, 'dt_deb_rss', dt_deb_rum, 'dt_fin_rss', dt_fin_rum, 'ghm', json_build_object('total',ghm.total,'ghm',ghm.ghmo), 'rum', json_build_object('total',rum.total, 'rum',rum.rumo) ) AS rss
FROM eds.nda_rum_tr
LEFT JOIN ghm USING (ids_nda_rum)
LEFT JOIN rum USING (ids_nda)
WHERE ids_nda IN
(SELECT ids_nda
FROM sel)
AND cd_rum = 'RSS' ) AS rss
GROUP BY ids_nda),
enc AS
(SELECT 'Encounter' AS "resourceType",
cd_nda AS "identifier",
duree_hospit AS "length",
lib_statut_nda_tr AS "status",
lib_type_nda_tr AS "type",
ids_pat,
json_build_object('start', dt_deb_nda,'end', dt_fin_nda) AS "appointment",
json_build_object('total',lab.total, 'lab',lab.labo) AS lab,
json_build_object('total',rss.total, 'rss',rss.rsso) AS rss
FROM eds.nda_tr
LEFT JOIN lab USING (ids_nda)
LEFT JOIN rss USING (ids_nda)
WHERE ids_nda IN
(SELECT ids_nda
FROM sel)
ORDER BY dt_deb_nda ASC)
SELECT 'Bundle' AS "resourceType",
count(1) AS total,
array_to_json(array_agg(ROW)) AS encounter
FROM
(SELECT 'Patient' AS "resourceType",
ipp AS "identifier",
nom AS "name",
cd_sex_tr AS "gender",
dt_nais AS "birthDate",
json_build_array(enc.*) AS encounters
FROM eds.patient_tr
INNER JOIN enc USING (ids_pat) ) ROW;



Re: array size exceeds the maximum allowed (1073741823) when building a json

From
"David G. Johnston"
Date:
On Tue, Jun 7, 2016 at 7:44 AM, Nicolas Paris <niparisco@gmail.com> wrote:
Hello,

I run a query transforming huge tables to a json document based on a period.
It works great for a modest period (little dataset).
However, when increasing the period (huge dataset) I get this error:
SQL ERROR[54000]
ERROR: array size exceeds the maximum allowed (1073741823)


​Maximum Field Size: 1 GB​

​It doesn't matter that the data never actually is placed into a physical table.

David J.

Re: array size exceeds the maximum allowed (1073741823) when building a json

From
Nicolas Paris
Date:
2016-06-07 14:31 GMT+02:00 David G. Johnston <david.g.johnston@gmail.com>:
On Tue, Jun 7, 2016 at 7:44 AM, Nicolas Paris <niparisco@gmail.com> wrote:
Hello,

I run a query transforming huge tables to a json document based on a period.
It works great for a modest period (little dataset).
However, when increasing the period (huge dataset) I get this error:
SQL ERROR[54000]
ERROR: array size exceeds the maximum allowed (1073741823)


​Maximum Field Size: 1 GB​

It means a json cannot exceed 1GB in postgresql, right ?
Then I must build it with an external tool ?
 

​It doesn't matter that the data never actually is placed into a physical table.

David J.


Re: array size exceeds the maximum allowed (1073741823) when building a json

From
"David G. Johnston"
Date:
On Tue, Jun 7, 2016 at 8:36 AM, Nicolas Paris <niparisco@gmail.com> wrote:
2016-06-07 14:31 GMT+02:00 David G. Johnston <david.g.johnston@gmail.com>:
On Tue, Jun 7, 2016 at 7:44 AM, Nicolas Paris <niparisco@gmail.com> wrote:
Hello,

I run a query transforming huge tables to a json document based on a period.
It works great for a modest period (little dataset).
However, when increasing the period (huge dataset) I get this error:
SQL ERROR[54000]
ERROR: array size exceeds the maximum allowed (1073741823)


​Maximum Field Size: 1 GB​

It means a json cannot exceed 1GB in postgresql, right ?

​Yes​
 
Then I must build it with an external tool ?
 

​​You have to do something different.  Using multiple columns and/or multiple rows might we workable.

David J.

Re: array size exceeds the maximum allowed (1073741823) when building a json

From
Nicolas Paris
Date:


2016-06-07 14:39 GMT+02:00 David G. Johnston <david.g.johnston@gmail.com>:
On Tue, Jun 7, 2016 at 8:36 AM, Nicolas Paris <niparisco@gmail.com> wrote:
2016-06-07 14:31 GMT+02:00 David G. Johnston <david.g.johnston@gmail.com>:
On Tue, Jun 7, 2016 at 7:44 AM, Nicolas Paris <niparisco@gmail.com> wrote:
Hello,

I run a query transforming huge tables to a json document based on a period.
It works great for a modest period (little dataset).
However, when increasing the period (huge dataset) I get this error:
SQL ERROR[54000]
ERROR: array size exceeds the maximum allowed (1073741823)


​Maximum Field Size: 1 GB​

It means a json cannot exceed 1GB in postgresql, right ?

​Yes​
 
Then I must build it with an external tool ?
 

​​You have to do something different.  Using multiple columns and/or multiple rows might we workable.

​Certainly. Kind of disappointing, because I won't find any json builder as performant as postgresql.​
 


Will this 1GO restriction is supposed to increase in a near future ?​


David J.


Re: array size exceeds the maximum allowed (1073741823) when building a json

From
"David G. Johnston"
Date:
On Tue, Jun 7, 2016 at 8:42 AM, Nicolas Paris <niparisco@gmail.com> wrote:


2016-06-07 14:39 GMT+02:00 David G. Johnston <david.g.johnston@gmail.com>:
On Tue, Jun 7, 2016 at 8:36 AM, Nicolas Paris <niparisco@gmail.com> wrote:
2016-06-07 14:31 GMT+02:00 David G. Johnston <david.g.johnston@gmail.com>:
On Tue, Jun 7, 2016 at 7:44 AM, Nicolas Paris <niparisco@gmail.com> wrote:
Hello,

I run a query transforming huge tables to a json document based on a period.
It works great for a modest period (little dataset).
However, when increasing the period (huge dataset) I get this error:
SQL ERROR[54000]
ERROR: array size exceeds the maximum allowed (1073741823)


​Maximum Field Size: 1 GB​

It means a json cannot exceed 1GB in postgresql, right ?

​Yes​
 
Then I must build it with an external tool ?
 

​​You have to do something different.  Using multiple columns and/or multiple rows might we workable.

​Certainly. Kind of disappointing, because I won't find any json builder as performant as postgresql.​
 


Will this 1GO restriction is supposed to increase in a near future ?​


There has been zero chatter on the public lists about increasing any of the limits on that page I linked to.

David J.
 
On 06/07/2016 08:42 AM, Nicolas Paris wrote:
>     ​​You have to do something different.  Using multiple columns and/or
>     multiple rows might we workable.
>
>
> ​Certainly. Kind of disappointing, because I won't find any json builder
> as performant as postgresql.​

That's nice to hear.

> Will this 1GO restriction is supposed to increase in a near future ?​

Not planned, no.  Thing is, that's the limit for a field in general, not
just JSON; changing it would be a fairly large patch.  It's desireable,
but AFAIK nobody is working on it.

--
--
Josh Berkus
Red Hat OSAS
(any opinions are my own)


Re: array size exceeds the maximum allowed (1073741823) when building a json

From
Nicolas Paris
Date:


2016-06-07 15:03 GMT+02:00 Josh Berkus <josh@agliodbs.com>:
On 06/07/2016 08:42 AM, Nicolas Paris wrote:
>     ​​You have to do something different.  Using multiple columns and/or
>     multiple rows might we workable.

​Getting a unique document from multiple rows coming from postgresql is not that easy... The external tools considers each postgresql JSON fields as strings or have to parse it again. Parsing them would add an overhead on the external tool, and I d'say this would be better to build the entire JSON in the external tool. This leads not to use postgresql JSON builder at all, and delegate this job to a tool that is able to deal with > 1GO documents.

 
>
>
> ​Certainly. Kind of disappointing, because I won't find any json builder
> as performant as postgresql.​

That's nice to hear.

> Will this 1GO restriction is supposed to increase in a near future ?​

Not planned, no.  Thing is, that's the limit for a field in general, not
just JSON; changing it would be a fairly large patch.  It's desireable,
but AFAIK nobody is working on it.

Comparing to mongoDB 16MO document limitation 1GO is great (http://tech.tulentsev.com/2014/02/limitations-of-mongodb/)​. But for my use case this is not sufficient.



--
--
Josh Berkus
Red Hat OSAS
(any opinions are my own)

Re: array size exceeds the maximum allowed (1073741823) when building a json

From
Michael Paquier
Date:
On Tue, Jun 7, 2016 at 10:03 PM, Josh Berkus <josh@agliodbs.com> wrote:
> On 06/07/2016 08:42 AM, Nicolas Paris wrote:
>>     You have to do something different.  Using multiple columns and/or
>>     multiple rows might we workable.
>>
>>
>> Certainly. Kind of disappointing, because I won't find any json builder
>> as performant as postgresql.
>
> That's nice to hear.
>
>> Will this 1GO restriction is supposed to increase in a near future ?
>
> Not planned, no.  Thing is, that's the limit for a field in general, not
> just JSON; changing it would be a fairly large patch.  It's desireable,
> but AFAIK nobody is working on it.

And there are other things to consider on top of that, like the
maximum allocation size for palloc, the maximum query string size,
COPY, etc. This is no small project, and the potential side-effects
should not be underestimated.
--
Michael


Michael Paquier <michael.paquier@gmail.com> writes:
> On Tue, Jun 7, 2016 at 10:03 PM, Josh Berkus <josh@agliodbs.com> wrote:
>> On 06/07/2016 08:42 AM, Nicolas Paris wrote:
>>> Will this 1GO restriction is supposed to increase in a near future ?

>> Not planned, no.  Thing is, that's the limit for a field in general, not
>> just JSON; changing it would be a fairly large patch.  It's desireable,
>> but AFAIK nobody is working on it.

> And there are other things to consider on top of that, like the
> maximum allocation size for palloc, the maximum query string size,
> COPY, etc. This is no small project, and the potential side-effects
> should not be underestimated.

It's also fair to doubt that client-side code would "just work" with
no functionality or performance problems for such large values.

I await with interest the OP's results on other JSON processors that
have no issues with GB-sized JSON strings.

            regards, tom lane


Re: array size exceeds the maximum allowed (1073741823) when building a json

From
Merlin Moncure
Date:
On Wed, Jun 8, 2016 at 1:04 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Michael Paquier <michael.paquier@gmail.com> writes:
>> On Tue, Jun 7, 2016 at 10:03 PM, Josh Berkus <josh@agliodbs.com> wrote:
>>> On 06/07/2016 08:42 AM, Nicolas Paris wrote:
>>>> Will this 1GO restriction is supposed to increase in a near future ?
>
>>> Not planned, no.  Thing is, that's the limit for a field in general, not
>>> just JSON; changing it would be a fairly large patch.  It's desireable,
>>> but AFAIK nobody is working on it.
>
>> And there are other things to consider on top of that, like the
>> maximum allocation size for palloc, the maximum query string size,
>> COPY, etc. This is no small project, and the potential side-effects
>> should not be underestimated.
>
> It's also fair to doubt that client-side code would "just work" with
> no functionality or performance problems for such large values.
>
> I await with interest the OP's results on other JSON processors that
> have no issues with GB-sized JSON strings.

Yup.  Most json libraries and tools are going to be disgusting memory
hogs or have exponential behaviors especially when you consider you
are doing the transformation as well.  Just prettifying json documents
over 1GB can be a real challenge.

Fortunately the workaround here is pretty easy.  Keep your query
exactly as is but remove the final aggregation step so that it returns
a set. Next, make a small application that runs this query and does
the array bits around each row (basically prepending the final result
with [ appending the final result with ] and putting , between rows).
It's essential that you use a client library that does not buffer the
entire result in memory before emitting results.   This can be done in
psql (FETCH mode), java, libpq (single row mode), etc.   I suspect
node.js pg module can do this as well, and there certainty will be
others.

The basic objective is you want the rows to be streamed out of the
database without being buffered.  If you do that, you should be able
to stream arbitrarily large datasets out of the database to a json
document assuming the server can produce the query.

merlin


Re: array size exceeds the maximum allowed (1073741823) when building a json

From
Nicolas Paris
Date:


2016-06-09 15:31 GMT+02:00 Merlin Moncure <mmoncure@gmail.com>:
On Wed, Jun 8, 2016 at 1:04 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Michael Paquier <michael.paquier@gmail.com> writes:
>> On Tue, Jun 7, 2016 at 10:03 PM, Josh Berkus <josh@agliodbs.com> wrote:
>>> On 06/07/2016 08:42 AM, Nicolas Paris wrote:
>>>> Will this 1GO restriction is supposed to increase in a near future ?
>
>>> Not planned, no.  Thing is, that's the limit for a field in general, not
>>> just JSON; changing it would be a fairly large patch.  It's desireable,
>>> but AFAIK nobody is working on it.
>
>> And there are other things to consider on top of that, like the
>> maximum allocation size for palloc, the maximum query string size,
>> COPY, etc. This is no small project, and the potential side-effects
>> should not be underestimated.
>
> It's also fair to doubt that client-side code would "just work" with
> no functionality or performance problems for such large values.
>
> I await with interest the OP's results on other JSON processors that
> have no issues with GB-sized JSON strings.

Yup.  Most json libraries and tools are going to be disgusting memory
hogs or have exponential behaviors especially when you consider you
are doing the transformation as well.  Just prettifying json documents
over 1GB can be a real challenge.

Fortunately the workaround here is pretty easy.  Keep your query
exactly as is but remove the final aggregation step so that it returns
a set. Next, make a small application that runs this query and does
the array bits around each row (basically prepending the final result
with [ appending the final result with ] and putting , between rows).

​The point is when prepending/appending leads to deal with strings.
Transforming each value of the resultset to a string implies to escape the double quote.
then:
row1 contains {"hello":"world"}
step 1 = prepend -> "[{\"hello\":\"world\"}"
step 2 = append -> "[{\"hello\":\"world\"},"
and so on
the json is corrupted. Hopelly I am sure I am on a wrong way about that.

 
It's essential that you use a client library that does not buffer the
entire result in memory before emitting results.   This can be done in
psql (FETCH mode), java, libpq (single row mode), etc.   I suspect
node.js pg module can do this as well, and there certainty will be
others.

The basic objective is you want the rows to be streamed out of the
database without being buffered.  If you do that, you should be able
to stream arbitrarily large datasets out of the database to a json
document assuming the server can produce the query.

merlin

Re: array size exceeds the maximum allowed (1073741823) when building a json

From
Merlin Moncure
Date:
On Thu, Jun 9, 2016 at 8:43 AM, Nicolas Paris <niparisco@gmail.com> wrote:
>
>
> 2016-06-09 15:31 GMT+02:00 Merlin Moncure <mmoncure@gmail.com>:
>>
>> On Wed, Jun 8, 2016 at 1:04 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> > Michael Paquier <michael.paquier@gmail.com> writes:
>> >> On Tue, Jun 7, 2016 at 10:03 PM, Josh Berkus <josh@agliodbs.com> wrote:
>> >>> On 06/07/2016 08:42 AM, Nicolas Paris wrote:
>> >>>> Will this 1GO restriction is supposed to increase in a near future ?
>> >
>> >>> Not planned, no.  Thing is, that's the limit for a field in general,
>> >>> not
>> >>> just JSON; changing it would be a fairly large patch.  It's
>> >>> desireable,
>> >>> but AFAIK nobody is working on it.
>> >
>> >> And there are other things to consider on top of that, like the
>> >> maximum allocation size for palloc, the maximum query string size,
>> >> COPY, etc. This is no small project, and the potential side-effects
>> >> should not be underestimated.
>> >
>> > It's also fair to doubt that client-side code would "just work" with
>> > no functionality or performance problems for such large values.
>> >
>> > I await with interest the OP's results on other JSON processors that
>> > have no issues with GB-sized JSON strings.
>>
>> Yup.  Most json libraries and tools are going to be disgusting memory
>> hogs or have exponential behaviors especially when you consider you
>> are doing the transformation as well.  Just prettifying json documents
>> over 1GB can be a real challenge.
>>
>> Fortunately the workaround here is pretty easy.  Keep your query
>> exactly as is but remove the final aggregation step so that it returns
>> a set. Next, make a small application that runs this query and does
>> the array bits around each row (basically prepending the final result
>> with [ appending the final result with ] and putting , between rows).
>
>
> The point is when prepending/appending leads to deal with strings.
> Transforming each value of the resultset to a string implies to escape the
> double quote.
> then:
> row1 contains {"hello":"world"}
> step 1 = prepend -> "[{\"hello\":\"world\"}"
> step 2 = append -> "[{\"hello\":\"world\"},"

right 3 rows contain {"hello":"world"}

before iteration: emit '['
before every row except the first, prepend ','
after iteration: emit ']'

you end up with:
[{"hello":"world"}
,{"hello":"world"}
,{"hello":"world"}]

...which is 100% valid json as long as each row of the set is a json object.

in SQL, the technique is like this:
select ('[' || string_agg(j::text, ',') || ']')::json from (select
json_build_object('hello', 'world') j from generate_series(1,3)) q;

the difference is, instead of having the database do the string_agg
step, it's handled on the client during iteration over the output of
generate_series.

merlin