Home > mailing lists

Re: [BUGS] json(b)_array_elements use causes very large memory usagewhen also referencing entire json document - Mailing list pgsql-bugs

From	Lucas Fairchild-Madar
Subject	Re: [BUGS] json(b)_array_elements use causes very large memory usagewhen also referencing entire json document
Date	October 7, 2017 02:03:52
Msg-id	CAJmoq7PdrJFteGWtQFyb6fBzbZft2mkG50=RjZNOtGyNRW+=5g@mail.gmail.com Whole thread Raw
In response to	Re: [BUGS] json(b)_array_elements use causes very large memory usage when also referencing entire json document (Tom Lane <tgl@sss.pgh.pa.us>)
Responses	Re: [BUGS] json(b)_array_elements use causes very large memory usage when also referencing entire json document
List	pgsql-bugs

Tree view

On Fri, Oct 6, 2017 at 12:37 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Lucas Fairchild-Madar <lucas.madar@gmail.com> writes:
> Sure, here's a bash script. Assumes you have PGDATABASE, etc set.
> Verified this also blows up on pg 10. I haven't verified the new patch.

Oh, I see the problem. This is a different animal, because it's actually
an intra-row leak, as it were. What you've got is

select data->'id', jsonb_array_elements(data->'items') from kaboom;

where the SRF jsonb_array_elements() emits a lot of values. For each
of those values, data->'id' gets evaluated over again, and we can't
reclaim memory in the per-tuple context until we've finished the whole
cycle for the current row of "kaboom". So a leak would occur in any
case ... but it's particularly awful in this case, because data->'id'
involves detoasting the rather wide value of "data", which is then
promptly leaked. So the total memory consumption is more or less
proportional to O(N^2) in the length of "data".

This has been like this since forever, and it's probably impractical
to do anything about it pre-v10, given the unstructured way that
targetlist SRFs are handled. You could dodge the problem by moving
the SRF to a lateral FROM item:

select data->'id', ja
from kaboom, lateral jsonb_array_elements(data->'items') as ja;

(The LATERAL keyword is optional here, but I like it because it
makes it clearer what's happening.)

As of v10, it might be possible to fix this for the tlist case
as well, by doing something like using a separate short-lived
context for the non-SRF tlist items.

Is there any sort of setting right now that can defend against this? A way to prevent a query from using 20+GB of memory? I'd prefer the query fail before the database system is kill -9'd.

pgsql-bugs by date:

From: Tom Lane
Date: 07 October 2017, 01:37:03
Subject: Re: [BUGS] json(b)_array_elements use causes very large memory usage when also referencing entire json document

From: Tom Lane
Date: 07 October 2017, 02:13:22
Subject: Re: [BUGS] json(b)_array_elements use causes very large memory usage when also referencing entire json document

Re: [BUGS] json(b)_array_elements use causes very large memory usagewhen also referencing entire json document - Mailing list pgsql-bugs

Previous

Next