pgsql: Fix transient memory leakage in jsonpath evaluation. - Mailing list pgsql-committers

From Tom Lane
Subject pgsql: Fix transient memory leakage in jsonpath evaluation.
Date
Msg-id E1w3FRA-000Vqk-1W@gemulon.postgresql.org
Whole thread Raw
List pgsql-committers
Fix transient memory leakage in jsonpath evaluation.

This patch reimplements JsonValueList to be more space-efficient
and arranges for temporary JsonValueLists created during jsonpath
evaluation to be freed when no longer needed, rather than being
leaked till the end of the function evaluation cycle as before.

The motivation is to prevent indefinite memory bloat while
evaluating jsonpath expressions that traverse a lot of data.
As an example, this query
  SELECT
    jsonb_path_query((SELECT jsonb_agg(i) FROM generate_series(1,10000) i),
                     '$[*] ? (@ < $)');
formerly required about 6GB to execute, with the space required
growing quadratically with the length of the input array.
With this patch the memory consumption stays static.  (The time
required is still quadratic, but we can't do much about that: this
path expression asks to compare each array element to each other one.)

The bloat happens because we construct a JsonValueList containing all
the array elements to represent the second occurrence of "$", and then
just leak it after evaluating the filter expression for any one value
generated from "$[*]".  If I were implementing this functionality from
scratch I'd probably try to avoid materializing that representation at
all, but changing that now looks like more trouble than it's worth.
This patch takes the more conservative approach of just making sure
we free the list after we're done with it.

The existing representation of JsonValueList is neither especially
compact nor especially easy to free: it's a List containing pointers
to separately-palloc'd JsonbValue structs.  We could theoretically
use list_free_deep, but it's not 100% clear that all the JsonbValues
are always safe for us to free.  In any case we are talking about a
lot of palloc/pfree traffic if we keep it like this.  This patch
replaces that with what's essentially an expansible array of
JsonbValues, so that even a long list requires relatively few
palloc requests.  Also, for the very common case that only one or
two elements appear in the list, this representation uses *zero*
pallocs: the elements can be kept in the on-the-stack base struct.

Note that we are only interested in freeing the JsonbValue structs
themselves.  While many types of JsonbValue include pointers to
external data such as strings or numerics, we expect that that data
is part of the original jsonb input Datum(s) and need not (indeed
cannot) be freed here.

In this reimplementation, JsonValueListAppend() always copies the
supplied JsonbValue struct into the JsonValueList data.  This allows
simplifying and regularizing many call sites that sometimes palloc'd
JsonbValues and sometimes passed a local-variable JsonbValue.  Always
doing the latter is simpler, faster, and less bug-prone.

I also removed JsonValueListLength() in favor of constant-time tests
for whether the list has zero, one, or more than one member, which is
what the callers really need to know.  JsonValueListLength() was not
a hot code path, so this aspect of the patch won't move the needle in
the least performance-wise.  But it seems neater.

I've not done any wide-ranging performance testing, but this should
be faster than the old code thanks to reduction of palloc overhead.
On the specific example shown above, it's about twice as fast as
before on not-very-large inputs; and of course it wins big if you
consider an input large enough to drive the old code into swapping.

Author: Tom Lane <tgl@sss.pgh.pa.us>
Reviewed-by: Chao Li <li.evan.chao@gmail.com>
Discussion: https://postgr.es/m/569394.1773783211@sss.pgh.pa.us

Branch
------
master

Details
-------
https://git.postgresql.org/pg/commitdiff/5a2043bf713113b0f6e9dbf2046499a5ca67883c

Modified Files
--------------
src/backend/utils/adt/jsonpath_exec.c | 595 +++++++++++++++++++++-------------
1 file changed, 362 insertions(+), 233 deletions(-)


pgsql-committers by date:

Previous
From: Peter Eisentraut
Date:
Subject: pgsql: Add some const qualifiers enabled by typeof_unqual change on cop
Next
From: Robert Haas
Date:
Subject: pgsql: dshash: Make it possible to suppress out of memory errors