Re: Avoiding memory leakage in jsonpath evaluation - Mailing list pgsql-hackers

From Chao Li
Subject Re: Avoiding memory leakage in jsonpath evaluation
Date
Msg-id D311D367-8313-4288-8272-CFAD67ED7CEA@gmail.com
Whole thread Raw
In response to Re: Avoiding memory leakage in jsonpath evaluation  (Chao Li <li.evan.chao@gmail.com>)
List pgsql-hackers

> On Mar 18, 2026, at 15:52, Chao Li <li.evan.chao@gmail.com> wrote:
>
>
>
>> On Mar 18, 2026, at 05:33, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>>
>> I got an off-list report that a query like this consumes
>> an unreasonable amount of memory:
>>
>> SELECT jsonb_path_query((SELECT jsonb_agg(i) FROM generate_series(1,10000) i),
>>                       '$[*] ? (@ < $)');
>>
>> For me, that eats about 6GB by the time it's done executing.
>> If that doesn't seem like a lot to you, just add another zero to the
>> generate_series call, and then it'll be more like 600GB, because the
>> leakage is O(N^2).
>>
>> Admittedly, this isn't an especially useful query: its runtime is
>> also O(N^2), because that path expression basically requires us to
>> compare every element of the input JSON array to every other element.
>> But it's not cool that it leaks so much memory while at it.
>>
>> I poked into this and found that the leakage is entirely composed of
>> "JsonValueList"s that are built during path evaluation and then just
>> left to rot until the end of jsonb_path_query().  We can fix it by
>> being careful to free those lists on the way out of each jsonpath
>> evaluation function that creates one.  However, just doing that would
>> mean adding pfree overhead on top of palloc overhead, so I went a bit
>> further and reimplemented JsonValueList to be more compact and cheaper
>> to allocate/free.  The attached seems to be a bit faster than the
>> existing code as well as not leaking so much memory.  See the draft
>> commit message for more details.
>>
>> regards, tom lane
>
> This patch looks like a big win. It not only saves memory, but also makes the query much faster.
>
> I tested the query on my MacBook M4, increasing the iteration count from 10000 to 50000.
>
> Current master (3b4c2b9db25):
> ```
> evantest=# SELECT jsonb_path_query((SELECT jsonb_agg(i) FROM generate_series(1,50000) i), '$[*] ? (@ < $)');
> Time: 208581.771 ms (03:28.582)
> evantest=# SELECT jsonb_path_query((SELECT jsonb_agg(i) FROM generate_series(1,50000) i), '$[*] ? (@ < $)');
> Time: 217269.595 ms (03:37.270)
> ```
>
> With the patch:
> ```
> evantest=# SELECT jsonb_path_query((SELECT jsonb_agg(i) FROM generate_series(1,50000) i), '$[*] ? (@ < $)');
> Time: 18674.580 ms (00:18.675)
> evantest=# SELECT jsonb_path_query((SELECT jsonb_agg(i) FROM generate_series(1,50000) i), '$[*] ? (@ < $)');
> Time: 18889.329 ms (00:18.889)
> ```
>
> My observations were:
>
> * Before the patch, the backend process memory usage fluctuated between roughly 50GB and 145GB, while CPU usage
stayedaround 30%. 
> * With the patch, the backend process memory usage stayed stable at around 30MB, while CPU usage stayed around 100%.
>
> After reviewing the patch, I thought JsonValueListLength() might be worth optimizing, since it is O(n). I tried
addingan ntotal_items field to JsonValueList to track the total number of items, similar to the last pointer that is
onlymeaningful in the base chunk. But that did not help in my test, and I realized JsonValueListLength() is not on the
hottestpath, so I dropped that idea. 
>
> From the MacOS Instruments tool, the most expensive parts seem to be fillJsonbValue, JsonbIteratorNext,
cmp_var_common,and cmp_numerics. But those look like separate topics. 
>
> Overall, this looks like a solid patch.
>

Forgot to mention that, to run the tests, I turned off debug and assertion, and compiled with -O2.

Best regards,
--
Chao Li (Evan)
HighGo Software Co., Ltd.
https://www.highgo.com/







pgsql-hackers by date:

Previous
From: Chao Li
Date:
Subject: Re: Avoiding memory leakage in jsonpath evaluation
Next
From: Peter Smith
Date:
Subject: Re: Skipping schema changes in publication