Re: Avoiding memory leakage in jsonpath evaluation - Mailing list pgsql-hackers

From Chao Li
Subject Re: Avoiding memory leakage in jsonpath evaluation
Date
Msg-id 218409A9-A9C9-47A4-9D3C-4D001DA5670A@gmail.com
Whole thread Raw
In response to Re: Avoiding memory leakage in jsonpath evaluation  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: Avoiding memory leakage in jsonpath evaluation
List pgsql-hackers

> On Mar 19, 2026, at 00:01, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>
> Chao Li <li.evan.chao@gmail.com> writes:
>>> On Mar 18, 2026, at 05:33, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>>> I got an off-list report that a query like this consumes
>>> an unreasonable amount of memory:
>>> SELECT jsonb_path_query((SELECT jsonb_agg(i) FROM generate_series(1,10000) i),
>>> '$[*] ? (@ < $)');
>
>> This patch looks like a big win. It not only saves memory, but also makes the query much faster.
>
> Thanks for looking at it!
>
>> I tested the query on my MacBook M4, increasing the iteration count from 10000 to 50000.
>
> Hmm, at that point you were probably mostly measuring how fast the
> machine could swap :-(.  I had been testing on Linux x86_64, but when
> I tried it on macOS just now, the unpatched memory consumption seemed
> even worse than on Linux.

My MacBook has only 32GB of RAM, while the query needed far more than that, so the system spent a lot of time swapping
andthe CPUs could not run at full speed. That really highlights how valuable the memory savings from this patch are. 

Actually, I also tested with 10000, where memory pressure was not severe enough to overwhelm RAM, and the patched
versionwas still faster there. 

On master:
```
evantest=# SELECT jsonb_path_query((SELECT jsonb_agg(i) FROM generate_series(1,10000) i), '$[*] ? (@ < $)');
Time: 1272.900 ms (00:01.273)
evantest=# SELECT jsonb_path_query((SELECT jsonb_agg(i) FROM generate_series(1,10000) i), '$[*] ? (@ < $)');
Time: 1265.017 ms (00:01.265)
evantest=# SELECT jsonb_path_query((SELECT jsonb_agg(i) FROM generate_series(1,10000) i), '$[*] ? (@ < $)');
Time: 1160.833 ms (00:01.161)
```

Patched:
```
evantest=# SELECT jsonb_path_query((SELECT jsonb_agg(i) FROM generate_series(1,10000) i), '$[*] ? (@ < $)');
Time: 794.270 ms
evantest=# SELECT jsonb_path_query((SELECT jsonb_agg(i) FROM generate_series(1,10000) i), '$[*] ? (@ < $)');
Time: 779.556 ms
evantest=# SELECT jsonb_path_query((SELECT jsonb_agg(i) FROM generate_series(1,10000) i), '$[*] ? (@ < $)');
Time: 769.634 ms
```

>
>> After reviewing the patch, I thought JsonValueListLength() might be
>> worth optimizing, since it is O(n).
>
> Well, it's really O(log N) given that we doubled the chunk size each
> time we added a chunk.  I had also thought about tracking the total
> number of items, but concluded that incrementing an additional counter
> during each item addition couldn't possibly get repaid given that
> amortization and the fact that we don't compute the list length more
> than once in any operation.

When I said O(n), I was treating “n” as the length of the chunk list, whereas you were treating “n” as the total number
ofvalues. Yes, I agree your interpretation makes more sense here. 

>
> However, looking at it again today, I realized that no caller actually
> needs to know the total length of a long JsonValueList.  They only
> really care whether the list has zero, one, or more than one member.
> And we can determine that just by looking at the base chunk.

I had noticed that the total count wasn’t really used either, but since you added it, I assumed you had some possible
futureuse for it, so I didn’t mention it. 

>
> Hence, v2 attached.  0001 is the same as before, and 0002 removes
> JsonValueListLength in favor of some constant-time tests.  (I'd
> plan to squash these to one commit in the end, but it seemed easier
> to review if I present the delta-from-yesterday separately.)
>
> BTW, I don't love the function name JsonValueListIsMultiple, but if
> there's a common term analogous to "singleton" but describing sets
> with more than one member, I don't know it.
>

Maybe JsonValueListHasMultiple?

If it were me, I might name them like this:

* JsonValueListIsEmpty
* JsonValueListHasOneItem
* JsonValueListHasMultipleItems

Those may not look as elegant, but at least to me, as a non-English speaker, they feel more direct and easier to
understand.

One nitpick on 0002 is that, these three functions don’t modify jvl, so the parameter could be made const.

Best regards,
--
Chao Li (Evan)
HighGo Software Co., Ltd.
https://www.highgo.com/







pgsql-hackers by date:

Previous
From: Robert Haas
Date:
Subject: Re: Feature freeze timezone change request
Next
From: Henson Choi
Date:
Subject: Re: Row pattern recognition