Re: Avoiding memory leakage in jsonpath evaluation - Mailing list pgsql-hackers
| From | Chao Li |
|---|---|
| Subject | Re: Avoiding memory leakage in jsonpath evaluation |
| Date | |
| Msg-id | 218409A9-A9C9-47A4-9D3C-4D001DA5670A@gmail.com Whole thread Raw |
| In response to | Re: Avoiding memory leakage in jsonpath evaluation (Tom Lane <tgl@sss.pgh.pa.us>) |
| Responses |
Re: Avoiding memory leakage in jsonpath evaluation
|
| List | pgsql-hackers |
> On Mar 19, 2026, at 00:01, Tom Lane <tgl@sss.pgh.pa.us> wrote: > > Chao Li <li.evan.chao@gmail.com> writes: >>> On Mar 18, 2026, at 05:33, Tom Lane <tgl@sss.pgh.pa.us> wrote: >>> I got an off-list report that a query like this consumes >>> an unreasonable amount of memory: >>> SELECT jsonb_path_query((SELECT jsonb_agg(i) FROM generate_series(1,10000) i), >>> '$[*] ? (@ < $)'); > >> This patch looks like a big win. It not only saves memory, but also makes the query much faster. > > Thanks for looking at it! > >> I tested the query on my MacBook M4, increasing the iteration count from 10000 to 50000. > > Hmm, at that point you were probably mostly measuring how fast the > machine could swap :-(. I had been testing on Linux x86_64, but when > I tried it on macOS just now, the unpatched memory consumption seemed > even worse than on Linux. My MacBook has only 32GB of RAM, while the query needed far more than that, so the system spent a lot of time swapping andthe CPUs could not run at full speed. That really highlights how valuable the memory savings from this patch are. Actually, I also tested with 10000, where memory pressure was not severe enough to overwhelm RAM, and the patched versionwas still faster there. On master: ``` evantest=# SELECT jsonb_path_query((SELECT jsonb_agg(i) FROM generate_series(1,10000) i), '$[*] ? (@ < $)'); Time: 1272.900 ms (00:01.273) evantest=# SELECT jsonb_path_query((SELECT jsonb_agg(i) FROM generate_series(1,10000) i), '$[*] ? (@ < $)'); Time: 1265.017 ms (00:01.265) evantest=# SELECT jsonb_path_query((SELECT jsonb_agg(i) FROM generate_series(1,10000) i), '$[*] ? (@ < $)'); Time: 1160.833 ms (00:01.161) ``` Patched: ``` evantest=# SELECT jsonb_path_query((SELECT jsonb_agg(i) FROM generate_series(1,10000) i), '$[*] ? (@ < $)'); Time: 794.270 ms evantest=# SELECT jsonb_path_query((SELECT jsonb_agg(i) FROM generate_series(1,10000) i), '$[*] ? (@ < $)'); Time: 779.556 ms evantest=# SELECT jsonb_path_query((SELECT jsonb_agg(i) FROM generate_series(1,10000) i), '$[*] ? (@ < $)'); Time: 769.634 ms ``` > >> After reviewing the patch, I thought JsonValueListLength() might be >> worth optimizing, since it is O(n). > > Well, it's really O(log N) given that we doubled the chunk size each > time we added a chunk. I had also thought about tracking the total > number of items, but concluded that incrementing an additional counter > during each item addition couldn't possibly get repaid given that > amortization and the fact that we don't compute the list length more > than once in any operation. When I said O(n), I was treating “n” as the length of the chunk list, whereas you were treating “n” as the total number ofvalues. Yes, I agree your interpretation makes more sense here. > > However, looking at it again today, I realized that no caller actually > needs to know the total length of a long JsonValueList. They only > really care whether the list has zero, one, or more than one member. > And we can determine that just by looking at the base chunk. I had noticed that the total count wasn’t really used either, but since you added it, I assumed you had some possible futureuse for it, so I didn’t mention it. > > Hence, v2 attached. 0001 is the same as before, and 0002 removes > JsonValueListLength in favor of some constant-time tests. (I'd > plan to squash these to one commit in the end, but it seemed easier > to review if I present the delta-from-yesterday separately.) > > BTW, I don't love the function name JsonValueListIsMultiple, but if > there's a common term analogous to "singleton" but describing sets > with more than one member, I don't know it. > Maybe JsonValueListHasMultiple? If it were me, I might name them like this: * JsonValueListIsEmpty * JsonValueListHasOneItem * JsonValueListHasMultipleItems Those may not look as elegant, but at least to me, as a non-English speaker, they feel more direct and easier to understand. One nitpick on 0002 is that, these three functions don’t modify jvl, so the parameter could be made const. Best regards, -- Chao Li (Evan) HighGo Software Co., Ltd. https://www.highgo.com/
pgsql-hackers by date: