Re: Optimize planner memory consumption for huge arrays - Mailing list pgsql-hackers
From | Lepikhov Andrei |
---|---|
Subject | Re: Optimize planner memory consumption for huge arrays |
Date | |
Msg-id | 1108a71a-e65e-41a5-81ab-beccc80c3628@app.fastmail.com Whole thread Raw |
In response to | Re: Optimize planner memory consumption for huge arrays (Ashutosh Bapat <ashutosh.bapat.oss@gmail.com>) |
Responses |
Re: Optimize planner memory consumption for huge arrays
|
List | pgsql-hackers |
On Wed, Sep 6, 2023, at 8:09 PM, Ashutosh Bapat wrote: > Hi Lepikhov, > > Thanks for using my patch and I am glad that you found it useful. > > On Mon, Sep 4, 2023 at 10:56 AM Lepikhov Andrei > <a.lepikhov@postgrespro.ru> wrote: >> >> Hi, hackers, >> >> Looking at the planner behaviour with the memory consumption patch [1], I figured out that arrays increase memory consumptionby the optimizer significantly. See init.sql in attachment. >> The point here is that the planner does small memory allocations for each element during estimation. As a result, it lookslike the planner consumes about 250 bytes for each integer element. > > I guess the numbers you mentioned in init.sql are total memory used by > the planner (as reported by the patch in the thread) when planning > that query and not memory consumed by Const nodes themselves. Am I > right? I think the measurements need to be explained better and also > the realistic scenario you are trying to oprimize. Yes, it is the total memory consumed by the planner - I used the numbers generated by your patch [1]. I had been increasingthe number of elements in the array to exclude the memory consumed by the planner for other purposes. As you cansee, the array with 1 element consumes 12kB of memory, 1E4 elements - 2.6 MB. All of that memory increment is relatedto the only enlargement of this array. (2600-12)/10 = 260 bytes. So, I make a conclusion: each 4-byte element producesa consumption of 260 bytes of memory. This scenario I obtained from the user complaint - they had strict restrictions on memory usage and were stuck in this unusualmemory usage case. > I guess, the reason you think that partitioning will increase the > memory consumed is because each partition will have the clause > translated for it. Selectivity estimation for each partition will > create those many Const nodes and hence consume memory. Am I right? Yes. > Can you please measure the memory consumed with and without your > patch. Done. See test case and results in 'init_parts.sql' in attachment. Short summary below. I varied a number of elements from1 to 10000 and partitions from 1 to 100. As you can see, partitioning adds a lot of memory consumption by itself. Butwe see an effect from patch also. master: elems 1 1E1 1E2 1E3 1E4 parts 1 28kB 50kB 0.3MB 2.5MB 25MB 10 45kB 143kB 0.6MB 4.8MB 47MB 100 208kB 125kB 3.3MB 27MB 274MB patched: elems 1 1E1 1E2 1E3 1E4 parts 1 28kB 48kB 0.25MB 2.2MB 22.8MB 10 44kB 100kB 313kB 2.4MB 23.7MB 100 208kB 101kB 0.9MB 3.7MB 32.4MB Just for comparison, without partitioning: elems 1 1E1 1E2 1E3 1E4 master: 12kB 14kB 37kB 266kB 2.5MB patched: 12kB 11.5kB 13kB 24kB 141kB >> It is maybe not a problem most of the time. However, in the case of partitions, memory consumption multiplies by eachpartition. Such a corner case looks weird, but the fix is simple. So, why not? > > With vectorized operations becoming a norm these days, it's possible > to have thousands of element in array of an ANY or IN clause. Also > will be common to have thousands of partitions. But I think what we > need to do here is to write a selectivity estimation function which > takes an const array and return selectivity without requiring to > create a Const node for each element. Maybe you're right. Could you show any examples of vectorized usage of postgres to understand your idea more clearly? Here I propose only quick simple solution. I don't think it would change the way of development. >> The diff in the attachment is proof of concept showing how to reduce wasting of memory. Having benchmarked a bit, I didn'tfind any overhead. >> > > You might want to include your benchmarking results as well. Here is nothing interesting. pgbench TPS and planning time for the cases above doesn't change planning time. [1] Report planning memory in EXPLAIN ANALYZE -- Regards, Andrei Lepikhov
Attachment
pgsql-hackers by date: