Thread: Removed extra memory allocations from create_list_bounds
Hi,
While working on [1], I observed that extra memory is allocated in 'create_list_bounds'
function which can be avoided. So the attached patch removes extra memory
allocations done inside 'create_list_bounds' function and also removes the
unused variable 'cell'.
In the existing code, in create_list_bounds(),
- It iterates through all the partitions and for each partition,
- It iterates through the list of datums named 'listdatums'.
- For each non null value of 'listdatums', it allocates a memory for 'list_value' whose type is 'PartitionListValue' and stores value and index information.
- Appends 'list_value' to a list named 'non_null_values'.
- It iterates through the list of datums named 'listdatums'.
- Allocates memory to 'all_values' variable which contains information of all the list bounds of all the partitions. The count allocated for 'all_values' is nothing but the total number of non null values which is populated from the previous step (1).
- Iterates through each item of 'non_null_values' list.
- It allocates a memory for 'all_values[i]' whose type is 'PartitionListValue' and copies the information from 'list_value'.
- It allocates a memory for 'all_values[i]' whose type is 'PartitionListValue' and copies the information from 'list_value'.
The above logic is changed to following,
- Call function 'get_non_null_count_list_bounds()' which iterates through all the partitions and for each partition, it iterates through a list of datums and calculates the count of all non null bound values.
- Allocates memory to 'all_values' variable which contains information of all the list bounds of all the partitions. The count allocated for 'all_values' is nothing but the total number of non null values which is populated from the previous step (1).
- Iterates through all the partitions and for each partition,
- It iterates through the list of datums named 'listdatums'.
- For each non null value of 'listdatums', it allocates a memory for 'all_values[i]' whose type is 'PartitionListValue' and stores value and index information directly.
- It iterates through the list of datums named 'listdatums'.
The above fix, removes the extra memory allocations. Let's consider an example.
If there are 10 partitions and each partition contains 11 bounds including NULL value.
Parameters | Existing code | With patch |
Memory allocation of 'PartitionListValue' | 100+100 = 200 times | 100 times |
Total number of iterations | 110 + 100 = 210 | 110 + 110 = 220 |
As we can see in the above data, the total number of iterations are increased slightly
(When it contains NULL values. Otherwise no change) but it improves in case of
memory allocations. As memory allocations are costly operations, I feel we should
consider changing the existing code.
Please share your thoughts.
Thanks & Regards,
Nitin Jadhav
Attachment
On Sat, May 15, 2021 at 02:40:45PM +0530, Nitin Jadhav wrote: > While working on [1], I observed that extra memory is allocated in > [1] https://mail.google.com/mail/u/2/#search/multi+column+list/KtbxLxgZZTjRxNrBWvmHzDTHXCHLssSprg?compose=CllgCHrjDqKgWCBNMmLqhzKhmrvHhSRlRVZxPCVcLkLmFQwrccpTpqLNgbWqKkTkTFCHMtZjWnV This is a link to your gmail, not to anything public. If it's worth counting list elements in advance, then you can also allocate the PartitionListValue as a single chunk, rather than palloc in a loop. This may help large partition heirarchies. And the same thing in create_hash_bounds with hbounds. create_range_bounds() already doesn't call palloc in a loop. However, then there's an asymmetry in create_range_bounds, which is still takes a double-indirect pointer. I'm not able to detect that this is saving more than about ~1% less RAM, to create or select from 1000 partitions, probably because other data structures are using much more, and savings here are relatively small. I'm going to add to the next CF. You can add yourself as an author, and watch that the patch passes tests in cfbot. https://commitfest.postgresql.org/ http://cfbot.cputube.org/ Thanks, -- Justin
Attachment
- 0001-Removed-extra-memory-allocations-from-create_list_bo.patch
- 0002-allocate-the-PartitionListValue-as-a-single-chunk.patch
- 0003-Same-for-create_hash_bounds-PartitionHashBound.patch
- 0004-Allocate-datum-arrays-in-bulk-to-avoid-palloc-overhe.patch
- 0005-create_range_bounds-pfree-intermediate-results.patch
On Sun, May 16, 2021 at 10:00 AM Justin Pryzby <pryzby@telsasoft.com> wrote:
On Sat, May 15, 2021 at 02:40:45PM +0530, Nitin Jadhav wrote:
> While working on [1], I observed that extra memory is allocated in
> [1] https://mail.google.com/mail/u/2/#search/multi+column+list/KtbxLxgZZTjRxNrBWvmHzDTHXCHLssSprg?compose=CllgCHrjDqKgWCBNMmLqhzKhmrvHhSRlRVZxPCVcLkLmFQwrccpTpqLNgbWqKkTkTFCHMtZjWnV
This is a link to your gmail, not to anything public.
If it's worth counting list elements in advance, then you can also allocate the
PartitionListValue as a single chunk, rather than palloc in a loop.
This may help large partition heirarchies.
And the same thing in create_hash_bounds with hbounds.
create_range_bounds() already doesn't call palloc in a loop. However, then
there's an asymmetry in create_range_bounds, which is still takes a
double-indirect pointer.
I'm not able to detect that this is saving more than about ~1% less RAM, to
create or select from 1000 partitions, probably because other data structures
are using much more, and savings here are relatively small.
I'm going to add to the next CF. You can add yourself as an author, and watch
that the patch passes tests in cfbot.
https://commitfest.postgresql.org/
http://cfbot.cputube.org/
Thanks,
--
Justin
Hi,
For 0001-Removed-extra-memory-allocations-from-create_list_bo.patch :
+get_non_null_count_list_bounds(PartitionBoundSpec **boundspecs, int nparts)
Since the function returns the total number of non null bound values, should it be named get_non_null_list_bounds_count ?
It can also be named get_count_of_... but that's longer.
+ all_values = (PartitionListValue **)
+ palloc(ndatums * sizeof(PartitionListValue *));
+ palloc(ndatums * sizeof(PartitionListValue *));
The palloc() call would take place even if ndatums is 0. I think in that case, palloc() doesn't need to be called.
Cheers
> > While working on [1], I observed that extra memory is allocated in > > [1] https://mail.google.com/mail/u/2/#search/multi+column+list/KtbxLxgZZTjRxNrBWvmHzDTHXCHLssSprg?compose=CllgCHrjDqKgWCBNMmLqhzKhmrvHhSRlRVZxPCVcLkLmFQwrccpTpqLNgbWqKkTkTFCHMtZjWnV I am really sorry for this. I wanted to point to the thread subjected to 'Multi-Column List Partitioning'. > If it's worth counting list elements in advance, then you can also allocate the > PartitionListValue as a single chunk, rather than palloc in a loop. > This may help large partition heirarchies. > > And the same thing in create_hash_bounds with hbounds. I agree and thanks for creating those patches. I am not able to apply the patch on the latest HEAD. Kindly check and upload the modified patches. > I'm not able to detect that this is saving more than about ~1% less RAM, to > create or select from 1000 partitions, probably because other data structures > are using much more, and savings here are relatively small. Yes it does not save huge memory but it's an improvement. > I'm going to add to the next CF. You can add yourself as an author, and watch > that the patch passes tests in cfbot. > https://commitfest.postgresql.org/ > http://cfbot.cputube.org/ Thanks for creating the commitfest entry. > Since the function returns the total number of non null bound values, should it be named get_non_null_list_bounds_count? > It can also be named get_count_of_... but that's longer. Changed it to 'get_non_null_list_bounds_count'. > The palloc() call would take place even if ndatums is 0. I think in that case, palloc() doesn't need to be called. I feel there is no such case where the 'ndatums' is 0 because as we can see below, there is an assert in the 'partition_bounds_create' function from where we call the 'create_list_bounds' function. Kindly provide such a case if I am wrong. PartitionBoundInfo partition_bounds_create(PartitionBoundSpec **boundspecs, int nparts, PartitionKey key, int **mapping) { int i; Assert(nparts > 0); Thanks & Regards, Nitin Jadhav On Sun, May 16, 2021 at 10:52 PM Zhihong Yu <zyu@yugabyte.com> wrote: > > > > On Sun, May 16, 2021 at 10:00 AM Justin Pryzby <pryzby@telsasoft.com> wrote: >> >> On Sat, May 15, 2021 at 02:40:45PM +0530, Nitin Jadhav wrote: >> > While working on [1], I observed that extra memory is allocated in >> > [1] https://mail.google.com/mail/u/2/#search/multi+column+list/KtbxLxgZZTjRxNrBWvmHzDTHXCHLssSprg?compose=CllgCHrjDqKgWCBNMmLqhzKhmrvHhSRlRVZxPCVcLkLmFQwrccpTpqLNgbWqKkTkTFCHMtZjWnV >> >> This is a link to your gmail, not to anything public. >> >> If it's worth counting list elements in advance, then you can also allocate the >> PartitionListValue as a single chunk, rather than palloc in a loop. >> This may help large partition heirarchies. >> >> And the same thing in create_hash_bounds with hbounds. >> >> create_range_bounds() already doesn't call palloc in a loop. However, then >> there's an asymmetry in create_range_bounds, which is still takes a >> double-indirect pointer. >> >> I'm not able to detect that this is saving more than about ~1% less RAM, to >> create or select from 1000 partitions, probably because other data structures >> are using much more, and savings here are relatively small. >> >> I'm going to add to the next CF. You can add yourself as an author, and watch >> that the patch passes tests in cfbot. >> https://commitfest.postgresql.org/ >> http://cfbot.cputube.org/ >> >> Thanks, >> -- >> Justin > > Hi, > For 0001-Removed-extra-memory-allocations-from-create_list_bo.patch : > > +static int > +get_non_null_count_list_bounds(PartitionBoundSpec **boundspecs, int nparts) > > Since the function returns the total number of non null bound values, should it be named get_non_null_list_bounds_count? > It can also be named get_count_of_... but that's longer. > > + all_values = (PartitionListValue **) > + palloc(ndatums * sizeof(PartitionListValue *)); > > The palloc() call would take place even if ndatums is 0. I think in that case, palloc() doesn't need to be called. > > Cheers >
Attachment
On Mon, May 17, 2021 at 7:52 AM Nitin Jadhav <nitinjadhavpostgres@gmail.com> wrote:
> > While working on [1], I observed that extra memory is allocated in
> > [1] https://mail.google.com/mail/u/2/#search/multi+column+list/KtbxLxgZZTjRxNrBWvmHzDTHXCHLssSprg?compose=CllgCHrjDqKgWCBNMmLqhzKhmrvHhSRlRVZxPCVcLkLmFQwrccpTpqLNgbWqKkTkTFCHMtZjWnV
I am really sorry for this. I wanted to point to the thread subjected
to 'Multi-Column List Partitioning'.
> If it's worth counting list elements in advance, then you can also allocate the
> PartitionListValue as a single chunk, rather than palloc in a loop.
> This may help large partition heirarchies.
>
> And the same thing in create_hash_bounds with hbounds.
I agree and thanks for creating those patches. I am not able to apply
the patch on the latest HEAD. Kindly check and upload the modified
patches.
> I'm not able to detect that this is saving more than about ~1% less RAM, to
> create or select from 1000 partitions, probably because other data structures
> are using much more, and savings here are relatively small.
Yes it does not save huge memory but it's an improvement.
> I'm going to add to the next CF. You can add yourself as an author, and watch
> that the patch passes tests in cfbot.
> https://commitfest.postgresql.org/
> http://cfbot.cputube.org/
Thanks for creating the commitfest entry.
> Since the function returns the total number of non null bound values, should it be named get_non_null_list_bounds_count ?
> It can also be named get_count_of_... but that's longer.
Changed it to 'get_non_null_list_bounds_count'.
> The palloc() call would take place even if ndatums is 0. I think in that case, palloc() doesn't need to be called.
I feel there is no such case where the 'ndatums' is 0 because as we
can see below, there is an assert in the 'partition_bounds_create'
function from where we call the 'create_list_bounds' function. Kindly
provide such a case if I am wrong.
PartitionBoundInfo
partition_bounds_create(PartitionBoundSpec **boundspecs, int nparts,
PartitionKey key, int **mapping)
{
int i;
Assert(nparts > 0);
Hi,
Thanks for pointing out the assertion.
My corresponding comment can be dropped.
Cheers
Thanks & Regards,
Nitin Jadhav
On Sun, May 16, 2021 at 10:52 PM Zhihong Yu <zyu@yugabyte.com> wrote:
>
>
>
> On Sun, May 16, 2021 at 10:00 AM Justin Pryzby <pryzby@telsasoft.com> wrote:
>>
>> On Sat, May 15, 2021 at 02:40:45PM +0530, Nitin Jadhav wrote:
>> > While working on [1], I observed that extra memory is allocated in
>> > [1] https://mail.google.com/mail/u/2/#search/multi+column+list/KtbxLxgZZTjRxNrBWvmHzDTHXCHLssSprg?compose=CllgCHrjDqKgWCBNMmLqhzKhmrvHhSRlRVZxPCVcLkLmFQwrccpTpqLNgbWqKkTkTFCHMtZjWnV
>>
>> This is a link to your gmail, not to anything public.
>>
>> If it's worth counting list elements in advance, then you can also allocate the
>> PartitionListValue as a single chunk, rather than palloc in a loop.
>> This may help large partition heirarchies.
>>
>> And the same thing in create_hash_bounds with hbounds.
>>
>> create_range_bounds() already doesn't call palloc in a loop. However, then
>> there's an asymmetry in create_range_bounds, which is still takes a
>> double-indirect pointer.
>>
>> I'm not able to detect that this is saving more than about ~1% less RAM, to
>> create or select from 1000 partitions, probably because other data structures
>> are using much more, and savings here are relatively small.
>>
>> I'm going to add to the next CF. You can add yourself as an author, and watch
>> that the patch passes tests in cfbot.
>> https://commitfest.postgresql.org/
>> http://cfbot.cputube.org/
>>
>> Thanks,
>> --
>> Justin
>
> Hi,
> For 0001-Removed-extra-memory-allocations-from-create_list_bo.patch :
>
> +static int
> +get_non_null_count_list_bounds(PartitionBoundSpec **boundspecs, int nparts)
>
> Since the function returns the total number of non null bound values, should it be named get_non_null_list_bounds_count ?
> It can also be named get_count_of_... but that's longer.
>
> + all_values = (PartitionListValue **)
> + palloc(ndatums * sizeof(PartitionListValue *));
>
> The palloc() call would take place even if ndatums is 0. I think in that case, palloc() doesn't need to be called.
>
> Cheers
>
On Mon, May 17, 2021 at 08:22:25PM +0530, Nitin Jadhav wrote: > I agree and thanks for creating those patches. I am not able to apply > the patch on the latest HEAD. Kindly check and upload the modified > patches. The CFBOT had no issues with the patches, so I suspect an issue on your side. http://cfbot.cputube.org/nitin-jadhav.html -- Justin
> The CFBOT had no issues with the patches, so I suspect an issue on your side. > http://cfbot.cputube.org/nitin-jadhav.html I am getting the following error when I try to apply in my machine. $ git apply ../patches/0001-Removed-extra-memory-allocations-from-create_list_bo.patch ../patches/0001-Removed-extra-memory-allocations-from-create_list_bo.patch:18: trailing whitespace. /* ../patches/0001-Removed-extra-memory-allocations-from-create_list_bo.patch:19: trailing whitespace. * get_non_null_count_list_bounds ../patches/0001-Removed-extra-memory-allocations-from-create_list_bo.patch:20: trailing whitespace. * Calculates the total number of non null bound values of ../patches/0001-Removed-extra-memory-allocations-from-create_list_bo.patch:21: trailing whitespace. * all the partitions. ../patches/0001-Removed-extra-memory-allocations-from-create_list_bo.patch:22: trailing whitespace. */ error: patch failed: src/backend/partitioning/partbounds.c:432 error: src/backend/partitioning/partbounds.c: patch does not apply However I was able to apply it by adding '--reject --whitespace=fix'. $ git apply --reject --whitespace=fix ../patches/0001-Removed-extra-memory-allocations-from-create_list_bo.patch ../patches/0001-Removed-extra-memory-allocations-from-create_list_bo.patch:18: trailing whitespace. /* ../patches/0001-Removed-extra-memory-allocations-from-create_list_bo.patch:19: trailing whitespace. * get_non_null_count_list_bounds ../patches/0001-Removed-extra-memory-allocations-from-create_list_bo.patch:20: trailing whitespace. * Calculates the total number of non null bound values of ../patches/0001-Removed-extra-memory-allocations-from-create_list_bo.patch:21: trailing whitespace. * all the partitions. ../patches/0001-Removed-extra-memory-allocations-from-create_list_bo.patch:22: trailing whitespace. */ Checking patch src/backend/partitioning/partbounds.c... Applied patch src/backend/partitioning/partbounds.c cleanly. warning: squelched 30 whitespace errors warning: 35 lines add whitespace errors. I have rebased all the patches on top of 'v2_0001-removed_extra_mem_alloc_from_create_list_bounds.patch'. Attaching all the patches here. -- Thanks & Regards, Nitin Jadhav On Mon, May 17, 2021 at 8:33 PM Justin Pryzby <pryzby@telsasoft.com> wrote: > > On Mon, May 17, 2021 at 08:22:25PM +0530, Nitin Jadhav wrote: > > I agree and thanks for creating those patches. I am not able to apply > > the patch on the latest HEAD. Kindly check and upload the modified > > patches. > > The CFBOT had no issues with the patches, so I suspect an issue on your side. > http://cfbot.cputube.org/nitin-jadhav.html > > -- > Justin
Attachment
- v2_0001-removed_extra_mem_alloc_from_create_list_bounds.patch
- v2_0003-Same-for-create_hash_bounds-PartitionHashBound.patch
- v2_0004-Allocate-datum-arrays-in-bulk-to-avoid-palloc-overhead.patch
- v2_0005-create_range_bounds-pfree-intermediate-results.patch
- v2_0002-allocate-the-PartitionListValue-as-a-single-chunk.patch
On Tue, May 18, 2021 at 1:29 PM Nitin Jadhav <nitinjadhavpostgres@gmail.com> wrote: > > The CFBOT had no issues with the patches, so I suspect an issue on your side. > > http://cfbot.cputube.org/nitin-jadhav.html > > I am getting the following error when I try to apply in my machine. > > $ git apply ../patches/0001-Removed-extra-memory-allocations-from-create_list_bo.patch > ../patches/0001-Removed-extra-memory-allocations-from-create_list_bo.patch:18: > trailing whitespace. 'git apply' is very picky. Use 'patch -p1' to apply your patches instead. Also, use 'git diff --check' or 'git log --check' before generating patches to send, and fix any whitespace errors before submitting. I see that you have made a theoretical argument for why this should be good for performance, but it would be better to have some test results that show that it works out in practice. Sometimes things seem like they ought to be more efficient but turn out to be less efficient when they are actually tried. -- Robert Haas EDB: http://www.enterprisedb.com
> 'git apply' is very picky. Use 'patch -p1' to apply your patches instead. > > Also, use 'git diff --check' or 'git log --check' before generating > patches to send, and fix any whitespace errors before submitting. Thanks for the suggestion. I will follow these. > I see that you have made a theoretical argument for why this should be > good for performance, but it would be better to have some test results > that show that it works out in practice. Sometimes things seem like > they ought to be more efficient but turn out to be less efficient when > they are actually tried. Created a table with one column of type 'int' and partitioned by that column. Created 1 million partitions using following queries. create table t(a int) partition by list(a); select 'create table t_' || i || ' partition of t for values in (' || i || ');' from generate_series(1, 10000) i \gexec After this I tried to create 10 partitions and observed the time taken to execute. Here is the data for 'with patch'. postgres@34077=#select 'create table t_' || i || ' partition of t for postgres'# values in (' || i || ');' postgres-# from generate_series(10001, 10010) i postgres-# \gexec CREATE TABLE Time: 36.863 ms CREATE TABLE Time: 46.645 ms CREATE TABLE Time: 44.915 ms CREATE TABLE Time: 39.660 ms CREATE TABLE Time: 42.188 ms CREATE TABLE Time: 43.163 ms CREATE TABLE Time: 44.374 ms CREATE TABLE Time: 45.117 ms CREATE TABLE Time: 40.340 ms CREATE TABLE Time: 38.604 ms The data for 'without patch' looks like this. postgres@31718=#select 'create table t_' || i || ' partition of t for postgres'# values in (' || i || ');' postgres-# from generate_series(10001, 10010) i postgres-# \gexec CREATE TABLE Time: 45.917 ms CREATE TABLE Time: 46.815 ms CREATE TABLE Time: 44.180 ms CREATE TABLE Time: 48.163 ms CREATE TABLE Time: 45.884 ms CREATE TABLE Time: 48.330 ms CREATE TABLE Time: 48.614 ms CREATE TABLE Time: 48.376 ms CREATE TABLE Time: 46.380 ms CREATE TABLE Time: 48.233 ms If we observe above data, we can see the improvement with the patch. The average time taken to execute for the last 10 partitions is. With patch - 42.1869 ms Without patch - 47.0892 ms. With respect to memory usage, AFAIK the allocated memory gets cleaned during deallocation of the memory used by the memory context. So at the end of the query, we may see no difference in the memory usage but during the query execution it tries to get less memory than before. Maybe during some worst case scenario, if there is less memory available, we may see 'out of memory' errors without the patch but it may work with the patch. I have not done experiments in this angle. I am happy to do it if required. Please share your thoughts. -- Thanks & Regards, Nitin Jadhav On Tue, May 18, 2021 at 11:19 PM Robert Haas <robertmhaas@gmail.com> wrote: > > On Tue, May 18, 2021 at 1:29 PM Nitin Jadhav > <nitinjadhavpostgres@gmail.com> wrote: > > > The CFBOT had no issues with the patches, so I suspect an issue on your side. > > > http://cfbot.cputube.org/nitin-jadhav.html > > > > I am getting the following error when I try to apply in my machine. > > > > $ git apply ../patches/0001-Removed-extra-memory-allocations-from-create_list_bo.patch > > ../patches/0001-Removed-extra-memory-allocations-from-create_list_bo.patch:18: > > trailing whitespace. > > 'git apply' is very picky. Use 'patch -p1' to apply your patches instead. > > Also, use 'git diff --check' or 'git log --check' before generating > patches to send, and fix any whitespace errors before submitting. > > I see that you have made a theoretical argument for why this should be > good for performance, but it would be better to have some test results > that show that it works out in practice. Sometimes things seem like > they ought to be more efficient but turn out to be less efficient when > they are actually tried. > > -- > Robert Haas > EDB: http://www.enterprisedb.com
> Created a table with one column of type 'int' and partitioned by that > column. Created 1 million partitions using following queries. Sorry. It's not 1 million. Its 10,000 partitions. -- Thanks & Regards, Nitin Jadhav On Thu, May 20, 2021 at 12:21 AM Nitin Jadhav <nitinjadhavpostgres@gmail.com> wrote: > > > 'git apply' is very picky. Use 'patch -p1' to apply your patches instead. > > > > Also, use 'git diff --check' or 'git log --check' before generating > > patches to send, and fix any whitespace errors before submitting. > > Thanks for the suggestion. I will follow these. > > > I see that you have made a theoretical argument for why this should be > > good for performance, but it would be better to have some test results > > that show that it works out in practice. Sometimes things seem like > > they ought to be more efficient but turn out to be less efficient when > > they are actually tried. > > Created a table with one column of type 'int' and partitioned by that > column. Created 1 million partitions using following queries. > > create table t(a int) partition by list(a); > select 'create table t_' || i || ' partition of t for > values in (' || i || ');' > from generate_series(1, 10000) i > \gexec > > After this I tried to create 10 partitions and observed the time taken > to execute. Here is the data for 'with patch'. > > postgres@34077=#select 'create table t_' || i || ' partition of t for > postgres'# values in (' || i || ');' > postgres-# from generate_series(10001, 10010) i > postgres-# \gexec > CREATE TABLE > Time: 36.863 ms > CREATE TABLE > Time: 46.645 ms > CREATE TABLE > Time: 44.915 ms > CREATE TABLE > Time: 39.660 ms > CREATE TABLE > Time: 42.188 ms > CREATE TABLE > Time: 43.163 ms > CREATE TABLE > Time: 44.374 ms > CREATE TABLE > Time: 45.117 ms > CREATE TABLE > Time: 40.340 ms > CREATE TABLE > Time: 38.604 ms > > The data for 'without patch' looks like this. > > postgres@31718=#select 'create table t_' || i || ' partition of t for > postgres'# values in (' || i || ');' > postgres-# from generate_series(10001, 10010) i > postgres-# \gexec > CREATE TABLE > Time: 45.917 ms > CREATE TABLE > Time: 46.815 ms > CREATE TABLE > Time: 44.180 ms > CREATE TABLE > Time: 48.163 ms > CREATE TABLE > Time: 45.884 ms > CREATE TABLE > Time: 48.330 ms > CREATE TABLE > Time: 48.614 ms > CREATE TABLE > Time: 48.376 ms > CREATE TABLE > Time: 46.380 ms > CREATE TABLE > Time: 48.233 ms > > If we observe above data, we can see the improvement with the patch. > The average time taken to execute for the last 10 partitions is. > With patch - 42.1869 ms > Without patch - 47.0892 ms. > > With respect to memory usage, AFAIK the allocated memory gets cleaned > during deallocation of the memory used by the memory context. So at > the end of the query, we may see no difference in the memory usage but > during the query execution it tries to get less memory than before. > Maybe during some worst case scenario, if there is less memory > available, we may see 'out of memory' errors without the patch but it > may work with the patch. I have not done experiments in this angle. I > am happy to do it if required. > > Please share your thoughts. > > -- > Thanks & Regards, > Nitin Jadhav > > On Tue, May 18, 2021 at 11:19 PM Robert Haas <robertmhaas@gmail.com> wrote: > > > > On Tue, May 18, 2021 at 1:29 PM Nitin Jadhav > > <nitinjadhavpostgres@gmail.com> wrote: > > > > The CFBOT had no issues with the patches, so I suspect an issue on your side. > > > > http://cfbot.cputube.org/nitin-jadhav.html > > > > > > I am getting the following error when I try to apply in my machine. > > > > > > $ git apply ../patches/0001-Removed-extra-memory-allocations-from-create_list_bo.patch > > > ../patches/0001-Removed-extra-memory-allocations-from-create_list_bo.patch:18: > > > trailing whitespace. > > > > 'git apply' is very picky. Use 'patch -p1' to apply your patches instead. > > > > Also, use 'git diff --check' or 'git log --check' before generating > > patches to send, and fix any whitespace errors before submitting. > > > > I see that you have made a theoretical argument for why this should be > > good for performance, but it would be better to have some test results > > that show that it works out in practice. Sometimes things seem like > > they ought to be more efficient but turn out to be less efficient when > > they are actually tried. > > > > -- > > Robert Haas > > EDB: http://www.enterprisedb.com
> I see this as a code cleanup more than an performance optimization.
I agree with this. This is like a code cleanup but it also improves performance.
I agree with this. This is like a code cleanup but it also improves performance.
I have done the performance testing, Just to confirm whether it really improves
performance.
> I think some of my patches could *increase* memory use, due to power-of-two
> allocation logic. I think it's still a good idea, since it doesn't seem to be
> the dominant memory allocation.
I don't think that it will increase performance rather it adds to the improvement.
> I think you should be sure to do this within a transaction, without cassert,
> and maybe with fsync=off full_page_writes=off
Thanks for sharing this. I have done the above settings and collected the
> I think some of my patches could *increase* memory use, due to power-of-two
> allocation logic. I think it's still a good idea, since it doesn't seem to be
> the dominant memory allocation.
I don't think that it will increase performance rather it adds to the improvement.
> I think you should be sure to do this within a transaction, without cassert,
> and maybe with fsync=off full_page_writes=off
Thanks for sharing this. I have done the above settings and collected the
below data.
> It'd be interesting to know which patch(es) contributed to the improvement.
> It's possible that (say) patch 0001 alone gives all the gain, or that 0002
> diminishes the gains.
>
> I think there'll be an interest in committing the smallest possible patch to
> realize the gains, to minimize code churn an unrelated changes.
> It'd be interesting to know which patch(es) contributed to the improvement.
> It's possible that (say) patch 0001 alone gives all the gain, or that 0002
> diminishes the gains.
>
> I think there'll be an interest in committing the smallest possible patch to
> realize the gains, to minimize code churn an unrelated changes.
In order to answer the above points, I have divided the patches into 2 sets.
1. Only 0001 and 0002 - These are related to list partitioning and do not contain
changes related power-of-two allocation logic.
2. This contains all the 5 patches.
I have used the same testing procedure as explained in the previous mail.
Please find the timing information of the last 10 creation of partitioned
tables as given below.
Without patch | With 0001 and 0002 | With all patch |
17.105 | 14.722 | 13.878 |
15.897 | 14.427 | 13.493 |
15.991 | 15.424 | 14.751 |
17.965 | 16.487 | 19.491 |
19.704 | 19.042 | 21.278 |
18.98 | 18.949 | 18.123 |
18.986 | 21.713 | 17.585 |
21.273 | 20.596 | 19.623 |
18.839 | 18.521 | 17.605 |
20.724 | 18.774 | 19.242 |
18.5464 | 17.8655 | 17.5069 |
As we can see in the above data, there is an improvement with both of the
patch sets.
> You can check MAXRSS (at least on linux) if you enable log_executor_stats,
> like:
>
> \set QUIET \\ SET client_min_messages=debug; SET log_executor_stats=on; DROP TABLE p; CREATE TABLE p(i int) PARTITION BY LIST(i); CREATE TABLE pd PARTITION OF p DEFAULT;
> SELECT format('CREATE TABLE p%s PARTITION OF p FOR VALUES IN (%s)', a,a) FROM generate_series(1,999)a;\gexec \\ SELECT;
> like:
>
> \set QUIET \\ SET client_min_messages=debug; SET log_executor_stats=on; DROP TABLE p; CREATE TABLE p(i int) PARTITION BY LIST(i); CREATE TABLE pd PARTITION OF p DEFAULT;
> SELECT format('CREATE TABLE p%s PARTITION OF p FOR VALUES IN (%s)', a,a) FROM generate_series(1,999)a;\gexec \\ SELECT;
Thanks a lot for sharing this information. It was really helpful.
I have collected the stat information for creation of 1000
partitions. Please find the stat information for the 'without patch'
case below.
LOG: EXECUTOR STATISTICS
DETAIL: ! system usage stats:
! 0.000012 s user, 0.000000 s system, 0.000011 s elapsed
! [71.599426 s user, 4.362552 s system total]
! 63872 kB max resident size
! 0/0 [0/231096] filesystem blocks in/out
! 0/0 [0/2388074] page faults/reclaims, 0 [0] swaps
! 0 [0] signals rcvd, 0/0 [0/0] messages rcvd/sent
! 0/0 [9982/6709] voluntary/involuntary context switches
Please find the stat information for 'with patch (all 5 patches)' case below.
LOG: EXECUTOR STATISTICS
DETAIL: ! system usage stats:
! 0.000018 s user, 0.000002 s system, 0.000013 s elapsed
! [73.529715 s user, 4.219172 s system total]
! 63152 kB max resident size
! 0/0 [0/204840] filesystem blocks in/out
! 0/0 [0/2066377] page faults/reclaims, 0 [0] swaps
! 0 [0] signals rcvd, 0/0 [0/0] messages rcvd/sent
! 0/0 [9956/4129] voluntary/involuntary context switches
Please share your thoughts.
--
Thanks & Regards,
Nitin Jadhav
On Thu, May 20, 2021 at 12:47 AM Justin Pryzby <pryzby@telsasoft.com> wrote:
On Tue, May 18, 2021 at 01:49:12PM -0400, Robert Haas wrote:
> I see that you have made a theoretical argument for why this should be
> good for performance, but it would be better to have some test results
> that show that it works out in practice. Sometimes things seem like
> they ought to be more efficient but turn out to be less efficient when
> they are actually tried.
I see this as a code cleanup more than an performance optimization.
I couldn't see a measurable difference in my tests, involving CREATE TABLE and
SELECT.
I think some of my patches could *increase* memory use, due to power-of-two
allocation logic. I think it's still a good idea, since it doesn't seem to be
the dominant memory allocation.
On Thu, May 20, 2021 at 12:21:19AM +0530, Nitin Jadhav wrote:
> > I see that you have made a theoretical argument for why this should be
> > good for performance, but it would be better to have some test results
> > that show that it works out in practice. Sometimes things seem like
> > they ought to be more efficient but turn out to be less efficient when
> > they are actually tried.
>
> After this I tried to create 10 partitions and observed the time taken
> to execute. Here is the data for 'with patch'.
>
> postgres@34077=#select 'create table t_' || i || ' partition of t for
> postgres'# values in (' || i || ');'
> postgres-# from generate_series(10001, 10010) i
> postgres-# \gexec
I think you should be sure to do this within a transaction, without cassert,
and maybe with fsync=off full_page_writes=off.
> If we observe above data, we can see the improvement with the patch.
> The average time taken to execute for the last 10 partitions is.
It'd be interesting to know which patch(es) contributed to the improvement.
It's possible that (say) patch 0001 alone gives all the gain, or that 0002
diminishes the gains.
I think there'll be an interest in committing the smallest possible patch to
realize the gains, to minimize code churn an unrelated changes.
LIST and RANGE might need to be checked separately..
> With respect to memory usage, AFAIK the allocated memory gets cleaned
> during deallocation of the memory used by the memory context. So at
> the end of the query, we may see no difference in the memory usage but
> during the query execution it tries to get less memory than before.
You can check MAXRSS (at least on linux) if you enable log_executor_stats,
like:
\set QUIET \\ SET client_min_messages=debug; SET log_executor_stats=on; DROP TABLE p; CREATE TABLE p(i int) PARTITION BY LIST(i); CREATE TABLE pd PARTITION OF p DEFAULT;
SELECT format('CREATE TABLE p%s PARTITION OF p FOR VALUES IN (%s)', a,a) FROM generate_series(1,999)a;\gexec \\ SELECT;
--
Justin
> > I think some of my patches could *increase* memory use, due to power-of-two
> > allocation logic. I think it's still a good idea, since it doesn't seem to be
> > the dominant memory allocation.
>
> I don't think that it will increase performance rather it adds to the improvement.
> > allocation logic. I think it's still a good idea, since it doesn't seem to be
> > the dominant memory allocation.
>
> I don't think that it will increase performance rather it adds to the improvement.
Sorry. Kindly ignore the above comment. I had misunderstood the statement.
Thanks & Regards,
Nitin Jadhav
On Sun, May 23, 2021 at 10:40 PM Nitin Jadhav <nitinjadhavpostgres@gmail.com> wrote:
> I see this as a code cleanup more than an performance optimization.
I agree with this. This is like a code cleanup but it also improves performance.I have done the performance testing, Just to confirm whether it really improvesperformance.
> I think some of my patches could *increase* memory use, due to power-of-two
> allocation logic. I think it's still a good idea, since it doesn't seem to be
> the dominant memory allocation.
I don't think that it will increase performance rather it adds to the improvement.
> I think you should be sure to do this within a transaction, without cassert,
> and maybe with fsync=off full_page_writes=off
Thanks for sharing this. I have done the above settings and collected thebelow data.
> It'd be interesting to know which patch(es) contributed to the improvement.
> It's possible that (say) patch 0001 alone gives all the gain, or that 0002
> diminishes the gains.
>
> I think there'll be an interest in committing the smallest possible patch to
> realize the gains, to minimize code churn an unrelated changes.In order to answer the above points, I have divided the patches into 2 sets.1. Only 0001 and 0002 - These are related to list partitioning and do not containchanges related power-of-two allocation logic.2. This contains all the 5 patches.I have used the same testing procedure as explained in the previous mail.Please find the timing information of the last 10 creation of partitionedtables as given below.
Without patch With 0001 and 0002 With all patch 17.105 14.722 13.878 15.897 14.427 13.493 15.991 15.424 14.751 17.965 16.487 19.491 19.704 19.042 21.278 18.98 18.949 18.123 18.986 21.713 17.585 21.273 20.596 19.623 18.839 18.521 17.605 20.724 18.774 19.242 18.5464 17.8655 17.5069 As we can see in the above data, there is an improvement with both of thepatch sets.> You can check MAXRSS (at least on linux) if you enable log_executor_stats,
> like:
>
> \set QUIET \\ SET client_min_messages=debug; SET log_executor_stats=on; DROP TABLE p; CREATE TABLE p(i int) PARTITION BY LIST(i); CREATE TABLE pd PARTITION OF p DEFAULT;
> SELECT format('CREATE TABLE p%s PARTITION OF p FOR VALUES IN (%s)', a,a) FROM generate_series(1,999)a;\gexec \\ SELECT;Thanks a lot for sharing this information. It was really helpful.I have collected the stat information for creation of 1000partitions. Please find the stat information for the 'without patch'case below.LOG: EXECUTOR STATISTICSDETAIL: ! system usage stats:! 0.000012 s user, 0.000000 s system, 0.000011 s elapsed! [71.599426 s user, 4.362552 s system total]! 63872 kB max resident size! 0/0 [0/231096] filesystem blocks in/out! 0/0 [0/2388074] page faults/reclaims, 0 [0] swaps! 0 [0] signals rcvd, 0/0 [0/0] messages rcvd/sent! 0/0 [9982/6709] voluntary/involuntary context switchesPlease find the stat information for 'with patch (all 5 patches)' case below.LOG: EXECUTOR STATISTICSDETAIL: ! system usage stats:! 0.000018 s user, 0.000002 s system, 0.000013 s elapsed! [73.529715 s user, 4.219172 s system total]! 63152 kB max resident size! 0/0 [0/204840] filesystem blocks in/out! 0/0 [0/2066377] page faults/reclaims, 0 [0] swaps! 0 [0] signals rcvd, 0/0 [0/0] messages rcvd/sent! 0/0 [9956/4129] voluntary/involuntary context switchesPlease share your thoughts.--Thanks & Regards,Nitin JadhavOn Thu, May 20, 2021 at 12:47 AM Justin Pryzby <pryzby@telsasoft.com> wrote:On Tue, May 18, 2021 at 01:49:12PM -0400, Robert Haas wrote:
> I see that you have made a theoretical argument for why this should be
> good for performance, but it would be better to have some test results
> that show that it works out in practice. Sometimes things seem like
> they ought to be more efficient but turn out to be less efficient when
> they are actually tried.
I see this as a code cleanup more than an performance optimization.
I couldn't see a measurable difference in my tests, involving CREATE TABLE and
SELECT.
I think some of my patches could *increase* memory use, due to power-of-two
allocation logic. I think it's still a good idea, since it doesn't seem to be
the dominant memory allocation.
On Thu, May 20, 2021 at 12:21:19AM +0530, Nitin Jadhav wrote:
> > I see that you have made a theoretical argument for why this should be
> > good for performance, but it would be better to have some test results
> > that show that it works out in practice. Sometimes things seem like
> > they ought to be more efficient but turn out to be less efficient when
> > they are actually tried.
>
> After this I tried to create 10 partitions and observed the time taken
> to execute. Here is the data for 'with patch'.
>
> postgres@34077=#select 'create table t_' || i || ' partition of t for
> postgres'# values in (' || i || ');'
> postgres-# from generate_series(10001, 10010) i
> postgres-# \gexec
I think you should be sure to do this within a transaction, without cassert,
and maybe with fsync=off full_page_writes=off.
> If we observe above data, we can see the improvement with the patch.
> The average time taken to execute for the last 10 partitions is.
It'd be interesting to know which patch(es) contributed to the improvement.
It's possible that (say) patch 0001 alone gives all the gain, or that 0002
diminishes the gains.
I think there'll be an interest in committing the smallest possible patch to
realize the gains, to minimize code churn an unrelated changes.
LIST and RANGE might need to be checked separately..
> With respect to memory usage, AFAIK the allocated memory gets cleaned
> during deallocation of the memory used by the memory context. So at
> the end of the query, we may see no difference in the memory usage but
> during the query execution it tries to get less memory than before.
You can check MAXRSS (at least on linux) if you enable log_executor_stats,
like:
\set QUIET \\ SET client_min_messages=debug; SET log_executor_stats=on; DROP TABLE p; CREATE TABLE p(i int) PARTITION BY LIST(i); CREATE TABLE pd PARTITION OF p DEFAULT;
SELECT format('CREATE TABLE p%s PARTITION OF p FOR VALUES IN (%s)', a,a) FROM generate_series(1,999)a;\gexec \\ SELECT;
--
Justin
> You checked LIST but not HASH (patches 3-4) or RANGE (patch 4-5), right? Yes. I did not check about HASH and RANGE partitioning related patches as the changes are mostly similar to the list partitioning related changes. > Another test is to show the time/memory used by SELECT. That's far more > important than DDL, but I think the same results would apply here, so I think > it's not needed to test each of LIST/RANGE/HASH, nor to test every combination > of patches. Yes. I also feel that the same result would apply there as well. > Note that for the MAXRSS test, you must a different postgres backend process > for each of the tests (or else each test would never show a lower number than > the previous test). I have used different backend processes for each of the tests. > Mostly it's nice to see if the memory use is more visibly > different, or if there's an impressive improvement for this case. I did not get this point. Kindly explain for which scenario the memory usage test has to be done. Thanks & Regards, Nitin Jadhav On Sun, May 23, 2021 at 11:16 PM Justin Pryzby <pryzby@telsasoft.com> wrote: > > On Sun, May 23, 2021 at 10:40:16PM +0530, Nitin Jadhav wrote: > > I have used the same testing procedure as explained in the previous mail. > > Please find the timing information of the last 10 creation of partitioned > > tables as given below. > > > Without patch With 0001 and 0002 With all patch > ... > > 18.5464 17.8655 17.5069 > > For anyone reading non-HTML email, the last line shows the averages of the > previous 10 lines. > > >> LIST and RANGE might need to be checked separately.. > > You checked LIST but not HASH (patches 3-4) or RANGE (patch 4-5), right? > > Another test is to show the time/memory used by SELECT. That's far more > important than DDL, but I think the same results would apply here, so I think > it's not needed to test each of LIST/RANGE/HASH, nor to test every combination > of patches. Mostly it's nice to see if the memory use is more visibly > different, or if there's an impressive improvement for this case. > > Note that for the MAXRSS test, you must a different postgres backend process > for each of the tests (or else each test would never show a lower number than > the previous test). > > Thanks, > -- > Justin
On Wed, 19 May 2021 at 05:28, Nitin Jadhav <nitinjadhavpostgres@gmail.com> wrote: > I have rebased all the patches on top of > 'v2_0001-removed_extra_mem_alloc_from_create_list_bounds.patch'. > Attaching all the patches here. I had a look over these and I think what's being done here is fine. I think this will help speed up building the partition bound. Unfortunately, it won't help any for speeding up things like finding the correct partition during SELECT or DML on partitioned tables. The reason for this is that RelationBuildPartitionDesc first builds the partition bound using the functions being modified here, but it then copies it into the relcache into a memory context for the partition using partition_bounds_copy(). It looks like partition_bounds_copy() suffers from the same palloc explosion type problem as is being fixed in each of the create_*_bounds() functions here. The good news is, we can just give partition_bounds_copy() the same treatment. 0004 does that. I tried to see if the better cache locality of the PartitionBoundInfo's datum array would help speed up inserts into a partitioned table. I figured a fairly small binary search in a LIST partitioned table of 10 partitions might have all Datum visited all on the same cache line. However, I was unable to see any performance gains. I think the other work being done is likely just going to drown out any gains in cache efficiency in the binary search. COPY into a partitioned table might have more chance of becoming a little faster, but I didn't try. I've attached another set of patches. I squashed all the changes to each create_*_bounds function into a patch of their own. Perhaps 0002 and 0003 which handle range and hash partitioning can be one patch since Justin seemed to write that one. I kept 0001 separate since that's Nitin's patch plus Justin's extra parts. It seems easier to credit properly having the patches broken out like this. I think it's excessive to break down 0001 into Nitin and Justin's individual parts. I did make a few adjustments to the patches renaming a variable or two and I changed how we assign the boundinfo->datums[i] pointers to take the address of the Nth element rather than incrementing the variable pointing to the array each item. I personally like p = &array[i]; more than p = array; array++, others may not feel the same. Nitin and Justin, are you both able to have another look over these and let me know what you think. If all is well I'd like to push all 4 patches. David
Attachment
On Tue, Jul 06, 2021 at 01:48:52AM +1200, David Rowley wrote: > On Wed, 19 May 2021 at 05:28, Nitin Jadhav <nitinjadhavpostgres@gmail.com> wrote: > > I have rebased all the patches on top of > > 'v2_0001-removed_extra_mem_alloc_from_create_list_bounds.patch'. > > Attaching all the patches here. > > I had a look over these and I think what's being done here is fine. Thanks for loooking. 0001 is missing a newline before create_list_bounds() 0003 is missing pfree(all_bounds), which I had as 0005. It 1) allocates all_bounds; 2) allocates rbounds; 3) copies all_bounds into rbounds; 4) allocates boundDatums; 5) copies rbounds into boundDatums; 6) frees rbounds; 7) returns boundInfo with boundinfo->datums. > The good news is, we can just give partition_bounds_copy() the same > treatment. 0004 does that. +1 > I've attached another set of patches. I squashed all the changes to > each create_*_bounds function into a patch of their own. Perhaps 0002 > and 0003 which handle range and hash partitioning can be one patch > since Justin seemed to write that one. I kept 0001 separate since > that's Nitin's patch plus Justin's extra parts. It seems easier to > credit properly having the patches broken out like this. I think it's > excessive to break down 0001 into Nitin and Justin's individual parts. If you wanted to further squish the patches together, I don't mind being a co-author. Cheers, -- Justin
Also, if you're going to remove the initializations here, maybe you'd also change i and j to C99 "for" declarations like "for (int i=0, j=0; ...)" - PartitionListValue **all_values = NULL; - ListCell *cell; - int i = 0; - int ndatums = 0; + PartitionListValue *all_values; + int i; + int j; + int ndatums; Same in get_non_null_list_datum_count() -- Justin
On Tue, 6 Jul 2021 at 04:45, Justin Pryzby <pryzby@telsasoft.com> wrote: > If you wanted to further squish the patches together, I don't mind being a > co-author. Thanks for looking at the patches. I fixed the couple of things that you mentioned and pushed all 4 patches as a single commit (53d86957e) David
On Tue, 6 Jul 2021 at 05:03, Justin Pryzby <pryzby@telsasoft.com> wrote: > > Also, if you're going to remove the initializations here, maybe you'd also > change i and j to C99 "for" declarations like "for (int i=0, j=0; ...)" > > - PartitionListValue **all_values = NULL; > - ListCell *cell; > - int i = 0; > - int ndatums = 0; > + PartitionListValue *all_values; > + int i; > + int j; > + int ndatums; > > Same in get_non_null_list_datum_count() I tend to only get motivated to use that for new code that does not exist in back-branches. I'll maybe stop doing that when we no longer have to support the pre-C99 versions of the code. David