Thread: Re: Increase of maintenance_work_mem limit in 64-bit Windows

Re: Increase of maintenance_work_mem limit in 64-bit Windows

From
David Rowley
Date:
On Fri, 20 Sept 2024 at 01:55, Пополитов Владлен
<v.popolitov@postgrespro.ru> wrote:
> Currently PostgreSQL built on 64-bit Windows has 2Gb limit for
> GUC variables due to sizeof(long)==4 used by Windows compilers.
> Technically 64-bit addressing for maintenance_work_mem is possible,
> but code base historically uses variables and constants of type "long",
> when process maintenance_work_mem value.

I agree. Ideally, we shouldn't use longs for anything ever. We should
likely adopt trying to remove the usages of them when possible.

I'd like to suggest you go about this patch slightly differently with
the end goal of removing the limitation from maintenance_work_mem,
work_mem, autovacuum_work_mem and logical_decoding_work_mem.

Patch 0001: Add a macro named something like WORK_MEM_KB_TO_BYTES()
and adjust all places where we do <work_mem_var> * 1024L to use this
new macro. Make the macro do the * 1024L as is done today so that this
patch is a simple refactor.
Patch 0002: Convert all places that use long and use Size instead.
Adjust WORK_MEM_KB_TO_BYTES to use a Size type rather than 1024L.

It might be wise to break 0002 down into individual GUCs as the patch
might become large.

I suspect we might have quite a large number of subtle bugs in our
code today due to using longs. 7340d9362 is an example of one that was
fixed recently.

David



Re: Increase of maintenance_work_mem limit in 64-bit Windows

From
Vladlen Popolitov
Date:
David Rowley писал(а) 2024-09-23 04:28:
> On Fri, 20 Sept 2024 at 01:55, Пополитов Владлен
> <v.popolitov@postgrespro.ru> wrote:
>> Currently PostgreSQL built on 64-bit Windows has 2Gb limit for
>> GUC variables due to sizeof(long)==4 used by Windows compilers.
>> Technically 64-bit addressing for maintenance_work_mem is possible,
>> but code base historically uses variables and constants of type 
>> "long",
>> when process maintenance_work_mem value.
> 
> I agree. Ideally, we shouldn't use longs for anything ever. We should
> likely adopt trying to remove the usages of them when possible.
> 
> I'd like to suggest you go about this patch slightly differently with
> the end goal of removing the limitation from maintenance_work_mem,
> work_mem, autovacuum_work_mem and logical_decoding_work_mem.
> 
> Patch 0001: Add a macro named something like WORK_MEM_KB_TO_BYTES()
> and adjust all places where we do <work_mem_var> * 1024L to use this
> new macro. Make the macro do the * 1024L as is done today so that this
> patch is a simple refactor.
> Patch 0002: Convert all places that use long and use Size instead.
> Adjust WORK_MEM_KB_TO_BYTES to use a Size type rather than 1024L.
> 
> It might be wise to break 0002 down into individual GUCs as the patch
> might become large.
> 
> I suspect we might have quite a large number of subtle bugs in our
> code today due to using longs. 7340d9362 is an example of one that was
> fixed recently.
> 
> David

Hi David,
Thank you for proposal, I looked at the patch and source code from this
point of view. In this approach we need to change all <work_mem_var>.
I counted the appearences of these vars in the code:
maintenance_work_mem appears 63 times in 20 files
work_mem appears 113 times in 48 files
logical_decoding_work_mem appears 10 times in 2 files
max_stack_depth appears 11 times in 3 files
wal_keep_size_mb appears 5 times in 3 files
min_wal_size_mb appears 5 times in 2 files
max_wal_size_mb appears 10 times in 2 files
wal_skip_threshold appears 5 times in 2 files
max_slot_wal_keep_size_mb appears 6 times in 3 files
wal_sender_timeout appears 23 times in 3 files
autovacuum_work_mem appears 11 times in 4 files
gin_pending_list_limit appears 8 times in 5 files
pendingListCleanupSize appears 2 times in 2 files
GinGetPendingListCleanupSize appears 2 times in 2 files

maintenance_work_mem appears 63 times and had only 4 cases, where "long"
is used (I fix it in patch). I also found, that this patch also fixed
autovacuum_work_mem , that has only 1 case - the same place in code as
maintenance_work_mem.

Now <work_mem_vars> in the code are processed based on the context: they 
are
assigned to Size, uint64, int64, double, long, int variables (last 2 
cases
need to fix) or multiplied by (uint64)1024, (Size)1024, 1024L (last case
needs to fix). Also signed value is used for max_stack_depth (-1 used as
error value). I am not sure, that we can solve all this cases by one
macro WORK_MEM_KB_TO_BYTES(). The code needs case by case check.

If I check the rest of the variables, the patch does not need
MAX_SIZE_T_KILOBYTES constant (I introduced it for variables, that are
already checked and fixed), it will contain only fixes in the types of
the variables and the constants.
It requires a lot of time to check all appearances and neighbour
code, but final patch will not be large, I do not expect a lot of
"long" in the rest of the code (only 4 case out of 63 needed to fix
for maintenance_work_mem).
What do you think about this approach?

-- 
Best regards,

Vladlen Popolitov.



Re: Increase of maintenance_work_mem limit in 64-bit Windows

From
David Rowley
Date:
On Mon, 23 Sept 2024 at 21:01, Vladlen Popolitov
<v.popolitov@postgrespro.ru> wrote:
> Thank you for proposal, I looked at the patch and source code from this
> point of view. In this approach we need to change all <work_mem_var>.
> I counted the appearences of these vars in the code:
> maintenance_work_mem appears 63 times in 20 files
> work_mem appears 113 times in 48 files
> logical_decoding_work_mem appears 10 times in 2 files
> max_stack_depth appears 11 times in 3 files
> wal_keep_size_mb appears 5 times in 3 files
> min_wal_size_mb appears 5 times in 2 files
> max_wal_size_mb appears 10 times in 2 files
> wal_skip_threshold appears 5 times in 2 files
> max_slot_wal_keep_size_mb appears 6 times in 3 files
> wal_sender_timeout appears 23 times in 3 files
> autovacuum_work_mem appears 11 times in 4 files
> gin_pending_list_limit appears 8 times in 5 files
> pendingListCleanupSize appears 2 times in 2 files
> GinGetPendingListCleanupSize appears 2 times in 2 files

Why do you think all of these appearances matter? I imagined all you
care about are when the values are multiplied by 1024.

> If I check the rest of the variables, the patch does not need
> MAX_SIZE_T_KILOBYTES constant (I introduced it for variables, that are
> already checked and fixed), it will contain only fixes in the types of
> the variables and the constants.
> It requires a lot of time to check all appearances and neighbour
> code, but final patch will not be large, I do not expect a lot of
> "long" in the rest of the code (only 4 case out of 63 needed to fix
> for maintenance_work_mem).
> What do you think about this approach?

I don't think you can do maintenance_work_mem without fixing work_mem
too. I don't think the hacks you've put into RI_Initial_Check() to
ensure you don't try to set work_mem beyond its allowed range are very
good. It effectively means that maintenance_work_mem does not do what
it's meant to for the initial validation of referential integrity
checks. If you're not planning on fixing work_mem too, would you just
propose to leave those hacks in there forever?

David



Re: Increase of maintenance_work_mem limit in 64-bit Windows

From
Vladlen Popolitov
Date:
David Rowley писал(а) 2024-09-23 15:35:
> On Mon, 23 Sept 2024 at 21:01, Vladlen Popolitov
> <v.popolitov@postgrespro.ru> wrote:
>> Thank you for proposal, I looked at the patch and source code from 
>> this
>> point of view. In this approach we need to change all <work_mem_var>.
>> I counted the appearences of these vars in the code:
>> maintenance_work_mem appears 63 times in 20 files
>> work_mem appears 113 times in 48 files
>> logical_decoding_work_mem appears 10 times in 2 files
>> max_stack_depth appears 11 times in 3 files
>> wal_keep_size_mb appears 5 times in 3 files
>> min_wal_size_mb appears 5 times in 2 files
>> max_wal_size_mb appears 10 times in 2 files
>> wal_skip_threshold appears 5 times in 2 files
>> max_slot_wal_keep_size_mb appears 6 times in 3 files
>> wal_sender_timeout appears 23 times in 3 files
>> autovacuum_work_mem appears 11 times in 4 files
>> gin_pending_list_limit appears 8 times in 5 files
>> pendingListCleanupSize appears 2 times in 2 files
>> GinGetPendingListCleanupSize appears 2 times in 2 files
> 
> Why do you think all of these appearances matter? I imagined all you
> care about are when the values are multiplied by 1024.
Common pattern in code - assign <work_mem_var> to local variable and 
send
local variable as parameter to function, then to nested function, and
somewhere deep multiply function parameter by 1024. It is why I needed 
to
check all appearances, most of them are correct.
>> If I check the rest of the variables, the patch does not need
>> MAX_SIZE_T_KILOBYTES constant (I introduced it for variables, that are
>> already checked and fixed), it will contain only fixes in the types of
>> the variables and the constants.
>> It requires a lot of time to check all appearances and neighbour
>> code, but final patch will not be large, I do not expect a lot of
>> "long" in the rest of the code (only 4 case out of 63 needed to fix
>> for maintenance_work_mem).
>> What do you think about this approach?
> 
> I don't think you can do maintenance_work_mem without fixing work_mem
> too. I don't think the hacks you've put into RI_Initial_Check() to
> ensure you don't try to set work_mem beyond its allowed range are very
> good. It effectively means that maintenance_work_mem does not do what
> it's meant to for the initial validation of referential integrity
> checks. If you're not planning on fixing work_mem too, would you just
> propose to leave those hacks in there forever?
I agree, it is better to fix all them together. I also do not like this
hack, it will be removed from the patch, if I check and change
all <work_mem_vars> at once.
I think, it will take about 1 week to fix and test all changes. I will
estimate the total volume of the changes and think, how to group them
in the patch ( I hope, it will be only one patch)

-- 
Best regards,

Vladlen Popolitov.



Re: Increase of maintenance_work_mem limit in 64-bit Windows

From
David Rowley
Date:
On Tue, 24 Sept 2024 at 02:47, Vladlen Popolitov
<v.popolitov@postgrespro.ru> wrote:
> I agree, it is better to fix all them together. I also do not like this
> hack, it will be removed from the patch, if I check and change
> all <work_mem_vars> at once.
> I think, it will take about 1 week to fix and test all changes. I will
> estimate the total volume of the changes and think, how to group them
> in the patch ( I hope, it will be only one patch)

There's a few places that do this:

Size maxBlockSize = ALLOCSET_DEFAULT_MAXSIZE;

/* choose the maxBlockSize to be no larger than 1/16 of work_mem */
while (16 * maxBlockSize > work_mem * 1024L)

I think since maxBlockSize is a Size variable, that the above should
probably be:

while (16 * maxBlockSize > (Size) work_mem * 1024)

Maybe there can be a precursor patch to fix all those to get rid of
the 'L' and cast to the type we're comparing to or assigning to rather
than trying to keep the result of the multiplication as a long.

David



Re: Increase of maintenance_work_mem limit in 64-bit Windows

From
Vladlen Popolitov
Date:
David Rowley писал(а) 2024-09-24 01:07:
> On Tue, 24 Sept 2024 at 02:47, Vladlen Popolitov
> <v.popolitov@postgrespro.ru> wrote:
>> I agree, it is better to fix all them together. I also do not like 
>> this
>> hack, it will be removed from the patch, if I check and change
>> all <work_mem_vars> at once.
>> I think, it will take about 1 week to fix and test all changes. I will
>> estimate the total volume of the changes and think, how to group them
>> in the patch ( I hope, it will be only one patch)
> 
> There's a few places that do this:
> 
> Size maxBlockSize = ALLOCSET_DEFAULT_MAXSIZE;
> 
> /* choose the maxBlockSize to be no larger than 1/16 of work_mem */
> while (16 * maxBlockSize > work_mem * 1024L)
> 
> I think since maxBlockSize is a Size variable, that the above should
> probably be:
> 
> while (16 * maxBlockSize > (Size) work_mem * 1024)
> 
> Maybe there can be a precursor patch to fix all those to get rid of
> the 'L' and cast to the type we're comparing to or assigning to rather
> than trying to keep the result of the multiplication as a long.
Yes. It is what I mean, when I wrote about the context - in this case
variable is used in "Size" context and the cast to Size type should be
used. It is why I need to check all places in code. I am going to do it
during this week.

-- 
Best regards,

Vladlen Popolitov.