Thread: Reducing Memory Consumption (aset and generation)

Reducing Memory Consumption (aset and generation)

From
Ranier Vilela
Date:
HI hackers,

I thought it would be better to start a new thread to discuss.
While working with sorting patch, and read others threads,
I have some ideas to reduces memory consumption by aset and generation
memory modules.

I have done basic benchmarks, and it seems to improve performance.
I think it's really worth it, if it really is possible to reduce memory consumption.

Linux Ubuntu 64 bits
work_mem = 64MB

set max_parallel_workers_per_gather = 0;
create table t (a bigint not null, b bigint not null, c bigint not
null, d bigint not null, e bigint not null, f bigint not null);

insert into t select x,x,x,x,x,x from generate_Series(1,140247142) x; -- 10GB!
vacuum freeze t;

select * from t order by a offset 140247142;

HEAD:
postgres=# select * from t order by a offset 140247142;
 a | b | c | d | e | f
---+---+---+---+---+---
(0 rows)

work_mem=64MB
Time: 99603,544 ms (01:39,604)
Time: 94000,342 ms (01:34,000)

postgres=# set work_mem="64.2MB";
SET
Time: 0,210 ms
postgres=# select * from t order by a offset 140247142;
 a | b | c | d | e | f
---+---+---+---+---+---
(0 rows)

Time: 95306,254 ms (01:35,306)


PATCHED:
postgres=# explain analyze select * from t order by a offset 140247142;
 a | b | c | d | e | f
---+---+---+---+---+---
(0 rows)

work_mem=64MB
Time: 90946,482 ms (01:30,946)

postgres=# set work_mem="64.2MB";
SET
Time: 0,210 ms
postgres=# select * from t order by a offset 140247142;
 a | b | c | d | e | f
---+---+---+---+---+---
(0 rows)

Time: 91817,533 ms (01:31,818)


There is still room for further improvements, and at this point I need help.

Regarding the patches we have:
1) 001-aset-reduces-memory-consumption.patch
Reduces memory used by struct AllocBlockData by minus 8 bits,
reducing the total size to 32 bits, which leads to "fitting" two structs in a 64bit cache.

Move some stores to fields struct, for the order of declaration, within the structures.

Remove tests elog(ERROR, "could not find block containing chunk %p" and
elog(ERROR, "could not find block containing chunk %p", moving them to
MEMORY_CONTEXT_CHECKING context.

Since 8.2 versions, nobody complains about these tests.
But if is not acceptable, have the option (3) 003-aset-reduces-memory-consumption.patch

2) 002-generation-reduces-memory-consumption.patch
Reduces memory used by struct GenerationBlock, by minus 8 bits,
reducing the total size to 32 bits, which leads to "fitting" two structs in a 64bit cache.

Remove all references to the field *block* used by struct GenerationChunk,
enabling its removal! (not done yet).
What would take the final size to 16 bits, which leads to "fitting" four structs in a 64bit cache.
Unfortunately, everything works only for the size 24, see the (4).

Move some stores to fields struct, for the order of declaration, within the structures.

3) 003-aset-reduces-memory-consumption.patch
Same to the (1), but without remove the tests:
elog(ERROR, "could not find block containing chunk %p" and
elog(ERROR, "could not find block containing chunk %p",
But at the cost of removing a one tiny part of the tests.

Since 8.2 versions, nobody complains about these tests.

4) 004-generation-reduces-memory-consumption-BUG.patch
Same to the (2), but with BUG.
It only takes a few tweaks to completely remove the field block.

@@ -117,9 +116,9 @@ struct GenerationChunk
  /* this is zero in a free chunk */
  Size requested_size;
 
-#define GENERATIONCHUNK_RAWSIZE  (SIZEOF_SIZE_T * 2 + SIZEOF_VOID_P * 2)
+#define GENERATIONCHUNK_RAWSIZE  (SIZEOF_SIZE_T * 2 + SIZEOF_VOID_P)
 #else
-#define GENERATIONCHUNK_RAWSIZE  (SIZEOF_SIZE_T + SIZEOF_VOID_P * 2)
+#define GENERATIONCHUNK_RAWSIZE  (SIZEOF_SIZE_T + SIZEOF_VOID_P)
 #endif /* MEMORY_CONTEXT_CHECKING */
 
  /* ensure proper alignment by adding padding if needed */
@@ -127,7 +126,6 @@ struct GenerationChunk
  char padding[MAXIMUM_ALIGNOF - GENERATIONCHUNK_RAWSIZE % MAXIMUM_ALIGNOF];
 #endif
 
- GenerationBlock *block; /* block owning this chunk */
  GenerationContext *context; /* owning context, or NULL if freed chunk */
  /* there must not be any padding to reach a MAXALIGN boundary here! */
};

This fails with make check.
I couldn't figure out why it doesn't work with 16 bits (struct GenerationChunk).


Attachment

Re: Reducing Memory Consumption (aset and generation)

From
David Rowley
Date:
On Mon, 6 Jun 2022 at 07:28, Ranier Vilela <ranier.vf@gmail.com> wrote:
> 4) 004-generation-reduces-memory-consumption-BUG.patch
> Same to the (2), but with BUG.
> It only takes a few tweaks to completely remove the field block.

> This fails with make check.
> I couldn't figure out why it doesn't work with 16 bits (struct GenerationChunk).

I think you're misunderstanding how blocks and chunks work here.  A
block can have many chunks.  You can't find the block that a chunk is
on by subtracting Generation_BLOCKHDRSZ from the pointer given to
GenerationFree(). That would only work if the chunk happened to be the
first chunk on a block. If it's anything apart from that then you'll
be making adjustments to the memory of some prior chunk on the block.
I imagine this is the reason you can't get the tests to pass.

Can you also explain why you think moving code around randomly or
adding unlikely() macros helps reduce the memory consumption overheads
of generation contexts?  I imagine you think that's helping to further
improve performance, but you've not offered any evidence of that
separately from the other changes you've made. If you think those are
useful changes then I recommend you run individual benchmarks and
offer those as proof that those changes are worthwhile.

David



Re: Reducing Memory Consumption (aset and generation)

From
Ranier Vilela
Date:
Em seg., 6 de jun. de 2022 às 20:37, David Rowley <dgrowleyml@gmail.com> escreveu:
On Mon, 6 Jun 2022 at 07:28, Ranier Vilela <ranier.vf@gmail.com> wrote:
> 4) 004-generation-reduces-memory-consumption-BUG.patch
> Same to the (2), but with BUG.
> It only takes a few tweaks to completely remove the field block.

> This fails with make check.
> I couldn't figure out why it doesn't work with 16 bits (struct GenerationChunk).

Hi David, thanks for taking a look at this.
 
I think you're misunderstanding how blocks and chunks work here.  A
block can have many chunks.  You can't find the block that a chunk is
on by subtracting Generation_BLOCKHDRSZ from the pointer given to
GenerationFree(). That would only work if the chunk happened to be the
first chunk on a block. If it's anything apart from that then you'll
be making adjustments to the memory of some prior chunk on the block.
I imagine this is the reason you can't get the tests to pass.
Ok, I am still learning about this.
Can you explain why subtracting Generation_BLOCKHDRSZ from the pointer,
works for sizeof(struct GenerationChunk) = 24 bits,
When all references for the block field have been removed.
This pass check-world.
 

Can you also explain why you think moving code around randomly or
adding unlikely() macros helps reduce the memory consumption overheads
of generation contexts?
Of course, those changes do not reduce memory consumption.
But, IMO, I think those changes improve the access to memory regions,
because of the locality of the data.

About "unlikely macros", this helps the branchs prediction, when most of the time,
malloc and related functions, will not fail.
 
  I imagine you think that's helping to further
improve performance, but you've not offered any evidence of that
separately from the other changes you've made. If you think those are
useful changes then I recommend you run individual benchmarks and
offer those as proof that those changes are worthwhile.
Ok, I can understand, are changes unrelated.

regards,
Ranier Vilela

Re: Reducing Memory Consumption (aset and generation)

From
Ranier Vilela
Date:
Em seg., 6 de jun. de 2022 às 21:14, Ranier Vilela <ranier.vf@gmail.com> escreveu:
Em seg., 6 de jun. de 2022 às 20:37, David Rowley <dgrowleyml@gmail.com> escreveu:
On Mon, 6 Jun 2022 at 07:28, Ranier Vilela <ranier.vf@gmail.com> wrote:
> 4) 004-generation-reduces-memory-consumption-BUG.patch
> Same to the (2), but with BUG.
> It only takes a few tweaks to completely remove the field block.

> This fails with make check.
> I couldn't figure out why it doesn't work with 16 bits (struct GenerationChunk).

Hi David, thanks for taking a look at this.
 
I think you're misunderstanding how blocks and chunks work here.  A
block can have many chunks.  You can't find the block that a chunk is
on by subtracting Generation_BLOCKHDRSZ from the pointer given to
GenerationFree(). That would only work if the chunk happened to be the
first chunk on a block. If it's anything apart from that then you'll
be making adjustments to the memory of some prior chunk on the block.
I imagine this is the reason you can't get the tests to pass.
Ok, I am still learning about this.
Can you explain why subtracting Generation_BLOCKHDRSZ from the pointer,
works for sizeof(struct GenerationChunk) = 24 bits,
When all references for the block field have been removed.
This pass check-world.
 

Can you also explain why you think moving code around randomly or
adding unlikely() macros helps reduce the memory consumption overheads
of generation contexts?
Of course, those changes do not reduce memory consumption.
But, IMO, I think those changes improve the access to memory regions,
because of the locality of the data.

About "unlikely macros", this helps the branchs prediction, when most of the time,
malloc and related functions, will not fail.
 
  I imagine you think that's helping to further
improve performance, but you've not offered any evidence of that
separately from the other changes you've made. If you think those are
useful changes then I recommend you run individual benchmarks and
offer those as proof that those changes are worthwhile.
Ok, I can understand, are changes unrelated.
Let's restart this, to simplify the review and commit work.

Regarding the patches now, we have:
1) v1-001-aset-reduces-memory-consumption.patch
Reduces memory used by struct AllocBlockData by minus 8 bits,
reducing the total size to 32 bits, which leads to "fitting" two structs in a 64bit cache.

Remove tests elog(ERROR, "could not find block containing chunk %p" and
elog(ERROR, "could not find block containing chunk %p", moving them to
MEMORY_CONTEXT_CHECKING context.

Since 8.2 versions, nobody complains about these tests.

But if is not acceptable, have the option (3) v1-003-aset-reduces-memory-consumption.patch

2) v1-002-generation-reduces-memory-consumption.patch
Reduces memory used by struct GenerationBlock, by minus 8 bits,
reducing the total size to 32 bits, which leads to "fitting" two structs in a 64bit cache.

3) v1-003-aset-reduces-memory-consumption.patch
Same to the (1), but without remove the tests:
elog(ERROR, "could not find block containing chunk %p" and
elog(ERROR, "could not find block containing chunk %p",
But at the cost of removing a one tiny part of the tests and
moving them to MEMORY_CONTEXT_CHECKING context.

Since 8.2 versions, nobody complains about these tests.

regards,
Ranier Vilela
Attachment

Re: Reducing Memory Consumption (aset and generation)

From
Matthias van de Meent
Date:
On Tue, 7 Jun 2022 at 03:09, Ranier Vilela <ranier.vf@gmail.com> wrote:
>
> Let's restart this, to simplify the review and commit work.

The patchset fails to apply. Could you send an updated version that
applies to the current master branch?

> Regarding the patches now, we have:
> 1) v1-001-aset-reduces-memory-consumption.patch
> Reduces memory used by struct AllocBlockData by minus 8 bits,

This seems reasonable, considering we don't generally use the field
for anything but validation.

> reducing the total size to 32 bits, which leads to "fitting" two structs in a 64bit cache.

By bits, you mean bytes, right?

Regarding fitting 2 structs in 64 bytes, that point is moot, as each
of these structs are stored at the front of each malloc-ed block, so
you will never see more than one of these in the same cache line. Less
space used is nice, but not as critical there IMO.

> Remove tests elog(ERROR, "could not find block containing chunk %p" and
> elog(ERROR, "could not find block containing chunk %p", moving them to
> MEMORY_CONTEXT_CHECKING context.
>
> Since 8.2 versions, nobody complains about these tests.
>
> But if is not acceptable, have the option (3) v1-003-aset-reduces-memory-consumption.patch
>
> 2) v1-002-generation-reduces-memory-consumption.patch
> Reduces memory used by struct GenerationBlock, by minus 8 bits,

That seems fairly straight-forward -- 8 bytes saved on each page isn't
a lot, but it's something.

> reducing the total size to 32 bits, which leads to "fitting" two structs in a 64bit cache.

Your size accounting seems wrong. On 64-bit architectures, we have
dlist_node (=16) + Size (=8) + 2*int (=8) + 2 * (char*) (=16) = 48
bytes. Shaving off the Size field reduces that by 8 bytes to 40 bytes.

The argument of fitting 2 of these structures into one cache line is
moot again, because here, too, two of this struct will not share a
cache line (unless somehow we allocate 0-sized blocks, which would be
a bug).

> 3) v1-003-aset-reduces-memory-consumption.patch
> Same to the (1), but without remove the tests:
> elog(ERROR, "could not find block containing chunk %p" and
> elog(ERROR, "could not find block containing chunk %p",
> But at the cost of removing a one tiny part of the tests and
> moving them to MEMORY_CONTEXT_CHECKING context.

I like this patch over 001 due to allowing less corruption to occur in
the memory context code. This allows for detecting some issues in 003,
as opposed to none in 001.

Kind regards,

Matthias van de Meent



Re: Reducing Memory Consumption (aset and generation)

From
Ranier Vilela
Date:
Hi,
Thanks for take a look.

Em seg., 11 de jul. de 2022 às 05:48, Matthias van de Meent <boekewurm+postgres@gmail.com> escreveu:
On Tue, 7 Jun 2022 at 03:09, Ranier Vilela <ranier.vf@gmail.com> wrote:
>
> Let's restart this, to simplify the review and commit work.

The patchset fails to apply. Could you send an updated version that
applies to the current master branch?
Sure.
 

> Regarding the patches now, we have:
> 1) v1-001-aset-reduces-memory-consumption.patch
> Reduces memory used by struct AllocBlockData by minus 8 bits,

This seems reasonable, considering we don't generally use the field
for anything but validation.

> reducing the total size to 32 bits, which leads to "fitting" two structs in a 64bit cache.

By bits, you mean bytes, right?
Correct.
 

Regarding fitting 2 structs in 64 bytes, that point is moot, as each
of these structs are stored at the front of each malloc-ed block, so
you will never see more than one of these in the same cache line. Less
space used is nice, but not as critical there IMO.

> Remove tests elog(ERROR, "could not find block containing chunk %p" and
> elog(ERROR, "could not find block containing chunk %p", moving them to
> MEMORY_CONTEXT_CHECKING context.
>
> Since 8.2 versions, nobody complains about these tests.
>
> But if is not acceptable, have the option (3) v1-003-aset-reduces-memory-consumption.patch
>
> 2) v1-002-generation-reduces-memory-consumption.patch
> Reduces memory used by struct GenerationBlock, by minus 8 bits,

That seems fairly straight-forward -- 8 bytes saved on each page isn't
a lot, but it's something.

> reducing the total size to 32 bits, which leads to "fitting" two structs in a 64bit cache.

Your size accounting seems wrong. On 64-bit architectures, we have
dlist_node (=16) + Size (=8) + 2*int (=8) + 2 * (char*) (=16) = 48
bytes. Shaving off the Size field reduces that by 8 bytes to 40 bytes.

The argument of fitting 2 of these structures into one cache line is
moot again, because here, too, two of this struct will not share a
cache line (unless somehow we allocate 0-sized blocks, which would be
a bug).
Right. I think I was very tired.
 

> 3) v1-003-aset-reduces-memory-consumption.patch
> Same to the (1), but without remove the tests:
> elog(ERROR, "could not find block containing chunk %p" and
> elog(ERROR, "could not find block containing chunk %p",
> But at the cost of removing a one tiny part of the tests and
> moving them to MEMORY_CONTEXT_CHECKING context.

I like this patch over 001 due to allowing less corruption to occur in
the memory context code. This allows for detecting some issues in 003,
as opposed to none in 001.
I understand.

regards,
Ranier Vilela

Re: Reducing Memory Consumption (aset and generation)

From
Ranier Vilela
Date:
Em seg., 11 de jul. de 2022 às 09:25, Ranier Vilela <ranier.vf@gmail.com> escreveu:
Hi,
Thanks for take a look.

Em seg., 11 de jul. de 2022 às 05:48, Matthias van de Meent <boekewurm+postgres@gmail.com> escreveu:
On Tue, 7 Jun 2022 at 03:09, Ranier Vilela <ranier.vf@gmail.com> wrote:
>
> Let's restart this, to simplify the review and commit work.

The patchset fails to apply. Could you send an updated version that
applies to the current master branch?
Sure.
Here the patchs updated.

regards,
Ranier Vilela
Attachment

Re: Reducing Memory Consumption (aset and generation)

From
David Rowley
Date:
On Mon, 11 Jul 2022 at 20:48, Matthias van de Meent
<boekewurm+postgres@gmail.com> wrote:
> > 2) v1-002-generation-reduces-memory-consumption.patch
> > Reduces memory used by struct GenerationBlock, by minus 8 bits,
>
> That seems fairly straight-forward -- 8 bytes saved on each page isn't
> a lot, but it's something.

I think 002 is likely the only patch here that has some merit.
However, it's hard to imagine any measurable performance gains from
it.  I think the smallest generation block we have today is 8192
bytes. Saving 8 bytes in that equates to a saving of 0.1% of memory.
For an 8MB page, it's 1024 times less than that.

I imagine Ranier has been working on this due the performance
regression mentioned in [1].  I think it'll be much more worthwhile to
aim to reduce the memory chunk overheads rather than the block
overheads, as Ranier is doing here. I posted a patch in [2] which does
that.  To make that work, I need to have the owning context in the
block. The 001 and 003 patch seems to remove those here.

David

[1] https://www.postgresql.org/message-id/CAApHDvqXpLzav6dUeR5vO_RBh_feHrHMLhigVQXw9jHCyKP9PA@mail.gmail.com
[2] https://www.postgresql.org/message-id/CAApHDvpjauCRXcgcaL6+e3eqecEHoeRm9D-kcbuvBitgPnW=vw@mail.gmail.com



Re: Reducing Memory Consumption (aset and generation)

From
Ranier Vilela
Date:
Em ter., 12 de jul. de 2022 às 02:35, David Rowley <dgrowleyml@gmail.com> escreveu:
On Mon, 11 Jul 2022 at 20:48, Matthias van de Meent
<boekewurm+postgres@gmail.com> wrote:
> > 2) v1-002-generation-reduces-memory-consumption.patch
> > Reduces memory used by struct GenerationBlock, by minus 8 bits,
>
> That seems fairly straight-forward -- 8 bytes saved on each page isn't
> a lot, but it's something.

I think 002 is likely the only patch here that has some merit.
However, it's hard to imagine any measurable performance gains from
it.  I think the smallest generation block we have today is 8192
bytes. Saving 8 bytes in that equates to a saving of 0.1% of memory.
For an 8MB page, it's 1024 times less than that.

I imagine Ranier has been working on this due the performance
regression mentioned in [1].  I think it'll be much more worthwhile to
aim to reduce the memory chunk overheads rather than the block
overheads, as Ranier is doing here. I posted a patch in [2] which does
that.  To make that work, I need to have the owning context in the
block. The 001 and 003 patch seems to remove those here.
I saw the numbers at [2], 17% is very impressive.
How you need the context in the block, 001 and 003,
they are more of a hindrance than a help.

So, feel free to incorporate 002 into your patch if you wish.
The best thing to do here is to close and withdraw from commitfest.

regards,
Ranier Vilela