Thread: Clear empty space in a page.

Clear empty space in a page.

From
Yura Sokolov
Date:
Good day.

Long time ago I've been played with proprietary "compressed storage"
patch on heavily updated table, and found empty pages (ie cleaned by
vacuum) are not compressed enough.

When table is stress-updated, page for new row versions are allocated
in round-robin kind, therefore some 1GB segments contains almost
no live tuples. Vacuum removes dead tuples, but segments remains large
after compression (>400MB) as if they are still full.

After some investigation I found it is because PageRepairFragmentation,
PageIndex*Delete* don't clear space that just became empty therefore it
still contains garbage data. Clearing it with memset greatly increase
compression ratio: some compressed relation segments become 30-60MB just
after vacuum remove tuples in them.

While this result is not directly applied to stock PostgreSQL, I believe
page compression is important for full_page_writes with wal_compression
enabled. And probably when PostgreSQL is used on filesystem with
compression enabled (ZFS?).

Therefore I propose clearing page's empty space with zero in
PageRepairFragmentation, PageIndexMultiDelete, PageIndexTupleDelete and
PageIndexTupleDeleteNoCompact.

Sorry, didn't measure impact on raw performance yet.

regards,
Yura Sokolov aka funny_falcon
Attachment

Re: Clear empty space in a page.

From
Fabien COELHO
Date:
Hello Yura,

> didn't measure impact on raw performance yet.

Must be done. There c/should be a guc to control this behavior if the 
performance impact is noticeable.

-- 
Fabien.



Re: Clear empty space in a page.

From
Omar Kilani
Date:
Hi,

I happened to be running some postgres on zfs on Linux/aarch64 tests
and tested this patch.

Kernel: 4.18.0-305.el8.aarch64
CPU: 16x3.0GHz Ampere Alta / Arm Neoverse N1 cores

ZFS: 2.1.0-rc6
ZFS options: options spl spl_kmem_cache_slab_limit=65536 (see:
https://github.com/openzfs/zfs/issues/12150)

Postgres: 13.3 with and without the patch
Postgres config:

full_page_writes = on
wal_compression = on

Without patch:

starting vacuum...end.
transaction type: <builtin: TPC-B (sort of)>
scaling factor: 100
query mode: prepared
number of clients: 32
number of threads: 32
duration: 43200 s
number of transactions actually processed: 612557228
latency average = 2.257 ms
tps = 14179.551402 (including connections establishing)
tps = 14179.553286 (excluding connections establishing)

With patch:

starting vacuum...end.
transaction type: <builtin: TPC-B (sort of)>
scaling factor: 100
query mode: prepared
number of clients: 32
number of threads: 32
duration: 43200 s
number of transactions actually processed: 606967295
latency average = 2.278 ms
tps = 14050.164370 (including connections establishing)
tps = 14050.166007 (excluding connections establishing)

It does seem to help with on disk compression but it *might* have
caused more fragmentation.

Regards,
Omar

On Sat, May 29, 2021 at 10:22 PM Fabien COELHO <coelho@cri.ensmp.fr> wrote:
>
>
> Hello Yura,
>
> > didn't measure impact on raw performance yet.
>
> Must be done. There c/should be a guc to control this behavior if the
> performance impact is noticeable.
>
> --
> Fabien.
>
>



Re: Clear empty space in a page.

From
Andres Freund
Date:
Hi,

On 2021-05-30 03:10:26 +0300, Yura Sokolov wrote:
> While this result is not directly applied to stock PostgreSQL, I believe
> page compression is important for full_page_writes with wal_compression
> enabled. And probably when PostgreSQL is used on filesystem with
> compression enabled (ZFS?).

I don't think the former is relevant, because the hole is skipped in wal page
compression (at some cost).


> Therefore I propose clearing page's empty space with zero in
> PageRepairFragmentation, PageIndexMultiDelete, PageIndexTupleDelete and
> PageIndexTupleDeleteNoCompact.
> 
> Sorry, didn't measure impact on raw performance yet.

I'm worried that this might cause O(n^2) behaviour in some cases, by
repeatedly memset'ing the same mostly already zeroed space to 0. Why do we
ever need to do memset_hole() instead of accurately just zeroing out the space
that was just vacated?

Greetings,

Andres Freund



Re: Clear empty space in a page.

From
Yura Sokolov
Date:
Hi,

Andres Freund wrote 2021-05-31 00:07:
> Hi,
> 
> On 2021-05-30 03:10:26 +0300, Yura Sokolov wrote:
>> While this result is not directly applied to stock PostgreSQL, I 
>> believe
>> page compression is important for full_page_writes with 
>> wal_compression
>> enabled. And probably when PostgreSQL is used on filesystem with
>> compression enabled (ZFS?).
> 
> I don't think the former is relevant, because the hole is skipped in 
> wal page
> compression (at some cost).

Ah, forgot about. Yep, you are right.

>> Therefore I propose clearing page's empty space with zero in
>> PageRepairFragmentation, PageIndexMultiDelete, PageIndexTupleDelete 
>> and
>> PageIndexTupleDeleteNoCompact.
>> 
>> Sorry, didn't measure impact on raw performance yet.
> 
> I'm worried that this might cause O(n^2) behaviour in some cases, by
> repeatedly memset'ing the same mostly already zeroed space to 0. Why do 
> we
> ever need to do memset_hole() instead of accurately just zeroing out 
> the space
> that was just vacated?

It is done exactly this way: memset_hole accepts "old_pd_upper" and 
cleans between
old and new one.

regards,
Yura