Thread: Clear empty space in a page.
Good day. Long time ago I've been played with proprietary "compressed storage" patch on heavily updated table, and found empty pages (ie cleaned by vacuum) are not compressed enough. When table is stress-updated, page for new row versions are allocated in round-robin kind, therefore some 1GB segments contains almost no live tuples. Vacuum removes dead tuples, but segments remains large after compression (>400MB) as if they are still full. After some investigation I found it is because PageRepairFragmentation, PageIndex*Delete* don't clear space that just became empty therefore it still contains garbage data. Clearing it with memset greatly increase compression ratio: some compressed relation segments become 30-60MB just after vacuum remove tuples in them. While this result is not directly applied to stock PostgreSQL, I believe page compression is important for full_page_writes with wal_compression enabled. And probably when PostgreSQL is used on filesystem with compression enabled (ZFS?). Therefore I propose clearing page's empty space with zero in PageRepairFragmentation, PageIndexMultiDelete, PageIndexTupleDelete and PageIndexTupleDeleteNoCompact. Sorry, didn't measure impact on raw performance yet. regards, Yura Sokolov aka funny_falcon
Attachment
Hello Yura, > didn't measure impact on raw performance yet. Must be done. There c/should be a guc to control this behavior if the performance impact is noticeable. -- Fabien.
Hi, I happened to be running some postgres on zfs on Linux/aarch64 tests and tested this patch. Kernel: 4.18.0-305.el8.aarch64 CPU: 16x3.0GHz Ampere Alta / Arm Neoverse N1 cores ZFS: 2.1.0-rc6 ZFS options: options spl spl_kmem_cache_slab_limit=65536 (see: https://github.com/openzfs/zfs/issues/12150) Postgres: 13.3 with and without the patch Postgres config: full_page_writes = on wal_compression = on Without patch: starting vacuum...end. transaction type: <builtin: TPC-B (sort of)> scaling factor: 100 query mode: prepared number of clients: 32 number of threads: 32 duration: 43200 s number of transactions actually processed: 612557228 latency average = 2.257 ms tps = 14179.551402 (including connections establishing) tps = 14179.553286 (excluding connections establishing) With patch: starting vacuum...end. transaction type: <builtin: TPC-B (sort of)> scaling factor: 100 query mode: prepared number of clients: 32 number of threads: 32 duration: 43200 s number of transactions actually processed: 606967295 latency average = 2.278 ms tps = 14050.164370 (including connections establishing) tps = 14050.166007 (excluding connections establishing) It does seem to help with on disk compression but it *might* have caused more fragmentation. Regards, Omar On Sat, May 29, 2021 at 10:22 PM Fabien COELHO <coelho@cri.ensmp.fr> wrote: > > > Hello Yura, > > > didn't measure impact on raw performance yet. > > Must be done. There c/should be a guc to control this behavior if the > performance impact is noticeable. > > -- > Fabien. > >
Hi, On 2021-05-30 03:10:26 +0300, Yura Sokolov wrote: > While this result is not directly applied to stock PostgreSQL, I believe > page compression is important for full_page_writes with wal_compression > enabled. And probably when PostgreSQL is used on filesystem with > compression enabled (ZFS?). I don't think the former is relevant, because the hole is skipped in wal page compression (at some cost). > Therefore I propose clearing page's empty space with zero in > PageRepairFragmentation, PageIndexMultiDelete, PageIndexTupleDelete and > PageIndexTupleDeleteNoCompact. > > Sorry, didn't measure impact on raw performance yet. I'm worried that this might cause O(n^2) behaviour in some cases, by repeatedly memset'ing the same mostly already zeroed space to 0. Why do we ever need to do memset_hole() instead of accurately just zeroing out the space that was just vacated? Greetings, Andres Freund
Hi, Andres Freund wrote 2021-05-31 00:07: > Hi, > > On 2021-05-30 03:10:26 +0300, Yura Sokolov wrote: >> While this result is not directly applied to stock PostgreSQL, I >> believe >> page compression is important for full_page_writes with >> wal_compression >> enabled. And probably when PostgreSQL is used on filesystem with >> compression enabled (ZFS?). > > I don't think the former is relevant, because the hole is skipped in > wal page > compression (at some cost). Ah, forgot about. Yep, you are right. >> Therefore I propose clearing page's empty space with zero in >> PageRepairFragmentation, PageIndexMultiDelete, PageIndexTupleDelete >> and >> PageIndexTupleDeleteNoCompact. >> >> Sorry, didn't measure impact on raw performance yet. > > I'm worried that this might cause O(n^2) behaviour in some cases, by > repeatedly memset'ing the same mostly already zeroed space to 0. Why do > we > ever need to do memset_hole() instead of accurately just zeroing out > the space > that was just vacated? It is done exactly this way: memset_hole accepts "old_pd_upper" and cleans between old and new one. regards, Yura