Re: problems with making relfilenodes 56-bits - Mailing list pgsql-hackers
From | Andres Freund |
---|---|
Subject | Re: problems with making relfilenodes 56-bits |
Date | |
Msg-id | 20221003170125.7xd2vdadjak74tnm@awork3.anarazel.de Whole thread Raw |
In response to | Re: problems with making relfilenodes 56-bits (Robert Haas <robertmhaas@gmail.com>) |
Responses |
Re: problems with making relfilenodes 56-bits
Re: problems with making relfilenodes 56-bits |
List | pgsql-hackers |
Hi, On 2022-10-03 08:12:39 -0400, Robert Haas wrote: > On Fri, Sep 30, 2022 at 8:20 PM Andres Freund <andres@anarazel.de> wrote: > > I think it'd be interesting to look at per-record-type stats between two > > equivalent workload, to see where practical workloads suffer the most > > (possibly with fpw=off, to make things more repeatable). > > I would expect, and Dilip's results seem to confirm, the effect to be > pretty uniform: basically, nearly every record gets bigger by 4 bytes. > That's because most records contain at least one block reference, and > if they contain multiple block references, likely all but one will be > marked BKPBLOCK_SAME_REL, so we pay the cost just once. But it doesn't really matter that much if an already large record gets a bit bigger. Whereas it does matter if it's a small record. Focussing on optimizing the record types where the increase is large seems like a potential way forward to me, even if we can't find something generic. > I thought about trying to buy back some space elsewhere, and I think > that would be a reasonable approach to getting this committed if we > could find a way to do it. However, I don't see a terribly obvious way > of making it happen. I think there's plenty potential... > Trying to do it by optimizing specific WAL record > types seems like a real pain in the neck, because there's tons of > different WAL records that all have the same problem. I am not so sure about that. Improving a bunch of the most frequent small records might buy you back enough on just about every workload to be OK. I put the top record sizes for an installcheck run with full_page_writes off at the bottom. Certainly our regression tests aren't generally representative. But I think it still decently highlights how just improving a few records could buy you back more than enough. > Trying to do it in a generic way makes more sense, and the fact that we have > 2 padding bytes available in XLogRecord seems like a place to start looking, > but the way forward from there is not clear to me. Random idea: xl_prev is large. Store a full xl_prev in the page header, but only store a 2 byte offset from the page header xl_prev within each record. Greetings, Andres Freund by total size: Type N (%) Record size (%) FPI size (%) Combined size (%) ---- - --- ----------- --- -------- --- ------------- --- Heap/INSERT 1041666 ( 50.48) 106565255 ( 50.54) 0 ( 0.00) 106565255 ( 43.92) Btree/INSERT_LEAF 352196 ( 17.07) 24067672 ( 11.41) 0 ( 0.00) 24067672 ( 9.92) Heap/DELETE 250852 ( 12.16) 13546008 ( 6.42) 0 ( 0.00) 13546008 ( 5.58) Hash/INSERT 108499 ( 5.26) 7811928 ( 3.70) 0 ( 0.00) 7811928 ( 3.22) Transaction/COMMIT 16053 ( 0.78) 6402657 ( 3.04) 0 ( 0.00) 6402657 ( 2.64) Gist/PAGE_UPDATE 57225 ( 2.77) 5217100 ( 2.47) 0 ( 0.00) 5217100 ( 2.15) Gin/UPDATE_META_PAGE 23943 ( 1.16) 4539970 ( 2.15) 0 ( 0.00) 4539970 ( 1.87) Gin/INSERT 27004 ( 1.31) 3623998 ( 1.72) 0 ( 0.00) 3623998 ( 1.49) Gist/PAGE_SPLIT 448 ( 0.02) 3391244 ( 1.61) 0 ( 0.00) 3391244 ( 1.40) SPGist/ADD_LEAF 38968 ( 1.89) 3341696 ( 1.58) 0 ( 0.00) 3341696 ( 1.38) ... XLOG/FPI 7228 ( 0.35) 378924 ( 0.18) 29788166 ( 93.67) 30167090 ( 12.43) ... Gin/SPLIT 141 ( 0.01) 13011 ( 0.01) 1187588 ( 3.73) 1200599 ( 0.49) ... -------- -------- -------- -------- Total 2063609 210848282 [86.89%] 31802766 [13.11%] 242651048 [100%] (Included XLOG/FPI and Gin/SPLIT to explain why there's FPIs despite running with fpw=off) sorted by number of records: Heap/INSERT 1041666 ( 50.48) 106565255 ( 50.54) 0 ( 0.00) 106565255 ( 43.92) Btree/INSERT_LEAF 352196 ( 17.07) 24067672 ( 11.41) 0 ( 0.00) 24067672 ( 9.92) Heap/DELETE 250852 ( 12.16) 13546008 ( 6.42) 0 ( 0.00) 13546008 ( 5.58) Hash/INSERT 108499 ( 5.26) 7811928 ( 3.70) 0 ( 0.00) 7811928 ( 3.22) Gist/PAGE_UPDATE 57225 ( 2.77) 5217100 ( 2.47) 0 ( 0.00) 5217100 ( 2.15) SPGist/ADD_LEAF 38968 ( 1.89) 3341696 ( 1.58) 0 ( 0.00) 3341696 ( 1.38) Gin/INSERT 27004 ( 1.31) 3623998 ( 1.72) 0 ( 0.00) 3623998 ( 1.49) Gin/UPDATE_META_PAGE 23943 ( 1.16) 4539970 ( 2.15) 0 ( 0.00) 4539970 ( 1.87) Standby/LOCK 18451 ( 0.89) 775026 ( 0.37) 0 ( 0.00) 775026 ( 0.32) Transaction/COMMIT 16053 ( 0.78) 6402657 ( 3.04) 0 ( 0.00) 6402657 ( 2.64)
pgsql-hackers by date: