Re: problems with making relfilenodes 56-bits - Mailing list pgsql-hackers

From Andres Freund
Subject Re: problems with making relfilenodes 56-bits
Date
Msg-id 20221012211331.nhnhjdwitzfuygjz@awork3.anarazel.de
Whole thread Raw
In response to Re: problems with making relfilenodes 56-bits  (Matthias van de Meent <boekewurm+postgres@gmail.com>)
Responses Re: problems with making relfilenodes 56-bits  (Matthias van de Meent <boekewurm+postgres@gmail.com>)
Re: problems with making relfilenodes 56-bits  (Robert Haas <robertmhaas@gmail.com>)
List pgsql-hackers
Hi,

On 2022-10-12 22:05:30 +0200, Matthias van de Meent wrote:
> On Wed, 5 Oct 2022 at 01:50, Andres Freund <andres@anarazel.de> wrote:
> > On 2022-10-03 10:01:25 -0700, Andres Freund wrote:
> > > On 2022-10-03 08:12:39 -0400, Robert Haas wrote:
> > > > On Fri, Sep 30, 2022 at 8:20 PM Andres Freund <andres@anarazel.de> wrote:
> > > > I thought about trying to buy back some space elsewhere, and I think
> > > > that would be a reasonable approach to getting this committed if we
> > > > could find a way to do it. However, I don't see a terribly obvious way
> > > > of making it happen.
> > >
> > > I think there's plenty potential...
> >
> > I light dusted off my old varint implementation from [1] and converted the
> > RelFileLocator and BlockNumber from fixed width integers to varint ones. This
> > isn't meant as a serious patch, but an experiment to see if this is a path
> > worth pursuing.
> >
> > A run of installcheck in a cluster with autovacuum=off, full_page_writes=off
> > (for increased reproducability) shows a decent saving:
> >
> > master: 241106544 - 230 MB
> > varint: 227858640 - 217 MB
> 
> I think a signficant part of this improvement comes from the premise
> of starting with a fresh database. tablespace OID will indeed most
> likely be low, but database OID may very well be linearly distributed
> if concurrent workloads in the cluster include updating (potentially
> unlogged) TOASTed columns and the databases are not created in one
> "big bang" but over the lifetime of the cluster. In that case DBOID
> will consume 5B for a significant fraction of databases (anything with
> OID >=2^28).
> 
> My point being: I don't think that we should have different WAL
> performance in databases which is dependent on which OID was assigned
> to that database.

To me this is raising the bar to an absurd level. Some minor space usage
increase after oid wraparound and for very large block numbers isn't a huge
issue - if you're in that situation you already have a huge amount of wal.


> 0002 - Rework XLogRecord
> This makes many fields in the xlog header optional, reducing the size
> of many xlog records by several bytes. This implements the design I
> shared in my earlier message [1].
> 
> 0003 - Rework XLogRecordBlockHeader.
> This patch could be applied on current head, and saves some bytes in
> per-block data. It potentially saves some bytes per registered
> block/buffer in the WAL record (max 2 bytes for the first block, after
> that up to 3). See the patch's commit message in the patch for
> detailed information.

The amount of complexity these two introduce seems quite substantial to
me. Both from an maintenance and a runtime perspective. I think we'd be better
off using building blocks like variable lengths encoded values than open
coding it in many places.

Greetings,

Andres Freund



pgsql-hackers by date:

Previous
From: Nathan Bossart
Date:
Subject: Re: allowing for control over SET ROLE
Next
From: Peter Smith
Date:
Subject: Re: create subscription - improved warning message