Re: problems with making relfilenodes 56-bits - Mailing list pgsql-hackers

From Andres Freund
Subject Re: problems with making relfilenodes 56-bits
Date
Msg-id 20221019192130.ebjbycpw6bzjry4v@awork3.anarazel.de
Whole thread Raw
In response to Re: problems with making relfilenodes 56-bits  (Robert Haas <robertmhaas@gmail.com>)
Responses Re: problems with making relfilenodes 56-bits  (Dilip Kumar <dilipbalaut@gmail.com>)
List pgsql-hackers
Hi,

On 2022-10-17 17:14:21 -0400, Robert Haas wrote:
> I have to admit that I worried about the same thing that Matthias
> raises, more or less. But I don't know whether I'm right to be
> worried. A variable-length representation of any kind is essentially a
> gamble that values requiring fewer bytes will be more common than
> values requiring more bytes, and by enough to justify the overhead
> that the method has. And, you want it to be more common for each
> individual user, not just overall. For example, more people are going
> to have small relations than large ones, but nobody wants performance
> to drop off a cliff when the relation passes a certain size threshold.
> Now, it wouldn't drop off a cliff here, but what about someone with a
> really big, append-only relation? Won't they just end up writing more
> to WAL than with the present system?

Perhaps. But I suspect it'd be a very small increase because they'd be using
bulk-insert paths in all likelihood anyway, if they managed to get to a very
large relation. And even in that case, if we e.g. were to make the record size
variable length, they'd still pretty much never reach that and it'd be an
overall win.

The number of people with that large relations, leaving partitioning aside
which'd still benefit as each relation is smaller, strikes me as a very small
percentage. And as you say, it's not like there's a cliff where everything
starts to be horrible.


> Maybe not. They might still have some writes to relations other than
> the very large, append-only relation, and then they could still win.
> Also, if we assume that the overhead of the variable-length
> representation is never more than 1 byte beyond what is needed to
> represent the underlying quantity in the minimal number of bytes, they
> are only going to lose if their relation is already more than half the
> maximum theoretical size, and if that is the case, they are in danger
> of hitting the size limit anyway. You can argue that there's still a
> risk here, but it doesn't seem like that bad of a risk.

Another thing here is that I suspect we ought to increase our relation size
beyond 4 byte * blocksize at some point - and then we'll have to use variable
encodings... Admittedly the amount of work needed to get there is substantial.

Somewhat relatedly, I think we, very slowly, should move towards wider OIDs as
well. Not having to deal with oid wraparound will be a significant win
(particularly for toast), but to keep the overhead reasonable, we're going to
need variable encodings.


> But the same thing is not so obvious for, let's say, database OIDs.
> What if you just have one or a few databases, but due to the previous
> history of the cluster, their OIDs just happen to be big? Then you're
> just behind where you would have been without the patch. Granted, if
> this happens to you, you will be in the minority, because most users
> are likely to have small database OIDs, but the fact that other people
> are writing less WAL on average isn't going to make you happy about
> writing more WAL on average. And even for a user for which that
> doesn't happen, it's not at all unlikely that the gains they see will
> be less than what we see on a freshly-initdb'd database.

I agree that going for variable width encodings on the basis of the database
oid field alone would be an unconvincing proposition. But variably encoding
database oids when we already variably encode other fields seems like a decent
bet. If you e.g. think of the 56-bit relfilenode field itself - obviously what
I was thinking about in the first place - it's going to be a win much more
often.

To really loose you'd not just have to have a large database oid, but also a
large tablespace and relation oid and a huge block number...


> So I don't really know what the answer is here. I don't think this
> technique sucks, but I don't think it's necessarily a categorical win
> for every case, either. And it even seems hard to reason about which
> cases are likely to be wins and which cases are likely to be losses.

True. I'm far less concerned than you or Matthias about increasing the size in
rare cases as long as it wins in the majority of cases. But that doesn't mean
every case is easy to consider.


> > > 0002 - Rework XLogRecord
> > > This makes many fields in the xlog header optional, reducing the size
> > > of many xlog records by several bytes. This implements the design I
> > > shared in my earlier message [1].
> > >
> > > 0003 - Rework XLogRecordBlockHeader.
> > > This patch could be applied on current head, and saves some bytes in
> > > per-block data. It potentially saves some bytes per registered
> > > block/buffer in the WAL record (max 2 bytes for the first block, after
> > > that up to 3). See the patch's commit message in the patch for
> > > detailed information.
> >
> > The amount of complexity these two introduce seems quite substantial to
> > me. Both from a maintenance and a runtime perspective. I think we'd be better
> > off using building blocks like variable lengths encoded values than open
> > coding it in many places.
> 
> I agree that this looks pretty ornate as written, but I think there
> might be some good ideas in here, too.

Agreed! Several of the ideas seem orthogonal to using variable encodings, so
this isn't really an either / or.


> It is also easy to reason about this kind of thing at least in terms of
> space consumption.

Hm, not for me, but...

Greetings,

Andres Freund



pgsql-hackers by date:

Previous
From: Andres Freund
Date:
Subject: Re: Reducing the chunk header sizes on all memory context types
Next
From: Melanie Plageman
Date:
Subject: Re: pg_stat_bgwriter.buffers_backend is pretty meaningless (and more?)