Re: XLog size reductions: smaller XLRec block header for PG17 - Mailing list pgsql-hackers

From vignesh C
Subject Re: XLog size reductions: smaller XLRec block header for PG17
Date
Msg-id CALDaNm2Wg9OwUumwd9oPsFEGfF8j_LA3eLjJdUzDNuX9eTMLDA@mail.gmail.com
Whole thread Raw
In response to Re: XLog size reductions: smaller XLRec block header for PG17  (Matthias van de Meent <boekewurm+postgres@gmail.com>)
Responses Re: XLog size reductions: smaller XLRec block header for PG17
List pgsql-hackers
On Tue, 26 Sept 2023 at 02:09, Matthias van de Meent
<boekewurm+postgres@gmail.com> wrote:
>
> On Tue, 19 Sept 2023 at 01:03, Andres Freund <andres@anarazel.de> wrote:
> >
> > Hi,
> >
> > On 2023-05-18 19:22:26 +0300, Heikki Linnakangas wrote:
> > > On 18/05/2023 17:59, Matthias van de Meent wrote:
> > > > It changes the block IDs used to fit in 6 bits, using the upper 2 bits
> > > > of the block_id field to store how much data is contained in the
> > > > record (0, <=UINT8_MAX, or <=UINT16_MAX bytes).
> > >
> > > Perhaps we should introduce a few generic inline functions to do varint
> > > encoding. That could be useful in many places, while this scheme is very
> > > tailored for XLogRecordBlockHeader.
>
> This scheme is reused later for the XLogRecord xl_tot_len field over
> at [0], and FWIW is thus being reused. Sure, it's tailored to this WAL
> use case, but IMO we're getting good value from it. We don't use
> protobuf or JSON for WAL, we use our own serialization format. Having
> some specialized encoding/decoding in that format for certain fields
> is IMO quite acceptable.
>
> > Yes - I proposed that and wrote an implementation of reasonably efficient
> > varint encoding. Here's my prototype:
> > https://postgr.es/m/20221004234952.anrguppx5owewb6n%40awork3.anarazel.de
>
> As I mentioned on that thread, that prototype has a significant
> probability of doing nothing to improve WAL size, or even increasing
> the WAL size for installations which consume a lot of OIDs.
>
> > I think it's a bad tradeoff to write lots of custom varint encodings, just to
> > eek out a bit more space savings.
>
> This is only a single "custom" varint encoding though, if you can even
> call it that. It makes a field's size depend on flags set in another
> byte, which is not that much different from the existing use of
> XLR_BLOCK_ID_DATA_[LONG, SHORT].
>
> > The increase in code complexity IMO makes it a bad tradeoff.
>
> Pardon me for asking, but what would you consider to be a good
> tradeoff then? I think the code relating to the WAL storage format is
> about as simple as you can get it within the feature set it provides
> and the size of the resulting records. While I think there is still
> much to gain w.r.t. WAL record size, I don't think we can get much of
> those improvements without adding at least some amount of complexity,
> something I think to be true for most components in PostgreSQL.
>
> So, except for redesigning significant parts of the public WAL APIs,
> are we just going to ignore any potential improvements because they
> "increase code complexity"?

I'm seeing that there has been no activity in this thread for nearly 4
months, I'm planning to close this in the current commitfest unless
someone is planning to take it forward.

Regards,
Vignesh



pgsql-hackers by date:

Previous
From: vignesh C
Date:
Subject: Re: Cross-database SERIALIZABLE safe snapshots
Next
From: vignesh C
Date:
Subject: Re: ReadRecentBuffer() doesn't scale well