Re: problems with making relfilenodes 56-bits - Mailing list pgsql-hackers
From | Robert Haas |
---|---|
Subject | Re: problems with making relfilenodes 56-bits |
Date | |
Msg-id | CA+Tgmoa7pNxxe_K=3mTHHZGSmnrc_YgApArx3OFHN2g57nzLNw@mail.gmail.com Whole thread Raw |
In response to | Re: problems with making relfilenodes 56-bits (Matthias van de Meent <boekewurm+postgres@gmail.com>) |
List | pgsql-hackers |
On Thu, Sep 29, 2022 at 12:24 PM Matthias van de Meent <boekewurm+postgres@gmail.com> wrote: > Currently, our minimal WAL record is exactly 24 bytes: length (4B), > TransactionId (4B), previous record pointer (8B), flags (1B), redo > manager (1B), 2 bytes of padding and lastly the 4-byte CRC. Of these > fields, TransactionID could reasonably be omitted for certain WAL > records (as example: index insertions don't really need the XID). > Additionally, the length field could be made to be variable length, > and any padding is just plain bad (adding 4 bytes to all > insert/update/delete/lock records was frowned upon). Right. I was shocked when I realized that we had two bytes of padding in there, considering that numerous rmgrs are stealing bits from the 1-byte field that identifies the record type. My question was: why aren't we exposing those 2 bytes for rmgr-type-specific use? Or for something like xl_xact_commit, we could get rid of xl_xact_info if we had those 2 bytes to work with. Right now, I see that a bare commit record is 34 bytes which rounds out to 40. With the trick above, we could shave off 4 bytes bringing the size to 30 which would round to 32. That's a pretty significant savings, although it'd be a lot better if we could get some kind of savings for DML records which could be much higher frequency. > I'm working on a prototype patch for a more bare-bones WAL record > header of which the only required fields would be prevptr (8B), CRC > (4B), rmgr (1B) and flags (1B) for a minimal size of 14 bytes. I don't > yet know the performance of this, but the considering that there will > be a lot more conditionals in header decoding it might be slower for > any one backend, but faster overall (less overall IOps) > > The flags field would be indications for additional information: [flag > name (bits): explanation (additional xlog header data in bytes)] > - len_size(0..1): xlog record size is at most xlrec_header_only (0B), > uint8_max(1B), uint16_max(2B), uint32_max(4B) > - has_xid (2): contains transaction ID of logging transaction (4B, or > probably 8B when we introduce 64-bit xids) > - has_cid (3): contains the command ID of the logging statement (4B) > (rationale for logging CID in [0], now in record header because XID is > included there as well, and both are required for consistent > snapshots. > - has_rminfo (4): has non-zero redo-manager flags field (1B) > (rationale for separate field [1], non-zero allows 1B space > optimization for one of each RMGR's operations) > - special_rel (5): pre-existing definition > - check_consistency (6): pre-existing definition > - unset (7): no meaning defined yet. Could be used for full record > compression, or other purposes. Interesting. One fly in the ointment here is that WAL records start on 8-byte boundaries (probably MAXALIGN boundaries, but I didn't check the details). And after the 24-byte header, there's a 2-byte header (or 5-byte header) introducing the payload data (see XLR_BLOCK_ID_DATA_SHORT/LONG). So if the size of the actual payload data is a multiple of 8, and is short enough that we use the short data header, we waste 6 bytes. If the data length is a multiple of 4, we waste 2 bytes. And those are probably really common cases. So the big improvements probably come from saving 2 bytes or 6 bytes or 10 bytes, and saving say 3 or 5 is probably not much better than 2. Or at least that's what I'm guessing. -- Robert Haas EDB: http://www.enterprisedb.com
pgsql-hackers by date: