Re: WAL format and API changes (9.5) - Mailing list pgsql-hackers

From Heikki Linnakangas
Subject Re: WAL format and API changes (9.5)
Date
Msg-id 545A91CF.7030000@vmware.com
Whole thread Raw
In response to Re: WAL format and API changes (9.5)  (Andres Freund <andres@2ndquadrant.com>)
Responses Re: WAL format and API changes (9.5)  (Andres Freund <andres@2ndquadrant.com>)
List pgsql-hackers
On 10/30/2014 09:19 PM, Andres Freund wrote:
> Some things I noticed while reading the patch:

A lot of good comments, but let me pick up just two that are related:

> * There's a couple record types (e.g. XLOG_SMGR_TRUNCATE) that only
>    refer to the relation, but not to the block number. These still log
>    their rnode manually. Shouldn't we somehow deal with those in a
>    similar way explicit block references are dealt with?
>
> * Hm. At least WriteMZeroPageXlogRec (and probably the same for all the
>    other slru stuff) doesn't include a reference to the page. Isn't that
>    bad? Shouldn't we make XLogRegisterBlock() usable for that case?
>    Otherwise I fail to see how pg_rewind like tools can sanely deal with this?

Yeah, there are still operations that modify relation pages, but don't 
store the information about the modified pages in the standard format. 
That includes XLOG_SMGR_TRUNCATE that you spotted, and XLOG_SMGR_CREATE, 
and also XLOG_DBASE_CREATE/DROP. And then there are updates to 
non-relation files, like all the slru stuff, relcache init files, etc. 
And updates to the FSM and VM bypass the full-page write mechanism too.

To play it safe, pg_rewind copies all non-relation files as is. That 
includes all SLRUs, FSM and VM files, and everything else whose filename 
doesn't match the (main fork of) a relation file. Of course, that's a 
fair amount of copying to do, so we might want to optimize that in the 
future, but I want to nail the relation files first. They are usually an 
order of magnitude larger than the other files, after all.

Unfortunately pg_rewind still needs to recognize and parse the special 
WAL records like XLOG_SMGR_CREATE/TRUNCATE, that modify relation files 
outside the normal block registration system. I've been thinking that we 
should add another flag to the WAL record format to mark such records. 
pg_rewind will still need to understand the record format of such 
records, but the flag will help to catch bugs of omission. If pg_rewind 
or another such tool sees a record that's flagged as "special", but 
doesn't recognize the record type, it can throw an error.

- Heikki




pgsql-hackers by date:

Previous
From: Alvaro Herrera
Date:
Subject: Re: BRIN indexes - TRAP: BadArgument
Next
From: Peter Geoghegan
Date:
Subject: Re: INSERT ... ON CONFLICT {UPDATE | IGNORE}