Re: WAL format and API changes (9.5) - Mailing list pgsql-hackers

From Andres Freund
Subject Re: WAL format and API changes (9.5)
Date
Msg-id 20140813110746.GB28982@awork2.anarazel.de
Whole thread Raw
In response to Re: WAL format and API changes (9.5)  (Heikki Linnakangas <hlinnakangas@vmware.com>)
Responses Re: WAL format and API changes (9.5)
List pgsql-hackers
On 2014-08-13 02:36:59 +0300, Heikki Linnakangas wrote:
> On 08/13/2014 01:04 AM, Andres Freund wrote:
> >* I'm distinctly not a fan of passing around both the LSN and the struct
> >   XLogRecord to functions. We should bit the bullet and use something
> >   encapsulating both.
> 
> You mean, the redo functions? Seems fine to me, and always it's been like
> that...

Meh. By that argument we could just not do this patch :)

> >* XLogReplayBuffer imo isn't a very good name - the buffer isn't
> >   replayed. If anything it's sometimes replaced by the backup block, but
> >   the real replay still happens outside of that function.
> 
> I agree, but haven't come up with a better name.

XLogRestoreBuffer(), XLogRecoverBuffer(), XLogReadBufferForReplay(), ...?

> >* Why do we need XLogBeginInsert(). Right now this leads to uglyness
> >   like duplicated if (RelationNeedsWAL()) wal checks and
> >   XLogCancelInsert() of inserts that haven't been started.
> 
> I like clearly marking the beginning of the insert, with XLogBeginInsert().
> We certainly could design it so that it's not needed, and you could just
> start registering stuff without calling XLogBeginInsert() first. But I
> believe it's more robust this way. For example, you will get an error at the
> next XLogBeginInsert(), if a previous operation had already registered some
> data without calling XLogInsert().

I'm coming around to that view - especially as accidentally nesting xlog
insertions is a real concern with the new API.

> >And if we have Begin, why do we also need Cancel?
> 
> We could just automatically "cancel" any previous insertion when
> XLogBeginInsert() is called, but again it seems like bad form to leave
> references to buffers and data just lying there, if an operation bails out
> after registering some stuff and doesn't finish the insertion.
> 
> >* "XXX: do we need to do this for every page?" - yes, and it's every
> >   tuple, not every page... And It needs to be done *after* the tuple has
> >   been put on the page, not before. Otherwise the location will be wrong.
> 
> That comment is in heap_multi_insert(). All the inserted tuples have the
> same command id, seems redundant to log multiple NEW_CID records for them.

Yea, I know it's in heap_multi_insert(). They tuples have the same cid,
but they don't have the same tid. Or well, wouldn't if
log_heap_new_cid() were called after the PutHeapTuple(). Note that this
won't trigger for any normal user defined relations.

> >* The patch mixes the API changes around WAL records with changes of how
> >   individual actions are logged. That makes it rather hard to review -
> >   and it's a 500kb patch already.
> >
> >   I realize it's hard to avoid because the new API changes which
> >   information about blocks is available, but it does make it really hard
> >   to see whether the old/new code is doing something
> >   equivalent. It's hard to find more critical code than this :/
> 
> Yeah, I hear you. I considered doing this piecemeal, just adding the new
> functions first so that you could still use the old XLogRecData API, until
> all the functions have been converted. But in the end, I figured it's not
> worth it, as sooner or later we'd want to convert all the functions anyway.

I think it might be worthwile anyway. I'd be very surprised if there
aren't several significant bugs in the conversion. Your full page
checking tool surely helps to reduce the number, but it's not
foolproof. I can understand not wanting to do it though, it's a
significant amount of work.

Would you ask somebody else to do it in two steps?

Greetings,

Andres Freund

-- Andres Freund                       http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training &
Services



pgsql-hackers by date:

Previous
From: Andres Freund
Date:
Subject: Re: Scaling shared buffer eviction
Next
From: "MauMau"
Date:
Subject: Re: proposal for 9.5: monitoring lock time for slow queries