Re: logical decoding and replication of sequences, take 2 - Mailing list pgsql-hackers
From | Tomas Vondra |
---|---|
Subject | Re: logical decoding and replication of sequences, take 2 |
Date | |
Msg-id | 62e6f3c0-2a7f-54ca-7587-6eb40d086653@enterprisedb.com Whole thread Raw |
In response to | Re: logical decoding and replication of sequences, take 2 (Ashutosh Bapat <ashutosh.bapat.oss@gmail.com>) |
List | pgsql-hackers |
On 7/24/23 14:53, Ashutosh Bapat wrote: > On Thu, Jul 20, 2023 at 8:22 PM Tomas Vondra > <tomas.vondra@enterprisedb.com> wrote: > >>> >>> PFA such edits in 0002 and 0006 patches. Let me know if those look >>> correct. I think we >>> need similar changes to the documentation and comments in other places. >>> >> >> OK, I merged the changes into the patches, with some minor changes to >> the wording etc. > > Thanks. > > >> >>> In sequence_decode() we skip sequence changes when fast forwarding. >>> Given that smgr_decode() is only to supplement sequence_decode(), I >>> think it's correct to do the same in smgr_decode() as well. Simillarly >>> skipping when we don't have full snapshot. >>> >> >> I don't follow, smgr_decode already checks ctx->fast_forward. > > In your earlier email you seemed to expressed some doubts about the > change skipping code in smgr_decode(). To that, I gave my own > perspective of why the change skipping code in smgr_decode() is > correct. I think smgr_decode is doing the right thing, IMO. No change > required there. > I think that was referring to the skipping we do for logical messages: if (message->transactional && !SnapBuildProcessChange(builder, xid, buf->origptr)) return; else if (!message->transactional && (SnapBuildCurrentState(builder) != SNAPBUILD_CONSISTENT || SnapBuildXactNeedsSkip(builder, buf->origptr))) return; I concluded we don't need to do that here. >> >>> Some minor comments on 0006 patch >>> >>> + /* make sure the relfilenode creation is associated with the XID */ >>> + if (XLogLogicalInfoActive()) >>> + GetCurrentTransactionId(); >>> >>> I think this change is correct and is inline with similar changes in 0002. But >>> I looked at other places from where DefineRelation() is called. For regular >>> tables it is called from ProcessUtilitySlow() which in turn does not call >>> GetCurrentTransactionId(). I am wondering whether we are just discovering a >>> class of bugs caused by not associating an xid with a newly created >>> relfilenode. >>> >> >> Not sure. Why would it be a bug? > > This discussion is unrelated to sequence decoding but let me add it > here. If we don't know the transaction ID that created a relfilenode, > we wouldn't know whether to roll back that creation if the transaction > gets rolled back during recovery. But maybe that doesn't matter since > the relfilenode is not visible in any of the catalogs, so it just lies > there unused. > I think that's unrelated to this patch. > >> >>> +void >>> +ReorderBufferAddRelFileLocator(ReorderBuffer *rb, TransactionId xid, >>> + RelFileLocator rlocator) >>> +{ >>> ... snip ... >>> + >>> + /* sequence changes require a transaction */ >>> + if (xid == InvalidTransactionId) >>> + return; >>> >>> IIUC, with your changes in DefineSequence() in this patch, this should not >>> happen. So this condition will never be true. But in case it happens, this code >>> will not add the relfilelocation to the hash table and we will deem the >>> sequence change as non-transactional. Isn't it better to just throw an error >>> and stop replication if that (ever) happens? >>> >> >> It can't happen for sequence, but it may happen when creating a >> non-sequence relfilenode. In a way, it's a way to skip (some) >> unnecessary relfilenodes. > > Ah! The comment is correct but cryptic. I didn't read it to mean this. > OK, I'll improve the comment. >>> + /* >>> + * To support logical decoding of sequences, we require the sequence >>> + * callback. We decide it here, but only check it later in the wrappers. >>> + * >>> + * XXX Isn't it wrong to define only one of those callbacks? Say we >>> + * only define the stream_sequence_cb() - that may get strange results >>> + * depending on what gets streamed. Either none or both? >>> + * >>> + * XXX Shouldn't sequence be defined at slot creation time, similar >>> + * to two_phase? Probably not. >>> + */ >>> >>> Do you intend to keep these XXX's as is? My previous comments on this comment >>> block are in [1]. > > This comment remains unanswered. > I think the conclusion was we don't need to do that. I forgot to remove the comment, though. >>> >>> In fact, given that whether or not sequences are replicated is decided by the >>> protocol version, do we really need LogicalDecodingContext::sequences? Drawing >>> parallel with WAL messages, I don't think it's needed. >>> >> >> Right. We do that for two_phase because you can override that when >> creating the subscription - sequences allowed that too initially, but >> then we ditched that. So I don't think we need this. > > Then we should just remove that member and its references. > The member is still needed - it says whether the plugin has callbacks for sequence decoding or not (just like we have a flag for streaming, for example). I see the XXX comment in sequence_decode() is no longer needed, we rely on protocol versioning. regards -- Tomas Vondra EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
pgsql-hackers by date: