Re: pg_receivewal starting position - Mailing list pgsql-hackers
From | Ronan Dunklau |
---|---|
Subject | Re: pg_receivewal starting position |
Date | |
Msg-id | 2834449.HKl90sH2d3@aivenronan Whole thread Raw |
In response to | Re: pg_receivewal starting position (Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com>) |
List | pgsql-hackers |
Le vendredi 3 septembre 2021 17:49:34 CEST, vous avez écrit : > On Fri, Sep 3, 2021 at 3:28 PM Ronan Dunklau <ronan.dunklau@aiven.io> wrote: > > There is still the concern raised by Bharath about being able to select > > the > > way to fetch the replication slot information for the user, and what > > should or should not be included in the documentation. I am in favor of > > documenting the process of selecting the wal start, and maybe include a > > --start-lsn option for the user to override it, but that's maybe for > > another patch. > > Let's hear from others. > > Thanks for the patches. I have some quick comments on the v5 patch-set: > > 0001: > 1) Do you also want to MemSet values too in ReadReplicationSlot? > > 2) When if clause has single statement we don't generally use "{" and "}" > + if (slot == NULL || !slot->in_use) > + { > + LWLockRelease(ReplicationSlotControlLock); > + } > you can just have: > + if (slot == NULL || !slot->in_use) > + LWLockRelease(ReplicationSlotControlLock); > > 3) This is still not copying the slot contents, as I said earlier you > can just take the required info into some local variables instead of > taking the slot pointer to slot_contents pointer. > + /* Copy slot contents while holding spinlock */ > + SpinLockAcquire(&slot->mutex); > + slot_contents = *slot; > + SpinLockRelease(&slot->mutex); > + LWLockRelease(ReplicationSlotControlLock); > > 4) As I said earlier, why can't we have separate tli variables for > more readability instead of just one slots_position_timeline and > timeline_history variable? And you are not even resetting those 2 > variables after if > (!XLogRecPtrIsInvalid(slot_contents.data.restart_lsn)), you might end > up sending the restart_lsn timelineid for confirmed_flush. So, better > use two separate variables. In fact you can use block local variables: > + if (!XLogRecPtrIsInvalid(slot_contents.data.restart_lsn)) > + { > + List *tl_history= NIL; > + TimeLineID tli; > + tl_history= readTimeLineHistory(ThisTimeLineID); > + tli = tliOfPointInHistory(slot_contents.data.restart_lsn, > + tl_history); > + values[i] = Int32GetDatum(tli); > + nulls[i] = false; > + } > similarly for confirmed_flush. > > 5) I still don't see the need for below test case: > + "READ_REPLICATION_SLOT does not produce an error with non existent slot"); > when we have > + "READ_REPLICATION_SLOT does not produce an error with dropped slot"); > Because for a user, dropped or non existent slot is same, it's just > that for dropped slot we internally don't delete its entry from the > shared memory. > Thank you for reiterating those concerns. As I said, I haven't touched Michael's version of the patch. I was incorporating those changes in my branch before he sent this, so I'll probably merge both before sending an updated patch. > 0002: > 1) Imagine GetSlotInformation always returns > READ_REPLICATION_SLOT_ERROR, don't you think StreamLog enters an > infinite loop there? Instead, why don't you just exit(1); instead of > return; and retry? Similarly for READ_REPLICATION_SLOT_NONEXISTENT? I > think, you can just do exit(1), no need to retry. > + case READ_REPLICATION_SLOT_ERROR: > + > + /* > + * Error has been logged by GetSlotInformation, return and > + * maybe retry > + */ > + return; This is the same behaviour we had before: if there is an error with pg_receivewal we retry the command. There was no special case for the replication slot not existing before, I don't see why we should change it now ? Eg: 2021-09-06 09:11:07.774 CEST [95853] ERROR: replication slot "nonexistent" does not exist 2021-09-06 09:11:07.774 CEST [95853] STATEMENT: START_REPLICATION SLOT "nonexistent" 0/1000000 TIMELINE 1 pg_receivewal: error: could not send replication command "START_REPLICATION": ERROR: replication slot "nonexistent" does not exist pg_receivewal: disconnected; waiting 5 seconds to try again Users may rely on it to keep retrying in the background until the slot has been created for example. > > 2) Why is it returning READ_REPLICATION_SLOT_OK when slot_info isn't > passed? And why are you having this check after you connect to the > server and fetch the data? > + /* If no slotinformation has been passed, we can return immediately */ > + if (slot_info == NULL) > + { > + PQclear(res); > + return READ_REPLICATION_SLOT_OK; > + } > Instead you can just have a single assert: > > + Assert(slot_name && slot_info ); At first it was so that we didn't have to fill in all required information if we don't need too, but it turns out pg_basebackup also has to check for the slot's type. I agree we should not support the null slot_info case anymore. > > 3) How about > pg_log_error("could not read replication slot: > instead of > pg_log_error("could not fetch replication slot: Ok. > > 4) Why are you having the READ_REPLICATION_SLOT_OK case in default? > + default: > + if (slot_info.is_logical) > + { > + /* > + * If the slot is not physical we can't expect to > + * recover from that > + */ > + pg_log_error("cannot use the slot provided, physical slot expected."); > + exit(1); > + } > + stream.startpos = slot_info.restart_lsn; > + stream.timeline = slot_info.restart_tli; > + } > You can just have another case statement for READ_REPLICATION_SLOT_OK > and in the default you can throw an error "unknown read replication > slot status" or some other better working and exit(1); Ok. > > 5) Do you want initialize slot_info to 0? > + if (replication_slot) > + { > + SlotInformation slot_info; > + MemSet(slot_info, 0, sizeof(SlotInformation)); > > 6) This isn't readable: > + slot_info->is_logical = strcmp(PQgetvalue(res, 0, 0), "logical") == 0; > How about: > if (strcmp(PQgetvalue(res, 0, 0), "logical") == 0) > slot_info->is_logical = true; > You don't need to set it false, because you would have > MemSet(slot_info) in the caller. > Isn't it preferable to fill in the whole struct without any regards to it's prior state ? I will modify the is_logical assignment to make it more readable though. > 7) How about > uint32 hi; > uint32 lo; > instead of > + uint32 hi, > + lo; > Ok. > 8) Move SlotInformation * slot_info) to the next line as it crosses > the 80 char limit. > +GetSlotInformation(PGconn *conn, const char *slot_name, > SlotInformation * slot_info) Ok. > > 9) Instead of a boolean is_logical, I would rather suggest to use an > enum or #define macros the slot types correctly, because someday we > might introduce new type slots and somebody wants is_physical kind of > variable name? > +typedef struct SlotInformation { > + bool is_logical; That's the reason why the READ_REPLICATION_SLOT command returns a slot type as a string, but the intermediate representation on the client doesn't really need to be concerned about this: it's the client after all and it will be very easy to change it locally if we need to. Thinking about it, for our use case we should really use is_physical instead of is_logical. But I guess all of this is moot if we end up deciding not to support the logical case anymore ? Best regards, -- Ronan Dunklau
pgsql-hackers by date: