Re: pg_receivewal starting position - Mailing list pgsql-hackers
From | Ronan Dunklau |
---|---|
Subject | Re: pg_receivewal starting position |
Date | |
Msg-id | 9598692.SQ0H41ba6E@aivenronan Whole thread Raw |
In response to | Re: pg_receivewal starting position (Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com>) |
List | pgsql-hackers |
Le mardi 31 août 2021, 13:17:22 CEST Bharath Rupireddy a écrit : > On Mon, Aug 30, 2021 at 3:26 PM Ronan Dunklau <ronan.dunklau@aiven.io> wrote: > > Thank you for this review ! Please see the other side of the thread where > > I > > answered Michael Paquier with a new patchset, which includes some of your > > proposed modifications. > > Thanks for the updated patches. Here are some comments on v3-0001 > patch. I will continue to review 0002 and 0003. Thank you ! I will send a new version shortly, once I address your remarks concerning patch 0002 (and hopefully 0003 :-) ) > > 1) Missing full stops "." at the end. > + <literal>logical</literal> > + when following the current timeline history > + position, when following the current timeline history > Good catch, I will take care of it for the next version. > 2) Can we have the "type" column as "slot_type" just to keep in sync > with the pg_replication_slots view? You're right, it makes more sense like this. > > 3) Can we mention the link to pg_replication_slots view in the columns > - "type", "restart_lsn", "confirmed_flush_lsn"? > Something like: the "slot_type"/"restart_lsn"/"confirmed_flush_lsn" is > same as <link > linkend="view-pg-replication-slots"><structname>pg_replication_slots</struc > tname></link> view. Same as above, thanks. > > 4) Can we use "read_replication_slot" instead of > "identify_replication_slot", just to be in sync with the actual > command? That must have been a leftover from an earlier version of the patch, I will fix it also. > > 5) Are you actually copying the slot contents into the slot_contents > variable here? Isn't just taking the pointer to the shared memory? > + /* Copy slot contents while holding spinlock */ > + SpinLockAcquire(&slot->mutex); > + slot_contents = *slot; > + SpinLockRelease(&slot->mutex); > + LWLockRelease(ReplicationSlotControlLock); > > You could just do: > + Oid dbid; > + XLogRecPtr restart_lsn; > + XLogRecPtr confirmed_flush; > > + /* Copy the required slot contents */ > + SpinLockAcquire(&slot->mutex); > + dbid = slot.data.database; > + restart_lsn = slot.data.restart_lsn; > + confirmed_flush = slot.data.confirmed_flush; > + SpinLockRelease(&slot->mutex); > + LWLockRelease(ReplicationSlotControlLock); It's probably simpler that way. > > 6) It looks like you are not sending anything as a response to the > READ_REPLICATION_SLOT command, if the slot specified doesn't exist. > You are just calling end_tup_output which just calls rShutdown (points > to donothingCleanup of printsimpleDR) > if (has_value) > do_tup_output(tstate, values, nulls); > end_tup_output(tstate); > > Can you test the use case where the pg_receivewal asks > READ_REPLICATION_SLOT with a non-existent replication slot and see > with your v3 patch how it behaves? > > Why don't you remove has_value flag and do this in ReadReplicationSlot: > Datum values[5]; > bool nulls[5]; > MemSet(values, 0, sizeof(values)); > MemSet(nulls, 0, sizeof(nulls)); > > + dest = CreateDestReceiver(DestRemoteSimple); > + tstate = begin_tup_output_tupdesc(dest, tupdesc, &TTSOpsVirtual); > + do_tup_output(tstate, values, nulls); > + end_tup_output(tstate); As for the API, I think it's cleaner to just send an empty result set instead of a null tuple in that case but I won't fight over it if there is consensus on having an all-nulls tuple value instead. There is indeed a bug, but not here, in the second patch: I still test the slot type even when we didn't fetch anything. So I will add a test for that too. > > 7) Why don't we use 2 separate variables "restart_tli", > "confirmed_flush_tli" instead of "slots_position_timeline", just to be > more informative? You're right. > > 8) I still have the question like, how can a client (pg_receivewal for > instance) know that it is the current owner/user of the slot it is > requesting the info? As I said upthread, why don't we send "active" > and "active_pid" fields of the pg_replication_slots view? > Also, it would be good to send the "wal_status" field so that the > client can check if the "wal_status" is not "lost"? As for pg_receivewal, it can only check that it's not active at that time, since we only aquire the replication slot once we know the start_lsn. For the basebackup case it's the same thing as we only want to check if it exists. As such, I didn't add them as I didn't see the need, but if it can be useful why not ? I will do that in the next version. > > 9) There are 2 new lines at the end of ReadReplicationSlot. We give > only one new line after each function definition. > end_tup_output(tstate); > } > <<1stnewline>> > <<2ndnewline>> > /* > * Handle TIMELINE_HISTORY command. > */ > Ok ! > 10) Why do we need to have two test cases for "non-existent" slots? > Isn't the test case after "DROP REPLICATION" enough? > +($ret, $stdout, $stderr) = $node_primary->psql( > + 'postgres', 'READ_REPLICATION_SLOT non_existent_slot;', > + extra_params => [ '-d', $connstr_rep ]); > +ok( $ret == 0, > + "READ_REPLICATION_SLOT does not produce an error with non existent > slot"); +ok( $stdout eq '', > + "READ_REPLICATION_SLOT returns no tuple if a slot is non existent"); > > You can just rename the test case name from: > + "READ_REPLICATION_SLOT returns no tuple if a slot has been dropped"); > to > + "READ_REPLICATION_SLOT returns no tuple if a slot is non existent"); > I wanted to test both the case where no slot by this name exists, and the case where it has been dropped hence still referenced but marked as not "in_use". Maybe it's not worth it and we can remove the first case as you suggest. Best regards, -- Ronan Dunklau
pgsql-hackers by date: