Re: pg_receivewal starting position - Mailing list pgsql-hackers

From Bharath Rupireddy
Subject Re: pg_receivewal starting position
Date
Msg-id CALj2ACXB9iGk=CdLYtaFVwaw316mOWdiF8zNEbnvFxRu2T8bPQ@mail.gmail.com
Whole thread Raw
In response to Re: pg_receivewal starting position  (Ronan Dunklau <ronan.dunklau@aiven.io>)
Responses Re: pg_receivewal starting position
Re: pg_receivewal starting position
List pgsql-hackers
On Mon, Aug 30, 2021 at 3:26 PM Ronan Dunklau <ronan.dunklau@aiven.io> wrote:
> Thank you for this review ! Please see the other side of the thread where I
> answered Michael Paquier with a new patchset, which includes some of your
> proposed modifications.

Thanks for the updated patches. Here are some comments on v3-0001
patch. I will continue to review 0002 and 0003.

1) Missing full stops "." at the end.
+               <literal>logical</literal>
+               when following the current timeline history
+               position, when following the current timeline history

2) Can we have the "type" column as "slot_type" just to keep in sync
with the pg_replication_slots view?

3) Can we mention the link to pg_replication_slots view in the columns
- "type", "restart_lsn", "confirmed_flush_lsn"?
Something like: the "slot_type"/"restart_lsn"/"confirmed_flush_lsn" is
same as  <link linkend="view-pg-replication-slots"><structname>pg_replication_slots</structname></link>
view.

4) Can we use "read_replication_slot" instead of
"identify_replication_slot", just to be in sync with the actual
command?

5) Are you actually copying the slot contents into the slot_contents
variable here? Isn't just taking the pointer to the shared memory?
+ /* Copy slot contents while holding spinlock */
+ SpinLockAcquire(&slot->mutex);
+ slot_contents = *slot;
+ SpinLockRelease(&slot->mutex);
+ LWLockRelease(ReplicationSlotControlLock);

You could just do:
+ Oid dbid;
+ XLogRecPtr restart_lsn;
+ XLogRecPtr confirmed_flush;

+ /* Copy the required slot contents */
+ SpinLockAcquire(&slot->mutex);
+ dbid = slot.data.database;
+ restart_lsn = slot.data.restart_lsn;
+ confirmed_flush = slot.data.confirmed_flush;
+ SpinLockRelease(&slot->mutex);
+ LWLockRelease(ReplicationSlotControlLock);

6) It looks like you are not sending anything as a response to the
READ_REPLICATION_SLOT command, if the slot specified doesn't exist.
You are just calling end_tup_output which just calls rShutdown (points
to donothingCleanup of printsimpleDR)
if (has_value)
do_tup_output(tstate, values, nulls);
end_tup_output(tstate);

Can you test the use case where the pg_receivewal asks
READ_REPLICATION_SLOT with a non-existent replication slot and see
with your v3 patch how it behaves?

Why don't you remove has_value flag and do this in ReadReplicationSlot:
Datum values[5];
bool nulls[5];
MemSet(values, 0, sizeof(values));
MemSet(nulls, 0, sizeof(nulls));

+ dest = CreateDestReceiver(DestRemoteSimple);
+ tstate = begin_tup_output_tupdesc(dest, tupdesc, &TTSOpsVirtual);
+ do_tup_output(tstate, values, nulls);
+ end_tup_output(tstate);

7) Why don't we use 2 separate variables "restart_tli",
"confirmed_flush_tli" instead of "slots_position_timeline", just to be
more informative?

8) I still have the question like, how can a client (pg_receivewal for
instance) know that it is the current owner/user of the slot it is
requesting the info? As I said upthread, why don't we send "active"
and "active_pid" fields of the pg_replication_slots view?
Also, it would be good to send the "wal_status" field so that the
client can check if the "wal_status" is not "lost"?

9) There are 2 new lines at the end of ReadReplicationSlot. We give
only one new line after each function definition.
end_tup_output(tstate);
}
<<1stnewline>>
<<2ndnewline>>
/*
 * Handle TIMELINE_HISTORY command.
 */

10) Why do we need to have two test cases for "non-existent" slots?
Isn't the test case after "DROP REPLICATION" enough?
+($ret, $stdout, $stderr) = $node_primary->psql(
+  'postgres', 'READ_REPLICATION_SLOT non_existent_slot;',
+  extra_params => [ '-d', $connstr_rep ]);
+ok( $ret == 0,
+  "READ_REPLICATION_SLOT does not produce an error with non existent slot");
+ok( $stdout eq '',
+    "READ_REPLICATION_SLOT returns no tuple if a slot is non existent");

You can just rename the test case name from:
+    "READ_REPLICATION_SLOT returns no tuple if a slot has been dropped");
to
+    "READ_REPLICATION_SLOT returns no tuple if a slot is non existent");

Regards,
Bharath Rupireddy.



pgsql-hackers by date:

Previous
From: Nitin Jadhav
Date:
Subject: Re: Multi-Column List Partitioning
Next
From: hubert depesz lubaczewski
Date:
Subject: Re: Pg stuck at 100% cpu, for multiple days