Re: pg_receivewal starting position - Mailing list pgsql-hackers

From Ronan Dunklau
Subject Re: pg_receivewal starting position
Date
Msg-id 9598692.SQ0H41ba6E@aivenronan
Whole thread Raw
In response to Re: pg_receivewal starting position  (Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com>)
List pgsql-hackers
Le mardi 31 août 2021, 13:17:22 CEST Bharath Rupireddy a écrit :
> On Mon, Aug 30, 2021 at 3:26 PM Ronan Dunklau <ronan.dunklau@aiven.io>
wrote:
> > Thank you for this review ! Please see the other side of the thread where
> > I
> > answered Michael Paquier with a new patchset, which includes some of your
> > proposed modifications.
>
> Thanks for the updated patches. Here are some comments on v3-0001
> patch. I will continue to review 0002 and 0003.

Thank you ! I will send a new version shortly, once I address your remarks
concerning patch 0002 (and hopefully 0003 :-) )
>
> 1) Missing full stops "." at the end.
> +               <literal>logical</literal>
> +               when following the current timeline history
> +               position, when following the current timeline history
>

Good catch, I will take care of it for the next version.

> 2) Can we have the "type" column as "slot_type" just to keep in sync
> with the pg_replication_slots view?

You're right, it makes more sense like this.

>
> 3) Can we mention the link to pg_replication_slots view in the columns
> - "type", "restart_lsn", "confirmed_flush_lsn"?
> Something like: the "slot_type"/"restart_lsn"/"confirmed_flush_lsn" is
> same as  <link
> linkend="view-pg-replication-slots"><structname>pg_replication_slots</struc
> tname></link> view.

Same as above, thanks.

>
> 4) Can we use "read_replication_slot" instead of
> "identify_replication_slot", just to be in sync with the actual
> command?

That must have been a leftover from an earlier version of the patch, I will fix
it also.

>
> 5) Are you actually copying the slot contents into the slot_contents
> variable here? Isn't just taking the pointer to the shared memory?
> + /* Copy slot contents while holding spinlock */
> + SpinLockAcquire(&slot->mutex);
> + slot_contents = *slot;
> + SpinLockRelease(&slot->mutex);
> + LWLockRelease(ReplicationSlotControlLock);
>
> You could just do:
> + Oid dbid;
> + XLogRecPtr restart_lsn;
> + XLogRecPtr confirmed_flush;
>
> + /* Copy the required slot contents */
> + SpinLockAcquire(&slot->mutex);
> + dbid = slot.data.database;
> + restart_lsn = slot.data.restart_lsn;
> + confirmed_flush = slot.data.confirmed_flush;
> + SpinLockRelease(&slot->mutex);
> + LWLockRelease(ReplicationSlotControlLock);

It's probably simpler that way.

>
> 6) It looks like you are not sending anything as a response to the
> READ_REPLICATION_SLOT command, if the slot specified doesn't exist.
> You are just calling end_tup_output which just calls rShutdown (points
> to donothingCleanup of printsimpleDR)
> if (has_value)
> do_tup_output(tstate, values, nulls);
> end_tup_output(tstate);

>
> Can you test the use case where the pg_receivewal asks
> READ_REPLICATION_SLOT with a non-existent replication slot and see
> with your v3 patch how it behaves?

>
> Why don't you remove has_value flag and do this in ReadReplicationSlot:
> Datum values[5];
> bool nulls[5];
> MemSet(values, 0, sizeof(values));
> MemSet(nulls, 0, sizeof(nulls));
>
> + dest = CreateDestReceiver(DestRemoteSimple);
> + tstate = begin_tup_output_tupdesc(dest, tupdesc, &TTSOpsVirtual);
> + do_tup_output(tstate, values, nulls);
> + end_tup_output(tstate);

As for the API, I think it's cleaner to just send an empty result set instead
of a null tuple in that case but I won't fight over it if there is consensus on
having an all-nulls tuple value instead.

There is indeed a bug, but not here, in the second patch: I still test the
slot type even when we didn't fetch anything. So I will add a test for that
too.
>
> 7) Why don't we use 2 separate variables "restart_tli",
> "confirmed_flush_tli" instead of "slots_position_timeline", just to be
> more informative?

You're right.

>
> 8) I still have the question like, how can a client (pg_receivewal for
> instance) know that it is the current owner/user of the slot it is
> requesting the info? As I said upthread, why don't we send "active"
> and "active_pid" fields of the pg_replication_slots view?
> Also, it would be good to send the "wal_status" field so that the
> client can check if the "wal_status" is not "lost"?

 As for pg_receivewal, it can only check that it's not active at that time,
since we only aquire the replication slot once we know the start_lsn. For the
basebackup case it's the same thing as we only want to check if it exists.
As such, I didn't add them as I didn't see the need, but if it can be useful
why not ? I will do that in the next version.

>
> 9) There are 2 new lines at the end of ReadReplicationSlot. We give
> only one new line after each function definition.
> end_tup_output(tstate);
> }
> <<1stnewline>>
> <<2ndnewline>>
> /*
>  * Handle TIMELINE_HISTORY command.
>  */
>

Ok !


> 10) Why do we need to have two test cases for "non-existent" slots?
> Isn't the test case after "DROP REPLICATION" enough?
> +($ret, $stdout, $stderr) = $node_primary->psql(
> +  'postgres', 'READ_REPLICATION_SLOT non_existent_slot;',
> +  extra_params => [ '-d', $connstr_rep ]);
> +ok( $ret == 0,
> +  "READ_REPLICATION_SLOT does not produce an error with non existent
> slot"); +ok( $stdout eq '',
> +    "READ_REPLICATION_SLOT returns no tuple if a slot is non existent");
>
> You can just rename the test case name from:
> +    "READ_REPLICATION_SLOT returns no tuple if a slot has been dropped");
> to
> +    "READ_REPLICATION_SLOT returns no tuple if a slot is non existent");
>

I wanted to test both the case where no slot by this name exists, and the case
where it has been dropped hence still referenced but marked as not "in_use".
Maybe it's not worth it and we can remove the first case as you suggest.

Best regards,

--
Ronan Dunklau





pgsql-hackers by date:

Previous
From: Amit Kapila
Date:
Subject: Re: Added schema level support for publication.
Next
From: Andrew Dunstan
Date:
Subject: Re: pgsql: Avoid using ambiguous word "positive" in error message.