Thread: Introduce XID age and inactive timeout based replication slot invalidation

Introduce XID age and inactive timeout based replication slot invalidation

From
Bharath Rupireddy
Date:
Hi,

Replication slots in postgres will prevent removal of required
resources when there is no connection using them (inactive). This
consumes storage because neither required WAL nor required rows from
the user tables/system catalogs can be removed by VACUUM as long as
they are required by a replication slot. In extreme cases this could
cause the transaction ID wraparound.

Currently postgres has the ability to invalidate inactive replication
slots based on the amount of WAL (set via max_slot_wal_keep_size GUC)
that will be needed for the slots in case they become active. However,
the wraparound issue isn't effectively covered by
max_slot_wal_keep_size - one can't tell postgres to invalidate a
replication slot if it is blocking VACUUM. Also, it is often tricky to
choose a default value for max_slot_wal_keep_size, because the amount
of WAL that gets generated and allocated storage for the database can
vary.

Therefore, it is often easy for developers to do the following:
a) set an XID age (age of slot's xmin or catalog_xmin) of say 1 or 1.5
billion, after which the slots get invalidated.
b) set a timeout of say 1 or 2 or 3 days, after which the inactive
slots get invalidated.

To implement (a), postgres needs a new GUC called max_slot_xid_age.
The checkpointer then invalidates all the slots whose xmin (the oldest
transaction that this slot needs the database to retain) or
catalog_xmin (the oldest transaction affecting the system catalogs
that this slot needs the database to retain) has reached the age
specified by this setting.

To implement (b), first postgres needs to track the replication slot
metrics like the time at which the slot became inactive (inactive_at
timestamptz) and the total number of times the slot became inactive in
its lifetime (inactive_count numeric) in ReplicationSlotPersistentData
structure. And, then it needs a new timeout GUC called
inactive_replication_slot_timeout. Whenever a slot becomes inactive,
the current timestamp and inactive count are stored in
ReplicationSlotPersistentData structure and persisted to disk. The
checkpointer then invalidates all the slots that are lying inactive
for about inactive_replication_slot_timeout duration starting from
inactive_at.

In addition to implementing (b), these two new metrics enable
developers to improve their monitoring tools as the metrics are
exposed via pg_replication_slots system view. For instance, one can
build a monitoring tool that signals when replication slots are lying
inactive for a day or so using inactive_at metric, and/or when a
replication slot is becoming inactive too frequently using inactive_at
metric.

I’m attaching the v1 patch set as described below:
0001 - Tracks invalidation_reason in pg_replication_slots. This is
needed because slots now have multiple reasons for slot invalidation.
0002 - Tracks inactive replication slot information inactive_at and
inactive_timeout.
0003 - Adds inactive_timeout based replication slot invalidation.
0004 - Adds XID based replication slot invalidation.

Thoughts?

--
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Attachment

Re: Introduce XID age and inactive timeout based replication slot invalidation

From
Bharath Rupireddy
Date:
On Thu, Jan 11, 2024 at 10:48 AM Bharath Rupireddy
<bharath.rupireddyforpostgres@gmail.com> wrote:
>
> Hi,
>
> Replication slots in postgres will prevent removal of required
> resources when there is no connection using them (inactive). This
> consumes storage because neither required WAL nor required rows from
> the user tables/system catalogs can be removed by VACUUM as long as
> they are required by a replication slot. In extreme cases this could
> cause the transaction ID wraparound.
>
> Currently postgres has the ability to invalidate inactive replication
> slots based on the amount of WAL (set via max_slot_wal_keep_size GUC)
> that will be needed for the slots in case they become active. However,
> the wraparound issue isn't effectively covered by
> max_slot_wal_keep_size - one can't tell postgres to invalidate a
> replication slot if it is blocking VACUUM. Also, it is often tricky to
> choose a default value for max_slot_wal_keep_size, because the amount
> of WAL that gets generated and allocated storage for the database can
> vary.
>
> Therefore, it is often easy for developers to do the following:
> a) set an XID age (age of slot's xmin or catalog_xmin) of say 1 or 1.5
> billion, after which the slots get invalidated.
> b) set a timeout of say 1 or 2 or 3 days, after which the inactive
> slots get invalidated.
>
> To implement (a), postgres needs a new GUC called max_slot_xid_age.
> The checkpointer then invalidates all the slots whose xmin (the oldest
> transaction that this slot needs the database to retain) or
> catalog_xmin (the oldest transaction affecting the system catalogs
> that this slot needs the database to retain) has reached the age
> specified by this setting.
>
> To implement (b), first postgres needs to track the replication slot
> metrics like the time at which the slot became inactive (inactive_at
> timestamptz) and the total number of times the slot became inactive in
> its lifetime (inactive_count numeric) in ReplicationSlotPersistentData
> structure. And, then it needs a new timeout GUC called
> inactive_replication_slot_timeout. Whenever a slot becomes inactive,
> the current timestamp and inactive count are stored in
> ReplicationSlotPersistentData structure and persisted to disk. The
> checkpointer then invalidates all the slots that are lying inactive
> for about inactive_replication_slot_timeout duration starting from
> inactive_at.
>
> In addition to implementing (b), these two new metrics enable
> developers to improve their monitoring tools as the metrics are
> exposed via pg_replication_slots system view. For instance, one can
> build a monitoring tool that signals when replication slots are lying
> inactive for a day or so using inactive_at metric, and/or when a
> replication slot is becoming inactive too frequently using inactive_at
> metric.
>
> I’m attaching the v1 patch set as described below:
> 0001 - Tracks invalidation_reason in pg_replication_slots. This is
> needed because slots now have multiple reasons for slot invalidation.
> 0002 - Tracks inactive replication slot information inactive_at and
> inactive_timeout.
> 0003 - Adds inactive_timeout based replication slot invalidation.
> 0004 - Adds XID based replication slot invalidation.
>
> Thoughts?

Needed a rebase due to c393308b. Please find the attached v2 patch set.

--
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Attachment

Re: Introduce XID age and inactive timeout based replication slot invalidation

From
Bharath Rupireddy
Date:
On Sat, Jan 27, 2024 at 1:18 AM Bharath Rupireddy
<bharath.rupireddyforpostgres@gmail.com> wrote:
>
> On Thu, Jan 11, 2024 at 10:48 AM Bharath Rupireddy
> <bharath.rupireddyforpostgres@gmail.com> wrote:
> >
> > Hi,
> >
> > Replication slots in postgres will prevent removal of required
> > resources when there is no connection using them (inactive). This
> > consumes storage because neither required WAL nor required rows from
> > the user tables/system catalogs can be removed by VACUUM as long as
> > they are required by a replication slot. In extreme cases this could
> > cause the transaction ID wraparound.
> >
> > Currently postgres has the ability to invalidate inactive replication
> > slots based on the amount of WAL (set via max_slot_wal_keep_size GUC)
> > that will be needed for the slots in case they become active. However,
> > the wraparound issue isn't effectively covered by
> > max_slot_wal_keep_size - one can't tell postgres to invalidate a
> > replication slot if it is blocking VACUUM. Also, it is often tricky to
> > choose a default value for max_slot_wal_keep_size, because the amount
> > of WAL that gets generated and allocated storage for the database can
> > vary.
> >
> > Therefore, it is often easy for developers to do the following:
> > a) set an XID age (age of slot's xmin or catalog_xmin) of say 1 or 1.5
> > billion, after which the slots get invalidated.
> > b) set a timeout of say 1 or 2 or 3 days, after which the inactive
> > slots get invalidated.
> >
> > To implement (a), postgres needs a new GUC called max_slot_xid_age.
> > The checkpointer then invalidates all the slots whose xmin (the oldest
> > transaction that this slot needs the database to retain) or
> > catalog_xmin (the oldest transaction affecting the system catalogs
> > that this slot needs the database to retain) has reached the age
> > specified by this setting.
> >
> > To implement (b), first postgres needs to track the replication slot
> > metrics like the time at which the slot became inactive (inactive_at
> > timestamptz) and the total number of times the slot became inactive in
> > its lifetime (inactive_count numeric) in ReplicationSlotPersistentData
> > structure. And, then it needs a new timeout GUC called
> > inactive_replication_slot_timeout. Whenever a slot becomes inactive,
> > the current timestamp and inactive count are stored in
> > ReplicationSlotPersistentData structure and persisted to disk. The
> > checkpointer then invalidates all the slots that are lying inactive
> > for about inactive_replication_slot_timeout duration starting from
> > inactive_at.
> >
> > In addition to implementing (b), these two new metrics enable
> > developers to improve their monitoring tools as the metrics are
> > exposed via pg_replication_slots system view. For instance, one can
> > build a monitoring tool that signals when replication slots are lying
> > inactive for a day or so using inactive_at metric, and/or when a
> > replication slot is becoming inactive too frequently using inactive_at
> > metric.
> >
> > I’m attaching the v1 patch set as described below:
> > 0001 - Tracks invalidation_reason in pg_replication_slots. This is
> > needed because slots now have multiple reasons for slot invalidation.
> > 0002 - Tracks inactive replication slot information inactive_at and
> > inactive_timeout.
> > 0003 - Adds inactive_timeout based replication slot invalidation.
> > 0004 - Adds XID based replication slot invalidation.
> >
> > Thoughts?
>
> Needed a rebase due to c393308b. Please find the attached v2 patch set.

Needed a rebase due to commit 776621a (conflict in
src/test/recovery/meson.build for new TAP test file added). Please
find the attached v3 patch set.

--
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Attachment

Re: Introduce XID age and inactive timeout based replication slot invalidation

From
Bertrand Drouvot
Date:
Hi,

On Thu, Jan 11, 2024 at 10:48:13AM +0530, Bharath Rupireddy wrote:
> Hi,
> 
> Therefore, it is often easy for developers to do the following:
> a) set an XID age (age of slot's xmin or catalog_xmin) of say 1 or 1.5
> billion, after which the slots get invalidated.
> b) set a timeout of say 1 or 2 or 3 days, after which the inactive
> slots get invalidated.
> 
> To implement (a), postgres needs a new GUC called max_slot_xid_age.
> The checkpointer then invalidates all the slots whose xmin (the oldest
> transaction that this slot needs the database to retain) or
> catalog_xmin (the oldest transaction affecting the system catalogs
> that this slot needs the database to retain) has reached the age
> specified by this setting.
> 
> To implement (b), first postgres needs to track the replication slot
> metrics like the time at which the slot became inactive (inactive_at
> timestamptz) and the total number of times the slot became inactive in
> its lifetime (inactive_count numeric) in ReplicationSlotPersistentData
> structure. And, then it needs a new timeout GUC called
> inactive_replication_slot_timeout. Whenever a slot becomes inactive,
> the current timestamp and inactive count are stored in
> ReplicationSlotPersistentData structure and persisted to disk. The
> checkpointer then invalidates all the slots that are lying inactive
> for about inactive_replication_slot_timeout duration starting from
> inactive_at.
> 
> In addition to implementing (b), these two new metrics enable
> developers to improve their monitoring tools as the metrics are
> exposed via pg_replication_slots system view. For instance, one can
> build a monitoring tool that signals when replication slots are lying
> inactive for a day or so using inactive_at metric, and/or when a
> replication slot is becoming inactive too frequently using inactive_at
> metric.

Thanks for the patch and +1 for the idea, I think adding those new
"invalidation reasons" make sense.

> 
> I’m attaching the v1 patch set as described below:
> 0001 - Tracks invalidation_reason in pg_replication_slots. This is
> needed because slots now have multiple reasons for slot invalidation.
> 0002 - Tracks inactive replication slot information inactive_at and
> inactive_timeout.
> 0003 - Adds inactive_timeout based replication slot invalidation.
> 0004 - Adds XID based replication slot invalidation.
>

I think it's better to have the XID one being discussed/implemented before the
inactive_timeout one: what about changing the 0002, 0003 and 0004 ordering?

0004 -> 0002
0002 -> 0003
0003 -> 0004

As far 0001:

"
This commit renames conflict_reason to
invalidation_reason, and adds the support to show invalidation
reasons for both physical and logical slots.
"

I'm not sure I like the fact that "invalidations" and "conflicts" are merged
into a single field. I'd vote to keep conflict_reason as it is and add a new
invalidation_reason (and put "conflict" as value when it is the case). The reason
is that I think they are 2 different concepts (could be linked though) and that
it would be easier to check for conflicts (means conflict_reason is not NULL).

Regards,
 
-- 
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com



On Thu, Jan 11, 2024 at 10:48 AM Bharath Rupireddy
<bharath.rupireddyforpostgres@gmail.com> wrote:
>
> Hi,
>
> Replication slots in postgres will prevent removal of required
> resources when there is no connection using them (inactive). This
> consumes storage because neither required WAL nor required rows from
> the user tables/system catalogs can be removed by VACUUM as long as
> they are required by a replication slot. In extreme cases this could
> cause the transaction ID wraparound.
>
> Currently postgres has the ability to invalidate inactive replication
> slots based on the amount of WAL (set via max_slot_wal_keep_size GUC)
> that will be needed for the slots in case they become active. However,
> the wraparound issue isn't effectively covered by
> max_slot_wal_keep_size - one can't tell postgres to invalidate a
> replication slot if it is blocking VACUUM. Also, it is often tricky to
> choose a default value for max_slot_wal_keep_size, because the amount
> of WAL that gets generated and allocated storage for the database can
> vary.
>
> Therefore, it is often easy for developers to do the following:
> a) set an XID age (age of slot's xmin or catalog_xmin) of say 1 or 1.5
> billion, after which the slots get invalidated.
> b) set a timeout of say 1 or 2 or 3 days, after which the inactive
> slots get invalidated.
>
> To implement (a), postgres needs a new GUC called max_slot_xid_age.
> The checkpointer then invalidates all the slots whose xmin (the oldest
> transaction that this slot needs the database to retain) or
> catalog_xmin (the oldest transaction affecting the system catalogs
> that this slot needs the database to retain) has reached the age
> specified by this setting.
>
> To implement (b), first postgres needs to track the replication slot
> metrics like the time at which the slot became inactive (inactive_at
> timestamptz) and the total number of times the slot became inactive in
> its lifetime (inactive_count numeric) in ReplicationSlotPersistentData
> structure. And, then it needs a new timeout GUC called
> inactive_replication_slot_timeout. Whenever a slot becomes inactive,
> the current timestamp and inactive count are stored in
> ReplicationSlotPersistentData structure and persisted to disk. The
> checkpointer then invalidates all the slots that are lying inactive
> for about inactive_replication_slot_timeout duration starting from
> inactive_at.
>
> In addition to implementing (b), these two new metrics enable
> developers to improve their monitoring tools as the metrics are
> exposed via pg_replication_slots system view. For instance, one can
> build a monitoring tool that signals when replication slots are lying
> inactive for a day or so using inactive_at metric, and/or when a
> replication slot is becoming inactive too frequently using inactive_at
> metric.
>
> I’m attaching the v1 patch set as described below:
> 0001 - Tracks invalidation_reason in pg_replication_slots. This is
> needed because slots now have multiple reasons for slot invalidation.
> 0002 - Tracks inactive replication slot information inactive_at and
> inactive_timeout.
> 0003 - Adds inactive_timeout based replication slot invalidation.
> 0004 - Adds XID based replication slot invalidation.
>
> Thoughts?
>
+1 for the idea,  here are some comments on 0002, I will review other
patches soon and respond.

1.
+      <entry role="catalog_table_entry"><para role="column_definition">
+       <structfield>inactive_at</structfield> <type>timestamptz</type>
+      </para>
+      <para>
+        The time at which the slot became inactive.
+        <literal>NULL</literal> if the slot is currently actively being
+        used.
+      </para></entry>
+     </row>

Maybe we can change the field name to 'last_inactive_at'? or maybe the
comment can explain timestampt at which slot was last inactivated.
I think since we are already maintaining the inactive_count so better
to explicitly say this is the last invaliding time.

2.
+ /*
+ * XXX: Can inactive_count of type uint64 ever overflow? It takes
+ * about a half-billion years for inactive_count to overflow even
+ * if slot becomes inactive for every 1 millisecond. So, using
+ * pg_add_u64_overflow might be an overkill.
+ */

Correct we don't need to use pg_add_u64_overflow for this counter.

3.

+
+ /* Convert to numeric. */
+ snprintf(buf, sizeof buf, UINT64_FORMAT, slot_contents.data.inactive_count);
+ values[i++] = DirectFunctionCall3(numeric_in,
+   CStringGetDatum(buf),
+   ObjectIdGetDatum(0),
+   Int32GetDatum(-1));

What is the purpose of doing this? I mean inactive_count is 8 byte
integer and you can define function outparameter as 'int8' which is 8
byte integer.  Then you don't need to convert int to string and then
to numeric?


--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com



Re: Introduce XID age and inactive timeout based replication slot invalidation

From
Bharath Rupireddy
Date:
On Tue, Feb 6, 2024 at 2:16 PM Dilip Kumar <dilipbalaut@gmail.com> wrote:
>
> > Thoughts?
> >
> +1 for the idea,  here are some comments on 0002, I will review other
> patches soon and respond.

Thanks for looking at it.

> +       <structfield>inactive_at</structfield> <type>timestamptz</type>
>
> Maybe we can change the field name to 'last_inactive_at'? or maybe the
> comment can explain timestampt at which slot was last inactivated.
> I think since we are already maintaining the inactive_count so better
> to explicitly say this is the last invaliding time.

last_inactive_at looks better, so will use that in the next version of
the patch.

> 2.
> + /*
> + * XXX: Can inactive_count of type uint64 ever overflow? It takes
> + * about a half-billion years for inactive_count to overflow even
> + * if slot becomes inactive for every 1 millisecond. So, using
> + * pg_add_u64_overflow might be an overkill.
> + */
>
> Correct we don't need to use pg_add_u64_overflow for this counter.

Will remove this comment in the next version of the patch.

> + /* Convert to numeric. */
> + snprintf(buf, sizeof buf, UINT64_FORMAT, slot_contents.data.inactive_count);
> + values[i++] = DirectFunctionCall3(numeric_in,
> +   CStringGetDatum(buf),
> +   ObjectIdGetDatum(0),
> +   Int32GetDatum(-1));
>
> What is the purpose of doing this? I mean inactive_count is 8 byte
> integer and you can define function outparameter as 'int8' which is 8
> byte integer.  Then you don't need to convert int to string and then
> to numeric?

Nope, it's of type uint64, so reporting it as numeric is a way
typically used elsewhere - see code around /* Convert to numeric. */.

--
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com



Re: Introduce XID age and inactive timeout based replication slot invalidation

From
Bharath Rupireddy
Date:
On Mon, Feb 5, 2024 at 3:15 PM Bertrand Drouvot
<bertranddrouvot.pg@gmail.com> wrote:
>
> Thanks for the patch and +1 for the idea, I think adding those new
> "invalidation reasons" make sense.

Thanks for looking at it.

> I think it's better to have the XID one being discussed/implemented before the
> inactive_timeout one: what about changing the 0002, 0003 and 0004 ordering?
>
> 0004 -> 0002
> 0002 -> 0003
> 0003 -> 0004

Done that way.

> As far 0001:
>
> "
> This commit renames conflict_reason to
> invalidation_reason, and adds the support to show invalidation
> reasons for both physical and logical slots.
> "
>
> I'm not sure I like the fact that "invalidations" and "conflicts" are merged
> into a single field. I'd vote to keep conflict_reason as it is and add a new
> invalidation_reason (and put "conflict" as value when it is the case). The reason
> is that I think they are 2 different concepts (could be linked though) and that
> it would be easier to check for conflicts (means conflict_reason is not NULL).

So, do you want conflict_reason for only logical slots, and a separate
column for invalidation_reason for both logical and physical slots? Is
there any strong reason to have two properties "conflict" and
"invalidated" for slots? They both are the same internally, so why
confuse the users?

--
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Attachment

Re: Introduce XID age and inactive timeout based replication slot invalidation

From
Bertrand Drouvot
Date:
Hi,

On Wed, Feb 07, 2024 at 12:22:07AM +0530, Bharath Rupireddy wrote:
> On Mon, Feb 5, 2024 at 3:15 PM Bertrand Drouvot
> <bertranddrouvot.pg@gmail.com> wrote:
> >
> > I'm not sure I like the fact that "invalidations" and "conflicts" are merged
> > into a single field. I'd vote to keep conflict_reason as it is and add a new
> > invalidation_reason (and put "conflict" as value when it is the case). The reason
> > is that I think they are 2 different concepts (could be linked though) and that
> > it would be easier to check for conflicts (means conflict_reason is not NULL).
> 
> So, do you want conflict_reason for only logical slots, and a separate
> column for invalidation_reason for both logical and physical slots?

Yes, with "conflict" as value in case of conflicts (and one would need to refer
to the conflict_reason reason to see the reason).

> Is there any strong reason to have two properties "conflict" and
> "invalidated" for slots?

I think "conflict" is an important topic and does contain several reasons. The
slot "first" conflict and then leads to slot "invalidation". 

> They both are the same internally, so why
> confuse the users?

I don't think that would confuse the users, I do think that would be easier to
check for conflicting slots.

I did not look closely at the code, just played a bit with the patch and was able
to produce something like:

postgres=# select slot_name,slot_type,active,active_pid,wal_status,invalidation_reason from pg_replication_slots;
  slot_name  | slot_type | active | active_pid | wal_status | invalidation_reason
-------------+-----------+--------+------------+------------+---------------------
 rep1        | physical  | f      |            | reserved   |
 master_slot | physical  | t      |    1482441 | unreserved | wal_removed
(2 rows)

does that make sense to have an "active/working" slot "ivalidated"?

Regards,

-- 
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com



Re: Introduce XID age and inactive timeout based replication slot invalidation

From
Bharath Rupireddy
Date:
On Fri, Feb 9, 2024 at 1:12 PM Bertrand Drouvot
<bertranddrouvot.pg@gmail.com> wrote:
>
> I think "conflict" is an important topic and does contain several reasons. The
> slot "first" conflict and then leads to slot "invalidation".
>
> > They both are the same internally, so why
> > confuse the users?
>
> I don't think that would confuse the users, I do think that would be easier to
> check for conflicting slots.

I've added a separate column for invalidation reasons for now. I'll
see how others think on this as the time goes by.

> I did not look closely at the code, just played a bit with the patch and was able
> to produce something like:
>
> postgres=# select slot_name,slot_type,active,active_pid,wal_status,invalidation_reason from pg_replication_slots;
>   slot_name  | slot_type | active | active_pid | wal_status | invalidation_reason
> -------------+-----------+--------+------------+------------+---------------------
>  rep1        | physical  | f      |            | reserved   |
>  master_slot | physical  | t      |    1482441 | unreserved | wal_removed
> (2 rows)
>
> does that make sense to have an "active/working" slot "ivalidated"?

Thanks. Can you please provide the steps to generate this error? Are
you setting max_slot_wal_keep_size on primary to generate
"wal_removed"?

Attached v5 patch set after rebasing and addressing review comments.
Please review it further.

--
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Attachment

Re: Introduce XID age and inactive timeout based replication slot invalidation

From
Bharath Rupireddy
Date:
On Tue, Feb 20, 2024 at 12:05 PM Bharath Rupireddy
<bharath.rupireddyforpostgres@gmail.com> wrote:
>
>> [...] and was able to produce something like:
> >
> > postgres=# select slot_name,slot_type,active,active_pid,wal_status,invalidation_reason from pg_replication_slots;
> >   slot_name  | slot_type | active | active_pid | wal_status | invalidation_reason
> > -------------+-----------+--------+------------+------------+---------------------
> >  rep1        | physical  | f      |            | reserved   |
> >  master_slot | physical  | t      |    1482441 | unreserved | wal_removed
> > (2 rows)
> >
> > does that make sense to have an "active/working" slot "ivalidated"?
>
> Thanks. Can you please provide the steps to generate this error? Are
> you setting max_slot_wal_keep_size on primary to generate
> "wal_removed"?

I'm able to reproduce [1] the state [2] where the slot got invalidated
first, then its wal_status became unreserved, but still the slot is
serving after the standby comes up online after it catches up with the
primary getting the WAL files from the archive. There's a good reason
for this state -

https://git.postgresql.org/gitweb/?p=postgresql.git;a=blob;f=src/backend/replication/slotfuncs.c;h=d2fa5e669a32f19989b0d987d3c7329851a1272e;hb=ff9e1e764fcce9a34467d614611a34d4d2a91b50#l351.
This intermittent state can only happen for physical slots, not for
logical slots because logical subscribers can't get the missing
changes from the WAL stored in the archive.

And, the fact looks to be that an invalidated slot can never become
normal but still can serve a standby if the standby is able to catch
up by fetching required WAL (this is the WAL the slot couldn't keep
for the standby) from elsewhere (archive via restore_command).

As far as the 0001 patch is concerned, it reports the
invalidation_reason as long as slot_contents.data.invalidated !=
RS_INVAL_NONE. I think this is okay.

Thoughts?

[1]
./initdb -D db17
echo "max_wal_size = 128MB
max_slot_wal_keep_size = 64MB
archive_mode = on
archive_command='cp %p
/home/ubuntu/postgres/pg17/bin/archived_wal/%f'" | tee -a
db17/postgresql.conf

./pg_ctl -D db17 -l logfile17 start

./psql -d postgres -p 5432 -c "SELECT
pg_create_physical_replication_slot('sb_repl_slot', true, false);"

rm -rf sbdata logfilesbdata
./pg_basebackup -D sbdata

echo "port=5433
primary_conninfo='host=localhost port=5432 dbname=postgres user=ubuntu'
primary_slot_name='sb_repl_slot'
restore_command='cp /home/ubuntu/postgres/pg17/bin/archived_wal/%f
%p'" | tee -a sbdata/postgresql.conf

touch sbdata/standby.signal

./pg_ctl -D sbdata -l logfilesbdata start
./psql -d postgres -p 5433 -c "SELECT pg_is_in_recovery();"

./pg_ctl -D sbdata -l logfilesbdata stop

./psql -d postgres -p 5432 -c "SELECT pg_logical_emit_message(true,
'mymessage', repeat('aaaa', 10000000));"
./psql -d postgres -p 5432 -c "CHECKPOINT;"
./pg_ctl -D sbdata -l logfilesbdata start
./psql -d postgres -p 5432 -xc "SELECT * FROM pg_replication_slots;"

[2]
postgres=# SELECT * FROM pg_replication_slots;
-[ RECORD 1 ]-------+-------------
slot_name           | sb_repl_slot
plugin              |
slot_type           | physical
datoid              |
database            |
temporary           | f
active              | t
active_pid          | 710667
xmin                |
catalog_xmin        |
restart_lsn         | 0/115D21A0
confirmed_flush_lsn |
wal_status          | unreserved
safe_wal_size       | 77782624
two_phase           | f
conflict_reason     |
failover            | f
synced              | f
invalidation_reason | wal_removed
last_inactive_at    |
inactive_count      | 1

--
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com



Re: Introduce XID age and inactive timeout based replication slot invalidation

From
Bertrand Drouvot
Date:
Hi,

On Wed, Feb 21, 2024 at 10:55:00AM +0530, Bharath Rupireddy wrote:
> On Tue, Feb 20, 2024 at 12:05 PM Bharath Rupireddy
> <bharath.rupireddyforpostgres@gmail.com> wrote:
> >
> >> [...] and was able to produce something like:
> > >
> > > postgres=# select slot_name,slot_type,active,active_pid,wal_status,invalidation_reason from
pg_replication_slots;
> > >   slot_name  | slot_type | active | active_pid | wal_status | invalidation_reason
> > > -------------+-----------+--------+------------+------------+---------------------
> > >  rep1        | physical  | f      |            | reserved   |
> > >  master_slot | physical  | t      |    1482441 | unreserved | wal_removed
> > > (2 rows)
> > >
> > > does that make sense to have an "active/working" slot "ivalidated"?
> >
> > Thanks. Can you please provide the steps to generate this error? Are
> > you setting max_slot_wal_keep_size on primary to generate
> > "wal_removed"?
> 
> I'm able to reproduce [1] the state [2] where the slot got invalidated
> first, then its wal_status became unreserved, but still the slot is
> serving after the standby comes up online after it catches up with the
> primary getting the WAL files from the archive. There's a good reason
> for this state -
>
https://git.postgresql.org/gitweb/?p=postgresql.git;a=blob;f=src/backend/replication/slotfuncs.c;h=d2fa5e669a32f19989b0d987d3c7329851a1272e;hb=ff9e1e764fcce9a34467d614611a34d4d2a91b50#l351.
> This intermittent state can only happen for physical slots, not for
> logical slots because logical subscribers can't get the missing
> changes from the WAL stored in the archive.
> 
> And, the fact looks to be that an invalidated slot can never become
> normal but still can serve a standby if the standby is able to catch
> up by fetching required WAL (this is the WAL the slot couldn't keep
> for the standby) from elsewhere (archive via restore_command).
> 
> As far as the 0001 patch is concerned, it reports the
> invalidation_reason as long as slot_contents.data.invalidated !=
> RS_INVAL_NONE. I think this is okay.
> 
> Thoughts?

Yeah, looking at the code I agree that looks ok. OTOH, that looks confusing,
maybe we should add a few words about it in the doc?

Looking at v5-0001:

+      <entry role="catalog_table_entry"><para role="column_definition">
+       <structfield>invalidation_reason</structfield> <type>text</type>
+      </para>
+      <para>

My initial thought was to put "conflict" value in this new field in case of
conflict (not to mention the conflict reason in it). With the current proposal
invalidation_reason could report the same as conflict_reason, which sounds weird
to me.

Does that make sense to you to use "conflict" as value in "invalidation_reason"
when the slot has "conflict_reason" not NULL?

Regards,

-- 
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com



Re: Introduce XID age and inactive timeout based replication slot invalidation

From
Bharath Rupireddy
Date:
On Wed, Feb 21, 2024 at 5:55 PM Bertrand Drouvot
<bertranddrouvot.pg@gmail.com> wrote:
>
> > As far as the 0001 patch is concerned, it reports the
> > invalidation_reason as long as slot_contents.data.invalidated !=
> > RS_INVAL_NONE. I think this is okay.
> >
> > Thoughts?
>
> Yeah, looking at the code I agree that looks ok. OTOH, that looks confusing,
> maybe we should add a few words about it in the doc?

I'll think about it.

> Looking at v5-0001:
>
> +      <entry role="catalog_table_entry"><para role="column_definition">
> +       <structfield>invalidation_reason</structfield> <type>text</type>
> +      </para>
> +      <para>
>
> My initial thought was to put "conflict" value in this new field in case of
> conflict (not to mention the conflict reason in it). With the current proposal
> invalidation_reason could report the same as conflict_reason, which sounds weird
> to me.
>
> Does that make sense to you to use "conflict" as value in "invalidation_reason"
> when the slot has "conflict_reason" not NULL?

I'm thinking the other way around - how about we revert
https://git.postgresql.org/gitweb/?p=postgresql.git;a=commitdiff;h=007693f2a3ac2ac19affcb03ad43cdb36ccff5b5,
that is, put in place "conflict" as a boolean and introduce
invalidation_reason the text form. So, for logical slots, whenever the
"conflict" column is true, the reason is found in invaldiation_reason
column? How does it sound? Again the debate might be "conflict" vs
"invalidation", but that looks clean IMHO.

--
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com



Re: Introduce XID age and inactive timeout based replication slot invalidation

From
Bertrand Drouvot
Date:
Hi,

On Wed, Feb 21, 2024 at 08:10:00PM +0530, Bharath Rupireddy wrote:
> On Wed, Feb 21, 2024 at 5:55 PM Bertrand Drouvot
> <bertranddrouvot.pg@gmail.com> wrote:
> > My initial thought was to put "conflict" value in this new field in case of
> > conflict (not to mention the conflict reason in it). With the current proposal
> > invalidation_reason could report the same as conflict_reason, which sounds weird
> > to me.
> >
> > Does that make sense to you to use "conflict" as value in "invalidation_reason"
> > when the slot has "conflict_reason" not NULL?
> 
> I'm thinking the other way around - how about we revert
> https://git.postgresql.org/gitweb/?p=postgresql.git;a=commitdiff;h=007693f2a3ac2ac19affcb03ad43cdb36ccff5b5,
> that is, put in place "conflict" as a boolean and introduce
> invalidation_reason the text form. So, for logical slots, whenever the
> "conflict" column is true, the reason is found in invaldiation_reason
> column? How does it sound?

Yeah, I think that looks fine too. We would need more change (like take care of
ddd5f4f54a for example).

CC'ing Amit, Hou-San and Shveta to get their point of view (as the ones behind
007693f2a3 and ddd5f4f54a).

Regarding,

-- 
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com



On Thu, Feb 22, 2024 at 1:44 PM Bertrand Drouvot
<bertranddrouvot.pg@gmail.com> wrote:
>
> > > Does that make sense to you to use "conflict" as value in "invalidation_reason"
> > > when the slot has "conflict_reason" not NULL?
> >
> > I'm thinking the other way around - how about we revert
> > https://git.postgresql.org/gitweb/?p=postgresql.git;a=commitdiff;h=007693f2a3ac2ac19affcb03ad43cdb36ccff5b5,
> > that is, put in place "conflict" as a boolean and introduce
> > invalidation_reason the text form. So, for logical slots, whenever the
> > "conflict" column is true, the reason is found in invaldiation_reason
> > column? How does it sound?
>
> Yeah, I think that looks fine too. We would need more change (like take care of
> ddd5f4f54a for example).
>
> CC'ing Amit, Hou-San and Shveta to get their point of view (as the ones behind
> 007693f2a3 and ddd5f4f54a).

Yeah, let's wait for what others think about it.

FWIW, I've had to rebase the patches due to 943f7ae1c. Please see the
attached v6 patch set.

--
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Attachment
On Wed, Feb 21, 2024 at 08:10:00PM +0530, Bharath Rupireddy wrote:
> I'm thinking the other way around - how about we revert
> https://git.postgresql.org/gitweb/?p=postgresql.git;a=commitdiff;h=007693f2a3ac2ac19affcb03ad43cdb36ccff5b5,
> that is, put in place "conflict" as a boolean and introduce
> invalidation_reason the text form. So, for logical slots, whenever the
> "conflict" column is true, the reason is found in invaldiation_reason
> column? How does it sound? Again the debate might be "conflict" vs
> "invalidation", but that looks clean IMHO.

Would you ever see "conflict" as false and "invalidation_reason" as
non-null for a logical slot?

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com



On Sat, Mar 2, 2024 at 3:41 AM Nathan Bossart <nathandbossart@gmail.com> wrote:
>
> > [....] how about we revert
> > https://git.postgresql.org/gitweb/?p=postgresql.git;a=commitdiff;h=007693f2a3ac2ac19affcb03ad43cdb36ccff5b5,
>
> Would you ever see "conflict" as false and "invalidation_reason" as
> non-null for a logical slot?

No. Because both conflict and invalidation_reason are decided based on
the invalidation reason i.e. value of slot_contents.data.invalidated.
IOW, a logical slot that reports conflict as true must have been
invalidated.

Do you have any thoughts on reverting 007693f and introducing
invalidation_reason?

--
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com



On Sun, Mar 03, 2024 at 11:40:00PM +0530, Bharath Rupireddy wrote:
> On Sat, Mar 2, 2024 at 3:41 AM Nathan Bossart <nathandbossart@gmail.com> wrote:
>> Would you ever see "conflict" as false and "invalidation_reason" as
>> non-null for a logical slot?
> 
> No. Because both conflict and invalidation_reason are decided based on
> the invalidation reason i.e. value of slot_contents.data.invalidated.
> IOW, a logical slot that reports conflict as true must have been
> invalidated.
> 
> Do you have any thoughts on reverting 007693f and introducing
> invalidation_reason?

Unless I am misinterpreting some details, ISTM we could rename this column
to invalidation_reason and use it for both logical and physical slots.  I'm
not seeing a strong need for another column.  Perhaps I am missing
something...

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com



On Sun, Mar 03, 2024 at 03:44:34PM -0600, Nathan Bossart wrote:
> On Sun, Mar 03, 2024 at 11:40:00PM +0530, Bharath Rupireddy wrote:
>> Do you have any thoughts on reverting 007693f and introducing
>> invalidation_reason?
>
> Unless I am misinterpreting some details, ISTM we could rename this column
> to invalidation_reason and use it for both logical and physical slots.  I'm
> not seeing a strong need for another column.  Perhaps I am missing
> something...

And also, please don't be hasty in taking a decision that would
involve a revert of 007693f without informing the committer of this
commit about that.  I am adding Amit Kapila in CC of this thread for
awareness.
--
Michael

Attachment
Hi,

On Sun, Mar 03, 2024 at 03:44:34PM -0600, Nathan Bossart wrote:
> On Sun, Mar 03, 2024 at 11:40:00PM +0530, Bharath Rupireddy wrote:
> > On Sat, Mar 2, 2024 at 3:41 AM Nathan Bossart <nathandbossart@gmail.com> wrote:
> >> Would you ever see "conflict" as false and "invalidation_reason" as
> >> non-null for a logical slot?
> > 
> > No. Because both conflict and invalidation_reason are decided based on
> > the invalidation reason i.e. value of slot_contents.data.invalidated.
> > IOW, a logical slot that reports conflict as true must have been
> > invalidated.
> > 
> > Do you have any thoughts on reverting 007693f and introducing
> > invalidation_reason?
> 
> Unless I am misinterpreting some details, ISTM we could rename this column
> to invalidation_reason and use it for both logical and physical slots.  I'm
> not seeing a strong need for another column.

Yeah having two columns was more for convenience purpose. Without the "conflict"
one, a slot conflicting with recovery would be "a logical slot having a non NULL
invalidation_reason".

I'm also fine with one column if most of you prefer that way.

Regards,

-- 
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com



On Mon, Mar 4, 2024 at 2:11 PM Bertrand Drouvot
<bertranddrouvot.pg@gmail.com> wrote:
>
> On Sun, Mar 03, 2024 at 03:44:34PM -0600, Nathan Bossart wrote:
> > On Sun, Mar 03, 2024 at 11:40:00PM +0530, Bharath Rupireddy wrote:
> > > On Sat, Mar 2, 2024 at 3:41 AM Nathan Bossart <nathandbossart@gmail.com> wrote:
> > >> Would you ever see "conflict" as false and "invalidation_reason" as
> > >> non-null for a logical slot?
> > >
> > > No. Because both conflict and invalidation_reason are decided based on
> > > the invalidation reason i.e. value of slot_contents.data.invalidated.
> > > IOW, a logical slot that reports conflict as true must have been
> > > invalidated.
> > >
> > > Do you have any thoughts on reverting 007693f and introducing
> > > invalidation_reason?
> >
> > Unless I am misinterpreting some details, ISTM we could rename this column
> > to invalidation_reason and use it for both logical and physical slots.  I'm
> > not seeing a strong need for another column.
>
> Yeah having two columns was more for convenience purpose. Without the "conflict"
> one, a slot conflicting with recovery would be "a logical slot having a non NULL
> invalidation_reason".
>
> I'm also fine with one column if most of you prefer that way.

While we debate on the above, please find the attached v7 patch set
after rebasing.

--
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Attachment
On Wed, Mar 06, 2024 at 12:50:38AM +0530, Bharath Rupireddy wrote:
> On Mon, Mar 4, 2024 at 2:11 PM Bertrand Drouvot
> <bertranddrouvot.pg@gmail.com> wrote:
>> On Sun, Mar 03, 2024 at 03:44:34PM -0600, Nathan Bossart wrote:
>> > Unless I am misinterpreting some details, ISTM we could rename this column
>> > to invalidation_reason and use it for both logical and physical slots.  I'm
>> > not seeing a strong need for another column.
>>
>> Yeah having two columns was more for convenience purpose. Without the "conflict"
>> one, a slot conflicting with recovery would be "a logical slot having a non NULL
>> invalidation_reason".
>>
>> I'm also fine with one column if most of you prefer that way.
> 
> While we debate on the above, please find the attached v7 patch set
> after rebasing.

It looks like Bertrand is okay with reusing the same column for both
logical and physical slots, which IIUC is what you initially proposed in v1
of the patch set.

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com



Hi,

On Tue, Mar 05, 2024 at 01:44:43PM -0600, Nathan Bossart wrote:
> On Wed, Mar 06, 2024 at 12:50:38AM +0530, Bharath Rupireddy wrote:
> > On Mon, Mar 4, 2024 at 2:11 PM Bertrand Drouvot
> > <bertranddrouvot.pg@gmail.com> wrote:
> >> On Sun, Mar 03, 2024 at 03:44:34PM -0600, Nathan Bossart wrote:
> >> > Unless I am misinterpreting some details, ISTM we could rename this column
> >> > to invalidation_reason and use it for both logical and physical slots.  I'm
> >> > not seeing a strong need for another column.
> >>
> >> Yeah having two columns was more for convenience purpose. Without the "conflict"
> >> one, a slot conflicting with recovery would be "a logical slot having a non NULL
> >> invalidation_reason".
> >>
> >> I'm also fine with one column if most of you prefer that way.
> > 
> > While we debate on the above, please find the attached v7 patch set
> > after rebasing.
> 
> It looks like Bertrand is okay with reusing the same column for both
> logical and physical slots

Yeah, I'm okay with one column.

Regards,

-- 
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com



On Wed, Mar 6, 2024 at 2:42 PM Bertrand Drouvot
<bertranddrouvot.pg@gmail.com> wrote:
>
> Hi,
>
> On Tue, Mar 05, 2024 at 01:44:43PM -0600, Nathan Bossart wrote:
> > On Wed, Mar 06, 2024 at 12:50:38AM +0530, Bharath Rupireddy wrote:
> > > On Mon, Mar 4, 2024 at 2:11 PM Bertrand Drouvot
> > > <bertranddrouvot.pg@gmail.com> wrote:
> > >> On Sun, Mar 03, 2024 at 03:44:34PM -0600, Nathan Bossart wrote:
> > >> > Unless I am misinterpreting some details, ISTM we could rename this column
> > >> > to invalidation_reason and use it for both logical and physical slots.  I'm
> > >> > not seeing a strong need for another column.
> > >>
> > >> Yeah having two columns was more for convenience purpose. Without the "conflict"
> > >> one, a slot conflicting with recovery would be "a logical slot having a non NULL
> > >> invalidation_reason".
> > >>
> > >> I'm also fine with one column if most of you prefer that way.
> > >
> > > While we debate on the above, please find the attached v7 patch set
> > > after rebasing.
> >
> > It looks like Bertrand is okay with reusing the same column for both
> > logical and physical slots
>
> Yeah, I'm okay with one column.

Thanks. v8-0001 is how it looks. Please see the v8 patch set with this change.

--
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Attachment
Hi,

On Wed, Mar 06, 2024 at 02:46:57PM +0530, Bharath Rupireddy wrote:
> On Wed, Mar 6, 2024 at 2:42 PM Bertrand Drouvot
> <bertranddrouvot.pg@gmail.com> wrote:
> > Yeah, I'm okay with one column.
> 
> Thanks. v8-0001 is how it looks. Please see the v8 patch set with this change.

Thanks!

A few comments:

1 ===

+       The reason for the slot's invalidation. <literal>NULL</literal> if the
+       slot is currently actively being used.

s/currently actively being used/not invalidated/ ? (I mean it could be valid
and not being used).

2 ===

+       the slot is marked as invalidated. In case of logical slots, it
+       represents the reason for the logical slot's conflict with recovery.

s/the reason for the logical slot's conflict with recovery./the recovery conflict reason./ ?

3 ===

@@ -667,13 +667,13 @@ get_old_cluster_logical_slot_infos(DbInfo *dbinfo, bool live_check)
         * removed.
         */
        res = executeQueryOrDie(conn, "SELECT slot_name, plugin, two_phase, failover, "
-                                                       "%s as caught_up, conflict_reason IS NOT NULL as invalid "
+                                                       "%s as caught_up, invalidation_reason IS NOT NULL as invalid "
                                                        "FROM pg_catalog.pg_replication_slots "
                                                        "WHERE slot_type = 'logical' AND "
                                                        "database = current_database() AND "
                                                        "temporary IS FALSE;",
                                                        live_check ? "FALSE" :
-                                                       "(CASE WHEN conflict_reason IS NOT NULL THEN FALSE "
+                                                       "(CASE WHEN invalidation_reason IS NOT NULL THEN FALSE "

Yeah that's fine because there is logical slot filtering here.

4 ===

-GetSlotInvalidationCause(const char *conflict_reason)
+GetSlotInvalidationCause(const char *invalidation_reason)

Should we change the comment "Maps a conflict reason" above this function?

5 ===

-# Check conflict_reason is NULL for physical slot
+# Check invalidation_reason is NULL for physical slot
 $res = $node_primary->safe_psql(
        'postgres', qq[
-                SELECT conflict_reason is null FROM pg_replication_slots where slot_name = '$primary_slotname';]
+                SELECT invalidation_reason is null FROM pg_replication_slots where slot_name = '$primary_slotname';]
 );


I don't think this test is needed anymore: it does not make that much sense since
it's done after the primary database initialization and startup.

6 ===

@@ -680,7 +680,7 @@ ok( $node_standby->poll_query_until(
 is( $node_standby->safe_psql(
                'postgres',
                q[select bool_or(conflicting) from
-                 (select conflict_reason is not NULL as conflicting
+                 (select invalidation_reason is not NULL as conflicting
                   from pg_replication_slots WHERE slot_type = 'logical')]),
        'f',
        'Logical slots are reported as non conflicting');

What about?

"
# Verify slots are reported as valid in pg_replication_slots
is( $node_standby->safe_psql(
        'postgres',
        q[select bool_or(invalidated) from
          (select invalidation_reason is not NULL as invalidated
           from pg_replication_slots WHERE slot_type = 'logical')]),
    'f',
    'Logical slots are reported as valid');
"

Regards,

-- 
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com



On Mon, Mar 4, 2024 at 3:14 AM Nathan Bossart <nathandbossart@gmail.com> wrote:
>
> On Sun, Mar 03, 2024 at 11:40:00PM +0530, Bharath Rupireddy wrote:
> > On Sat, Mar 2, 2024 at 3:41 AM Nathan Bossart <nathandbossart@gmail.com> wrote:
> >> Would you ever see "conflict" as false and "invalidation_reason" as
> >> non-null for a logical slot?
> >
> > No. Because both conflict and invalidation_reason are decided based on
> > the invalidation reason i.e. value of slot_contents.data.invalidated.
> > IOW, a logical slot that reports conflict as true must have been
> > invalidated.
> >
> > Do you have any thoughts on reverting 007693f and introducing
> > invalidation_reason?
>
> Unless I am misinterpreting some details, ISTM we could rename this column
> to invalidation_reason and use it for both logical and physical slots.  I'm
> not seeing a strong need for another column.  Perhaps I am missing
> something...
>

IIUC, the current conflict_reason is primarily used to determine
logical slots on standby that got invalidated due to recovery time
conflict. On the primary, it will also show logical slots that got
invalidated due to the corresponding WAL got removed. Is that
understanding correct? If so, we are already sort of overloading this
column. However, now adding more invalidation reasons that won't
happen during recovery conflict handling will change entirely the
purpose (as per the name we use) of this variable. I think
invalidation_reason could depict this column correctly but OTOH I
guess it would lose its original meaning/purpose.

--
With Regards,
Amit Kapila.



On Wed, Mar 6, 2024 at 2:47 PM Bharath Rupireddy
<bharath.rupireddyforpostgres@gmail.com> wrote:
>
>
> Thanks. v8-0001 is how it looks. Please see the v8 patch set with this change.
>

@@ -1629,6 +1634,20 @@
InvalidatePossiblyObsoleteSlot(ReplicationSlotInvalidationCause cause,
  }
  }
  break;
+ case RS_INVAL_INACTIVE_TIMEOUT:
+ if (s->data.last_inactive_at > 0)
+ {
+ TimestampTz now;
+
+ Assert(s->data.persistency == RS_PERSISTENT);
+ Assert(s->active_pid == 0);
+
+ now = GetCurrentTimestamp();
+ if (TimestampDifferenceExceeds(s->data.last_inactive_at, now,
+    inactive_replication_slot_timeout * 1000))

You might want to consider its interaction with sync slots on standby.
Say, there is no activity on slots in terms of processing the changes
for slots. Now, we won't perform sync of such slots on standby showing
them inactive as per your new criteria where as same slots could still
be valid on primary as the walsender is still active. This may be more
of a theoretical point as in running system there will probably be
some activity but I think this needs some thougths.


--
With Regards,
Amit Kapila.



On Wed, Mar 6, 2024 at 4:28 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> IIUC, the current conflict_reason is primarily used to determine
> logical slots on standby that got invalidated due to recovery time
> conflict. On the primary, it will also show logical slots that got
> invalidated due to the corresponding WAL got removed. Is that
> understanding correct?

That's right.

> If so, we are already sort of overloading this
> column. However, now adding more invalidation reasons that won't
> happen during recovery conflict handling will change entirely the
> purpose (as per the name we use) of this variable. I think
> invalidation_reason could depict this column correctly but OTOH I
> guess it would lose its original meaning/purpose.

Hm. I get the concern. Are you okay with having inavlidation_reason
separately for both logical and physical slots? In such a case,
logical slots that got invalidated on the standby will have duplicate
info in conflict_reason and invalidation_reason, is this fine?

Another idea is to make 'conflict_reason text' as a 'conflicting
boolean' again (revert 007693f2a3), and have 'invalidation_reason
text' for both logical and physical slots. So, whenever 'conflicting'
is true, one can look at invalidation_reason for the reason for
conflict. How does this sound?

--
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com



On Wed, Mar 6, 2024 at 4:49 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> You might want to consider its interaction with sync slots on standby.
> Say, there is no activity on slots in terms of processing the changes
> for slots. Now, we won't perform sync of such slots on standby showing
> them inactive as per your new criteria where as same slots could still
> be valid on primary as the walsender is still active. This may be more
> of a theoretical point as in running system there will probably be
> some activity but I think this needs some thougths.

I believe the xmin and catalog_xmin of the sync slots on the standby
keep advancing depending on the slots on the primary, no? If yes, the
XID age based invalidation shouldn't be a problem.

I believe there are no walsenders started for the sync slots on the
standbys, right? If yes, the inactive timeout based invalidation also
shouldn't be a problem. Because, the inactive timeouts for a slot are
tracked only for walsenders because they are the ones that typically
hold replication slots for longer durations and for real replication
use. We did a similar thing in a recent commit [1].

Is my understanding right? Do you still see any problems with it?

[1]
commit 7c3fb505b14e86581b6a052075a294c78c91b123
Author: Amit Kapila <akapila@postgresql.org>
Date:   Tue Nov 21 07:59:53 2023 +0530

    Log messages for replication slot acquisition and release.
.........
    Note that these messages are emitted only for walsenders but not for
    backends. This is because walsenders are the ones that typically hold
    replication slots for longer durations, unlike backends which hold them
    for executing replication related functions.

--
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com



On Fri, Mar 8, 2024 at 8:08 PM Bharath Rupireddy
<bharath.rupireddyforpostgres@gmail.com> wrote:
>
> On Wed, Mar 6, 2024 at 4:28 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> >
> > IIUC, the current conflict_reason is primarily used to determine
> > logical slots on standby that got invalidated due to recovery time
> > conflict. On the primary, it will also show logical slots that got
> > invalidated due to the corresponding WAL got removed. Is that
> > understanding correct?
>
> That's right.
>
> > If so, we are already sort of overloading this
> > column. However, now adding more invalidation reasons that won't
> > happen during recovery conflict handling will change entirely the
> > purpose (as per the name we use) of this variable. I think
> > invalidation_reason could depict this column correctly but OTOH I
> > guess it would lose its original meaning/purpose.
>
> Hm. I get the concern. Are you okay with having inavlidation_reason
> separately for both logical and physical slots? In such a case,
> logical slots that got invalidated on the standby will have duplicate
> info in conflict_reason and invalidation_reason, is this fine?
>

If we have duplicate information in two columns that could be
confusing for users. BTW, isn't the recovery conflict occur only
because of rows_removed and wal_level_insufficient reasons? The
wal_removed or the new reasons you are proposing can't happen because
of recovery conflict. Am, I missing something here?

> Another idea is to make 'conflict_reason text' as a 'conflicting
> boolean' again (revert 007693f2a3), and have 'invalidation_reason
> text' for both logical and physical slots. So, whenever 'conflicting'
> is true, one can look at invalidation_reason for the reason for
> conflict. How does this sound?
>

So, does this mean that conflicting will only be true for some of the
reasons (say wal_level_insufficient, rows_removed, wal_removed) and
logical slots but not for others? I think that will also not eliminate
the duplicate information as user could have deduced that from single
column

--
With Regards,
Amit Kapila.



On Fri, Mar 8, 2024 at 10:42 PM Bharath Rupireddy
<bharath.rupireddyforpostgres@gmail.com> wrote:
>
> On Wed, Mar 6, 2024 at 4:49 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> >
> > You might want to consider its interaction with sync slots on standby.
> > Say, there is no activity on slots in terms of processing the changes
> > for slots. Now, we won't perform sync of such slots on standby showing
> > them inactive as per your new criteria where as same slots could still
> > be valid on primary as the walsender is still active. This may be more
> > of a theoretical point as in running system there will probably be
> > some activity but I think this needs some thougths.
>
> I believe the xmin and catalog_xmin of the sync slots on the standby
> keep advancing depending on the slots on the primary, no? If yes, the
> XID age based invalidation shouldn't be a problem.
>
> I believe there are no walsenders started for the sync slots on the
> standbys, right? If yes, the inactive timeout based invalidation also
> shouldn't be a problem. Because, the inactive timeouts for a slot are
> tracked only for walsenders because they are the ones that typically
> hold replication slots for longer durations and for real replication
> use. We did a similar thing in a recent commit [1].
>
> Is my understanding right?
>

Yes, your understanding is correct. I wanted us to consider having new
parameters like 'inactive_replication_slot_timeout' to be at
slot-level instead of GUC. I think this new parameter doesn't seem to
be the similar as 'max_slot_wal_keep_size' which leads to truncation
of WAL at global and then invalidates the appropriate slots. OTOH, the
'inactive_replication_slot_timeout' doesn't appear to have a similar
global effect. The other thing we should consider is what if the
checkpoint happens at a timeout greater than
'inactive_replication_slot_timeout'? Shall, we consider doing it via
some other background process or do we think checkpointer is the best
we can have?

>
 Do you still see any problems with it?
>

Sorry, I haven't done any detailed review yet so can't say with
confidence whether there is any problem or not w.r.t sync slots.

--
With Regards,
Amit Kapila.



On Wed, Mar 6, 2024 at 2:47 PM Bharath Rupireddy
<bharath.rupireddyforpostgres@gmail.com> wrote:
>
> Thanks. v8-0001 is how it looks. Please see the v8 patch set with this change.
>

Commit message says: "Currently postgres has the ability to invalidate
inactive replication slots based on the amount of WAL (set via
max_slot_wal_keep_size GUC) that will be needed for the slots in case
they become active. However, choosing a default value for
max_slot_wal_keep_size is tricky. Because the amount of WAL a customer
generates, and their allocated storage will vary greatly in
production, making it difficult to pin down a one-size-fits-all value.
It is often easy for developers to set an XID age (age of slot's xmin
or catalog_xmin) of say 1 or 1.5 billion, after which the slots get
invalidated."

I don't see how it will be easier for the user to choose the default
value of 'max_slot_xid_age' compared to 'max_slot_wal_keep_size'. But,
I agree similar to 'max_slot_wal_keep_size', 'max_slot_xid_age' can be
another parameter to allow vacuum to proceed removing the rows which
otherwise it wouldn't have been as those would be required by some
slot. Now, if this understanding is correct, we should probably make
this invalidation happen by (auto)vacuum after computing the age based
on this new parameter.

--
With Regards,
Amit Kapila.



On Mon, Mar 11, 2024 at 04:09:27PM +0530, Amit Kapila wrote:
> I don't see how it will be easier for the user to choose the default
> value of 'max_slot_xid_age' compared to 'max_slot_wal_keep_size'. But,
> I agree similar to 'max_slot_wal_keep_size', 'max_slot_xid_age' can be
> another parameter to allow vacuum to proceed removing the rows which
> otherwise it wouldn't have been as those would be required by some
> slot.

Yeah, the idea is to help prevent transaction ID wraparound, so I would
expect max_slot_xid_age to ordinarily be set relatively high, i.e., 1.5B+.

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com



Hi,

On Fri, Mar 08, 2024 at 10:42:20PM +0530, Bharath Rupireddy wrote:
> On Wed, Mar 6, 2024 at 4:49 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> >
> > You might want to consider its interaction with sync slots on standby.
> > Say, there is no activity on slots in terms of processing the changes
> > for slots. Now, we won't perform sync of such slots on standby showing
> > them inactive as per your new criteria where as same slots could still
> > be valid on primary as the walsender is still active. This may be more
> > of a theoretical point as in running system there will probably be
> > some activity but I think this needs some thougths.
> 
> I believe the xmin and catalog_xmin of the sync slots on the standby
> keep advancing depending on the slots on the primary, no? If yes, the
> XID age based invalidation shouldn't be a problem.
> 
> I believe there are no walsenders started for the sync slots on the
> standbys, right? If yes, the inactive timeout based invalidation also
> shouldn't be a problem. Because, the inactive timeouts for a slot are
> tracked only for walsenders because they are the ones that typically
> hold replication slots for longer durations and for real replication
> use. We did a similar thing in a recent commit [1].
> 
> Is my understanding right? Do you still see any problems with it?

Would that make sense to "simply" discard/prevent those kind of invalidations
for "synced" slot on standby? I mean, do they make sense given the fact that
those slots are not usable until the standby is promoted?

Regards,

-- 
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com



On Tue, Mar 12, 2024 at 1:24 PM Bertrand Drouvot
<bertranddrouvot.pg@gmail.com> wrote:
>
> On Fri, Mar 08, 2024 at 10:42:20PM +0530, Bharath Rupireddy wrote:
> > On Wed, Mar 6, 2024 at 4:49 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> > >
> > > You might want to consider its interaction with sync slots on standby.
> > > Say, there is no activity on slots in terms of processing the changes
> > > for slots. Now, we won't perform sync of such slots on standby showing
> > > them inactive as per your new criteria where as same slots could still
> > > be valid on primary as the walsender is still active. This may be more
> > > of a theoretical point as in running system there will probably be
> > > some activity but I think this needs some thougths.
> >
> > I believe the xmin and catalog_xmin of the sync slots on the standby
> > keep advancing depending on the slots on the primary, no? If yes, the
> > XID age based invalidation shouldn't be a problem.
> >
> > I believe there are no walsenders started for the sync slots on the
> > standbys, right? If yes, the inactive timeout based invalidation also
> > shouldn't be a problem. Because, the inactive timeouts for a slot are
> > tracked only for walsenders because they are the ones that typically
> > hold replication slots for longer durations and for real replication
> > use. We did a similar thing in a recent commit [1].
> >
> > Is my understanding right? Do you still see any problems with it?
>
> Would that make sense to "simply" discard/prevent those kind of invalidations
> for "synced" slot on standby? I mean, do they make sense given the fact that
> those slots are not usable until the standby is promoted?
>

AFAIR, we don't prevent similar invalidations due to
'max_slot_wal_keep_size' for sync slots, so why to prevent it for
these new parameters? This will unnecessarily create inconsistency in
the invalidation behavior.

--
With Regards,
Amit Kapila.



On Mon, Mar 11, 2024 at 11:26 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> > Hm. I get the concern. Are you okay with having inavlidation_reason
> > separately for both logical and physical slots? In such a case,
> > logical slots that got invalidated on the standby will have duplicate
> > info in conflict_reason and invalidation_reason, is this fine?
> >
>
> If we have duplicate information in two columns that could be
> confusing for users. BTW, isn't the recovery conflict occur only
> because of rows_removed and wal_level_insufficient reasons? The
> wal_removed or the new reasons you are proposing can't happen because
> of recovery conflict. Am, I missing something here?

My understanding aligns with yours that the rows_removed and
wal_level_insufficient invalidations can occur only upon recovery
conflict.

FWIW, a test named 'synchronized slot has been invalidated' in
040_standby_failover_slots_sync.pl inappropriately uses
conflict_reason = 'wal_removed' logical slot on standby. As per the
above understanding, it's inappropriate to use conflict_reason here
because wal_removed invalidation doesn't conflict with recovery.

> > Another idea is to make 'conflict_reason text' as a 'conflicting
> > boolean' again (revert 007693f2a3), and have 'invalidation_reason
> > text' for both logical and physical slots. So, whenever 'conflicting'
> > is true, one can look at invalidation_reason for the reason for
> > conflict. How does this sound?
> >
>
> So, does this mean that conflicting will only be true for some of the
> reasons (say wal_level_insufficient, rows_removed, wal_removed) and
> logical slots but not for others? I think that will also not eliminate
> the duplicate information as user could have deduced that from single
> column.

So, how about we turn conflict_reason to only report the reasons that
actually cause conflict with recovery for logical slots, something
like below, and then have invalidation_cause as a generic column for
all sorts of invalidation reasons for both logical and physical slots?

ReplicationSlotInvalidationCause cause = slot_contents.data.invalidated;

if (slot_contents.data.database == InvalidOid ||
    cause == RS_INVAL_NONE ||
    cause != RS_INVAL_HORIZON ||
    cause != RS_INVAL_WAL_LEVEL)
{
    nulls[i++] = true;
}
else
{
    Assert(cause == RS_INVAL_HORIZON || cause == RS_INVAL_WAL_LEVEL);

    values[i++] = CStringGetTextDatum(SlotInvalidationCauses[cause]);
}

--
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com



Hi,

On Tue, Mar 12, 2024 at 05:51:43PM +0530, Amit Kapila wrote:
> On Tue, Mar 12, 2024 at 1:24 PM Bertrand Drouvot
> <bertranddrouvot.pg@gmail.com> wrote:
> >
> > On Fri, Mar 08, 2024 at 10:42:20PM +0530, Bharath Rupireddy wrote:
> > > On Wed, Mar 6, 2024 at 4:49 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> > > >
> > > > You might want to consider its interaction with sync slots on standby.
> > > > Say, there is no activity on slots in terms of processing the changes
> > > > for slots. Now, we won't perform sync of such slots on standby showing
> > > > them inactive as per your new criteria where as same slots could still
> > > > be valid on primary as the walsender is still active. This may be more
> > > > of a theoretical point as in running system there will probably be
> > > > some activity but I think this needs some thougths.
> > >
> > > I believe the xmin and catalog_xmin of the sync slots on the standby
> > > keep advancing depending on the slots on the primary, no? If yes, the
> > > XID age based invalidation shouldn't be a problem.
> > >
> > > I believe there are no walsenders started for the sync slots on the
> > > standbys, right? If yes, the inactive timeout based invalidation also
> > > shouldn't be a problem. Because, the inactive timeouts for a slot are
> > > tracked only for walsenders because they are the ones that typically
> > > hold replication slots for longer durations and for real replication
> > > use. We did a similar thing in a recent commit [1].
> > >
> > > Is my understanding right? Do you still see any problems with it?
> >
> > Would that make sense to "simply" discard/prevent those kind of invalidations
> > for "synced" slot on standby? I mean, do they make sense given the fact that
> > those slots are not usable until the standby is promoted?
> >
> 
> AFAIR, we don't prevent similar invalidations due to
> 'max_slot_wal_keep_size' for sync slots,

Right, we'd invalidate them on the standby should the standby sync slot restart_lsn
exceeds the limit.

> so why to prevent it for
> these new parameters? This will unnecessarily create inconsistency in
> the invalidation behavior.

Yeah, but I think wal removal has a direct impact on the slot usuability which
is probably not the case with the new XID and Timeout ones. That's why I thought
about handling them differently (but I'm also fine if that's not the case).

Regards,

-- 
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com



On Tue, Mar 12, 2024 at 5:51 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> > Would that make sense to "simply" discard/prevent those kind of invalidations
> > for "synced" slot on standby? I mean, do they make sense given the fact that
> > those slots are not usable until the standby is promoted?
>
> AFAIR, we don't prevent similar invalidations due to
> 'max_slot_wal_keep_size' for sync slots, so why to prevent it for
> these new parameters? This will unnecessarily create inconsistency in
> the invalidation behavior.

Right. +1 to keep the behaviour consistent for all invalidations.
However, an assertion that inactive_timeout isn't set for synced slots
on the standby isn't a bad idea because we rely on the fact that
walsenders aren't started for synced slots. Again, I think it misses
the consistency in the invalidation behaviour.

--
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com



On Tue, Mar 12, 2024 at 9:11 PM Bertrand Drouvot
<bertranddrouvot.pg@gmail.com> wrote:
>
> > AFAIR, we don't prevent similar invalidations due to
> > 'max_slot_wal_keep_size' for sync slots,
>
> Right, we'd invalidate them on the standby should the standby sync slot restart_lsn
> exceeds the limit.

Right. Help me understand this a bit - is the wal_removed invalidation
going to conflict with recovery on the standby?

Per the discussion upthread, I'm trying to understand what
invalidation reasons will exactly cause conflict with recovery? Is it
just rows_removed and wal_level_insufficient invalidations? My
understanding on the conflict with recovery and invalidation reason
has been a bit off track. Perhaps, we need to clarify these two things
in the docs for the end users as well?

--
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com



On Mon, Mar 11, 2024 at 3:44 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> Yes, your understanding is correct. I wanted us to consider having new
> parameters like 'inactive_replication_slot_timeout' to be at
> slot-level instead of GUC. I think this new parameter doesn't seem to
> be the similar as 'max_slot_wal_keep_size' which leads to truncation
> of WAL at global and then invalidates the appropriate slots. OTOH, the
> 'inactive_replication_slot_timeout' doesn't appear to have a similar
> global effect.

last_inactive_at is tracked for each slot using which slots get
invalidated based on inactive_replication_slot_timeout. It's like
max_slot_wal_keep_size invalidating slots based on restart_lsn. In a
way, both are similar, right?

> The other thing we should consider is what if the
> checkpoint happens at a timeout greater than
> 'inactive_replication_slot_timeout'?

In such a case, the slots get invalidated upon the next checkpoint as
the (current_checkpointer_timeout - last_inactive_at) will then be
greater than inactive_replication_slot_timeout.

> Shall, we consider doing it via
> some other background process or do we think checkpointer is the best
> we can have?

The same problem exists if we do it with some other background
process. I think the checkpointer is best because it already
invalidates slots for wal_removed cause, and flushes all replication
slots to disk. Moving this new invalidation functionality into some
other background process such as autovacuum will not only burden that
process' work but also mix up the unique functionality of that
background process.

Having said above, I'm open to ideas from others as I'm not so sure if
there's any issue with checkpointer invalidating the slots for new
reasons.

--
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com



On Mon, Mar 11, 2024 at 4:09 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> I don't see how it will be easier for the user to choose the default
> value of 'max_slot_xid_age' compared to 'max_slot_wal_keep_size'. But,
> I agree similar to 'max_slot_wal_keep_size', 'max_slot_xid_age' can be
> another parameter to allow vacuum to proceed removing the rows which
> otherwise it wouldn't have been as those would be required by some
> slot. Now, if this understanding is correct, we should probably make
> this invalidation happen by (auto)vacuum after computing the age based
> on this new parameter.

Currently, the patch computes the XID age in the checkpointer using
the next XID (gets from ReadNextFullTransactionId()) and slot's xmin
and catalog_xmin. I think the checkpointer is best because it already
invalidates slots for wal_removed cause, and flushes all replication
slots to disk. Moving this new invalidation functionality into some
other background process such as autovacuum will not only burden that
process' work but also mix up the unique functionality of that
background process.

Having said above, I'm open to ideas from others as I'm not so sure if
there's any issue with checkpointer invalidating the slots for new
reasons.

--
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com



On Tue, Mar 12, 2024 at 8:55 PM Bharath Rupireddy
<bharath.rupireddyforpostgres@gmail.com> wrote:
>
> On Mon, Mar 11, 2024 at 11:26 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
> >
> > > Hm. I get the concern. Are you okay with having inavlidation_reason
> > > separately for both logical and physical slots? In such a case,
> > > logical slots that got invalidated on the standby will have duplicate
> > > info in conflict_reason and invalidation_reason, is this fine?
> > >
> >
> > If we have duplicate information in two columns that could be
> > confusing for users. BTW, isn't the recovery conflict occur only
> > because of rows_removed and wal_level_insufficient reasons? The
> > wal_removed or the new reasons you are proposing can't happen because
> > of recovery conflict. Am, I missing something here?
>
> My understanding aligns with yours that the rows_removed and
> wal_level_insufficient invalidations can occur only upon recovery
> conflict.
>
> FWIW, a test named 'synchronized slot has been invalidated' in
> 040_standby_failover_slots_sync.pl inappropriately uses
> conflict_reason = 'wal_removed' logical slot on standby. As per the
> above understanding, it's inappropriate to use conflict_reason here
> because wal_removed invalidation doesn't conflict with recovery.
>
> > > Another idea is to make 'conflict_reason text' as a 'conflicting
> > > boolean' again (revert 007693f2a3), and have 'invalidation_reason
> > > text' for both logical and physical slots. So, whenever 'conflicting'
> > > is true, one can look at invalidation_reason for the reason for
> > > conflict. How does this sound?
> > >
> >
> > So, does this mean that conflicting will only be true for some of the
> > reasons (say wal_level_insufficient, rows_removed, wal_removed) and
> > logical slots but not for others? I think that will also not eliminate
> > the duplicate information as user could have deduced that from single
> > column.
>
> So, how about we turn conflict_reason to only report the reasons that
> actually cause conflict with recovery for logical slots, something
> like below, and then have invalidation_cause as a generic column for
> all sorts of invalidation reasons for both logical and physical slots?
>

If our above understanding is correct then coflict_reason will be a
subset of invalidation_reason. If so, whatever way we arrange this
information, there will be some sort of duplicity unless we just have
one column 'invalidation_reason' and update the docs to interpret it
correctly for conflicts.

--
With Regards,
Amit Kapila.



On Tue, Mar 12, 2024 at 9:11 PM Bertrand Drouvot
<bertranddrouvot.pg@gmail.com> wrote:
>
> On Tue, Mar 12, 2024 at 05:51:43PM +0530, Amit Kapila wrote:
> > On Tue, Mar 12, 2024 at 1:24 PM Bertrand Drouvot
> > <bertranddrouvot.pg@gmail.com> wrote:
>
> > so why to prevent it for
> > these new parameters? This will unnecessarily create inconsistency in
> > the invalidation behavior.
>
> Yeah, but I think wal removal has a direct impact on the slot usuability which
> is probably not the case with the new XID and Timeout ones.
>

BTW, is XID the based parameter 'max_slot_xid_age' not have similarity
with 'max_slot_wal_keep_size'? I think it will impact the rows we
removed based on xid horizons. Don't we need to consider it while
vacuum computing the xid horizons in ComputeXidHorizons() similar to
what we do for WAL w.r.t 'max_slot_wal_keep_size'?

--
With Regards,
Amit Kapila.



On Tue, Mar 12, 2024 at 10:10 PM Bharath Rupireddy
<bharath.rupireddyforpostgres@gmail.com> wrote:
>
> On Mon, Mar 11, 2024 at 3:44 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> >
> > Yes, your understanding is correct. I wanted us to consider having new
> > parameters like 'inactive_replication_slot_timeout' to be at
> > slot-level instead of GUC. I think this new parameter doesn't seem to
> > be the similar as 'max_slot_wal_keep_size' which leads to truncation
> > of WAL at global and then invalidates the appropriate slots. OTOH, the
> > 'inactive_replication_slot_timeout' doesn't appear to have a similar
> > global effect.
>
> last_inactive_at is tracked for each slot using which slots get
> invalidated based on inactive_replication_slot_timeout. It's like
> max_slot_wal_keep_size invalidating slots based on restart_lsn. In a
> way, both are similar, right?
>

There is some similarity but 'max_slot_wal_keep_size' leads to
truncation of WAL which in turn leads to invalidation of slots. Here,
I am also trying to be cautious in adding a GUC unless it is required
or having a slot-level parameter doesn't serve the need. Having said
that, I see that there is an argument that we should follow the path
of 'max_slot_wal_keep_size' GUC and there is some value to it but
still I think avoiding a new GUC for inactivity in the slot would
outweigh.

--
With Regards,
Amit Kapila.



On Wed, Mar 6, 2024 at 2:47 PM Bharath Rupireddy
<bharath.rupireddyforpostgres@gmail.com> wrote:
>
> On Wed, Mar 6, 2024 at 2:42 PM Bertrand Drouvot
> <bertranddrouvot.pg@gmail.com> wrote:
> >
> > Hi,
> >
> > On Tue, Mar 05, 2024 at 01:44:43PM -0600, Nathan Bossart wrote:
> > > On Wed, Mar 06, 2024 at 12:50:38AM +0530, Bharath Rupireddy wrote:
> > > > On Mon, Mar 4, 2024 at 2:11 PM Bertrand Drouvot
> > > > <bertranddrouvot.pg@gmail.com> wrote:
> > > >> On Sun, Mar 03, 2024 at 03:44:34PM -0600, Nathan Bossart wrote:
> > > >> > Unless I am misinterpreting some details, ISTM we could rename this column
> > > >> > to invalidation_reason and use it for both logical and physical slots.  I'm
> > > >> > not seeing a strong need for another column.
> > > >>
> > > >> Yeah having two columns was more for convenience purpose. Without the "conflict"
> > > >> one, a slot conflicting with recovery would be "a logical slot having a non NULL
> > > >> invalidation_reason".
> > > >>
> > > >> I'm also fine with one column if most of you prefer that way.
> > > >
> > > > While we debate on the above, please find the attached v7 patch set
> > > > after rebasing.
> > >
> > > It looks like Bertrand is okay with reusing the same column for both
> > > logical and physical slots
> >
> > Yeah, I'm okay with one column.
>
> Thanks. v8-0001 is how it looks. Please see the v8 patch set with this change.

JFYI, the patch does not apply to the head. There is a conflict in
multiple files.

thanks
Shveta



> JFYI, the patch does not apply to the head. There is a conflict in
> multiple files.

For review purposes, I applied v8 to the March 6 code-base. I have yet
to review in detail, please find my initial thoughts:

1)
I found that 'inactive_replication_slot_timeout' works only if there
was any walsender ever started for that slot . The logic is under
'am_walsender' check.  Is this intentional?
If I create a slot and use only pg_logical_slot_get_changes or
pg_replication_slot_advance on it, it never gets invalidated due to
timeout.  While, when I set 'max_slot_xid_age' or say
'max_slot_wal_keep_size' to a lower value, the said slot is
invalidated correctly with 'xid_aged' and 'wal_removed' reasons
respectively.

Example:
With inactive_replication_slot_timeout=1min, test1_3  is the slot for
which there is no walsender and only advance and get_changes SQL
functions were called; test1_4 is the one for which pg_recvlogical was
run for a second.

 test1_3     |   785 | | reserved   |                           | t
    |                                  |
 test1_4     |   798 | | lost       | inactive_timeout    | t        |
2024-03-13 11:52:41.58446+05:30  |

And when inactive_replication_slot_timeout=0  and max_slot_xid_age=10

 test1_3     |   785 | | lost       |          xid_aged       | t
  |                                  |
 test1_4     |   798 | | lost       | inactive_timeout    | t        |
2024-03-13 11:52:41.58446+05:30  |


2)
The msg for patch 3 says:
--------------
a) when replication slots is lying inactive for a day or so using
last_inactive_at metric,
b) when a replication slot is becoming inactive too frequently using
last_inactive_at metric.
--------------
 I think in b, you want to refer to inactive_count instead of last_inactive_at?

3)
I do not see invalidation_reason updated for 2 new reasons in system-views.sgml


thanks
Shveta



Hi,

On Tue, Mar 12, 2024 at 09:19:35PM +0530, Bharath Rupireddy wrote:
> On Tue, Mar 12, 2024 at 9:11 PM Bertrand Drouvot
> <bertranddrouvot.pg@gmail.com> wrote:
> >
> > > AFAIR, we don't prevent similar invalidations due to
> > > 'max_slot_wal_keep_size' for sync slots,
> >
> > Right, we'd invalidate them on the standby should the standby sync slot restart_lsn
> > exceeds the limit.
> 
> Right. Help me understand this a bit - is the wal_removed invalidation
> going to conflict with recovery on the standby?

I don't think so, as it's not directly related to recovery. The slot will
be invalided on the standby though.

> Per the discussion upthread, I'm trying to understand what
> invalidation reasons will exactly cause conflict with recovery? Is it
> just rows_removed and wal_level_insufficient invalidations? 

Yes, that's the ones added in be87200efd.

See the error messages on a standby:

== wal removal

postgres=#  SELECT * FROM pg_logical_slot_get_changes('lsub4_slot', NULL, NULL, 'include-xids', '0');
ERROR:  can no longer get changes from replication slot "lsub4_slot"
DETAIL:  This slot has been invalidated because it exceeded the maximum reserved size.

== wal level

postgres=# select conflict_reason from pg_replication_slots where slot_name = 'lsub5_slot';;
    conflict_reason
------------------------
 wal_level_insufficient
(1 row)

postgres=#  SELECT * FROM pg_logical_slot_get_changes('lsub5_slot', NULL, NULL, 'include-xids', '0');
ERROR:  can no longer get changes from replication slot "lsub5_slot"
DETAIL:  This slot has been invalidated because it was conflicting with recovery.

== rows removal

postgres=# select conflict_reason from pg_replication_slots where slot_name = 'lsub6_slot';;
 conflict_reason
-----------------
 rows_removed
(1 row)

postgres=#  SELECT * FROM pg_logical_slot_get_changes('lsub6_slot', NULL, NULL, 'include-xids', '0');
ERROR:  can no longer get changes from replication slot "lsub6_slot"
DETAIL:  This slot has been invalidated because it was conflicting with recovery.

As you can see, only wal level and rows removal are mentioning conflict with
recovery.

So, are we already "wrong" mentioning "wal_removed" in conflict_reason?

Regards,

-- 
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com



On Fri, Mar 8, 2024 at 10:42 PM Bharath Rupireddy
<bharath.rupireddyforpostgres@gmail.com> wrote:
>
> On Wed, Mar 6, 2024 at 4:49 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> >
> > You might want to consider its interaction with sync slots on standby.
> > Say, there is no activity on slots in terms of processing the changes
> > for slots. Now, we won't perform sync of such slots on standby showing
> > them inactive as per your new criteria where as same slots could still
> > be valid on primary as the walsender is still active. This may be more
> > of a theoretical point as in running system there will probably be
> > some activity but I think this needs some thougths.
>
> I believe the xmin and catalog_xmin of the sync slots on the standby
> keep advancing depending on the slots on the primary, no? If yes, the
> XID age based invalidation shouldn't be a problem.

If the user has not enabled slot-sync worker and is relying on the SQL
function pg_sync_replication_slots(), then the xmin and catalog_xmin
of synced slots may not keep on advancing. These will be advanced only
on next run of function. But meanwhile the synced slots may be
invalidated due to 'xid_aged'.  Then the next time, when user runs
pg_sync_replication_slots() again, the invalidated slots will be
dropped and will be recreated by this SQL function (provided they are
valid on primary and are invalidated on standby alone).  I am not
stating that it is a problem, but we need to think if this is what we
want. Secondly, the behaviour is not same with 'inactive_timeout'
invalidation. Synced slots are immune to 'inactive_timeout'
invalidation as this invalidation happens only in walsender, while
these are not immune to 'xid_aged' invalidation. So again, needs some
thoughts here.

> I believe there are no walsenders started for the sync slots on the
> standbys, right? If yes, the inactive timeout based invalidation also
> shouldn't be a problem. Because, the inactive timeouts for a slot are
> tracked only for walsenders because they are the ones that typically
> hold replication slots for longer durations and for real replication
> use. We did a similar thing in a recent commit [1].
>
> Is my understanding right? Do you still see any problems with it?

I have explained the situation above for us to think over it better.

thanks
Shveta



On Wed, Mar 13, 2024 at 9:21 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> > So, how about we turn conflict_reason to only report the reasons that
> > actually cause conflict with recovery for logical slots, something
> > like below, and then have invalidation_cause as a generic column for
> > all sorts of invalidation reasons for both logical and physical slots?
>
> If our above understanding is correct then coflict_reason will be a
> subset of invalidation_reason. If so, whatever way we arrange this
> information, there will be some sort of duplicity unless we just have
> one column 'invalidation_reason' and update the docs to interpret it
> correctly for conflicts.

Yes, there will be some sort of duplicity if we emit conflict_reason
as a text field. However, I still think the better way is to turn
conflict_reason text to conflict boolean and set it to true only on
rows_removed and wal_level_insufficient invalidations. When conflict
boolean is true, one (including all the tests that we've added
recently) can look for invalidation_reason text field for the reason.
This sounds reasonable to me as opposed to we just mentioning in the
docs that "if invalidation_reason is rows_removed or
wal_level_insufficient it's the reason for conflict with recovery".

Thoughts?

--
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com



On Wed, Mar 13, 2024 at 12:51 PM Bertrand Drouvot
<bertranddrouvot.pg@gmail.com> wrote:
>
> See the error messages on a standby:
>
> == wal removal
>
> postgres=#  SELECT * FROM pg_logical_slot_get_changes('lsub4_slot', NULL, NULL, 'include-xids', '0');
> ERROR:  can no longer get changes from replication slot "lsub4_slot"
> DETAIL:  This slot has been invalidated because it exceeded the maximum reserved size.
>
> == wal level
>
> postgres=# select conflict_reason from pg_replication_slots where slot_name = 'lsub5_slot';;
>     conflict_reason
> ------------------------
>  wal_level_insufficient
> (1 row)
>
> postgres=#  SELECT * FROM pg_logical_slot_get_changes('lsub5_slot', NULL, NULL, 'include-xids', '0');
> ERROR:  can no longer get changes from replication slot "lsub5_slot"
> DETAIL:  This slot has been invalidated because it was conflicting with recovery.
>
> == rows removal
>
> postgres=# select conflict_reason from pg_replication_slots where slot_name = 'lsub6_slot';;
>  conflict_reason
> -----------------
>  rows_removed
> (1 row)
>
> postgres=#  SELECT * FROM pg_logical_slot_get_changes('lsub6_slot', NULL, NULL, 'include-xids', '0');
> ERROR:  can no longer get changes from replication slot "lsub6_slot"
> DETAIL:  This slot has been invalidated because it was conflicting with recovery.
>
> As you can see, only wal level and rows removal are mentioning conflict with
> recovery.
>
> So, are we already "wrong" mentioning "wal_removed" in conflict_reason?

It looks like yes. So, how about we fix it the way proposed here -
https://www.postgresql.org/message-id/CALj2ACVd_dizYQiZwwUfsb%2BhG-fhGYo_kEDq0wn_vNwQvOrZHg%40mail.gmail.com?

--
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com



On Wed, Mar 13, 2024 at 11:13 AM shveta malik <shveta.malik@gmail.com> wrote:
>
> > Thanks. v8-0001 is how it looks. Please see the v8 patch set with this change.
>
> JFYI, the patch does not apply to the head. There is a conflict in
> multiple files.

Thanks for looking into this. I noticed that the v8 patches needed
rebase. Before I go do anything with the patches, I'm trying to gain
consensus on the design. Following is the summary of design choices
we've discussed so far:
1) conflict_reason vs invalidation_reason.
2) When to compute the XID age?
3) Where to do the invalidations? Is it in the checkpointer or
autovacuum or some other process?
4) Interaction of these new invalidations with sync slots on the standby.

I hope to get on to these one after the other.

--
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com



On Wed, Mar 13, 2024 at 9:24 PM Bharath Rupireddy
<bharath.rupireddyforpostgres@gmail.com> wrote:
>
> On Wed, Mar 13, 2024 at 9:21 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
> >
> > > So, how about we turn conflict_reason to only report the reasons that
> > > actually cause conflict with recovery for logical slots, something
> > > like below, and then have invalidation_cause as a generic column for
> > > all sorts of invalidation reasons for both logical and physical slots?
> >
> > If our above understanding is correct then coflict_reason will be a
> > subset of invalidation_reason. If so, whatever way we arrange this
> > information, there will be some sort of duplicity unless we just have
> > one column 'invalidation_reason' and update the docs to interpret it
> > correctly for conflicts.
>
> Yes, there will be some sort of duplicity if we emit conflict_reason
> as a text field. However, I still think the better way is to turn
> conflict_reason text to conflict boolean and set it to true only on
> rows_removed and wal_level_insufficient invalidations. When conflict
> boolean is true, one (including all the tests that we've added
> recently) can look for invalidation_reason text field for the reason.
> This sounds reasonable to me as opposed to we just mentioning in the
> docs that "if invalidation_reason is rows_removed or
> wal_level_insufficient it's the reason for conflict with recovery".
>

Fair point. I think we can go either way. Bertrand, Nathan, and
others, do you have an opinion on this matter?

--
With Regards,
Amit Kapila.



On Wed, Mar 13, 2024 at 10:16 PM Bharath Rupireddy
<bharath.rupireddyforpostgres@gmail.com> wrote:
>
> On Wed, Mar 13, 2024 at 11:13 AM shveta malik <shveta.malik@gmail.com> wrote:
> >
> > > Thanks. v8-0001 is how it looks. Please see the v8 patch set with this change.
> >
> > JFYI, the patch does not apply to the head. There is a conflict in
> > multiple files.
>
> Thanks for looking into this. I noticed that the v8 patches needed
> rebase. Before I go do anything with the patches, I'm trying to gain
> consensus on the design. Following is the summary of design choices
> we've discussed so far:
> 1) conflict_reason vs invalidation_reason.
> 2) When to compute the XID age?
>

I feel we should focus on two things (a) one is to introduce a new
column invalidation_reason, and (b) let's try to first complete
invalidation due to timeout. We can look into XID stuff if time
permits, remember, we don't have ample time left.

With Regards,
Amit Kapila.



On Thu, Mar 14, 2024 at 12:24 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> On Wed, Mar 13, 2024 at 9:24 PM Bharath Rupireddy
> >
> > Yes, there will be some sort of duplicity if we emit conflict_reason
> > as a text field. However, I still think the better way is to turn
> > conflict_reason text to conflict boolean and set it to true only on
> > rows_removed and wal_level_insufficient invalidations. When conflict
> > boolean is true, one (including all the tests that we've added
> > recently) can look for invalidation_reason text field for the reason.
> > This sounds reasonable to me as opposed to we just mentioning in the
> > docs that "if invalidation_reason is rows_removed or
> > wal_level_insufficient it's the reason for conflict with recovery".
> >
> Fair point. I think we can go either way. Bertrand, Nathan, and
> others, do you have an opinion on this matter?

While we wait to hear from others on this, I'm attaching the v9 patch
set implementing the above idea (check 0001 patch). Please have a
look. I'll come back to the other review comments soon.

--
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Attachment
On Thu, Mar 14, 2024 at 7:58 PM Bharath Rupireddy
<bharath.rupireddyforpostgres@gmail.com> wrote:
>
> On Thu, Mar 14, 2024 at 12:24 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> >
> > On Wed, Mar 13, 2024 at 9:24 PM Bharath Rupireddy
> > >
> > > Yes, there will be some sort of duplicity if we emit conflict_reason
> > > as a text field. However, I still think the better way is to turn
> > > conflict_reason text to conflict boolean and set it to true only on
> > > rows_removed and wal_level_insufficient invalidations. When conflict
> > > boolean is true, one (including all the tests that we've added
> > > recently) can look for invalidation_reason text field for the reason.
> > > This sounds reasonable to me as opposed to we just mentioning in the
> > > docs that "if invalidation_reason is rows_removed or
> > > wal_level_insufficient it's the reason for conflict with recovery".

+1 on maintaining both conflicting and invalidation_reason

> > Fair point. I think we can go either way. Bertrand, Nathan, and
> > others, do you have an opinion on this matter?
>
> While we wait to hear from others on this, I'm attaching the v9 patch
> set implementing the above idea (check 0001 patch). Please have a
> look. I'll come back to the other review comments soon.

Thanks for the patch. JFYI, patch09 does not apply to HEAD, some
recent commit caused the conflict.

Some trivial comments on patch001 (yet to review other patches)

1)
info.c:

- "%s as caught_up, conflict_reason IS NOT NULL as invalid "
+ "%s as caught_up, invalidation_reason IS NOT NULL as invalid "

Can we revert back to 'conflicting as invalid' since it is a query for
logical slots only.

2)
040_standby_failover_slots_sync.pl:

- q{SELECT conflict_reason IS NULL AND synced AND NOT temporary FROM
pg_replication_slots WHERE slot_name = 'lsub1_slot';}
+ q{SELECT invalidation_reason IS NULL AND synced AND NOT temporary
FROM pg_replication_slots WHERE slot_name = 'lsub1_slot';}

Here too, can we have 'NOT conflicting' instead of '
invalidation_reason IS NULL' as it is a logical slot test.

thanks
Shveta



On Wed, Mar 13, 2024 at 9:38 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> BTW, is XID the based parameter 'max_slot_xid_age' not have similarity
> with 'max_slot_wal_keep_size'? I think it will impact the rows we
> removed based on xid horizons. Don't we need to consider it while
> vacuum computing the xid horizons in ComputeXidHorizons() similar to
> what we do for WAL w.r.t 'max_slot_wal_keep_size'?

I'm having a hard time understanding why we'd need something up there
in ComputeXidHorizons(). Can you elaborate it a bit please?

What's proposed with max_slot_xid_age is that during checkpoint we
look at slot's xmin and catalog_xmin, and the current system txn id.
Then, if the XID age of (xmin, catalog_xmin) and current_xid crosses
max_slot_xid_age, we invalidate the slot.  Let me illustrate how all
this works:

1. Setup a primary and standby with hot_standby_feedback set to on on
standby. For instance, check my scripts at [1].

2. Stop the standby to make the slot inactive on the primary. Check
the slot is holding xmin of 738.
./pg_ctl -D sbdata -l logfilesbdata stop

postgres=# SELECT * FROM pg_replication_slots;
-[ RECORD 1 ]-------+-------------
slot_name           | sb_repl_slot
plugin              |
slot_type           | physical
datoid              |
database            |
temporary           | f
active              | f
active_pid          |
xmin                | 738
catalog_xmin        |
restart_lsn         | 0/3000000
confirmed_flush_lsn |
wal_status          | reserved
safe_wal_size       |
two_phase           | f
conflict_reason     |
failover            | f
synced              | f

3. Start consuming the XIDs on the primary with the following script
for instance
./psql -d postgres -p 5432
DROP TABLE tab_int;
CREATE TABLE tab_int (a int);

do $$
begin
  for i in 1..268435 loop
    -- use an exception block so that each iteration eats an XID
    begin
      insert into tab_int values (i);
    exception
      when division_by_zero then null;
    end;
  end loop;
end$$;

4. Make some dead rows in the table.
update tab_int set a = a+1;
delete from tab_int where a%4=0;

postgres=# SELECT n_dead_tup, n_tup_ins, n_tup_upd, n_tup_del FROM
pg_stat_user_tables WHERE relname = 'tab_int';
-[ RECORD 1 ]------
n_dead_tup | 335544
n_tup_ins  | 268435
n_tup_upd  | 268435
n_tup_del  | 67109

5. Try vacuuming to delete the dead rows, observe 'tuples: 0 removed,
536870 remain, 335544 are dead but not yet removable'. The dead rows
can't be removed because the inactive slot is holding an xmin, see
'removable cutoff: 738, which was 268441 XIDs old when operation
ended'.

postgres=# vacuum verbose tab_int;
INFO:  vacuuming "postgres.public.tab_int"
INFO:  finished vacuuming "postgres.public.tab_int": index scans: 0
pages: 0 removed, 2376 remain, 2376 scanned (100.00% of total)
tuples: 0 removed, 536870 remain, 335544 are dead but not yet removable
removable cutoff: 738, which was 268441 XIDs old when operation ended
frozen: 0 pages from table (0.00% of total) had 0 tuples frozen
index scan not needed: 0 pages from table (0.00% of total) had 0 dead
item identifiers removed
avg read rate: 0.000 MB/s, avg write rate: 0.000 MB/s
buffer usage: 4759 hits, 0 misses, 0 dirtied
WAL usage: 0 records, 0 full page images, 0 bytes
system usage: CPU: user: 0.07 s, system: 0.00 s, elapsed: 0.07 s
VACUUM

6. Now, repeat the above steps but with setting max_slot_xid_age =
200000 on the primary.

7. Do a checkpoint to invalidate the slot.
postgres=# checkpoint;
CHECKPOINT
postgres=# SELECT * FROM pg_replication_slots;
-[ RECORD 1 ]-------+-------------
slot_name           | sb_repl_slot
plugin              |
slot_type           | physical
datoid              |
database            |
temporary           | f
active              | f
active_pid          |
xmin                | 738
catalog_xmin        |
restart_lsn         | 0/3000000
confirmed_flush_lsn |
wal_status          | lost
safe_wal_size       |
two_phase           | f
conflicting         |
failover            | f
synced              | f
invalidation_reason | xid_aged

8. And, then vacuum the table, observe 'tuples: 335544 removed, 201326
remain, 0 are dead but not yet removable'.

postgres=# vacuum verbose tab_int;
INFO:  vacuuming "postgres.public.tab_int"
INFO:  finished vacuuming "postgres.public.tab_int": index scans: 0
pages: 0 removed, 2376 remain, 2376 scanned (100.00% of total)
tuples: 335544 removed, 201326 remain, 0 are dead but not yet removable
removable cutoff: 269179, which was 0 XIDs old when operation ended
new relfrozenxid: 269179, which is 268441 XIDs ahead of previous value
frozen: 1189 pages from table (50.04% of total) had 201326 tuples frozen
index scan not needed: 0 pages from table (0.00% of total) had 0 dead
item identifiers removed
avg read rate: 0.000 MB/s, avg write rate: 193.100 MB/s
buffer usage: 4760 hits, 0 misses, 2381 dirtied
WAL usage: 5942 records, 2378 full page images, 8343275 bytes
system usage: CPU: user: 0.09 s, system: 0.00 s, elapsed: 0.09 s
VACUUM

[1]
cd /home/ubuntu/postgres/pg17/bin
./pg_ctl -D db17 -l logfile17 stop
rm -rf db17 logfile17
rm -rf /home/ubuntu/postgres/pg17/bin/archived_wal
mkdir /home/ubuntu/postgres/pg17/bin/archived_wal

./initdb -D db17
echo "archive_mode = on
archive_command='cp %p
/home/ubuntu/postgres/pg17/bin/archived_wal/%f'" | tee -a
db17/postgresql.conf

./pg_ctl -D db17 -l logfile17 start
./psql -d postgres -p 5432 -c "SELECT
pg_create_physical_replication_slot('sb_repl_slot', true, false);"

rm -rf sbdata logfilesbdata
./pg_basebackup -D sbdata
echo "port=5433
primary_conninfo='host=localhost port=5432 dbname=postgres user=ubuntu'
primary_slot_name='sb_repl_slot'
restore_command='cp /home/ubuntu/postgres/pg17/bin/archived_wal/%f %p'
hot_standby_feedback = on" | tee -a sbdata/postgresql.conf

touch sbdata/standby.signal

./pg_ctl -D sbdata -l logfilesbdata start
./psql -d postgres -p 5433 -c "SELECT pg_is_in_recovery();"

--
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com



On Thu, Mar 14, 2024 at 7:58 PM Bharath Rupireddy
<bharath.rupireddyforpostgres@gmail.com> wrote:
>
> While we wait to hear from others on this, I'm attaching the v9 patch
> set implementing the above idea (check 0001 patch). Please have a
> look. I'll come back to the other review comments soon.
>

patch002:

1)
I would like to understand the purpose of 'inactive_count'? Is it only
for users for monitoring purposes? We are not using it anywhere
internally.

I shutdown the instance 5 times and found that 'inactive_count' became
5 for all the slots created on that instance. Is this intentional? I
mean we can not really use them if the instance is down.  I felt it
should increment the inactive_count only if during the span of
instance, they were actually inactive i.e. no streaming or replication
happening through them.


2)
slot.c:
+ case RS_INVAL_XID_AGE:
+ {
+ if (TransactionIdIsNormal(s->data.xmin))
+ {
+                          ..........
+ }
+ if (TransactionIdIsNormal(s->data.catalog_xmin))
+ {
+                          ..........
+ }
+ }

Can we optimize this code? It has duplicate code for processing
s->data.catalog_xmin and s->data.xmin. Can we create a sub-function
for this purpose and call it twice here?

thanks
Shveta



On Fri, Mar 15, 2024 at 10:15 AM shveta malik <shveta.malik@gmail.com> wrote:
>
> > > > wal_level_insufficient it's the reason for conflict with recovery".
>
> +1 on maintaining both conflicting and invalidation_reason

Thanks.

> Thanks for the patch. JFYI, patch09 does not apply to HEAD, some
> recent commit caused the conflict.

Yep, the conflict is in src/test/recovery/meson.build and is because
of e6927270cd18d535b77cbe79c55c6584351524be.

> Some trivial comments on patch001 (yet to review other patches)

Thanks for looking into this.

> 1)
> info.c:
>
> - "%s as caught_up, conflict_reason IS NOT NULL as invalid "
> + "%s as caught_up, invalidation_reason IS NOT NULL as invalid "
>
> Can we revert back to 'conflicting as invalid' since it is a query for
> logical slots only.

I guess, no. There the intention is to check for invalid logical slots
not just for the conflicting ones. The logical slots can get
invalidated due to other reasons as well.

> 2)
> 040_standby_failover_slots_sync.pl:
>
> - q{SELECT conflict_reason IS NULL AND synced AND NOT temporary FROM
> pg_replication_slots WHERE slot_name = 'lsub1_slot';}
> + q{SELECT invalidation_reason IS NULL AND synced AND NOT temporary
> FROM pg_replication_slots WHERE slot_name = 'lsub1_slot';}
>
> Here too, can we have 'NOT conflicting' instead of '
> invalidation_reason IS NULL' as it is a logical slot test.

I guess no. The tests are ensuring the slot on the standby isn't invalidated.

In general, one needs to use the 'conflicting' column from
pg_replication_slots when the intention is to look for reasons for
conflicts, otherwise use the 'invalidation_reason' column for
invalidations.

Please see the attached v10 patch set after resolving the merge
conflict and fixing an indentation warning in the TAP test file.

--
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Attachment
On Thu, Mar 14, 2024 at 12:24:00PM +0530, Amit Kapila wrote:
> On Wed, Mar 13, 2024 at 9:24 PM Bharath Rupireddy
> <bharath.rupireddyforpostgres@gmail.com> wrote:
>> On Wed, Mar 13, 2024 at 9:21 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
>> > > So, how about we turn conflict_reason to only report the reasons that
>> > > actually cause conflict with recovery for logical slots, something
>> > > like below, and then have invalidation_cause as a generic column for
>> > > all sorts of invalidation reasons for both logical and physical slots?
>> >
>> > If our above understanding is correct then coflict_reason will be a
>> > subset of invalidation_reason. If so, whatever way we arrange this
>> > information, there will be some sort of duplicity unless we just have
>> > one column 'invalidation_reason' and update the docs to interpret it
>> > correctly for conflicts.
>>
>> Yes, there will be some sort of duplicity if we emit conflict_reason
>> as a text field. However, I still think the better way is to turn
>> conflict_reason text to conflict boolean and set it to true only on
>> rows_removed and wal_level_insufficient invalidations. When conflict
>> boolean is true, one (including all the tests that we've added
>> recently) can look for invalidation_reason text field for the reason.
>> This sounds reasonable to me as opposed to we just mentioning in the
>> docs that "if invalidation_reason is rows_removed or
>> wal_level_insufficient it's the reason for conflict with recovery".
> 
> Fair point. I think we can go either way. Bertrand, Nathan, and
> others, do you have an opinion on this matter?

WFM

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com



Hi,

On Thu, Mar 14, 2024 at 12:24:00PM +0530, Amit Kapila wrote:
> On Wed, Mar 13, 2024 at 9:24 PM Bharath Rupireddy
> <bharath.rupireddyforpostgres@gmail.com> wrote:
> >
> > On Wed, Mar 13, 2024 at 9:21 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
> > >
> > > > So, how about we turn conflict_reason to only report the reasons that
> > > > actually cause conflict with recovery for logical slots, something
> > > > like below, and then have invalidation_cause as a generic column for
> > > > all sorts of invalidation reasons for both logical and physical slots?
> > >
> > > If our above understanding is correct then coflict_reason will be a
> > > subset of invalidation_reason. If so, whatever way we arrange this
> > > information, there will be some sort of duplicity unless we just have
> > > one column 'invalidation_reason' and update the docs to interpret it
> > > correctly for conflicts.
> >
> > Yes, there will be some sort of duplicity if we emit conflict_reason
> > as a text field. However, I still think the better way is to turn
> > conflict_reason text to conflict boolean and set it to true only on
> > rows_removed and wal_level_insufficient invalidations. When conflict
> > boolean is true, one (including all the tests that we've added
> > recently) can look for invalidation_reason text field for the reason.
> > This sounds reasonable to me as opposed to we just mentioning in the
> > docs that "if invalidation_reason is rows_removed or
> > wal_level_insufficient it's the reason for conflict with recovery".
> >
> 
> Fair point. I think we can go either way. Bertrand, Nathan, and
> others, do you have an opinion on this matter?

Sounds like a good approach to me and one will be able to quickly identify
if a conflict occured.

Regards,

-- 
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com



On Fri, Mar 15, 2024 at 12:49 PM shveta malik <shveta.malik@gmail.com> wrote:
>
> patch002:
>
> 1)
> I would like to understand the purpose of 'inactive_count'? Is it only
> for users for monitoring purposes? We are not using it anywhere
> internally.

inactive_count metric helps detect unstable replication slots
connections that have a lot of disconnections. It's not used for the
inactive_timeout based slot invalidation mechanism.

> I shutdown the instance 5 times and found that 'inactive_count' became
> 5 for all the slots created on that instance. Is this intentional?

Yes, it's incremented on shutdown (and for that matter upon every slot
release) for all the slots that are tied to walsenders.

> I mean we can not really use them if the instance is down.  I felt it
> should increment the inactive_count only if during the span of
> instance, they were actually inactive i.e. no streaming or replication
> happening through them.

inactive_count is persisted to disk- upon clean shutdown, so, once the
slots become active again, one gets to see the metric and deduce some
info on disconnections.

Having said that, I'm okay to hear from others on the inactive_count
metric being added.

> 2)
> slot.c:
> + case RS_INVAL_XID_AGE:
>
> Can we optimize this code? It has duplicate code for processing
> s->data.catalog_xmin and s->data.xmin. Can we create a sub-function
> for this purpose and call it twice here?

Good idea. Done that way.

> 2)
> The msg for patch 3 says:
> --------------
> a) when replication slots is lying inactive for a day or so using
> last_inactive_at metric,
> b) when a replication slot is becoming inactive too frequently using
> last_inactive_at metric.
> --------------
>  I think in b, you want to refer to inactive_count instead of last_inactive_at?

Right. Changed.

> 3)
> I do not see invalidation_reason updated for 2 new reasons in system-views.sgml

Nice catch. Added them now.

I've also responded to Bertrand's comments here.

On Wed, Mar 6, 2024 at 3:56 PM Bertrand Drouvot
<bertranddrouvot.pg@gmail.com> wrote:
>
> A few comments:
>
> 1 ===
>
> +       The reason for the slot's invalidation. <literal>NULL</literal> if the
> +       slot is currently actively being used.
>
> s/currently actively being used/not invalidated/ ? (I mean it could be valid
> and not being used).

Changed.

> 3 ===
>
>         res = executeQueryOrDie(conn, "SELECT slot_name, plugin, two_phase, failover, "
> -                                                       "%s as caught_up, conflict_reason IS NOT NULL as invalid "
> +                                                       "%s as caught_up, invalidation_reason IS NOT NULL as invalid
"
>                                                         "FROM pg_catalog.pg_replication_slots "
> -                                                       "(CASE WHEN conflict_reason IS NOT NULL THEN FALSE "
> +                                                       "(CASE WHEN invalidation_reason IS NOT NULL THEN FALSE "
>
> Yeah that's fine because there is logical slot filtering here.

Right. And, we really are looking for invalid slots there, so use of
invalidation_reason is much more correct than conflicting.

> 4 ===
>
> -GetSlotInvalidationCause(const char *conflict_reason)
> +GetSlotInvalidationCause(const char *invalidation_reason)
>
> Should we change the comment "Maps a conflict reason" above this function?

Changed.

> 5 ===
>
> -# Check conflict_reason is NULL for physical slot
> +# Check invalidation_reason is NULL for physical slot
>  $res = $node_primary->safe_psql(
>         'postgres', qq[
> -                SELECT conflict_reason is null FROM pg_replication_slots where slot_name = '$primary_slotname';]
> +                SELECT invalidation_reason is null FROM pg_replication_slots where slot_name = '$primary_slotname';]
>  );
>
>
> I don't think this test is needed anymore: it does not make that much sense since
> it's done after the primary database initialization and startup.

It is now turned into a test verifying 'conflicting boolean' is null
for the physical slot. Isn't that okay?

> 6 ===
>
>         'Logical slots are reported as non conflicting');
>
> What about?
>
> "
> # Verify slots are reported as valid in pg_replication_slots
>     'Logical slots are reported as valid');
> "

Changed.

Please see the attached v11 patch set with all the above review
comments addressed.

--
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Attachment
On Fri, Mar 15, 2024 at 10:45 AM Bharath Rupireddy
<bharath.rupireddyforpostgres@gmail.com> wrote:
>
> On Wed, Mar 13, 2024 at 9:38 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
> >
> > BTW, is XID the based parameter 'max_slot_xid_age' not have similarity
> > with 'max_slot_wal_keep_size'? I think it will impact the rows we
> > removed based on xid horizons. Don't we need to consider it while
> > vacuum computing the xid horizons in ComputeXidHorizons() similar to
> > what we do for WAL w.r.t 'max_slot_wal_keep_size'?
>
> I'm having a hard time understanding why we'd need something up there
> in ComputeXidHorizons(). Can you elaborate it a bit please?
>
> What's proposed with max_slot_xid_age is that during checkpoint we
> look at slot's xmin and catalog_xmin, and the current system txn id.
> Then, if the XID age of (xmin, catalog_xmin) and current_xid crosses
> max_slot_xid_age, we invalidate the slot.
>

I can see that in your patch (in function
InvalidatePossiblyObsoleteSlot()). As per my understanding, we need
something similar for slot xids in ComputeXidHorizons() as we are
doing WAL in KeepLogSeg(). In KeepLogSeg(), we compute the minimum LSN
location required by slots and then adjust it for
'max_slot_wal_keep_size'. On similar lines, currently in
ComputeXidHorizons(), we compute the minimum xid required by slots
(procArray->replication_slot_xmin and
procArray->replication_slot_catalog_xmin) but then don't adjust it for
'max_slot_xid_age'. I could be missing something in this but it is
better to keep discussing this and try to move with another parameter
'inactive_replication_slot_timeout' which according to me can be kept
at slot level instead of a GUC but OTOH we need to see the arguments
on both side and then decide which makes more sense.

--
With Regards,
Amit Kapila.



On Sat, Mar 16, 2024 at 3:55 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> procArray->replication_slot_catalog_xmin) but then don't adjust it for
> 'max_slot_xid_age'. I could be missing something in this but it is
> better to keep discussing this and try to move with another parameter
> 'inactive_replication_slot_timeout' which according to me can be kept
> at slot level instead of a GUC but OTOH we need to see the arguments
> on both side and then decide which makes more sense.

Hm. Are you suggesting inactive_timeout to be a slot level parameter
similar to 'failover' property added recently by
c393308b69d229b664391ac583b9e07418d411b6 and
73292404370c9900a96e2bebdc7144f7010339cf? With this approach, one can
set inactive_timeout while creating the slot either via
pg_create_physical_replication_slot() or
pg_create_logical_replication_slot() or CREATE_REPLICATION_SLOT or
ALTER_REPLICATION_SLOT command, and postgres tracks the
last_inactive_at for every slot based on which the slot gets
invalidated. If this understanding is right, I can go ahead and work
towards it.

Alternatively, we can go the route of making GUC a list of key-value
pairs of {slot_name, inactive_timeout}, but this kind of GUC for
setting slot level parameters is going to be the first of its kind, so
I'd prefer the above approach.

Thoughts?

--
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com



On Sun, Mar 17, 2024 at 2:03 PM Bharath Rupireddy
<bharath.rupireddyforpostgres@gmail.com> wrote:
>
> On Sat, Mar 16, 2024 at 3:55 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> >
> > procArray->replication_slot_catalog_xmin) but then don't adjust it for
> > 'max_slot_xid_age'. I could be missing something in this but it is
> > better to keep discussing this and try to move with another parameter
> > 'inactive_replication_slot_timeout' which according to me can be kept
> > at slot level instead of a GUC but OTOH we need to see the arguments
> > on both side and then decide which makes more sense.
>
> Hm. Are you suggesting inactive_timeout to be a slot level parameter
> similar to 'failover' property added recently by
> c393308b69d229b664391ac583b9e07418d411b6 and
> 73292404370c9900a96e2bebdc7144f7010339cf? With this approach, one can
> set inactive_timeout while creating the slot either via
> pg_create_physical_replication_slot() or
> pg_create_logical_replication_slot() or CREATE_REPLICATION_SLOT or
> ALTER_REPLICATION_SLOT command, and postgres tracks the
> last_inactive_at for every slot based on which the slot gets
> invalidated. If this understanding is right, I can go ahead and work
> towards it.
>

Yeah, I have something like that in mind. You can prepare the patch
but it would be good if others involved in this thread can also share
their opinion.

> Alternatively, we can go the route of making GUC a list of key-value
> pairs of {slot_name, inactive_timeout}, but this kind of GUC for
> setting slot level parameters is going to be the first of its kind, so
> I'd prefer the above approach.
>

I would prefer a slot-level parameter in this case rather than a GUC.

--
With Regards,
Amit Kapila.



On Sat, Mar 16, 2024 at 3:55 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> > What's proposed with max_slot_xid_age is that during checkpoint we
> > look at slot's xmin and catalog_xmin, and the current system txn id.
> > Then, if the XID age of (xmin, catalog_xmin) and current_xid crosses
> > max_slot_xid_age, we invalidate the slot.
> >
>
> I can see that in your patch (in function
> InvalidatePossiblyObsoleteSlot()). As per my understanding, we need
> something similar for slot xids in ComputeXidHorizons() as we are
> doing WAL in KeepLogSeg(). In KeepLogSeg(), we compute the minimum LSN
> location required by slots and then adjust it for
> 'max_slot_wal_keep_size'. On similar lines, currently in
> ComputeXidHorizons(), we compute the minimum xid required by slots
> (procArray->replication_slot_xmin and
> procArray->replication_slot_catalog_xmin) but then don't adjust it for
> 'max_slot_xid_age'. I could be missing something in this but it is
> better to keep discussing this

After invalidating slots because of max_slot_xid_age, the
procArray->replication_slot_xmin and
procArray->replication_slot_catalog_xmin are recomputed immediately in
InvalidateObsoleteReplicationSlots->ReplicationSlotsComputeRequiredXmin->ProcArraySetReplicationSlotXmin.
And, later the XID horizons in ComputeXidHorizons are computed before
the vacuum on each table via GetOldestNonRemovableTransactionId.
Aren't these enough? Do you want the XID horizons recomputed
immediately, something like the below?

/* Invalidate replication slots based on xmin or catalog_xmin age */
if (max_slot_xid_age > 0)
{
    if (InvalidateObsoleteReplicationSlots(RS_INVAL_XID_AGE,
                                           0, InvalidOid,
                                           InvalidTransactionId))
    {
        ComputeXidHorizonsResult horizons;

        /*
         * Some slots have been invalidated; update the XID horizons
         * as a side-effect.
         */
        ComputeXidHorizons(&horizons);
    }
}

--
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com



On Mon, Mar 18, 2024 at 9:58 AM Bharath Rupireddy
<bharath.rupireddyforpostgres@gmail.com> wrote:
>
> On Sat, Mar 16, 2024 at 3:55 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> >
> > > What's proposed with max_slot_xid_age is that during checkpoint we
> > > look at slot's xmin and catalog_xmin, and the current system txn id.
> > > Then, if the XID age of (xmin, catalog_xmin) and current_xid crosses
> > > max_slot_xid_age, we invalidate the slot.
> > >
> >
> > I can see that in your patch (in function
> > InvalidatePossiblyObsoleteSlot()). As per my understanding, we need
> > something similar for slot xids in ComputeXidHorizons() as we are
> > doing WAL in KeepLogSeg(). In KeepLogSeg(), we compute the minimum LSN
> > location required by slots and then adjust it for
> > 'max_slot_wal_keep_size'. On similar lines, currently in
> > ComputeXidHorizons(), we compute the minimum xid required by slots
> > (procArray->replication_slot_xmin and
> > procArray->replication_slot_catalog_xmin) but then don't adjust it for
> > 'max_slot_xid_age'. I could be missing something in this but it is
> > better to keep discussing this
>
> After invalidating slots because of max_slot_xid_age, the
> procArray->replication_slot_xmin and
> procArray->replication_slot_catalog_xmin are recomputed immediately in
> InvalidateObsoleteReplicationSlots->ReplicationSlotsComputeRequiredXmin->ProcArraySetReplicationSlotXmin.
> And, later the XID horizons in ComputeXidHorizons are computed before
> the vacuum on each table via GetOldestNonRemovableTransactionId.
> Aren't these enough?
>

IIUC, this will be delayed by one cycle in the vacuum rather than
doing it when the slot's xmin age is crossed and it can be
invalidated.

 Do you want the XID horizons recomputed
> immediately, something like the below?
>

I haven't thought of the exact logic but we can try to mimic the
handling similar to WAL.

--
With Regards,
Amit Kapila.



Hi,

On Sat, Mar 16, 2024 at 09:29:01AM +0530, Bharath Rupireddy wrote:
> I've also responded to Bertrand's comments here.

Thanks!

> 
> On Wed, Mar 6, 2024 at 3:56 PM Bertrand Drouvot
> <bertranddrouvot.pg@gmail.com> wrote:
> >
> > 5 ===
> >
> > -# Check conflict_reason is NULL for physical slot
> > +# Check invalidation_reason is NULL for physical slot
> >  $res = $node_primary->safe_psql(
> >         'postgres', qq[
> > -                SELECT conflict_reason is null FROM pg_replication_slots where slot_name = '$primary_slotname';]
> > +                SELECT invalidation_reason is null FROM pg_replication_slots where slot_name =
'$primary_slotname';]
> >  );
> >
> >
> > I don't think this test is needed anymore: it does not make that much sense since
> > it's done after the primary database initialization and startup.
> 
> It is now turned into a test verifying 'conflicting boolean' is null
> for the physical slot. Isn't that okay?

Yeah makes more sense now, thanks!

Regards,

-- 
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com



Hi,

On Mon, Mar 18, 2024 at 08:50:56AM +0530, Amit Kapila wrote:
> On Sun, Mar 17, 2024 at 2:03 PM Bharath Rupireddy
> <bharath.rupireddyforpostgres@gmail.com> wrote:
> >
> > On Sat, Mar 16, 2024 at 3:55 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> > >
> > > procArray->replication_slot_catalog_xmin) but then don't adjust it for
> > > 'max_slot_xid_age'. I could be missing something in this but it is
> > > better to keep discussing this and try to move with another parameter
> > > 'inactive_replication_slot_timeout' which according to me can be kept
> > > at slot level instead of a GUC but OTOH we need to see the arguments
> > > on both side and then decide which makes more sense.
> >
> > Hm. Are you suggesting inactive_timeout to be a slot level parameter
> > similar to 'failover' property added recently by
> > c393308b69d229b664391ac583b9e07418d411b6 and
> > 73292404370c9900a96e2bebdc7144f7010339cf? With this approach, one can
> > set inactive_timeout while creating the slot either via
> > pg_create_physical_replication_slot() or
> > pg_create_logical_replication_slot() or CREATE_REPLICATION_SLOT or
> > ALTER_REPLICATION_SLOT command, and postgres tracks the
> > last_inactive_at for every slot based on which the slot gets
> > invalidated. If this understanding is right, I can go ahead and work
> > towards it.
> >
> 
> Yeah, I have something like that in mind. You can prepare the patch
> but it would be good if others involved in this thread can also share
> their opinion.

I think it makes sense to put the inactive_timeout granularity at the slot
level (as the activity could vary a lot say between one slot linked to a 
subcription and one linked to some plugins). As far max_slot_xid_age I've the
feeling that a new GUC is good enough.

> > Alternatively, we can go the route of making GUC a list of key-value
> > pairs of {slot_name, inactive_timeout}, but this kind of GUC for
> > setting slot level parameters is going to be the first of its kind, so
> > I'd prefer the above approach.
> >
> 
> I would prefer a slot-level parameter in this case rather than a GUC.

Yeah, same here.

Regards,

-- 
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com



Hi,

On Sat, Mar 16, 2024 at 09:29:01AM +0530, Bharath Rupireddy wrote:
> Please see the attached v11 patch set with all the above review
> comments addressed.

Thanks!

Looking at 0001:

1 ===

+       True if this logical slot conflicted with recovery (and so is now
+       invalidated). When this column is true, check

Worth to add back the physical slot mention "Always NULL for physical slots."?

2 ===

@@ -1023,9 +1023,10 @@ CREATE VIEW pg_replication_slots AS
             L.wal_status,
             L.safe_wal_size,
             L.two_phase,
-            L.conflict_reason,
+            L.conflicting,
             L.failover,
-            L.synced
+            L.synced,
+            L.invalidation_reason

What about making invalidation_reason close to conflict_reason?

3 ===

- * Maps a conflict reason for a replication slot to
+ * Maps a invalidation reason for a replication slot to

s/a invalidation/an invalidation/?

4 ===

While at it, shouldn't we also rename "conflict" to say "invalidation_cause" in
InvalidatePossiblyObsoleteSlot()?

5 ===

+ * rows_removed and wal_level_insufficient are only two reasons

s/are only two/are the only two/?

Regards,

-- 
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com



Hi,

On Thu, Mar 14, 2024 at 12:27:26PM +0530, Amit Kapila wrote:
> On Wed, Mar 13, 2024 at 10:16 PM Bharath Rupireddy
> <bharath.rupireddyforpostgres@gmail.com> wrote:
> >
> > On Wed, Mar 13, 2024 at 11:13 AM shveta malik <shveta.malik@gmail.com> wrote:
> > >
> > > > Thanks. v8-0001 is how it looks. Please see the v8 patch set with this change.
> > >
> > > JFYI, the patch does not apply to the head. There is a conflict in
> > > multiple files.
> >
> > Thanks for looking into this. I noticed that the v8 patches needed
> > rebase. Before I go do anything with the patches, I'm trying to gain
> > consensus on the design. Following is the summary of design choices
> > we've discussed so far:
> > 1) conflict_reason vs invalidation_reason.
> > 2) When to compute the XID age?
> >
> 
> I feel we should focus on two things (a) one is to introduce a new
> column invalidation_reason, and (b) let's try to first complete
> invalidation due to timeout. We can look into XID stuff if time
> permits, remember, we don't have ample time left.

Agree. While it makes sense to invalidate slots for wal removal in
CreateCheckPoint() (because this is the place where wal is removed), I 'm not
sure this is the right place for the 2 new cases.

Let's focus on the timeout one as proposed above (as probably the simplest one):
as this one is purely related to time and activity what about to invalidate them
when?:

- their usage resume
- in pg_get_replication_slots()

The idea is to invalidate the slot when one resumes activity on it or wants to
get information about it (and among other things wants to know if the slot is
valid or not).

Thoughts?

Regards,

-- 
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com



On Mon, Mar 18, 2024 at 8:19 PM Bertrand Drouvot
<bertranddrouvot.pg@gmail.com> wrote:
>
> On Thu, Mar 14, 2024 at 12:27:26PM +0530, Amit Kapila wrote:
> > On Wed, Mar 13, 2024 at 10:16 PM Bharath Rupireddy
> > <bharath.rupireddyforpostgres@gmail.com> wrote:
> > >
> > > On Wed, Mar 13, 2024 at 11:13 AM shveta malik <shveta.malik@gmail.com> wrote:
> > > >
> > > > > Thanks. v8-0001 is how it looks. Please see the v8 patch set with this change.
> > > >
> > > > JFYI, the patch does not apply to the head. There is a conflict in
> > > > multiple files.
> > >
> > > Thanks for looking into this. I noticed that the v8 patches needed
> > > rebase. Before I go do anything with the patches, I'm trying to gain
> > > consensus on the design. Following is the summary of design choices
> > > we've discussed so far:
> > > 1) conflict_reason vs invalidation_reason.
> > > 2) When to compute the XID age?
> > >
> >
> > I feel we should focus on two things (a) one is to introduce a new
> > column invalidation_reason, and (b) let's try to first complete
> > invalidation due to timeout. We can look into XID stuff if time
> > permits, remember, we don't have ample time left.
>
> Agree. While it makes sense to invalidate slots for wal removal in
> CreateCheckPoint() (because this is the place where wal is removed), I 'm not
> sure this is the right place for the 2 new cases.
>
> Let's focus on the timeout one as proposed above (as probably the simplest one):
> as this one is purely related to time and activity what about to invalidate them
> when?:
>
> - their usage resume
> - in pg_get_replication_slots()
>
> The idea is to invalidate the slot when one resumes activity on it or wants to
> get information about it (and among other things wants to know if the slot is
> valid or not).
>

Trying to invalidate at those two places makes sense to me but we
still need to cover the cases where it takes very long to resume the
slot activity and the dangling slot cases where the activity is never
resumed. How about apart from the above two places, trying to
invalidate in CheckPointReplicationSlots() where we are traversing all
the slots? This could prevent invalid slots from being marked as
dirty.

BTW, how will the user use 'inactive_count' to know whether a
replication slot is becoming inactive too frequently? The patch just
keeps incrementing this counter, one will never know in the last 'n'
minutes, how many times the slot became inactive unless there is some
monitoring tool that keeps capturing this counter from time to time
and calculates the frequency in some way. Even, if this is useful, it
is not clear to me whether we need to store 'inactive_count' in the
slot's persistent data. I understand it could be a metric required by
the user but wouldn't it be better to track this via
pg_stat_replication_slots such that we don't need to store this in
slot's persist data? If this understanding is correct, I would say
let's remove 'inactive_count' as well from the main patch and discuss
it separately.

--
With Regards,
Amit Kapila.



Hi,

On Tue, Mar 19, 2024 at 10:56:25AM +0530, Amit Kapila wrote:
> On Mon, Mar 18, 2024 at 8:19 PM Bertrand Drouvot
> <bertranddrouvot.pg@gmail.com> wrote:
> > Agree. While it makes sense to invalidate slots for wal removal in
> > CreateCheckPoint() (because this is the place where wal is removed), I 'm not
> > sure this is the right place for the 2 new cases.
> >
> > Let's focus on the timeout one as proposed above (as probably the simplest one):
> > as this one is purely related to time and activity what about to invalidate them
> > when?:
> >
> > - their usage resume
> > - in pg_get_replication_slots()
> >
> > The idea is to invalidate the slot when one resumes activity on it or wants to
> > get information about it (and among other things wants to know if the slot is
> > valid or not).
> >
> 
> Trying to invalidate at those two places makes sense to me but we
> still need to cover the cases where it takes very long to resume the
> slot activity and the dangling slot cases where the activity is never
> resumed.

I understand it's better to have the slot reflecting its real status internally
but it is a real issue if that's not the case until the activity on it is resumed?
(just asking, not saying we should not)

> How about apart from the above two places, trying to
> invalidate in CheckPointReplicationSlots() where we are traversing all
> the slots?

I think that's a good place but there is still a window of time (that could also
be "large" depending of the activity and the checkpoint frequency) during which
the slot is not known as invalid internally. But yeah, at leat we know that we'll
mark it as invalid at some point...

BTW:

        if (am_walsender)
        {
+               if (slot->data.persistency == RS_PERSISTENT)
+               {
+                       SpinLockAcquire(&slot->mutex);
+                       slot->data.last_inactive_at = GetCurrentTimestamp();
+                       slot->data.inactive_count++;
+                       SpinLockRelease(&slot->mutex);

I'm also feeling the same concern as Shveta mentioned in [1]: that a "normal"
backend using pg_logical_slot_get_changes() or friends would not set the
last_inactive_at.

[1]: https://www.postgresql.org/message-id/CAJpy0uD64X%3D2ENmbHaRiWTKeQawr-rbGoy_GdhQQLVXzUSKTMg%40mail.gmail.com

Regards,

-- 
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com



On Tue, Mar 19, 2024 at 3:11 PM Bertrand Drouvot
<bertranddrouvot.pg@gmail.com> wrote:
>
> On Tue, Mar 19, 2024 at 10:56:25AM +0530, Amit Kapila wrote:
> > On Mon, Mar 18, 2024 at 8:19 PM Bertrand Drouvot
> > <bertranddrouvot.pg@gmail.com> wrote:
> > > Agree. While it makes sense to invalidate slots for wal removal in
> > > CreateCheckPoint() (because this is the place where wal is removed), I 'm not
> > > sure this is the right place for the 2 new cases.
> > >
> > > Let's focus on the timeout one as proposed above (as probably the simplest one):
> > > as this one is purely related to time and activity what about to invalidate them
> > > when?:
> > >
> > > - their usage resume
> > > - in pg_get_replication_slots()
> > >
> > > The idea is to invalidate the slot when one resumes activity on it or wants to
> > > get information about it (and among other things wants to know if the slot is
> > > valid or not).
> > >
> >
> > Trying to invalidate at those two places makes sense to me but we
> > still need to cover the cases where it takes very long to resume the
> > slot activity and the dangling slot cases where the activity is never
> > resumed.
>
> I understand it's better to have the slot reflecting its real status internally
> but it is a real issue if that's not the case until the activity on it is resumed?
> (just asking, not saying we should not)
>

Sorry, I didn't understand your point. Can you try to explain by example?

--
With Regards,
Amit Kapila.



Hi,

On Tue, Mar 19, 2024 at 04:20:35PM +0530, Amit Kapila wrote:
> On Tue, Mar 19, 2024 at 3:11 PM Bertrand Drouvot
> <bertranddrouvot.pg@gmail.com> wrote:
> >
> > On Tue, Mar 19, 2024 at 10:56:25AM +0530, Amit Kapila wrote:
> > > On Mon, Mar 18, 2024 at 8:19 PM Bertrand Drouvot
> > > <bertranddrouvot.pg@gmail.com> wrote:
> > > > Agree. While it makes sense to invalidate slots for wal removal in
> > > > CreateCheckPoint() (because this is the place where wal is removed), I 'm not
> > > > sure this is the right place for the 2 new cases.
> > > >
> > > > Let's focus on the timeout one as proposed above (as probably the simplest one):
> > > > as this one is purely related to time and activity what about to invalidate them
> > > > when?:
> > > >
> > > > - their usage resume
> > > > - in pg_get_replication_slots()
> > > >
> > > > The idea is to invalidate the slot when one resumes activity on it or wants to
> > > > get information about it (and among other things wants to know if the slot is
> > > > valid or not).
> > > >
> > >
> > > Trying to invalidate at those two places makes sense to me but we
> > > still need to cover the cases where it takes very long to resume the
> > > slot activity and the dangling slot cases where the activity is never
> > > resumed.
> >
> > I understand it's better to have the slot reflecting its real status internally
> > but it is a real issue if that's not the case until the activity on it is resumed?
> > (just asking, not saying we should not)
> >
> 
> Sorry, I didn't understand your point. Can you try to explain by example?

Sorry if that was not clear, let me try to rephrase it first: what issue to you
see if the invalidation of such a slot occurs only when its usage resume or
when pg_get_replication_slots() is triggered? I understand that this could lead
to the slot not being invalidated (maybe forever) but is that an issue for an
inactive slot?

Regards,

-- 
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com



On Mon, Mar 18, 2024 at 3:02 PM Bertrand Drouvot
<bertranddrouvot.pg@gmail.com> wrote:
>
> > > Hm. Are you suggesting inactive_timeout to be a slot level parameter
> > > similar to 'failover' property added recently by
> > > c393308b69d229b664391ac583b9e07418d411b6 and
> > > 73292404370c9900a96e2bebdc7144f7010339cf?
> >
> > Yeah, I have something like that in mind. You can prepare the patch
> > but it would be good if others involved in this thread can also share
> > their opinion.
>
> I think it makes sense to put the inactive_timeout granularity at the slot
> level (as the activity could vary a lot say between one slot linked to a
> subcription and one linked to some plugins). As far max_slot_xid_age I've the
> feeling that a new GUC is good enough.

Well, here I'm implementing the above idea. The attached v12 patches
majorly have the following changes:

1. inactive_timeout is now slot-level, that is, one can set it while
creating the slot either via SQL functions or via replication commands
or via subscription.
2. last_inactive_at and inactive_timeout are now tracked in on-disk
replication slot data structure.
3. last_inactive_at is now set even for non-walsenders whenever the
slot is released as opposed to initial versions of the patches setting
it only for walsenders.
4. slot's inactive_timeout parameter is now migrated to the new
cluster with pg_upgrade.
5. slot's inactive_timeout parameter is now synced to the standby when
failover is enabled for the slot.
6. Test cases are added to cover most of the above cases including new
invalidation mechanisms.

Following are some open points:

1. Where to do inactive_timeout invalidation exactly if not the checkpointer.
2. Where to do XID age invalidation exactly if not the checkpointer.
3. How to go about recomputing XID horizons based on max_slot_xid_age.
Does the slot's horizon's need to be adjusted in ComputeXidHorizons()?
4. New invalidation mechanisms interaction with slot sync feature.
5. Review comments on 0001 from Bertrand.

Please see the attached v12 patches.

--
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Attachment
On Tue, Mar 19, 2024 at 6:12 PM Bertrand Drouvot
<bertranddrouvot.pg@gmail.com> wrote:
>
> On Tue, Mar 19, 2024 at 04:20:35PM +0530, Amit Kapila wrote:
> > On Tue, Mar 19, 2024 at 3:11 PM Bertrand Drouvot
> > <bertranddrouvot.pg@gmail.com> wrote:
> > >
> > > On Tue, Mar 19, 2024 at 10:56:25AM +0530, Amit Kapila wrote:
> > > > On Mon, Mar 18, 2024 at 8:19 PM Bertrand Drouvot
> > > > <bertranddrouvot.pg@gmail.com> wrote:
> > > > > Agree. While it makes sense to invalidate slots for wal removal in
> > > > > CreateCheckPoint() (because this is the place where wal is removed), I 'm not
> > > > > sure this is the right place for the 2 new cases.
> > > > >
> > > > > Let's focus on the timeout one as proposed above (as probably the simplest one):
> > > > > as this one is purely related to time and activity what about to invalidate them
> > > > > when?:
> > > > >
> > > > > - their usage resume
> > > > > - in pg_get_replication_slots()
> > > > >
> > > > > The idea is to invalidate the slot when one resumes activity on it or wants to
> > > > > get information about it (and among other things wants to know if the slot is
> > > > > valid or not).
> > > > >
> > > >
> > > > Trying to invalidate at those two places makes sense to me but we
> > > > still need to cover the cases where it takes very long to resume the
> > > > slot activity and the dangling slot cases where the activity is never
> > > > resumed.
> > >
> > > I understand it's better to have the slot reflecting its real status internally
> > > but it is a real issue if that's not the case until the activity on it is resumed?
> > > (just asking, not saying we should not)
> > >
> >
> > Sorry, I didn't understand your point. Can you try to explain by example?
>
> Sorry if that was not clear, let me try to rephrase it first: what issue to you
> see if the invalidation of such a slot occurs only when its usage resume or
> when pg_get_replication_slots() is triggered? I understand that this could lead
> to the slot not being invalidated (maybe forever) but is that an issue for an
> inactive slot?
>

It has the risk of preventing WAL and row removal. I think this is the
primary reason we are at the first place planning to have such a
parameter. So, we should have some way to invalidate it even when the
walsender/backend process doesn't use it again.

--
With Regards,
Amit Kapila.



On Wed, Mar 20, 2024 at 12:49 AM Bharath Rupireddy
<bharath.rupireddyforpostgres@gmail.com> wrote:
>
>
> Following are some open points:
>
> 1. Where to do inactive_timeout invalidation exactly if not the checkpointer.
>

I have suggested to do it at the time of CheckpointReplicationSlots()
and Bertrand suggested to do it whenever we resume using the slot. I
think we should follow both the suggestions.

> 2. Where to do XID age invalidation exactly if not the checkpointer.
> 3. How to go about recomputing XID horizons based on max_slot_xid_age.
> Does the slot's horizon's need to be adjusted in ComputeXidHorizons()?
>

I suggest postponing the patch for xid based invalidation for a later
discussion.

> 4. New invalidation mechanisms interaction with slot sync feature.
>

Yeah, this is important. My initial thoughts are that synced slots
shouldn't be invalidated on the standby due to timeout.

> 5. Review comments on 0001 from Bertrand.
>
> Please see the attached v12 patches.
>

Thanks for quickly updating the patches.

--
With Regards,
Amit Kapila.



On Mon, Mar 18, 2024 at 3:42 PM Bertrand Drouvot
<bertranddrouvot.pg@gmail.com> wrote:
>
> Hi,
>
> Looking at 0001:

Thanks for reviewing.

> 1 ===
>
> +       True if this logical slot conflicted with recovery (and so is now
> +       invalidated). When this column is true, check
>
> Worth to add back the physical slot mention "Always NULL for physical slots."?

Will change.

> 2 ===
>
> @@ -1023,9 +1023,10 @@ CREATE VIEW pg_replication_slots AS
>              L.wal_status,
>              L.safe_wal_size,
>              L.two_phase,
> -            L.conflict_reason,
> +            L.conflicting,
>              L.failover,
> -            L.synced
> +            L.synced,
> +            L.invalidation_reason
>
> What about making invalidation_reason close to conflict_reason?

Not required I think. One can pick the required columns in the SELECT
clause anyways.

> 3 ===
>
> - * Maps a conflict reason for a replication slot to
> + * Maps a invalidation reason for a replication slot to
>
> s/a invalidation/an invalidation/?

Will change.

> 4 ===
>
> While at it, shouldn't we also rename "conflict" to say "invalidation_cause" in
> InvalidatePossiblyObsoleteSlot()?

That's inline with our understanding about conflict vs invalidation,
and keeps the function generic. Will change.

> 5 ===
>
> + * rows_removed and wal_level_insufficient are only two reasons
>
> s/are only two/are the only two/?

Will change..

--
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com



Hi,

On Wed, Mar 20, 2024 at 08:58:05AM +0530, Amit Kapila wrote:
> On Wed, Mar 20, 2024 at 12:49 AM Bharath Rupireddy
> <bharath.rupireddyforpostgres@gmail.com> wrote:
> >
> >
> > Following are some open points:
> >
> > 1. Where to do inactive_timeout invalidation exactly if not the checkpointer.
> >
> 
> I have suggested to do it at the time of CheckpointReplicationSlots()
> and Bertrand suggested to do it whenever we resume using the slot. I
> think we should follow both the suggestions.

Agree. I also think that pg_get_replication_slots() would be a good place, so
that queries would return the right invalidation status.

> > 4. New invalidation mechanisms interaction with slot sync feature.
> >
> 
> Yeah, this is important. My initial thoughts are that synced slots
> shouldn't be invalidated on the standby due to timeout.

+1

Regards,

-- 
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com



Hi,

On Wed, Mar 20, 2024 at 12:48:55AM +0530, Bharath Rupireddy wrote:
> On Mon, Mar 18, 2024 at 3:02 PM Bertrand Drouvot
> <bertranddrouvot.pg@gmail.com> wrote:
> >
> > > > Hm. Are you suggesting inactive_timeout to be a slot level parameter
> > > > similar to 'failover' property added recently by
> > > > c393308b69d229b664391ac583b9e07418d411b6 and
> > > > 73292404370c9900a96e2bebdc7144f7010339cf?
> > >
> > > Yeah, I have something like that in mind. You can prepare the patch
> > > but it would be good if others involved in this thread can also share
> > > their opinion.
> >
> > I think it makes sense to put the inactive_timeout granularity at the slot
> > level (as the activity could vary a lot say between one slot linked to a
> > subcription and one linked to some plugins). As far max_slot_xid_age I've the
> > feeling that a new GUC is good enough.
> 
> Well, here I'm implementing the above idea.

Thanks!

> The attached v12 patches
> majorly have the following changes:
> 
> 2. last_inactive_at and inactive_timeout are now tracked in on-disk
> replication slot data structure.

Should last_inactive_at be tracked on disk? Say the engine is down for a period
of time > inactive_timeout then the slot will be invalidated after the engine
re-start (if no activity before we invalidate the slot). Should the time the
engine is down be counted as "inactive" time? I've the feeling it should not, and
that we should only take into account inactive time while the engine is up.

Regards,

-- 
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com



Hi,

On Wed, Mar 20, 2024 at 12:48:55AM +0530, Bharath Rupireddy wrote:
> On Mon, Mar 18, 2024 at 3:02 PM Bertrand Drouvot
> <bertranddrouvot.pg@gmail.com> wrote:
> >
> > > > Hm. Are you suggesting inactive_timeout to be a slot level parameter
> > > > similar to 'failover' property added recently by
> > > > c393308b69d229b664391ac583b9e07418d411b6 and
> > > > 73292404370c9900a96e2bebdc7144f7010339cf?
> > >
> > > Yeah, I have something like that in mind. You can prepare the patch
> > > but it would be good if others involved in this thread can also share
> > > their opinion.
> >
> > I think it makes sense to put the inactive_timeout granularity at the slot
> > level (as the activity could vary a lot say between one slot linked to a
> > subcription and one linked to some plugins). As far max_slot_xid_age I've the
> > feeling that a new GUC is good enough.
> 
> Well, here I'm implementing the above idea. The attached v12 patches
> majorly have the following changes:
> 

Regarding v12-0004: "Allow setting inactive_timeout in the replication command",
shouldn't we also add an new SQL API say: pg_alter_replication_slot() that would
allow to change the timeout property? 

That would allow users to alter this property without the need to make a
replication connection. 

But the issue is that it would make it inconsistent with the new inactivetimeout
in the subscription that is added in "v12-0005". But do we need to display
subinactivetimeout in pg_subscription (and even allow it at subscription creation
/ alter) after all? (I've the feeling there is less such a need as compare to
subfailover, subtwophasestate for example).

Regards,

-- 
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com



On Wed, Mar 20, 2024 at 1:04 PM Bertrand Drouvot
<bertranddrouvot.pg@gmail.com> wrote:
>
> On Wed, Mar 20, 2024 at 08:58:05AM +0530, Amit Kapila wrote:
> > On Wed, Mar 20, 2024 at 12:49 AM Bharath Rupireddy
> > <bharath.rupireddyforpostgres@gmail.com> wrote:
> > >
> > > Following are some open points:
> > >
> > > 1. Where to do inactive_timeout invalidation exactly if not the checkpointer.
> > >
> > I have suggested to do it at the time of CheckpointReplicationSlots()
> > and Bertrand suggested to do it whenever we resume using the slot. I
> > think we should follow both the suggestions.
>
> Agree. I also think that pg_get_replication_slots() would be a good place, so
> that queries would return the right invalidation status.

I've addressed review comments and attaching the v13 patches with the
following changes:

1. Invalidate replication slot due to inactive_timeout:
1.1 In CheckpointReplicationSlots() to help with automatic invalidation.
1.2 In pg_get_replication_slots to help readers see the latest slot information.
1.3 In ReplicationSlotAcquire for walsenders as typically walsenders
are the ones that use slots for longer durations for streaming
standbys and logical subscribers.
1.4 In ReplicationSlotAcquire when called from
pg_logical_slot_get_changes_guts to help with logical decoding clients
to disallow decoding from invalidated slots.
1.5 In ReplicationSlotAcquire when called from
pg_replication_slot_advance to help with disallowing advancing
invalidated slots.
2. Have a new input parameter bool check_for_invalidation for
ReplicationSlotAcquire(). When true, check for the inactive_timeout
invalidation, if invalidated, error out.
3. Have a new function to just do inactive_timeout invalidation.
4. Do not update last_inactive_at for failover slots on standby to not
invalidate failover slots on the standby.
5. In ReplicationSlotAcquire(), invalidate the slot before making it active.
6. Make last_inactive_at a shared-memory parameter as opposed to an
on-disk parameter to help not count the server downtime for inactive
time.
7. Let the failover slot on standby and pg_upgraded slots get
inactive_timeout parameter from the primary and old cluster
respectively.

Please see the attached v13 patches.

--
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Attachment
On Wed, Mar 20, 2024 at 7:08 PM Bertrand Drouvot
<bertranddrouvot.pg@gmail.com> wrote:
>
> Regarding v12-0004: "Allow setting inactive_timeout in the replication command",
> shouldn't we also add an new SQL API say: pg_alter_replication_slot() that would
> allow to change the timeout property?
>
> That would allow users to alter this property without the need to make a
> replication connection.

+1 to add a new SQL function pg_alter_replication_slot(). It helps
first create the slots and then later decide the appropriate
inactive_timeout. It might grow into altering other slot parameters
such as failover (I'm not sure if altering failover property on the
primary after a while makes it the right candidate for syncing on the
standby). Perhaps, we can add it for altering just inactive_timeout
for now and be done with it.

FWIW, ALTER_REPLICATION_SLOT was added keeping in mind just the
failover property for logical slots, that's why it emits an error
"cannot use ALTER_REPLICATION_SLOT with a physical replication slot"

> But the issue is that it would make it inconsistent with the new inactivetimeout
> in the subscription that is added in "v12-0005".

Can you please elaborate what the inconsistency it causes with inactivetimeout?

> But do we need to display
> subinactivetimeout in pg_subscription (and even allow it at subscription creation
> / alter) after all? (I've the feeling there is less such a need as compare to
> subfailover, subtwophasestate for example).

Maybe we don't need to. One can always trace down to the replication
slot associated with the subscription on the publisher, and get to
know what the slot's inactive_timeout setting is. However, it looks to
me that it avoids one going to the publisher to know the
inactive_timeout value for a subscription. Moreover, we are allowing
the inactive_timeout to be set via CREATE/ALTER SUBSCRIPTION command,
I believe there's nothing wrong if it's also part of the
pg_subscription catalog.

--
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com



On Wed, Mar 20, 2024 at 1:51 PM Bertrand Drouvot
<bertranddrouvot.pg@gmail.com> wrote:
>
> On Wed, Mar 20, 2024 at 12:48:55AM +0530, Bharath Rupireddy wrote:
> >
> > 2. last_inactive_at and inactive_timeout are now tracked in on-disk
> > replication slot data structure.
>
> Should last_inactive_at be tracked on disk? Say the engine is down for a period
> of time > inactive_timeout then the slot will be invalidated after the engine
> re-start (if no activity before we invalidate the slot). Should the time the
> engine is down be counted as "inactive" time? I've the feeling it should not, and
> that we should only take into account inactive time while the engine is up.
>

Good point. The question is how do we achieve this without persisting
the 'last_inactive_at'? Say, 'last_inactive_at' for a particular slot
had some valid value before we shut down but it still didn't cross the
configured 'inactive_timeout' value, so, we won't be able to
invalidate it. Now, after the restart, as we don't know the
last_inactive_at's value before the shutdown, we will initialize it
with 0 (this is what Bharath seems to have done in the latest
v13-0002* patch). After this, even if walsender or backend never
acquires the slot, we won't invalidate it. OTOH, if we track
'last_inactive_at' on the disk, after, restart, we could initialize it
to the current time if the value is non-zero. Do you have any better
ideas?

--
With Regards,
Amit Kapila.



On Thu, Mar 21, 2024 at 5:19 AM Bharath Rupireddy
<bharath.rupireddyforpostgres@gmail.com> wrote:
>
> On Wed, Mar 20, 2024 at 7:08 PM Bertrand Drouvot
> <bertranddrouvot.pg@gmail.com> wrote:
> >
> > Regarding v12-0004: "Allow setting inactive_timeout in the replication command",
> > shouldn't we also add an new SQL API say: pg_alter_replication_slot() that would
> > allow to change the timeout property?
> >
> > That would allow users to alter this property without the need to make a
> > replication connection.
>
> +1 to add a new SQL function pg_alter_replication_slot().
>

I also don't see any obvious problem with such an API. However, this
is not a good time to invent new APIs. Let's keep the feature simple
and then we can extend it in the next version after more discussion
and probably by that time we will get some feedback from the field as
well.

>
> It helps
> first create the slots and then later decide the appropriate
> inactive_timeout. It might grow into altering other slot parameters
> such as failover (I'm not sure if altering failover property on the
> primary after a while makes it the right candidate for syncing on the
> standby). Perhaps, we can add it for altering just inactive_timeout
> for now and be done with it.
>
> FWIW, ALTER_REPLICATION_SLOT was added keeping in mind just the
> failover property for logical slots, that's why it emits an error
> "cannot use ALTER_REPLICATION_SLOT with a physical replication slot"
>
> > But the issue is that it would make it inconsistent with the new inactivetimeout
> > in the subscription that is added in "v12-0005".
>
> Can you please elaborate what the inconsistency it causes with inactivetimeout?
>

I think the inconsistency can arise from the fact that on publisher
one can change the inactive_timeout for the slot corresponding to a
subscription but the subscriber won't know, so it will still show the
old value. If we want we can document this as a limitation and let
users be aware of it. However, I feel at this stage, let's not even
expose this from the subscription or maybe we can discuss it once/if
we are done with other patches. Anyway, if one wants to use this
feature with a subscription, she can create a slot first on the
publisher with inactive_timeout value and then associate such a slot
with a required subscription.

--
With Regards,
Amit Kapila.



On Thu, Mar 21, 2024 at 9:07 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> I also don't see any obvious problem with such an API. However, this
> is not a good time to invent new APIs. Let's keep the feature simple
> and then we can extend it in the next version after more discussion
> and probably by that time we will get some feedback from the field as
> well.

I couldn't agree more.

> > > But the issue is that it would make it inconsistent with the new inactivetimeout
> > > in the subscription that is added in "v12-0005".
> >
> > Can you please elaborate what the inconsistency it causes with inactivetimeout?
> >
> I think the inconsistency can arise from the fact that on publisher
> one can change the inactive_timeout for the slot corresponding to a
> subscription but the subscriber won't know, so it will still show the
> old value.

Understood.

> If we want we can document this as a limitation and let
> users be aware of it. However, I feel at this stage, let's not even
> expose this from the subscription or maybe we can discuss it once/if
> we are done with other patches. Anyway, if one wants to use this
> feature with a subscription, she can create a slot first on the
> publisher with inactive_timeout value and then associate such a slot
> with a required subscription.

If we are not exposing it via subscription (meaning, we don't consider
v13-0004 and v13-0005 patches), I feel we can have a new SQL API
pg_alter_replication_slot(int inactive_timeout) for now just altering
the inactive_timeout of a given slot.

With this approach, one can do either of the following:
1) Create a slot with SQL API with inactive_timeout set, and use it
for subscriptions or for streaming standbys.
2) Create a slot with SQL API without inactive_timeout set, use it for
subscriptions or for streaming standbys, and set inactive_timeout
later via pg_alter_replication_slot() depending on how the slot is
consumed
3) Create a subscription with create_slot=true, and set
inactive_timeout via pg_alter_replication_slot() depending on how the
slot is consumed.

This approach seems consistent and minimal to start with.

If we agree on this, I'll drop both 0004 and 0005 that are allowing
inactive_timeout to be set via replication commands and via
create/alter subscription respectively, and implement
pg_alter_replication_slot().

FWIW, adding the new SQL API pg_alter_replication_slot() isn't that hard.

--
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com



On Thu, Mar 21, 2024 at 8:47 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> On Wed, Mar 20, 2024 at 1:51 PM Bertrand Drouvot
> <bertranddrouvot.pg@gmail.com> wrote:
> >
> > On Wed, Mar 20, 2024 at 12:48:55AM +0530, Bharath Rupireddy wrote:
> > >
> > > 2. last_inactive_at and inactive_timeout are now tracked in on-disk
> > > replication slot data structure.
> >
> > Should last_inactive_at be tracked on disk? Say the engine is down for a period
> > of time > inactive_timeout then the slot will be invalidated after the engine
> > re-start (if no activity before we invalidate the slot). Should the time the
> > engine is down be counted as "inactive" time? I've the feeling it should not, and
> > that we should only take into account inactive time while the engine is up.
> >
>
> Good point. The question is how do we achieve this without persisting
> the 'last_inactive_at'? Say, 'last_inactive_at' for a particular slot
> had some valid value before we shut down but it still didn't cross the
> configured 'inactive_timeout' value, so, we won't be able to
> invalidate it. Now, after the restart, as we don't know the
> last_inactive_at's value before the shutdown, we will initialize it
> with 0 (this is what Bharath seems to have done in the latest
> v13-0002* patch). After this, even if walsender or backend never
> acquires the slot, we won't invalidate it. OTOH, if we track
> 'last_inactive_at' on the disk, after, restart, we could initialize it
> to the current time if the value is non-zero. Do you have any better
> ideas?

This sounds reasonable to me at least.

--
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com



Hi,

On Thu, Mar 21, 2024 at 08:47:18AM +0530, Amit Kapila wrote:
> On Wed, Mar 20, 2024 at 1:51 PM Bertrand Drouvot
> <bertranddrouvot.pg@gmail.com> wrote:
> >
> > On Wed, Mar 20, 2024 at 12:48:55AM +0530, Bharath Rupireddy wrote:
> > >
> > > 2. last_inactive_at and inactive_timeout are now tracked in on-disk
> > > replication slot data structure.
> >
> > Should last_inactive_at be tracked on disk? Say the engine is down for a period
> > of time > inactive_timeout then the slot will be invalidated after the engine
> > re-start (if no activity before we invalidate the slot). Should the time the
> > engine is down be counted as "inactive" time? I've the feeling it should not, and
> > that we should only take into account inactive time while the engine is up.
> >
> 
> Good point. The question is how do we achieve this without persisting
> the 'last_inactive_at'? Say, 'last_inactive_at' for a particular slot
> had some valid value before we shut down but it still didn't cross the
> configured 'inactive_timeout' value, so, we won't be able to
> invalidate it. Now, after the restart, as we don't know the
> last_inactive_at's value before the shutdown, we will initialize it
> with 0 (this is what Bharath seems to have done in the latest
> v13-0002* patch). After this, even if walsender or backend never
> acquires the slot, we won't invalidate it. OTOH, if we track
> 'last_inactive_at' on the disk, after, restart, we could initialize it
> to the current time if the value is non-zero. Do you have any better
> ideas?
> 

I think that setting last_inactive_at when we restart makes sense if the slot
has been active previously. I think the idea is because it's holding xmin/catalog_xmin
and that we don't want to prevent rows removal longer that the timeout.

So what about relying on xmin/catalog_xmin instead that way?

- For physical slots if xmin is set then set last_inactive_at to the current
time at restart (else zero).

- For logical slot, it's not the same as the catalog_xmin is set at the slot
creation time. So what about setting last_inactive_at at the current time at 
restart but also at creation time for logical slot? (Setting it to zero at
creation time (as we do in v13) does not look right, given the fact that it's
"already" holding a catalog_xmin).

That way, we'd ensure that we are not holding rows for longer that the timeout
and we don't need to persist last_inactive_at.

Regards,

-- 
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com



Hi,

On Thu, Mar 21, 2024 at 10:53:54AM +0530, Bharath Rupireddy wrote:
> On Thu, Mar 21, 2024 at 9:07 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
> > > > But the issue is that it would make it inconsistent with the new inactivetimeout
> > > > in the subscription that is added in "v12-0005".
> > >
> > > Can you please elaborate what the inconsistency it causes with inactivetimeout?
> > >
> > I think the inconsistency can arise from the fact that on publisher
> > one can change the inactive_timeout for the slot corresponding to a
> > subscription but the subscriber won't know, so it will still show the
> > old value.

Yeah, that was what I had in mind.

> > If we want we can document this as a limitation and let
> > users be aware of it. However, I feel at this stage, let's not even
> > expose this from the subscription or maybe we can discuss it once/if
> > we are done with other patches.

I agree, it's important to expose it for things like "failover" but I think we
can get rid of it for the timeout one.

>>  Anyway, if one wants to use this
> > feature with a subscription, she can create a slot first on the
> > publisher with inactive_timeout value and then associate such a slot
> > with a required subscription.

Right.

> 
> If we are not exposing it via subscription (meaning, we don't consider
> v13-0004 and v13-0005 patches), I feel we can have a new SQL API
> pg_alter_replication_slot(int inactive_timeout) for now just altering
> the inactive_timeout of a given slot.

Agree, that seems more "natural" that going through a replication connection.

> With this approach, one can do either of the following:
> 1) Create a slot with SQL API with inactive_timeout set, and use it
> for subscriptions or for streaming standbys.

Yes.

> 2) Create a slot with SQL API without inactive_timeout set, use it for
> subscriptions or for streaming standbys, and set inactive_timeout
> later via pg_alter_replication_slot() depending on how the slot is
> consumed

Yes.

> 3) Create a subscription with create_slot=true, and set
> inactive_timeout via pg_alter_replication_slot() depending on how the
> slot is consumed.

Yes.

We could also do the above 3 and altering the timeout with a replication
connection but the SQL API seems more natural to me.

> 
> This approach seems consistent and minimal to start with.
> 
> If we agree on this, I'll drop both 0004 and 0005 that are allowing
> inactive_timeout to be set via replication commands and via
> create/alter subscription respectively, and implement
> pg_alter_replication_slot().

+1 on this.

> FWIW, adding the new SQL API pg_alter_replication_slot() isn't that hard.

Also I think we should ensure that one could "only" alter the timeout property
for the time being (if not that could lead to the subscription inconsistency 
mentioned above).

Regards,

-- 
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com



On Thu, Mar 21, 2024 at 11:23 AM Bertrand Drouvot
<bertranddrouvot.pg@gmail.com> wrote:
>
> On Thu, Mar 21, 2024 at 08:47:18AM +0530, Amit Kapila wrote:
> > On Wed, Mar 20, 2024 at 1:51 PM Bertrand Drouvot
> > <bertranddrouvot.pg@gmail.com> wrote:
> > >
> > > On Wed, Mar 20, 2024 at 12:48:55AM +0530, Bharath Rupireddy wrote:
> > > >
> > > > 2. last_inactive_at and inactive_timeout are now tracked in on-disk
> > > > replication slot data structure.
> > >
> > > Should last_inactive_at be tracked on disk? Say the engine is down for a period
> > > of time > inactive_timeout then the slot will be invalidated after the engine
> > > re-start (if no activity before we invalidate the slot). Should the time the
> > > engine is down be counted as "inactive" time? I've the feeling it should not, and
> > > that we should only take into account inactive time while the engine is up.
> > >
> >
> > Good point. The question is how do we achieve this without persisting
> > the 'last_inactive_at'? Say, 'last_inactive_at' for a particular slot
> > had some valid value before we shut down but it still didn't cross the
> > configured 'inactive_timeout' value, so, we won't be able to
> > invalidate it. Now, after the restart, as we don't know the
> > last_inactive_at's value before the shutdown, we will initialize it
> > with 0 (this is what Bharath seems to have done in the latest
> > v13-0002* patch). After this, even if walsender or backend never
> > acquires the slot, we won't invalidate it. OTOH, if we track
> > 'last_inactive_at' on the disk, after, restart, we could initialize it
> > to the current time if the value is non-zero. Do you have any better
> > ideas?
> >
>
> I think that setting last_inactive_at when we restart makes sense if the slot
> has been active previously. I think the idea is because it's holding xmin/catalog_xmin
> and that we don't want to prevent rows removal longer that the timeout.
>
> So what about relying on xmin/catalog_xmin instead that way?
>

That doesn't sound like a great idea because xmin/catalog_xmin values
won't tell us before restart whether it was active or not. It could
have been inactive for long time before restart but the xmin values
could still be valid. What about we always set 'last_inactive_at' at
restart (if the slot's inactive_timeout has non-zero value) and reset
it as soon as someone acquires that slot? Now, if the slot doesn't get
acquired till 'inactive_timeout', checkpointer will invalidate the
slot.

--
With Regards,
Amit Kapila.



On Thu, Mar 21, 2024 at 11:37 AM Bertrand Drouvot
<bertranddrouvot.pg@gmail.com> wrote:
>
> On Thu, Mar 21, 2024 at 10:53:54AM +0530, Bharath Rupireddy wrote:
> > On Thu, Mar 21, 2024 at 9:07 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
> > > > > But the issue is that it would make it inconsistent with the new inactivetimeout
> > > > > in the subscription that is added in "v12-0005".
> > > >
> > > > Can you please elaborate what the inconsistency it causes with inactivetimeout?
> > > >
> > > I think the inconsistency can arise from the fact that on publisher
> > > one can change the inactive_timeout for the slot corresponding to a
> > > subscription but the subscriber won't know, so it will still show the
> > > old value.
>
> Yeah, that was what I had in mind.
>
> > > If we want we can document this as a limitation and let
> > > users be aware of it. However, I feel at this stage, let's not even
> > > expose this from the subscription or maybe we can discuss it once/if
> > > we are done with other patches.
>
> I agree, it's important to expose it for things like "failover" but I think we
> can get rid of it for the timeout one.
>
> >>  Anyway, if one wants to use this
> > > feature with a subscription, she can create a slot first on the
> > > publisher with inactive_timeout value and then associate such a slot
> > > with a required subscription.
>
> Right.
>
> >
> > If we are not exposing it via subscription (meaning, we don't consider
> > v13-0004 and v13-0005 patches), I feel we can have a new SQL API
> > pg_alter_replication_slot(int inactive_timeout) for now just altering
> > the inactive_timeout of a given slot.
>
> Agree, that seems more "natural" that going through a replication connection.
>
> > With this approach, one can do either of the following:
> > 1) Create a slot with SQL API with inactive_timeout set, and use it
> > for subscriptions or for streaming standbys.
>
> Yes.
>
> > 2) Create a slot with SQL API without inactive_timeout set, use it for
> > subscriptions or for streaming standbys, and set inactive_timeout
> > later via pg_alter_replication_slot() depending on how the slot is
> > consumed
>
> Yes.
>
> > 3) Create a subscription with create_slot=true, and set
> > inactive_timeout via pg_alter_replication_slot() depending on how the
> > slot is consumed.
>
> Yes.
>
> We could also do the above 3 and altering the timeout with a replication
> connection but the SQL API seems more natural to me.
>

If we want to go with this then I think we should at least ensure that
if one specified timeout via CREATE_REPLICATION_SLOT or
ALTER_REPLICATION_SLOT that should be honored.

--
With Regards,
Amit Kapila.



Hi,

On Thu, Mar 21, 2024 at 11:43:54AM +0530, Amit Kapila wrote:
> On Thu, Mar 21, 2024 at 11:23 AM Bertrand Drouvot
> <bertranddrouvot.pg@gmail.com> wrote:
> >
> > On Thu, Mar 21, 2024 at 08:47:18AM +0530, Amit Kapila wrote:
> > > On Wed, Mar 20, 2024 at 1:51 PM Bertrand Drouvot
> > > <bertranddrouvot.pg@gmail.com> wrote:
> > > >
> > > > On Wed, Mar 20, 2024 at 12:48:55AM +0530, Bharath Rupireddy wrote:
> > > > >
> > > > > 2. last_inactive_at and inactive_timeout are now tracked in on-disk
> > > > > replication slot data structure.
> > > >
> > > > Should last_inactive_at be tracked on disk? Say the engine is down for a period
> > > > of time > inactive_timeout then the slot will be invalidated after the engine
> > > > re-start (if no activity before we invalidate the slot). Should the time the
> > > > engine is down be counted as "inactive" time? I've the feeling it should not, and
> > > > that we should only take into account inactive time while the engine is up.
> > > >
> > >
> > > Good point. The question is how do we achieve this without persisting
> > > the 'last_inactive_at'? Say, 'last_inactive_at' for a particular slot
> > > had some valid value before we shut down but it still didn't cross the
> > > configured 'inactive_timeout' value, so, we won't be able to
> > > invalidate it. Now, after the restart, as we don't know the
> > > last_inactive_at's value before the shutdown, we will initialize it
> > > with 0 (this is what Bharath seems to have done in the latest
> > > v13-0002* patch). After this, even if walsender or backend never
> > > acquires the slot, we won't invalidate it. OTOH, if we track
> > > 'last_inactive_at' on the disk, after, restart, we could initialize it
> > > to the current time if the value is non-zero. Do you have any better
> > > ideas?
> > >
> >
> > I think that setting last_inactive_at when we restart makes sense if the slot
> > has been active previously. I think the idea is because it's holding xmin/catalog_xmin
> > and that we don't want to prevent rows removal longer that the timeout.
> >
> > So what about relying on xmin/catalog_xmin instead that way?
> >
> 
> That doesn't sound like a great idea because xmin/catalog_xmin values
> won't tell us before restart whether it was active or not. It could
> have been inactive for long time before restart but the xmin values
> could still be valid.

Right, the idea here was more like "don't hold xmin/catalog_xmin" for longer
than timeout.

My concern was that we set catalog_xmin at logical slot creation time. So if we
set last_inactive_at to zero at creation time and the slot is not used for a long
period of time > timeout, then I think it's not helping there.

> What about we always set 'last_inactive_at' at
> restart (if the slot's inactive_timeout has non-zero value) and reset
> it as soon as someone acquires that slot? Now, if the slot doesn't get
> acquired till 'inactive_timeout', checkpointer will invalidate the
> slot.

Yeah that sounds good to me, but I think we should set last_inactive_at at creation
time too, if not:

- physical slot could remain valid for long time after creation (which is fine)
but the behavior would change at restart.
- logical slot would have the "issue" reported above (holding catalog_xmin). 

Regards,

-- 
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com



Hi,

On Thu, Mar 21, 2024 at 11:53:32AM +0530, Amit Kapila wrote:
> On Thu, Mar 21, 2024 at 11:37 AM Bertrand Drouvot
> <bertranddrouvot.pg@gmail.com> wrote:
> >
> > We could also do the above 3 and altering the timeout with a replication
> > connection but the SQL API seems more natural to me.
> >
> 
> If we want to go with this then I think we should at least ensure that
> if one specified timeout via CREATE_REPLICATION_SLOT or
> ALTER_REPLICATION_SLOT that should be honored.

Yeah, agree.

Regards,

-- 
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com



Hi,

On Thu, Mar 21, 2024 at 05:05:46AM +0530, Bharath Rupireddy wrote:
> On Wed, Mar 20, 2024 at 1:04 PM Bertrand Drouvot
> <bertranddrouvot.pg@gmail.com> wrote:
> >
> > On Wed, Mar 20, 2024 at 08:58:05AM +0530, Amit Kapila wrote:
> > > On Wed, Mar 20, 2024 at 12:49 AM Bharath Rupireddy
> > > <bharath.rupireddyforpostgres@gmail.com> wrote:
> > > >
> > > > Following are some open points:
> > > >
> > > > 1. Where to do inactive_timeout invalidation exactly if not the checkpointer.
> > > >
> > > I have suggested to do it at the time of CheckpointReplicationSlots()
> > > and Bertrand suggested to do it whenever we resume using the slot. I
> > > think we should follow both the suggestions.
> >
> > Agree. I also think that pg_get_replication_slots() would be a good place, so
> > that queries would return the right invalidation status.
> 
> I've addressed review comments and attaching the v13 patches with the
> following changes:

Thanks!

v13-0001 looks good to me. The only Nit (that I've mentioned up-thread) is that
in the pg_replication_slots view, the invalidation_reason is "far away" from the
conflicting field. I understand that one could query the fields individually but
when describing the view or reading the doc, it seems more appropriate to see
them closer. Also as "failover" and "synced" are also new in version 17, there
is no risk to break order by "17,18" kind of queries (which are the failover
and sync positions).

Regards,

-- 
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com



On Thu, Mar 21, 2024 at 12:40 PM Bertrand Drouvot
<bertranddrouvot.pg@gmail.com> wrote:
>
> v13-0001 looks good to me. The only Nit (that I've mentioned up-thread) is that
> in the pg_replication_slots view, the invalidation_reason is "far away" from the
> conflicting field. I understand that one could query the fields individually but
> when describing the view or reading the doc, it seems more appropriate to see
> them closer. Also as "failover" and "synced" are also new in version 17, there
> is no risk to break order by "17,18" kind of queries (which are the failover
> and sync positions).

Hm, yeah, I can change that in the next version of the patches. Thanks.

--
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com



On Thu, Mar 21, 2024 at 12:15 PM Bertrand Drouvot
<bertranddrouvot.pg@gmail.com> wrote:
>
> On Thu, Mar 21, 2024 at 11:43:54AM +0530, Amit Kapila wrote:
> > On Thu, Mar 21, 2024 at 11:23 AM Bertrand Drouvot
> > <bertranddrouvot.pg@gmail.com> wrote:
> > >
> > > On Thu, Mar 21, 2024 at 08:47:18AM +0530, Amit Kapila wrote:
> > > > On Wed, Mar 20, 2024 at 1:51 PM Bertrand Drouvot
> > > > <bertranddrouvot.pg@gmail.com> wrote:
> > > > >
> > > > > On Wed, Mar 20, 2024 at 12:48:55AM +0530, Bharath Rupireddy wrote:
> > > > > >
> > > > > > 2. last_inactive_at and inactive_timeout are now tracked in on-disk
> > > > > > replication slot data structure.
> > > > >
> > > > > Should last_inactive_at be tracked on disk? Say the engine is down for a period
> > > > > of time > inactive_timeout then the slot will be invalidated after the engine
> > > > > re-start (if no activity before we invalidate the slot). Should the time the
> > > > > engine is down be counted as "inactive" time? I've the feeling it should not, and
> > > > > that we should only take into account inactive time while the engine is up.
> > > > >
> > > >
> > > > Good point. The question is how do we achieve this without persisting
> > > > the 'last_inactive_at'? Say, 'last_inactive_at' for a particular slot
> > > > had some valid value before we shut down but it still didn't cross the
> > > > configured 'inactive_timeout' value, so, we won't be able to
> > > > invalidate it. Now, after the restart, as we don't know the
> > > > last_inactive_at's value before the shutdown, we will initialize it
> > > > with 0 (this is what Bharath seems to have done in the latest
> > > > v13-0002* patch). After this, even if walsender or backend never
> > > > acquires the slot, we won't invalidate it. OTOH, if we track
> > > > 'last_inactive_at' on the disk, after, restart, we could initialize it
> > > > to the current time if the value is non-zero. Do you have any better
> > > > ideas?
> > > >
> > >
> > > I think that setting last_inactive_at when we restart makes sense if the slot
> > > has been active previously. I think the idea is because it's holding xmin/catalog_xmin
> > > and that we don't want to prevent rows removal longer that the timeout.
> > >
> > > So what about relying on xmin/catalog_xmin instead that way?
> > >
> >
> > That doesn't sound like a great idea because xmin/catalog_xmin values
> > won't tell us before restart whether it was active or not. It could
> > have been inactive for long time before restart but the xmin values
> > could still be valid.
>
> Right, the idea here was more like "don't hold xmin/catalog_xmin" for longer
> than timeout.
>
> My concern was that we set catalog_xmin at logical slot creation time. So if we
> set last_inactive_at to zero at creation time and the slot is not used for a long
> period of time > timeout, then I think it's not helping there.
>

But, we do call ReplicationSlotRelease() after slot creation. For
example, see CreateReplicationSlot(). So wouldn't that take care of
the case you are worried about?

--
With Regards,
Amit Kapila.



On Thu, Mar 21, 2024 at 3:20 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> > My concern was that we set catalog_xmin at logical slot creation time. So if we
> > set last_inactive_at to zero at creation time and the slot is not used for a long
> > period of time > timeout, then I think it's not helping there.
>
> But, we do call ReplicationSlotRelease() after slot creation. For
> example, see CreateReplicationSlot(). So wouldn't that take care of
> the case you are worried about?

Right. That's true even for pg_create_physical_replication_slot and
pg_create_logical_replication_slot. AFAICS, setting it to the current
timestamp in ReplicationSlotRelease suffices unless I'm missing
something.

--
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com



On Thu, Mar 21, 2024 at 2:44 PM Bharath Rupireddy
<bharath.rupireddyforpostgres@gmail.com> wrote:
>
> On Thu, Mar 21, 2024 at 12:40 PM Bertrand Drouvot
> <bertranddrouvot.pg@gmail.com> wrote:
> >
> > v13-0001 looks good to me. The only Nit (that I've mentioned up-thread) is that
> > in the pg_replication_slots view, the invalidation_reason is "far away" from the
> > conflicting field. I understand that one could query the fields individually but
> > when describing the view or reading the doc, it seems more appropriate to see
> > them closer. Also as "failover" and "synced" are also new in version 17, there
> > is no risk to break order by "17,18" kind of queries (which are the failover
> > and sync positions).
>
> Hm, yeah, I can change that in the next version of the patches. Thanks.
>

This makes sense to me. Apart from this, few more comments on 0001.
1.
--- a/src/bin/pg_upgrade/info.c
+++ b/src/bin/pg_upgrade/info.c
@@ -676,13 +676,13 @@ get_old_cluster_logical_slot_infos(DbInfo
*dbinfo, bool live_check)
  * removed.
  */
  res = executeQueryOrDie(conn, "SELECT slot_name, plugin, two_phase,
failover, "
- "%s as caught_up, conflict_reason IS NOT NULL as invalid "
+ "%s as caught_up, invalidation_reason IS NOT NULL as invalid "
  "FROM pg_catalog.pg_replication_slots "
  "WHERE slot_type = 'logical' AND "
  "database = current_database() AND "
  "temporary IS FALSE;",
  live_check ? "FALSE" :
- "(CASE WHEN conflict_reason IS NOT NULL THEN FALSE "
+ "(CASE WHEN conflicting THEN FALSE "

I think here at both places we need to change 'conflict_reason' to
'conflicting'.

2.
+     <row>
+      <entry role="catalog_table_entry"><para role="column_definition">
+       <structfield>invalidation_reason</structfield> <type>text</type>
+      </para>
+      <para>
+       The reason for the slot's invalidation. It is set for both logical and
+       physical slots. <literal>NULL</literal> if the slot is not invalidated.
+       Possible values are:
+       <itemizedlist spacing="compact">
+        <listitem>
+         <para>
+          <literal>wal_removed</literal> means that the required WAL has been
+          removed.
+         </para>
+        </listitem>
+        <listitem>
+         <para>
+          <literal>rows_removed</literal> means that the required rows have
+          been removed.
+         </para>
+        </listitem>
+        <listitem>
+         <para>
+          <literal>wal_level_insufficient</literal> means that the
+          primary doesn't have a <xref linkend="guc-wal-level"/> sufficient to
+          perform logical decoding.
+         </para>

Can the reasons 'rows_removed' and 'wal_level_insufficient' appear for
physical slots? If not, then it is not clear from above text.

3.
-# Verify slots are reported as non conflicting in pg_replication_slots
+# Verify slots are reported as valid in pg_replication_slots
 is( $node_standby->safe_psql(
  'postgres',
  q[select bool_or(conflicting) from
-   (select conflict_reason is not NULL as conflicting
-    from pg_replication_slots WHERE slot_type = 'logical')]),
+   (select conflicting from pg_replication_slots
+ where slot_type = 'logical')]),
  'f',
- 'Logical slots are reported as non conflicting');
+ 'Logical slots are reported as valid');

I don't think we need to change the comment or success message in this test.

--
With Regards,
Amit Kapila.



Hi,

On Thu, Mar 21, 2024 at 04:13:31PM +0530, Bharath Rupireddy wrote:
> On Thu, Mar 21, 2024 at 3:20 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> >
> > > My concern was that we set catalog_xmin at logical slot creation time. So if we
> > > set last_inactive_at to zero at creation time and the slot is not used for a long
> > > period of time > timeout, then I think it's not helping there.
> >
> > But, we do call ReplicationSlotRelease() after slot creation. For
> > example, see CreateReplicationSlot(). So wouldn't that take care of
> > the case you are worried about?
> 
> Right. That's true even for pg_create_physical_replication_slot and
> pg_create_logical_replication_slot. AFAICS, setting it to the current
> timestamp in ReplicationSlotRelease suffices unless I'm missing
> something.

Right, but we have:

"
    if (set_last_inactive_at &&
        slot->data.persistency == RS_PERSISTENT)
    {
        /*
         * There's no point in allowing failover slots to get invalidated
         * based on slot's inactive_timeout parameter on standby. The failover
         * slots simply get synced from the primary on the standby.
         */
        if (!(RecoveryInProgress() && slot->data.failover))
        {
            SpinLockAcquire(&slot->mutex);
            slot->last_inactive_at = GetCurrentTimestamp();
            SpinLockRelease(&slot->mutex);
        }
    }
"

while we set set_last_inactive_at to false at creation time so that last_inactive_at
is not set to GetCurrentTimestamp(). We should set set_last_inactive_at to true
if a timeout is provided during the slot creation.

Regards,

-- 
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com



On Thu, Mar 21, 2024 at 4:25 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> This makes sense to me. Apart from this, few more comments on 0001.

Thanks for looking into it.

> 1.
> - "%s as caught_up, conflict_reason IS NOT NULL as invalid "
> + "%s as caught_up, invalidation_reason IS NOT NULL as invalid "
>   live_check ? "FALSE" :
> - "(CASE WHEN conflict_reason IS NOT NULL THEN FALSE "
> + "(CASE WHEN conflicting THEN FALSE "
>
> I think here at both places we need to change 'conflict_reason' to
> 'conflicting'.

Basically, the idea there is to not live_check for invalidated logical
slots. It has nothing to do with conflicting. Up until now,
conflict_reason is also reporting wal_removed (although wrongly
including rows_removed, wal_level_insufficient, the two reasons for
conflicts). So, I think invalidation_reason is right for invalid
column. Also, I think we need to change conflicting to
invalidation_reason for live_check. So, I've changed that to use
invalidation_reason for both columns.

> 2.
>
> Can the reasons 'rows_removed' and 'wal_level_insufficient' appear for
> physical slots?

No. They can only occur for logical slots, check
InvalidatePossiblyObsoleteSlot, only the logical slots get
invalidated.

> If not, then it is not clear from above text.

I've stated that "It is set only for logical slots." for rows_removed
and wal_level_insufficient. Other reasons can occur for both slots.

> 3.
> -# Verify slots are reported as non conflicting in pg_replication_slots
> +# Verify slots are reported as valid in pg_replication_slots
>  is( $node_standby->safe_psql(
>   'postgres',
>   q[select bool_or(conflicting) from
> -   (select conflict_reason is not NULL as conflicting
> -    from pg_replication_slots WHERE slot_type = 'logical')]),
> +   (select conflicting from pg_replication_slots
> + where slot_type = 'logical')]),
>   'f',
> - 'Logical slots are reported as non conflicting');
> + 'Logical slots are reported as valid');
>
> I don't think we need to change the comment or success message in this test.

Yes. There the intention of the test case is to verify logical slots
are reported as non conflicting. So, I changed them.

Please find the v14-0001 patch for now. I'll post the other patches soon.

--
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Attachment
On Thu, Mar 21, 2024 at 11:21 PM Bharath Rupireddy
<bharath.rupireddyforpostgres@gmail.com> wrote:
>
>
> Please find the v14-0001 patch for now. I'll post the other patches soon.
>

LGTM. Let's wait for Bertrand to see if he has more comments on 0001
and then I'll push it.

--
With Regards,
Amit Kapila.



Hi,

On Fri, Mar 22, 2024 at 10:49:17AM +0530, Amit Kapila wrote:
> On Thu, Mar 21, 2024 at 11:21 PM Bharath Rupireddy
> <bharath.rupireddyforpostgres@gmail.com> wrote:
> >
> >
> > Please find the v14-0001 patch for now.

Thanks!

> LGTM. Let's wait for Bertrand to see if he has more comments on 0001
> and then I'll push it.

LGTM too.

Regards,

-- 
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com



On Fri, Mar 22, 2024 at 12:39 PM Bertrand Drouvot
<bertranddrouvot.pg@gmail.com> wrote:
>
> > > Please find the v14-0001 patch for now.
>
> Thanks!
>
> > LGTM. Let's wait for Bertrand to see if he has more comments on 0001
> > and then I'll push it.
>
> LGTM too.

Thanks. Here I'm implementing the following:

0001 Track invalidation_reason in pg_replication_slots
0002 Track last_inactive_at in pg_replication_slots
0003 Allow setting inactive_timeout for replication slots via SQL API
0004 Introduce new SQL funtion pg_alter_replication_slot
0005 Allow setting inactive_timeout in the replication command
0006 Add inactive_timeout based replication slot invalidation

1. Keep it last_inactive_at as a shared memory variable, but always
set it at restart if the slot's inactive_timeout has non-zero value
and reset it as soon as someone acquires that slot so that if the slot
doesn't get acquired  till inactive_timeout, checkpointer will
invalidate the slot.
2. Ensure with pg_alter_replication_slot one could "only" alter the
timeout property for the time being, if not that could lead to the
subscription inconsistency.
3. Have some notes in the CREATE and ALTER SUBSCRIPTION docs about
using an existing slot to leverage inactive_timeout feature.
4. last_inactive_at should also be set to the current time during slot
creation because if one creates a slot and does nothing with it then
it's the time it starts to be inactive.
5. We don't set last_inactive_at to GetCurrentTimestamp() for failover slots.
6. Leave the patch that added support for inactive_timeout in subscriptions.

Please see the attached v14 patch set. No change in the attached
v14-0001 from the previous patch.

--
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Attachment
Hi,

On Fri, Mar 22, 2024 at 01:45:01PM +0530, Bharath Rupireddy wrote:
> On Fri, Mar 22, 2024 at 12:39 PM Bertrand Drouvot
> <bertranddrouvot.pg@gmail.com> wrote:
> >
> > > > Please find the v14-0001 patch for now.
> >
> > Thanks!
> >
> > > LGTM. Let's wait for Bertrand to see if he has more comments on 0001
> > > and then I'll push it.
> >
> > LGTM too.
> 
> Thanks. Here I'm implementing the following:

Thanks!

> 0001 Track invalidation_reason in pg_replication_slots
> 0002 Track last_inactive_at in pg_replication_slots
> 0003 Allow setting inactive_timeout for replication slots via SQL API
> 0004 Introduce new SQL funtion pg_alter_replication_slot
> 0005 Allow setting inactive_timeout in the replication command
> 0006 Add inactive_timeout based replication slot invalidation
> 
> 1. Keep it last_inactive_at as a shared memory variable, but always
> set it at restart if the slot's inactive_timeout has non-zero value
> and reset it as soon as someone acquires that slot so that if the slot
> doesn't get acquired  till inactive_timeout, checkpointer will
> invalidate the slot.
> 4. last_inactive_at should also be set to the current time during slot
> creation because if one creates a slot and does nothing with it then
> it's the time it starts to be inactive.

I did not look at the code yet but just tested the behavior. It works as you
describe it but I think this behavior is weird because:

- when we create a slot without a timeout then last_inactive_at is set. I think
that's fine, but then:
- when we restart the engine, then last_inactive_at is gone (as timeout is not
set).

I think last_inactive_at should be set also at engine restart even if there is
no timeout. I don't think we should link both. Changing my mind here on this
subject due to the testing.

Regards,

-- 
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com



On Fri, Mar 22, 2024 at 2:27 PM Bertrand Drouvot
<bertranddrouvot.pg@gmail.com> wrote:
>
> On Fri, Mar 22, 2024 at 01:45:01PM +0530, Bharath Rupireddy wrote:
>
> > 0001 Track invalidation_reason in pg_replication_slots
> > 0002 Track last_inactive_at in pg_replication_slots
> > 0003 Allow setting inactive_timeout for replication slots via SQL API
> > 0004 Introduce new SQL funtion pg_alter_replication_slot
> > 0005 Allow setting inactive_timeout in the replication command
> > 0006 Add inactive_timeout based replication slot invalidation
> >
> > 1. Keep it last_inactive_at as a shared memory variable, but always
> > set it at restart if the slot's inactive_timeout has non-zero value
> > and reset it as soon as someone acquires that slot so that if the slot
> > doesn't get acquired  till inactive_timeout, checkpointer will
> > invalidate the slot.
> > 4. last_inactive_at should also be set to the current time during slot
> > creation because if one creates a slot and does nothing with it then
> > it's the time it starts to be inactive.
>
> I did not look at the code yet but just tested the behavior. It works as you
> describe it but I think this behavior is weird because:
>
> - when we create a slot without a timeout then last_inactive_at is set. I think
> that's fine, but then:
> - when we restart the engine, then last_inactive_at is gone (as timeout is not
> set).
>
> I think last_inactive_at should be set also at engine restart even if there is
> no timeout.

I think it is the opposite. Why do we need to set  'last_inactive_at'
when inactive_timeout is not set? BTW, haven't we discussed that we
don't need to set 'last_inactive_at' at the time of slot creation as
it is sufficient to set it at the time ReplicationSlotRelease()?

A few other comments:
==================
1.
@@ -1027,7 +1027,8 @@ CREATE VIEW pg_replication_slots AS
             L.invalidation_reason,
             L.failover,
             L.synced,
-            L.last_inactive_at
+            L.last_inactive_at,
+            L.inactive_timeout

I think it would be better to keep 'inactive_timeout' ahead of
'last_inactive_at' as that is the primary field. In major versions, we
don't have to strictly keep the new fields at the end. In this case,
it seems better to keep these two new fields after two_phase so that
these are before invalidation_reason where we can show the
invalidation due to these fields.

2.
 void
-ReplicationSlotRelease(void)
+ReplicationSlotRelease(bool set_last_inactive_at)

Why do we need a parameter here? Can't we directly check from the slot
whether 'inactive_timeout' has a non-zero value?

3.
+ /*
+ * There's no point in allowing failover slots to get invalidated
+ * based on slot's inactive_timeout parameter on standby. The failover
+ * slots simply get synced from the primary on the standby.
+ */
+ if (!(RecoveryInProgress() && slot->data.failover))

I think you need to check 'sync' flag instead of 'failover'.
Generally, failover marker slots should be invalidated either on
primary or standby unless on standby the 'failover' marked slot is
synced from the primary.

4. I feel the patches should be arranged like 0003->0001, 0002->0002,
0006->0003. We can leave remaining for the time being till we get
these three patches (all three need to be committed as one but it is
okay to keep them separate for review) committed.

--
With Regards,
Amit Kapila.



Hi,

On Fri, Mar 22, 2024 at 01:45:01PM +0530, Bharath Rupireddy wrote:
> On Fri, Mar 22, 2024 at 12:39 PM Bertrand Drouvot
> <bertranddrouvot.pg@gmail.com> wrote:
> >
> > > > Please find the v14-0001 patch for now.
> >
> > Thanks!
> >
> > > LGTM. Let's wait for Bertrand to see if he has more comments on 0001
> > > and then I'll push it.
> >
> > LGTM too.
> 
> 
> Please see the attached v14 patch set. No change in the attached
> v14-0001 from the previous patch.

Looking at v14-0002:

1 ===

@@ -691,6 +699,13 @@ ReplicationSlotRelease(void)
                ConditionVariableBroadcast(&slot->active_cv);
        }

+       if (slot->data.persistency == RS_PERSISTENT)
+       {
+               SpinLockAcquire(&slot->mutex);
+               slot->last_inactive_at = GetCurrentTimestamp();
+               SpinLockRelease(&slot->mutex);
+       }

I'm not sure we should do system calls while we're holding a spinlock.
Assign a variable before?

2 ===

Also, what about moving this here?

"
    if (slot->data.persistency == RS_PERSISTENT)
    {
        /*
         * Mark persistent slot inactive.  We're not freeing it, just
         * disconnecting, but wake up others that may be waiting for it.
         */
        SpinLockAcquire(&slot->mutex);
        slot->active_pid = 0;
        SpinLockRelease(&slot->mutex);
        ConditionVariableBroadcast(&slot->active_cv);
    }
"

That would avoid testing twice "slot->data.persistency == RS_PERSISTENT".

3 ===

@@ -2341,6 +2356,7 @@ RestoreSlotFromDisk(const char *name)

                slot->in_use = true;
                slot->active_pid = 0;
+               slot->last_inactive_at = 0;

I think we should put GetCurrentTimestamp() here. It's done in v14-0006 but I
think it's better to do it in 0002 (and not taking care of inactive_timeout).

4 ===

    Track last_inactive_at in pg_replication_slots

 doc/src/sgml/system-views.sgml       | 11 +++++++++++
 src/backend/catalog/system_views.sql |  3 ++-
 src/backend/replication/slot.c       | 16 ++++++++++++++++
 src/backend/replication/slotfuncs.c  |  7 ++++++-
 src/include/catalog/pg_proc.dat      |  6 +++---
 src/include/replication/slot.h       |  3 +++
 src/test/regress/expected/rules.out  |  5 +++--
 7 files changed, 44 insertions(+), 7 deletions(-)

Worth to add some tests too (or we postpone them in future commits because we're
confident enough they will follow soon)?

5 ===

Most of the fields that reflect a time (not duration) in the system views are
xxxx_time, so I'm wondering if instead of "last_inactive_at" we should use
something like "last_inactive_time"?

Regards,

-- 
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com



Hi,

On Fri, Mar 22, 2024 at 02:59:21PM +0530, Amit Kapila wrote:
> On Fri, Mar 22, 2024 at 2:27 PM Bertrand Drouvot
> <bertranddrouvot.pg@gmail.com> wrote:
> >
> > On Fri, Mar 22, 2024 at 01:45:01PM +0530, Bharath Rupireddy wrote:
> >
> > > 0001 Track invalidation_reason in pg_replication_slots
> > > 0002 Track last_inactive_at in pg_replication_slots
> > > 0003 Allow setting inactive_timeout for replication slots via SQL API
> > > 0004 Introduce new SQL funtion pg_alter_replication_slot
> > > 0005 Allow setting inactive_timeout in the replication command
> > > 0006 Add inactive_timeout based replication slot invalidation
> > >
> > > 1. Keep it last_inactive_at as a shared memory variable, but always
> > > set it at restart if the slot's inactive_timeout has non-zero value
> > > and reset it as soon as someone acquires that slot so that if the slot
> > > doesn't get acquired  till inactive_timeout, checkpointer will
> > > invalidate the slot.
> > > 4. last_inactive_at should also be set to the current time during slot
> > > creation because if one creates a slot and does nothing with it then
> > > it's the time it starts to be inactive.
> >
> > I did not look at the code yet but just tested the behavior. It works as you
> > describe it but I think this behavior is weird because:
> >
> > - when we create a slot without a timeout then last_inactive_at is set. I think
> > that's fine, but then:
> > - when we restart the engine, then last_inactive_at is gone (as timeout is not
> > set).
> >
> > I think last_inactive_at should be set also at engine restart even if there is
> > no timeout.
> 
> I think it is the opposite. Why do we need to set  'last_inactive_at'
> when inactive_timeout is not set?

I think those are unrelated, one could want to know when a slot has been inactive
even if no timeout is set. I understand that for this patch series we have in mind 
to use them both to invalidate slots but I think that there is use case to not
use both in correlation. Also not setting last_inactive_at could give the "false"
impression that the slot is active.

> BTW, haven't we discussed that we
> don't need to set 'last_inactive_at' at the time of slot creation as
> it is sufficient to set it at the time ReplicationSlotRelease()?

Right.

Regards,

-- 
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com



On Fri, Mar 22, 2024 at 3:15 PM Bertrand Drouvot
<bertranddrouvot.pg@gmail.com> wrote:
>
> On Fri, Mar 22, 2024 at 01:45:01PM +0530, Bharath Rupireddy wrote:
>
> 1 ===
>
> @@ -691,6 +699,13 @@ ReplicationSlotRelease(void)
>                 ConditionVariableBroadcast(&slot->active_cv);
>         }
>
> +       if (slot->data.persistency == RS_PERSISTENT)
> +       {
> +               SpinLockAcquire(&slot->mutex);
> +               slot->last_inactive_at = GetCurrentTimestamp();
> +               SpinLockRelease(&slot->mutex);
> +       }
>
> I'm not sure we should do system calls while we're holding a spinlock.
> Assign a variable before?
>
> 2 ===
>
> Also, what about moving this here?
>
> "
>     if (slot->data.persistency == RS_PERSISTENT)
>     {
>         /*
>          * Mark persistent slot inactive.  We're not freeing it, just
>          * disconnecting, but wake up others that may be waiting for it.
>          */
>         SpinLockAcquire(&slot->mutex);
>         slot->active_pid = 0;
>         SpinLockRelease(&slot->mutex);
>         ConditionVariableBroadcast(&slot->active_cv);
>     }
> "
>
> That would avoid testing twice "slot->data.persistency == RS_PERSISTENT".
>

That sounds like a good idea. Also, don't we need to consider physical
slots where we don't reserve WAL during slot creation? I don't think
there is a need to set inactive_at for such slots. If we agree,
probably checking restart_lsn should suffice the need to know whether
the WAL is reserved or not.

>
> 5 ===
>
> Most of the fields that reflect a time (not duration) in the system views are
> xxxx_time, so I'm wondering if instead of "last_inactive_at" we should use
> something like "last_inactive_time"?
>

How about naming it as last_active_time? This will indicate the time
at which the slot was last active.

--
With Regards,
Amit Kapila.



On Fri, Mar 22, 2024 at 3:23 PM Bertrand Drouvot
<bertranddrouvot.pg@gmail.com> wrote:
>
> On Fri, Mar 22, 2024 at 02:59:21PM +0530, Amit Kapila wrote:
> > On Fri, Mar 22, 2024 at 2:27 PM Bertrand Drouvot
> > <bertranddrouvot.pg@gmail.com> wrote:
> > >
> > > On Fri, Mar 22, 2024 at 01:45:01PM +0530, Bharath Rupireddy wrote:
> > >
> > > > 0001 Track invalidation_reason in pg_replication_slots
> > > > 0002 Track last_inactive_at in pg_replication_slots
> > > > 0003 Allow setting inactive_timeout for replication slots via SQL API
> > > > 0004 Introduce new SQL funtion pg_alter_replication_slot
> > > > 0005 Allow setting inactive_timeout in the replication command
> > > > 0006 Add inactive_timeout based replication slot invalidation
> > > >
> > > > 1. Keep it last_inactive_at as a shared memory variable, but always
> > > > set it at restart if the slot's inactive_timeout has non-zero value
> > > > and reset it as soon as someone acquires that slot so that if the slot
> > > > doesn't get acquired  till inactive_timeout, checkpointer will
> > > > invalidate the slot.
> > > > 4. last_inactive_at should also be set to the current time during slot
> > > > creation because if one creates a slot and does nothing with it then
> > > > it's the time it starts to be inactive.
> > >
> > > I did not look at the code yet but just tested the behavior. It works as you
> > > describe it but I think this behavior is weird because:
> > >
> > > - when we create a slot without a timeout then last_inactive_at is set. I think
> > > that's fine, but then:
> > > - when we restart the engine, then last_inactive_at is gone (as timeout is not
> > > set).
> > >
> > > I think last_inactive_at should be set also at engine restart even if there is
> > > no timeout.
> >
> > I think it is the opposite. Why do we need to set  'last_inactive_at'
> > when inactive_timeout is not set?
>
> I think those are unrelated, one could want to know when a slot has been inactive
> even if no timeout is set. I understand that for this patch series we have in mind
> to use them both to invalidate slots but I think that there is use case to not
> use both in correlation. Also not setting last_inactive_at could give the "false"
> impression that the slot is active.
>

I see your point and agree with this. I feel we can commit this part
first then, probably that is the reason Bharath has kept it as a
separate patch. It would be good add the use case for this patch in
the commit message.

A minor comment:

  if (SlotIsLogical(s))
  pgstat_acquire_replslot(s);

+ if (s->data.persistency == RS_PERSISTENT)
+ {
+ SpinLockAcquire(&s->mutex);
+ s->last_inactive_at = 0;
+ SpinLockRelease(&s->mutex);
+ }
+

I think this part of the change needs a comment.

--
With Regards,
Amit Kapila.



Hi,

On Fri, Mar 22, 2024 at 03:56:23PM +0530, Amit Kapila wrote:
> On Fri, Mar 22, 2024 at 3:15 PM Bertrand Drouvot
> <bertranddrouvot.pg@gmail.com> wrote:
> >
> > On Fri, Mar 22, 2024 at 01:45:01PM +0530, Bharath Rupireddy wrote:
> >
> > 1 ===
> >
> > @@ -691,6 +699,13 @@ ReplicationSlotRelease(void)
> >                 ConditionVariableBroadcast(&slot->active_cv);
> >         }
> >
> > +       if (slot->data.persistency == RS_PERSISTENT)
> > +       {
> > +               SpinLockAcquire(&slot->mutex);
> > +               slot->last_inactive_at = GetCurrentTimestamp();
> > +               SpinLockRelease(&slot->mutex);
> > +       }
> >
> > I'm not sure we should do system calls while we're holding a spinlock.
> > Assign a variable before?
> >
> > 2 ===
> >
> > Also, what about moving this here?
> >
> > "
> >     if (slot->data.persistency == RS_PERSISTENT)
> >     {
> >         /*
> >          * Mark persistent slot inactive.  We're not freeing it, just
> >          * disconnecting, but wake up others that may be waiting for it.
> >          */
> >         SpinLockAcquire(&slot->mutex);
> >         slot->active_pid = 0;
> >         SpinLockRelease(&slot->mutex);
> >         ConditionVariableBroadcast(&slot->active_cv);
> >     }
> > "
> >
> > That would avoid testing twice "slot->data.persistency == RS_PERSISTENT".
> >
> 
> That sounds like a good idea. Also, don't we need to consider physical
> slots where we don't reserve WAL during slot creation? I don't think
> there is a need to set inactive_at for such slots.

If the slot is not active, why shouldn't we set inactive_at? I can understand
that such a slots do not present "any risks" but I think we should still set
inactive_at (also to not give the false impression that the slot is active).

> > 5 ===
> >
> > Most of the fields that reflect a time (not duration) in the system views are
> > xxxx_time, so I'm wondering if instead of "last_inactive_at" we should use
> > something like "last_inactive_time"?
> >
> 
> How about naming it as last_active_time? This will indicate the time
> at which the slot was last active.

I thought about it too but I think it could be missleading as one could think that 
it should be updated each time WAL record decoding is happening.

Regards,

-- 
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com



Hi,

On Fri, Mar 22, 2024 at 04:16:19PM +0530, Amit Kapila wrote:
> On Fri, Mar 22, 2024 at 3:23 PM Bertrand Drouvot
> <bertranddrouvot.pg@gmail.com> wrote:
> >
> > On Fri, Mar 22, 2024 at 02:59:21PM +0530, Amit Kapila wrote:
> > > On Fri, Mar 22, 2024 at 2:27 PM Bertrand Drouvot
> > > <bertranddrouvot.pg@gmail.com> wrote:
> > > >
> > > > On Fri, Mar 22, 2024 at 01:45:01PM +0530, Bharath Rupireddy wrote:
> > > >
> > > > > 0001 Track invalidation_reason in pg_replication_slots
> > > > > 0002 Track last_inactive_at in pg_replication_slots
> > > > > 0003 Allow setting inactive_timeout for replication slots via SQL API
> > > > > 0004 Introduce new SQL funtion pg_alter_replication_slot
> > > > > 0005 Allow setting inactive_timeout in the replication command
> > > > > 0006 Add inactive_timeout based replication slot invalidation
> > > > >
> > > > > 1. Keep it last_inactive_at as a shared memory variable, but always
> > > > > set it at restart if the slot's inactive_timeout has non-zero value
> > > > > and reset it as soon as someone acquires that slot so that if the slot
> > > > > doesn't get acquired  till inactive_timeout, checkpointer will
> > > > > invalidate the slot.
> > > > > 4. last_inactive_at should also be set to the current time during slot
> > > > > creation because if one creates a slot and does nothing with it then
> > > > > it's the time it starts to be inactive.
> > > >
> > > > I did not look at the code yet but just tested the behavior. It works as you
> > > > describe it but I think this behavior is weird because:
> > > >
> > > > - when we create a slot without a timeout then last_inactive_at is set. I think
> > > > that's fine, but then:
> > > > - when we restart the engine, then last_inactive_at is gone (as timeout is not
> > > > set).
> > > >
> > > > I think last_inactive_at should be set also at engine restart even if there is
> > > > no timeout.
> > >
> > > I think it is the opposite. Why do we need to set  'last_inactive_at'
> > > when inactive_timeout is not set?
> >
> > I think those are unrelated, one could want to know when a slot has been inactive
> > even if no timeout is set. I understand that for this patch series we have in mind
> > to use them both to invalidate slots but I think that there is use case to not
> > use both in correlation. Also not setting last_inactive_at could give the "false"
> > impression that the slot is active.
> >
> 
> I see your point and agree with this. I feel we can commit this part
> first then,

Agree that in this case the current ordering makes sense (as setting
last_inactive_at would be completly unrelated to the timeout).

Regards,

-- 
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com





On Fri, Mar 22, 2024 at 7:15 PM Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com> wrote:
On Fri, Mar 22, 2024 at 12:39 PM Bertrand Drouvot
<bertranddrouvot.pg@gmail.com> wrote:
>
> > > Please find the v14-0001 patch for now.
>
> Thanks!
>
> > LGTM. Let's wait for Bertrand to see if he has more comments on 0001
> > and then I'll push it.
>
> LGTM too.

Thanks. Here I'm implementing the following:

0001 Track invalidation_reason in pg_replication_slots
0002 Track last_inactive_at in pg_replication_slots
0003 Allow setting inactive_timeout for replication slots via SQL API
0004 Introduce new SQL funtion pg_alter_replication_slot
0005 Allow setting inactive_timeout in the replication command
0006 Add inactive_timeout based replication slot invalidation

1. Keep it last_inactive_at as a shared memory variable, but always
set it at restart if the slot's inactive_timeout has non-zero value
and reset it as soon as someone acquires that slot so that if the slot
doesn't get acquired  till inactive_timeout, checkpointer will
invalidate the slot.
2. Ensure with pg_alter_replication_slot one could "only" alter the
timeout property for the time being, if not that could lead to the
subscription inconsistency.
3. Have some notes in the CREATE and ALTER SUBSCRIPTION docs about
using an existing slot to leverage inactive_timeout feature.
4. last_inactive_at should also be set to the current time during slot
creation because if one creates a slot and does nothing with it then
it's the time it starts to be inactive.
5. We don't set last_inactive_at to GetCurrentTimestamp() for failover slots.
6. Leave the patch that added support for inactive_timeout in subscriptions.

Please see the attached v14 patch set. No change in the attached
v14-0001 from the previous patch.



Some comments:
1. In patch 0005:
In ReplicationSlotAlter():
+ lock_acquired = false;
  if (MyReplicationSlot->data.failover != failover)
  {
  SpinLockAcquire(&MyReplicationSlot->mutex);
+ lock_acquired = true;
  MyReplicationSlot->data.failover = failover;
+ }
+
+ if (MyReplicationSlot->data.inactive_timeout != inactive_timeout)
+ {
+ if (!lock_acquired)
+ {
+ SpinLockAcquire(&MyReplicationSlot->mutex);
+ lock_acquired = true;
+ }
+
+ MyReplicationSlot->data.inactive_timeout = inactive_timeout;
+ }
+
+ if (lock_acquired)
+ {
  SpinLockRelease(&MyReplicationSlot->mutex);

Can't you make it shorter like below:
lock_acquired = false;

if (MyReplicationSlot->data.failover != failover || MyReplicationSlot->data.inactive_timeout != inactive_timeout) {
    SpinLockAcquire(&MyReplicationSlot->mutex);
    lock_acquired = true;
}

if (MyReplicationSlot->data.failover != failover) {
    MyReplicationSlot->data.failover = failover;
}

if (MyReplicationSlot->data.inactive_timeout != inactive_timeout) {
    MyReplicationSlot->data.inactive_timeout = inactive_timeout;
}

if (lock_acquired) {
    SpinLockRelease(&MyReplicationSlot->mutex);
    ReplicationSlotMarkDirty();
    ReplicationSlotSave();
}

2. In patch 0005:  why change walrcv_alter_slot option? it doesn't seem to be used anywhere, any use case for it? If required, would the intention be to add this as a Create Subscription option?

regards,
Ajin Cherian
Fujitsu Australia
On Fri, Mar 22, 2024 at 5:30 PM Bertrand Drouvot
<bertranddrouvot.pg@gmail.com> wrote:
>
> On Fri, Mar 22, 2024 at 03:56:23PM +0530, Amit Kapila wrote:
> > On Fri, Mar 22, 2024 at 3:15 PM Bertrand Drouvot
> > <bertranddrouvot.pg@gmail.com> wrote:
> > >
> > > On Fri, Mar 22, 2024 at 01:45:01PM +0530, Bharath Rupireddy wrote:
> > >
> > > 1 ===
> > >
> > > @@ -691,6 +699,13 @@ ReplicationSlotRelease(void)
> > >                 ConditionVariableBroadcast(&slot->active_cv);
> > >         }
> > >
> > > +       if (slot->data.persistency == RS_PERSISTENT)
> > > +       {
> > > +               SpinLockAcquire(&slot->mutex);
> > > +               slot->last_inactive_at = GetCurrentTimestamp();
> > > +               SpinLockRelease(&slot->mutex);
> > > +       }
> > >
> > > I'm not sure we should do system calls while we're holding a spinlock.
> > > Assign a variable before?
> > >
> > > 2 ===
> > >
> > > Also, what about moving this here?
> > >
> > > "
> > >     if (slot->data.persistency == RS_PERSISTENT)
> > >     {
> > >         /*
> > >          * Mark persistent slot inactive.  We're not freeing it, just
> > >          * disconnecting, but wake up others that may be waiting for it.
> > >          */
> > >         SpinLockAcquire(&slot->mutex);
> > >         slot->active_pid = 0;
> > >         SpinLockRelease(&slot->mutex);
> > >         ConditionVariableBroadcast(&slot->active_cv);
> > >     }
> > > "
> > >
> > > That would avoid testing twice "slot->data.persistency == RS_PERSISTENT".
> > >
> >
> > That sounds like a good idea. Also, don't we need to consider physical
> > slots where we don't reserve WAL during slot creation? I don't think
> > there is a need to set inactive_at for such slots.
>
> If the slot is not active, why shouldn't we set inactive_at? I can understand
> that such a slots do not present "any risks" but I think we should still set
> inactive_at (also to not give the false impression that the slot is active).
>

But OTOH, there is a chance that we will invalidate such slots even
though they have never reserved WAL in the first place which doesn't
appear to be a good thing.

> > > 5 ===
> > >
> > > Most of the fields that reflect a time (not duration) in the system views are
> > > xxxx_time, so I'm wondering if instead of "last_inactive_at" we should use
> > > something like "last_inactive_time"?
> > >
> >
> > How about naming it as last_active_time? This will indicate the time
> > at which the slot was last active.
>
> I thought about it too but I think it could be missleading as one could think that
> it should be updated each time WAL record decoding is happening.
>

Fair enough.

--
With Regards,
Amit Kapila.



Hi,

On Fri, Mar 22, 2024 at 06:02:11PM +0530, Amit Kapila wrote:
> On Fri, Mar 22, 2024 at 5:30 PM Bertrand Drouvot
> <bertranddrouvot.pg@gmail.com> wrote:
> > On Fri, Mar 22, 2024 at 03:56:23PM +0530, Amit Kapila wrote:
> > > >
> > > > That would avoid testing twice "slot->data.persistency == RS_PERSISTENT".
> > > >
> > >
> > > That sounds like a good idea. Also, don't we need to consider physical
> > > slots where we don't reserve WAL during slot creation? I don't think
> > > there is a need to set inactive_at for such slots.
> >
> > If the slot is not active, why shouldn't we set inactive_at? I can understand
> > that such a slots do not present "any risks" but I think we should still set
> > inactive_at (also to not give the false impression that the slot is active).
> >
> 
> But OTOH, there is a chance that we will invalidate such slots even
> though they have never reserved WAL in the first place which doesn't
> appear to be a good thing.

That's right but I don't think it is not a good thing. I think we should treat
inactive_at as an independent field (like if the timeout one does not exist at
all) and just focus on its meaning (slot being inactive). If one sets a timeout
(> 0) and gets an invalidation then I think it works as designed (even if the
slot does not present any "risk" as it does not hold any rows or WAL). 

Regards,

-- 
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com



On Fri, Mar 22, 2024 at 3:15 PM Bertrand Drouvot
<bertranddrouvot.pg@gmail.com> wrote:
>
> Looking at v14-0002:

Thanks for reviewing. I agree that 0002 with last_inactive_at can go
independently and be of use on its own in addition to helping
implement inactive_timeout based invalidation.

> 1 ===
>
> @@ -691,6 +699,13 @@ ReplicationSlotRelease(void)
>                 ConditionVariableBroadcast(&slot->active_cv);
>         }
>
> +       if (slot->data.persistency == RS_PERSISTENT)
> +       {
> +               SpinLockAcquire(&slot->mutex);
> +               slot->last_inactive_at = GetCurrentTimestamp();
> +               SpinLockRelease(&slot->mutex);
> +       }
>
> I'm not sure we should do system calls while we're holding a spinlock.
> Assign a variable before?

Can do that. Then, the last_inactive_at = current_timestamp + mutex
acquire time. But, that shouldn't be a problem than doing system calls
while holding the mutex. So, done that way.

> 2 ===
>
> Also, what about moving this here?
>
> "
>     if (slot->data.persistency == RS_PERSISTENT)
>     {
>         /*
>          * Mark persistent slot inactive.  We're not freeing it, just
>          * disconnecting, but wake up others that may be waiting for it.
>          */
>         SpinLockAcquire(&slot->mutex);
>         slot->active_pid = 0;
>         SpinLockRelease(&slot->mutex);
>         ConditionVariableBroadcast(&slot->active_cv);
>     }
> "
>
> That would avoid testing twice "slot->data.persistency == RS_PERSISTENT".

Ugh. Done that now.

> 3 ===
>
> @@ -2341,6 +2356,7 @@ RestoreSlotFromDisk(const char *name)
>
>                 slot->in_use = true;
>                 slot->active_pid = 0;
> +               slot->last_inactive_at = 0;
>
> I think we should put GetCurrentTimestamp() here. It's done in v14-0006 but I
> think it's better to do it in 0002 (and not taking care of inactive_timeout).

Done.

> 4 ===
>
>     Track last_inactive_at in pg_replication_slots
>
>  doc/src/sgml/system-views.sgml       | 11 +++++++++++
>  src/backend/catalog/system_views.sql |  3 ++-
>  src/backend/replication/slot.c       | 16 ++++++++++++++++
>  src/backend/replication/slotfuncs.c  |  7 ++++++-
>  src/include/catalog/pg_proc.dat      |  6 +++---
>  src/include/replication/slot.h       |  3 +++
>  src/test/regress/expected/rules.out  |  5 +++--
>  7 files changed, 44 insertions(+), 7 deletions(-)
>
> Worth to add some tests too (or we postpone them in future commits because we're
> confident enough they will follow soon)?

Yes. Added some tests in a new TAP test file named
src/test/recovery/t/043_replslot_misc.pl. This new file can be used to
add miscellaneous replication tests in future as well. I couldn't find
a better place in existing test files - tried having the new tests for
physical slots in t/001_stream_rep.pl and I didn't find a right place
for logical slots.

> 5 ===
>
> Most of the fields that reflect a time (not duration) in the system views are
> xxxx_time, so I'm wondering if instead of "last_inactive_at" we should use
> something like "last_inactive_time"?

Yeah, I can see that. So, I changed it to last_inactive_time.

I agree with treating last_inactive_time as a separate property of the
slot having its own use in addition to helping implement
inactive_timeout based invalidation. I think it can go separately.

I tried to address the review comments received for this patch alone
and attached v15-0001. I'll post other patches soon.

--
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Attachment
On Fri, Mar 22, 2024 at 7:17 PM Bertrand Drouvot
<bertranddrouvot.pg@gmail.com> wrote:
>
> On Fri, Mar 22, 2024 at 06:02:11PM +0530, Amit Kapila wrote:
> > On Fri, Mar 22, 2024 at 5:30 PM Bertrand Drouvot
> > <bertranddrouvot.pg@gmail.com> wrote:
> > > On Fri, Mar 22, 2024 at 03:56:23PM +0530, Amit Kapila wrote:
> > > > >
> > > > > That would avoid testing twice "slot->data.persistency == RS_PERSISTENT".
> > > > >
> > > >
> > > > That sounds like a good idea. Also, don't we need to consider physical
> > > > slots where we don't reserve WAL during slot creation? I don't think
> > > > there is a need to set inactive_at for such slots.
> > >
> > > If the slot is not active, why shouldn't we set inactive_at? I can understand
> > > that such a slots do not present "any risks" but I think we should still set
> > > inactive_at (also to not give the false impression that the slot is active).
> > >
> >
> > But OTOH, there is a chance that we will invalidate such slots even
> > though they have never reserved WAL in the first place which doesn't
> > appear to be a good thing.
>
> That's right but I don't think it is not a good thing. I think we should treat
> inactive_at as an independent field (like if the timeout one does not exist at
> all) and just focus on its meaning (slot being inactive). If one sets a timeout
> (> 0) and gets an invalidation then I think it works as designed (even if the
> slot does not present any "risk" as it does not hold any rows or WAL).
>

Fair point.

--
With Regards,
Amit Kapila.



On Sat, Mar 23, 2024 at 3:02 AM Bharath Rupireddy
<bharath.rupireddyforpostgres@gmail.com> wrote:
>
> On Fri, Mar 22, 2024 at 3:15 PM Bertrand Drouvot
> <bertranddrouvot.pg@gmail.com> wrote:
> >
> >
> > Worth to add some tests too (or we postpone them in future commits because we're
> > confident enough they will follow soon)?
>
> Yes. Added some tests in a new TAP test file named
> src/test/recovery/t/043_replslot_misc.pl. This new file can be used to
> add miscellaneous replication tests in future as well. I couldn't find
> a better place in existing test files - tried having the new tests for
> physical slots in t/001_stream_rep.pl and I didn't find a right place
> for logical slots.
>

How about adding the test in 019_replslot_limit? It is not a direct
fit but I feel later we can even add 'invalid_timeout' related tests
in this file which will use last_inactive_time feature. It is also
possible that some of the tests added by the 'invalid_timeout' feature
will obviate the need for some of these tests.

Review of v15
==============
1.
@@ -1026,7 +1026,8 @@ CREATE VIEW pg_replication_slots AS
             L.conflicting,
             L.invalidation_reason,
             L.failover,
-            L.synced
+            L.synced,
+            L.last_inactive_time
     FROM pg_get_replication_slots() AS L

As mentioned previously, let's keep these new fields before
conflicting and after two_phase.

2.
+# Get last_inactive_time value after slot's creation. Note that the
slot is still
+# inactive unless it's used by the standby below.
+my $last_inactive_time_1 = $primary->safe_psql('postgres',
+ qq(SELECT last_inactive_time FROM pg_replication_slots WHERE
slot_name = '$sb_slot' AND last_inactive_time IS NOT NULL;)
+);

We should check $last_inactive_time_1 to be a valid value and add a
similar check for logical slots.

3. BTW, why don't we set last_inactive_time for temporary slots
(RS_TEMPORARY) as well? Don't we even invalidate temporary slots? If
so, then I think we should set last_inactive_time for those as well
and later allow them to be invalidated based on timeout parameter.

--
With Regards,
Amit Kapila.



On Sat, Mar 23, 2024 at 11:27 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> How about adding the test in 019_replslot_limit? It is not a direct
> fit but I feel later we can even add 'invalid_timeout' related tests
> in this file which will use last_inactive_time feature.

I'm thinking the other way. Now, the new TAP file 043_replslot_misc.pl
can have last_inactive_time tests, and later invalid_timeout ones too.
This way 019_replslot_limit.pl is not cluttered.

> It is also
> possible that some of the tests added by the 'invalid_timeout' feature
> will obviate the need for some of these tests.

Might be. But, I prefer to keep both these tests separate but in the
same file 043_replslot_misc.pl. Because we cover some corner cases the
last_inactive_time is set upon loading the slot from disk.

> Review of v15
> ==============
> 1.
> @@ -1026,7 +1026,8 @@ CREATE VIEW pg_replication_slots AS
>              L.conflicting,
>              L.invalidation_reason,
>              L.failover,
> -            L.synced
> +            L.synced,
> +            L.last_inactive_time
>      FROM pg_get_replication_slots() AS L
>
> As mentioned previously, let's keep these new fields before
> conflicting and after two_phase.

Sorry, I forgot to notice that comment (out of a flood of comments
really :)). Now, done that way.

> 2.
> +# Get last_inactive_time value after slot's creation. Note that the
> slot is still
> +# inactive unless it's used by the standby below.
> +my $last_inactive_time_1 = $primary->safe_psql('postgres',
> + qq(SELECT last_inactive_time FROM pg_replication_slots WHERE
> slot_name = '$sb_slot' AND last_inactive_time IS NOT NULL;)
> +);
>
> We should check $last_inactive_time_1 to be a valid value and add a
> similar check for logical slots.

That's taken care by the type cast we do, right? Isn't that enough?

is( $primary->safe_psql(
        'postgres',
        qq[SELECT last_inactive_time >
'$last_inactive_time'::timestamptz FROM pg_replication_slots WHERE
slot_name = '$sb_slot' AND last_inactive_time IS NOT NULL;]
    ),
    't',
    'last inactive time for an inactive physical slot is updated correctly');

For instance, setting last_inactive_time_1 to an invalid value fails
with the following error:

error running SQL: 'psql:<stdin>:1: ERROR:  invalid input syntax for
type timestamp with time zone: "foo"
LINE 1: SELECT last_inactive_time > 'foo'::timestamptz FROM pg_repli...

> 3. BTW, why don't we set last_inactive_time for temporary slots
> (RS_TEMPORARY) as well? Don't we even invalidate temporary slots? If
> so, then I think we should set last_inactive_time for those as well
> and later allow them to be invalidated based on timeout parameter.

WFM. Done that way.

Please see the attached v16 patch.

--
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Attachment
Hi,

On Sat, Mar 23, 2024 at 01:11:50PM +0530, Bharath Rupireddy wrote:
> On Sat, Mar 23, 2024 at 11:27 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
> >
> > How about adding the test in 019_replslot_limit? It is not a direct
> > fit but I feel later we can even add 'invalid_timeout' related tests
> > in this file which will use last_inactive_time feature.
> 
> I'm thinking the other way. Now, the new TAP file 043_replslot_misc.pl
> can have last_inactive_time tests, and later invalid_timeout ones too.
> This way 019_replslot_limit.pl is not cluttered.

I share the same opinion as Amit: I think 019_replslot_limit would be a better
place, because I see the timeout as another kind of limit.

> 
> > It is also
> > possible that some of the tests added by the 'invalid_timeout' feature
> > will obviate the need for some of these tests.
> 
> Might be. But, I prefer to keep both these tests separate but in the
> same file 043_replslot_misc.pl. Because we cover some corner cases the
> last_inactive_time is set upon loading the slot from disk.

Right but I think that this test does not necessary have to be in the same .pl
as the one testing the timeout. Could be added in one of the existing .pl like
001_stream_rep.pl for example.

Regards,

-- 
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com



On Sat, Mar 23, 2024 at 2:34 PM Bertrand Drouvot
<bertranddrouvot.pg@gmail.com> wrote:
>
> > > How about adding the test in 019_replslot_limit? It is not a direct
> > > fit but I feel later we can even add 'invalid_timeout' related tests
> > > in this file which will use last_inactive_time feature.
> >
> > I'm thinking the other way. Now, the new TAP file 043_replslot_misc.pl
> > can have last_inactive_time tests, and later invalid_timeout ones too.
> > This way 019_replslot_limit.pl is not cluttered.
>
> I share the same opinion as Amit: I think 019_replslot_limit would be a better
> place, because I see the timeout as another kind of limit.

Hm. Done that way.

Please see the attached v17 patch.

--
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Attachment
On Sat, Mar 23, 2024 at 1:12 PM Bharath Rupireddy
<bharath.rupireddyforpostgres@gmail.com> wrote:
>
> On Sat, Mar 23, 2024 at 11:27 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
> >
>
> > 2.
> > +# Get last_inactive_time value after slot's creation. Note that the
> > slot is still
> > +# inactive unless it's used by the standby below.
> > +my $last_inactive_time_1 = $primary->safe_psql('postgres',
> > + qq(SELECT last_inactive_time FROM pg_replication_slots WHERE
> > slot_name = '$sb_slot' AND last_inactive_time IS NOT NULL;)
> > +);
> >
> > We should check $last_inactive_time_1 to be a valid value and add a
> > similar check for logical slots.
>
> That's taken care by the type cast we do, right? Isn't that enough?
>
> is( $primary->safe_psql(
>         'postgres',
>         qq[SELECT last_inactive_time >
> '$last_inactive_time'::timestamptz FROM pg_replication_slots WHERE
> slot_name = '$sb_slot' AND last_inactive_time IS NOT NULL;]
>     ),
>     't',
>     'last inactive time for an inactive physical slot is updated correctly');
>
> For instance, setting last_inactive_time_1 to an invalid value fails
> with the following error:
>
> error running SQL: 'psql:<stdin>:1: ERROR:  invalid input syntax for
> type timestamp with time zone: "foo"
> LINE 1: SELECT last_inactive_time > 'foo'::timestamptz FROM pg_repli...
>

It would be found at a later point. It would be probably better to
verify immediately after the test that fetches the last_inactive_time
value.

--
With Regards,
Amit Kapila.



On Sun, Mar 24, 2024 at 10:40 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> > For instance, setting last_inactive_time_1 to an invalid value fails
> > with the following error:
> >
> > error running SQL: 'psql:<stdin>:1: ERROR:  invalid input syntax for
> > type timestamp with time zone: "foo"
> > LINE 1: SELECT last_inactive_time > 'foo'::timestamptz FROM pg_repli...
> >
>
> It would be found at a later point. It would be probably better to
> verify immediately after the test that fetches the last_inactive_time
> value.

Agree. I've added a few more checks explicitly to verify the
last_inactive_time is sane with the following:

        qq[SELECT '$last_inactive_time'::timestamptz > to_timestamp(0)
AND '$last_inactive_time'::timestamptz >
'$slot_creation_time'::timestamptz;]

I've attached the v18 patch set here. I've also addressed earlier
review comments from Amit, Ajin Cherian. Note that I've added new
invalidation mechanism tests in a separate TAP test file just because
I don't want to clutter or bloat any of the existing files and spread
tests for physical slots and logical slots into separate existing TAP
files.

--
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Attachment
On Sun, Mar 24, 2024 at 3:05 PM Bharath Rupireddy
<bharath.rupireddyforpostgres@gmail.com> wrote:
>
> On Sun, Mar 24, 2024 at 10:40 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
> >
> > > For instance, setting last_inactive_time_1 to an invalid value fails
> > > with the following error:
> > >
> > > error running SQL: 'psql:<stdin>:1: ERROR:  invalid input syntax for
> > > type timestamp with time zone: "foo"
> > > LINE 1: SELECT last_inactive_time > 'foo'::timestamptz FROM pg_repli...
> > >
> >
> > It would be found at a later point. It would be probably better to
> > verify immediately after the test that fetches the last_inactive_time
> > value.
>
> Agree. I've added a few more checks explicitly to verify the
> last_inactive_time is sane with the following:
>
>         qq[SELECT '$last_inactive_time'::timestamptz > to_timestamp(0)
> AND '$last_inactive_time'::timestamptz >
> '$slot_creation_time'::timestamptz;]
>

Such a test looks reasonable but shall we add equal to in the second
part of the test (like '$last_inactive_time'::timestamptz >=
> '$slot_creation_time'::timestamptz;). This is just to be sure that even if the test ran fast enough to give the same
time,the test shouldn't fail. I think it won't matter for correctness as well. 


--
With Regards,
Amit Kapila.



On Mon, Mar 25, 2024 at 9:48 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
>
>
> Such a test looks reasonable but shall we add equal to in the second
> part of the test (like '$last_inactive_time'::timestamptz >=
> > '$slot_creation_time'::timestamptz;). This is just to be sure that even if the test ran fast enough to give the
sametime, the test shouldn't fail. I think it won't matter for correctness as well. 
>

Apart from this, I have made minor changes in the comments. See and
let me know what you think of attached.

--
With Regards,
Amit Kapila.

Attachment
On Sun, Mar 24, 2024 at 3:06 PM Bharath Rupireddy
<bharath.rupireddyforpostgres@gmail.com> wrote:
>
> I've attached the v18 patch set here.

Thanks for the patches. Please find few comments:

patch 001:
--------

1)
slot.h:

+ /* The time at which this slot become inactive */
+ TimestampTz last_inactive_time;

become -->became

---------
patch 002:

2)
slotsync.c:

  ReplicationSlotCreate(remote_slot->name, true, RS_TEMPORARY,
    remote_slot->two_phase,
    remote_slot->failover,
-   true);
+   true, 0);

+ slot->data.inactive_timeout = remote_slot->inactive_timeout;

Is there a reason we are not passing 'remote_slot->inactive_timeout'
to ReplicationSlotCreate() directly?

---------

3)
slotfuncs.c
pg_create_logical_replication_slot():
+ int inactive_timeout = PG_GETARG_INT32(5);

Can we mention here that timeout is in seconds either in comment or
rename variable to inactive_timeout_secs?

Please do this for create_physical_replication_slot(),
create_logical_replication_slot(),
pg_create_physical_replication_slot() as well.

---------
4)
+ int inactive_timeout; /* The amount of time in seconds the slot
+ * is allowed to be inactive. */
 } LogicalSlotInfo;

 Do we need to mention "before getting invalided" like other places
(in last patch)?

----------

 5)
Same at these two places. "before getting invalided" to be added in
the last patch otherwise the info is incompleted.

+
+ /* The amount of time in seconds the slot is allowed to be inactive */
+ int inactive_timeout;
 } ReplicationSlotPersistentData;


+ * inactive_timeout: The amount of time in seconds the slot is allowed to be
+ *     inactive.
  */
 void
 ReplicationSlotCreate(const char *name, bool db_specific,
 Same here. "before getting invalidated" ?

--------

Reviewing more..

thanks
Shveta



On Mon, Mar 25, 2024 at 10:33 AM shveta malik <shveta.malik@gmail.com> wrote:
>
> On Sun, Mar 24, 2024 at 3:06 PM Bharath Rupireddy
> <bharath.rupireddyforpostgres@gmail.com> wrote:
> >
> > I've attached the v18 patch set here.
>

I have a question. Don't we allow creating subscriptions on an
existing slot with a non-null 'inactive_timeout' set where
'inactive_timeout' of the slot is retained even after subscription
creation?

I tried this:

===================
--On publisher, create slot with 120sec inactive_timeout:
SELECT * FROM pg_create_logical_replication_slot('logical_slot1',
'pgoutput', false, true, true, 120);

--On subscriber, create sub using logical_slot1
create subscription mysubnew1_1  connection 'dbname=newdb1
host=localhost user=shveta port=5433' publication mypubnew1_1 WITH
(failover = true, create_slot=false, slot_name='logical_slot1');

--Before creating sub, pg_replication_slots output:
   slot_name   | failover | synced | active | temp | conf |
   lat                | inactive_timeout
---------------+----------+--------+--------+------+------+----------------------------------+------------------
 logical_slot1 | t        | f      | f      | f    | f    | 2024-03-25
11:11:55.375736+05:30 |              120

--After creating sub pg_replication_slots output:  (inactive_timeout is 0 now):
   slot_name   |failover | synced | active | temp | conf | | lat |
inactive_timeout
---------------+---------+--------+--------+------+------+-+-----+------------------
 logical_slot1 |t        | f      | t      | f    | f    | |     |
           0
===================

In CreateSubscription, we call  'walrcv_alter_slot()' /
'ReplicationSlotAlter()' when create_slot is false. This call ends up
setting active_timeout from 120sec to 0. Is it intentional?

thanks
Shveta



On Mon, Mar 25, 2024 at 10:28 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> On Mon, Mar 25, 2024 at 9:48 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
> >
> >
> > Such a test looks reasonable but shall we add equal to in the second
> > part of the test (like '$last_inactive_time'::timestamptz >=
> > > '$slot_creation_time'::timestamptz;). This is just to be sure that even if the test ran fast enough to give the
sametime, the test shouldn't fail. I think it won't matter for correctness as well. 

Agree. I added that in v19 patch. I was having that concern in my
mind. That's the reason I wasn't capturing current_time something like
below for the same worry that current_timestamp might be the same (or
nearly the same) as the slot creation time. That's why I ended up
capturing current_timestamp in a separate query than clubbing it up
with pg_create_physical_replication_slot.

SELECT current_timestamp FROM pg_create_physical_replication_slot('foo');

> Apart from this, I have made minor changes in the comments. See and
> let me know what you think of attached.

LGTM. I've merged the diff into v19 patch.

Please find the attached v19 patch.

--
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Attachment
On Mon, Mar 25, 2024 at 11:53 AM shveta malik <shveta.malik@gmail.com> wrote:
>
> On Mon, Mar 25, 2024 at 10:33 AM shveta malik <shveta.malik@gmail.com> wrote:
> >
> > On Sun, Mar 24, 2024 at 3:06 PM Bharath Rupireddy
> > <bharath.rupireddyforpostgres@gmail.com> wrote:
> > >
> > > I've attached the v18 patch set here.

I have one concern, for synced slots on standby, how do we disallow
invalidation due to inactive-timeout immediately after promotion?

For synced slots, last_inactive_time and inactive_timeout are both
set. Let's say I bring down primary for promotion of standby and then
promote standby, there are chances that it may end up invalidating
synced slots (considering standby is not brought down during promotion
and thus inactive_timeout may already be past 'last_inactive_time'). I
tried with smaller unit of inactive_timeout:

--Shutdown primary to prepare for planned promotion.

--On standby, one synced slot with last_inactive_time (lat) as 12:21
   slot_name   | failover | synced | active | temp | conf | res |
         lat                                        | inactive_timeout
---------------+----------+--------+--------+------+------+-----+----------------------------------+------------------
 logical_slot1 | t           | t              | f         | f       |
f       |       | 2024-03-25 12:21:09.020757+05:30 |              60

--wait for some time, now the time is 12:24
postgres=# select now();
               now
----------------------------------
 2024-03-25 12:24:17.616716+05:30

-- promote immediately:
./pg_ctl -D ../../standbydb/ promote -w

--on promoted standby:
postgres=# select pg_is_in_recovery();
 pg_is_in_recovery
-------------------
 f

--synced slot is invalidated immediately on promotion.
   slot_name   | failover | synced | active | temp | conf
  |       res                |               lat                |
inactive_timeout

---------------+----------+--------+--------+------+------+------------------+----------------------------------+--------
 logical_slot1 | t             | t           | f         | f
| f                    | inactive_timeout | 2024-03-25
12:21:09.020757+05:30 |


thanks
Shveta



On Mon, Mar 25, 2024 at 12:43 PM shveta malik <shveta.malik@gmail.com> wrote:
>
> On Mon, Mar 25, 2024 at 11:53 AM shveta malik <shveta.malik@gmail.com> wrote:
> >
> > On Mon, Mar 25, 2024 at 10:33 AM shveta malik <shveta.malik@gmail.com> wrote:
> > >
> > > On Sun, Mar 24, 2024 at 3:06 PM Bharath Rupireddy
> > > <bharath.rupireddyforpostgres@gmail.com> wrote:
> > > >
> > > > I've attached the v18 patch set here.
>
> I have one concern, for synced slots on standby, how do we disallow
> invalidation due to inactive-timeout immediately after promotion?
>
> For synced slots, last_inactive_time and inactive_timeout are both
> set. Let's say I bring down primary for promotion of standby and then
> promote standby, there are chances that it may end up invalidating
> synced slots (considering standby is not brought down during promotion
> and thus inactive_timeout may already be past 'last_inactive_time').
>

This raises the question of whether we need to set
'last_inactive_time' synced slots on the standby?

--
With Regards,
Amit Kapila.



Hi,

On Mon, Mar 25, 2024 at 12:25:21PM +0530, Bharath Rupireddy wrote:
> On Mon, Mar 25, 2024 at 10:28 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
> >
> > On Mon, Mar 25, 2024 at 9:48 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
> > >
> > >
> > > Such a test looks reasonable but shall we add equal to in the second
> > > part of the test (like '$last_inactive_time'::timestamptz >=
> > > > '$slot_creation_time'::timestamptz;). This is just to be sure that even if the test ran fast enough to give the
sametime, the test shouldn't fail. I think it won't matter for correctness as well.
 
> 
> Agree. I added that in v19 patch. I was having that concern in my
> mind. That's the reason I wasn't capturing current_time something like
> below for the same worry that current_timestamp might be the same (or
> nearly the same) as the slot creation time. That's why I ended up
> capturing current_timestamp in a separate query than clubbing it up
> with pg_create_physical_replication_slot.
> 
> SELECT current_timestamp FROM pg_create_physical_replication_slot('foo');
> 
> > Apart from this, I have made minor changes in the comments. See and
> > let me know what you think of attached.
> 

Thanks!

v19-0001 LGTM, just one Nit comment for 019_replslot_limit.pl:

The code for "Get last_inactive_time value after the slot's creation" and 
"Check that the captured time is sane" is somehow duplicated: is it worth creating
2 functions?

Regards,

-- 
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com



Hi,

On Mon, Mar 25, 2024 at 12:59:52PM +0530, Amit Kapila wrote:
> On Mon, Mar 25, 2024 at 12:43 PM shveta malik <shveta.malik@gmail.com> wrote:
> >
> > On Mon, Mar 25, 2024 at 11:53 AM shveta malik <shveta.malik@gmail.com> wrote:
> > >
> > > On Mon, Mar 25, 2024 at 10:33 AM shveta malik <shveta.malik@gmail.com> wrote:
> > > >
> > > > On Sun, Mar 24, 2024 at 3:06 PM Bharath Rupireddy
> > > > <bharath.rupireddyforpostgres@gmail.com> wrote:
> > > > >
> > > > > I've attached the v18 patch set here.
> >
> > I have one concern, for synced slots on standby, how do we disallow
> > invalidation due to inactive-timeout immediately after promotion?
> >
> > For synced slots, last_inactive_time and inactive_timeout are both
> > set.

Yeah, and I can see last_inactive_time is moving on the standby (while not the
case on the primary), probably due to the sync worker slot acquisition/release
which does not seem right.

> Let's say I bring down primary for promotion of standby and then
> > promote standby, there are chances that it may end up invalidating
> > synced slots (considering standby is not brought down during promotion
> > and thus inactive_timeout may already be past 'last_inactive_time').
> >
> 
> This raises the question of whether we need to set
> 'last_inactive_time' synced slots on the standby?

Yeah, I think that last_inactive_time should stay at 0 on synced slots on the
standby because such slots are not usable anyway (until the standby gets promoted).

So, I think that last_inactive_time does not make sense if the slot never had
the chance to be active.

OTOH I think the timeout invalidation (if any) should be synced from primary.

Regards,

-- 
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com



On Mon, Mar 25, 2024 at 1:37 PM Bertrand Drouvot
<bertranddrouvot.pg@gmail.com> wrote:
>
> Hi,
>
> On Mon, Mar 25, 2024 at 12:59:52PM +0530, Amit Kapila wrote:
> > On Mon, Mar 25, 2024 at 12:43 PM shveta malik <shveta.malik@gmail.com> wrote:
> > >
> > > On Mon, Mar 25, 2024 at 11:53 AM shveta malik <shveta.malik@gmail.com> wrote:
> > > >
> > > > On Mon, Mar 25, 2024 at 10:33 AM shveta malik <shveta.malik@gmail.com> wrote:
> > > > >
> > > > > On Sun, Mar 24, 2024 at 3:06 PM Bharath Rupireddy
> > > > > <bharath.rupireddyforpostgres@gmail.com> wrote:
> > > > > >
> > > > > > I've attached the v18 patch set here.
> > >
> > > I have one concern, for synced slots on standby, how do we disallow
> > > invalidation due to inactive-timeout immediately after promotion?
> > >
> > > For synced slots, last_inactive_time and inactive_timeout are both
> > > set.
>
> Yeah, and I can see last_inactive_time is moving on the standby (while not the
> case on the primary), probably due to the sync worker slot acquisition/release
> which does not seem right.
>
> > Let's say I bring down primary for promotion of standby and then
> > > promote standby, there are chances that it may end up invalidating
> > > synced slots (considering standby is not brought down during promotion
> > > and thus inactive_timeout may already be past 'last_inactive_time').
> > >
> >
> > This raises the question of whether we need to set
> > 'last_inactive_time' synced slots on the standby?
>
> Yeah, I think that last_inactive_time should stay at 0 on synced slots on the
> standby because such slots are not usable anyway (until the standby gets promoted).
>
> So, I think that last_inactive_time does not make sense if the slot never had
> the chance to be active.
>
> OTOH I think the timeout invalidation (if any) should be synced from primary.

Yes, even I feel that last_inactive_time makes sense only when the
slot is available to be used. Synced slots are not available to be
used until standby is promoted and thus last_inactive_time can be
skipped to be set for synced_slots. But once primay is invalidated due
to inactive-timeout, that invalidation should be synced to standby
(which is happening currently).

thanks
Shveta



Hi,

On Mon, Mar 25, 2024 at 02:07:21PM +0530, shveta malik wrote:
> On Mon, Mar 25, 2024 at 1:37 PM Bertrand Drouvot
> <bertranddrouvot.pg@gmail.com> wrote:
> >
> > Hi,
> >
> > On Mon, Mar 25, 2024 at 12:59:52PM +0530, Amit Kapila wrote:
> > > On Mon, Mar 25, 2024 at 12:43 PM shveta malik <shveta.malik@gmail.com> wrote:
> > > >
> > > > On Mon, Mar 25, 2024 at 11:53 AM shveta malik <shveta.malik@gmail.com> wrote:
> > > > >
> > > > > On Mon, Mar 25, 2024 at 10:33 AM shveta malik <shveta.malik@gmail.com> wrote:
> > > > > >
> > > > > > On Sun, Mar 24, 2024 at 3:06 PM Bharath Rupireddy
> > > > > > <bharath.rupireddyforpostgres@gmail.com> wrote:
> > > > > > >
> > > > > > > I've attached the v18 patch set here.
> > > >
> > > > I have one concern, for synced slots on standby, how do we disallow
> > > > invalidation due to inactive-timeout immediately after promotion?
> > > >
> > > > For synced slots, last_inactive_time and inactive_timeout are both
> > > > set.
> >
> > Yeah, and I can see last_inactive_time is moving on the standby (while not the
> > case on the primary), probably due to the sync worker slot acquisition/release
> > which does not seem right.
> >
> > > Let's say I bring down primary for promotion of standby and then
> > > > promote standby, there are chances that it may end up invalidating
> > > > synced slots (considering standby is not brought down during promotion
> > > > and thus inactive_timeout may already be past 'last_inactive_time').
> > > >
> > >
> > > This raises the question of whether we need to set
> > > 'last_inactive_time' synced slots on the standby?
> >
> > Yeah, I think that last_inactive_time should stay at 0 on synced slots on the
> > standby because such slots are not usable anyway (until the standby gets promoted).
> >
> > So, I think that last_inactive_time does not make sense if the slot never had
> > the chance to be active.
> >
> > OTOH I think the timeout invalidation (if any) should be synced from primary.
> 
> Yes, even I feel that last_inactive_time makes sense only when the
> slot is available to be used. Synced slots are not available to be
> used until standby is promoted and thus last_inactive_time can be
> skipped to be set for synced_slots. But once primay is invalidated due
> to inactive-timeout, that invalidation should be synced to standby
> (which is happening currently).
> 

yeah, syncing the invalidation and always keeping last_inactive_time to zero 
for synced slots looks right to me.

Regards,

-- 
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com



On Mon, Mar 25, 2024 at 1:37 PM Bertrand Drouvot
<bertranddrouvot.pg@gmail.com> wrote:
>
> Hi,
>
> Yeah, and I can see last_inactive_time is moving on the standby (while not the
> case on the primary), probably due to the sync worker slot acquisition/release
> which does not seem right.
>

Yes, you are right, last_inactive_time keeps on moving for synced
slots on standby.  Once I disabled slot-sync worker, then it is
constant. Then it only changes if I call pg_sync_replication_slots().

On a  different note, I noticed that we allow altering
inactive_timeout for synced-slots on standby. And again overwrite it
with the primary's value in the next sync cycle. Steps:

====================
--Check pg_replication_slots for synced slot on standby, inactive_timeout is 120
   slot_name   | failover | synced | active | inactive_timeout
---------------+----------+--------+--------+------------------
 logical_slot1 | t        | t      | f      |              120

--Alter on standby
SELECT 'alter' FROM pg_alter_replication_slot('logical_slot1', 900);

--Check pg_replication_slots:
   slot_name   | failover | synced | active | inactive_timeout
---------------+----------+--------+--------+------------------
 logical_slot1 | t        | t      | f      |              900

--Run sync function
SELECT pg_sync_replication_slots();

--check again, inactive_timeout is set back to primary's value.
   slot_name   | failover | synced | active | inactive_timeout
---------------+----------+--------+--------+------------------
 logical_slot1 | t        | t      | f      |              120

 ====================

I feel altering synced slot's inactive_timeout should be prohibited on
standby. It should be in sync with primary always. Thoughts?

I am listing the concerns raised by me:
1) create-subscription with create_slot=false overwriting
inactive_timeout of existing slot  ([1])
2) last_inactive_time set for synced slots may result in invalidation
of slot on promotion.  ([2])
3) alter replication slot to alter inactive_timout for synced slots on
standby, should this be allowed?

[1]: https://www.postgresql.org/message-id/CAJpy0uAqBi%2BGbNn2ngJ-A_Z905CD3ss896bqY2ACUjGiF1Gkng%40mail.gmail.com
[2]: https://www.postgresql.org/message-id/CAJpy0uCLu%2BmqAwAMum%3DpXE9YYsy0BE7hOSw_Wno5vjwpFY%3D63g%40mail.gmail.com

thanks
Shveta



Hi,

On Mon, Mar 25, 2024 at 02:39:50PM +0530, shveta malik wrote:
> I am listing the concerns raised by me:
> 3) alter replication slot to alter inactive_timout for synced slots on
> standby, should this be allowed?

I don't think it should be allowed.

Regards,

-- 
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com



On Mon, Mar 25, 2024 at 1:37 PM Bertrand Drouvot
<bertranddrouvot.pg@gmail.com> wrote:
>
> > > I have one concern, for synced slots on standby, how do we disallow
> > > invalidation due to inactive-timeout immediately after promotion?
> > >
> > > For synced slots, last_inactive_time and inactive_timeout are both
> > > set.
>
> Yeah, and I can see last_inactive_time is moving on the standby (while not the
> case on the primary), probably due to the sync worker slot acquisition/release
> which does not seem right.
>
> > Let's say I bring down primary for promotion of standby and then
> > > promote standby, there are chances that it may end up invalidating
> > > synced slots (considering standby is not brought down during promotion
> > > and thus inactive_timeout may already be past 'last_inactive_time').
> > >
> >
> > This raises the question of whether we need to set
> > 'last_inactive_time' synced slots on the standby?
>
> Yeah, I think that last_inactive_time should stay at 0 on synced slots on the
> standby because such slots are not usable anyway (until the standby gets promoted).
>
> So, I think that last_inactive_time does not make sense if the slot never had
> the chance to be active.

Right. Done that way i.e. not setting the last_inactive_time for slots
both while releasing the slot and restoring from the disk.

Also, I've added a TAP function to check if the captured times are
sane per Bertrand's review comment.

Please see the attached v20 patch.

--
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Attachment
On Mon, Mar 25, 2024 at 3:31 PM Bharath Rupireddy
<bharath.rupireddyforpostgres@gmail.com> wrote:
>
> Right. Done that way i.e. not setting the last_inactive_time for slots
> both while releasing the slot and restoring from the disk.
>
> Also, I've added a TAP function to check if the captured times are
> sane per Bertrand's review comment.
>
> Please see the attached v20 patch.

Thanks for the patch. The issue of unnecessary invalidation of synced
slots on promotion is resolved in this patch.

thanks
Shveta



On Mon, Mar 25, 2024 at 3:31 PM Bharath Rupireddy
<bharath.rupireddyforpostgres@gmail.com> wrote:
>
> Right. Done that way i.e. not setting the last_inactive_time for slots
> both while releasing the slot and restoring from the disk.
>
> Also, I've added a TAP function to check if the captured times are
> sane per Bertrand's review comment.
>
> Please see the attached v20 patch.
>

Pushed, after minor changes.

--
With Regards,
Amit Kapila.



On Mon, Mar 25, 2024 at 2:40 PM shveta malik <shveta.malik@gmail.com> wrote:
>
> On Mon, Mar 25, 2024 at 1:37 PM Bertrand Drouvot
> <bertranddrouvot.pg@gmail.com> wrote:
> >
> > Hi,
> >
> > Yeah, and I can see last_inactive_time is moving on the standby (while not the
> > case on the primary), probably due to the sync worker slot acquisition/release
> > which does not seem right.
> >
>
> Yes, you are right, last_inactive_time keeps on moving for synced
> slots on standby.  Once I disabled slot-sync worker, then it is
> constant. Then it only changes if I call pg_sync_replication_slots().
>
> On a  different note, I noticed that we allow altering
> inactive_timeout for synced-slots on standby. And again overwrite it
> with the primary's value in the next sync cycle. Steps:
>
> ====================
> --Check pg_replication_slots for synced slot on standby, inactive_timeout is 120
>    slot_name   | failover | synced | active | inactive_timeout
> ---------------+----------+--------+--------+------------------
>  logical_slot1 | t        | t      | f      |              120
>
> --Alter on standby
> SELECT 'alter' FROM pg_alter_replication_slot('logical_slot1', 900);
>

I think we should keep pg_alter_replication_slot() as the last
priority among the remaining patches for this release. Let's try to
first finish the primary functionality of inactive_timeout patch.
Otherwise, I agree that the problem reported by you should be fixed.

--
With Regards,
Amit Kapila.



On Mon, Mar 25, 2024 at 5:10 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> I think we should keep pg_alter_replication_slot() as the last
> priority among the remaining patches for this release. Let's try to
> first finish the primary functionality of inactive_timeout patch.
> Otherwise, I agree that the problem reported by you should be fixed.

Noted. Will focus on v18-002 patch now.

I was debugging the flow and just noticed that RecoveryInProgress()
always returns 'true' during
StartupReplicationSlots()-->RestoreSlotFromDisk() (even on primary) as
'xlogctl->SharedRecoveryState' is always 'RECOVERY_STATE_CRASH' at
that time. The 'xlogctl->SharedRecoveryState' is changed  to
'RECOVERY_STATE_DONE' on primary and to 'RECOVERY_STATE_ARCHIVE' on
standby at a later stage in StartupXLOG() (after we are done loading
slots).

The impact of this is, the condition in RestoreSlotFromDisk() in v20-001:

if (!(RecoveryInProgress() && slot->data.synced))
     slot->last_inactive_time = GetCurrentTimestamp();

is merely equivalent to:

if (!slot->data.synced)
    slot->last_inactive_time = GetCurrentTimestamp();

Thus on primary, after restart, last_inactive_at is set correctly,
while on promoted standby (new primary), last_inactive_at is always
NULL after restart for the synced slots.

thanks
Shveta



I apologize that I haven't been able to keep up with this thread for a
while, but I'm happy to see the continued interest in $SUBJECT.

On Sun, Mar 24, 2024 at 03:05:44PM +0530, Bharath Rupireddy wrote:
> This commit particularly lets one specify the inactive_timeout for
> a slot via SQL functions pg_create_physical_replication_slot and
> pg_create_logical_replication_slot.

Off-list, Bharath brought to my attention that the current proposal was to
set the timeout at the slot level.  While I think that is an entirely
reasonable thing to support, the main use-case I have in mind for this
feature is for an administrator that wants to prevent inactive slots from
causing problems (e.g., transaction ID wraparound) on a server or a number
of servers.  For that use-case, I think a GUC would be much more
convenient.  Perhaps there could be a default inactive slot timeout GUC
that would be used in the absence of a slot-level setting.  Thoughts?

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com



On Mon, Mar 25, 2024 at 12:43 PM shveta malik <shveta.malik@gmail.com> wrote:
>
> I have one concern, for synced slots on standby, how do we disallow
> invalidation due to inactive-timeout immediately after promotion?
>
> For synced slots, last_inactive_time and inactive_timeout are both
> set. Let's say I bring down primary for promotion of standby and then
> promote standby, there are chances that it may end up invalidating
> synced slots (considering standby is not brought down during promotion
> and thus inactive_timeout may already be past 'last_inactive_time').
>

On standby, if we decide to maintain valid last_inactive_time for
synced slots, then invalidation is correctly restricted in
InvalidateSlotForInactiveTimeout() for synced slots using the check:

        if (RecoveryInProgress() && slot->data.synced)
                return false;

But immediately after promotion, we can not rely on the above check
and thus possibility of synced slots invalidation is there. To
maintain consistent behavior regarding the setting of
last_inactive_time for synced slots, similar to user slots, one
potential solution to prevent this invalidation issue is to update the
last_inactive_time of all synced slots within the ShutDownSlotSync()
function during FinishWalRecovery(). This approach ensures that
promotion doesn't immediately invalidate slots, and henceforth, we
possess a correct last_inactive_time as a basis for invalidation going
forward. This will be equivalent to updating last_inactive_time during
restart (but without actual restart during promotion).
The plus point of maintaining last_inactive_time for synced slots
could be, this can provide data to the user on when last time the sync
was attempted on that particular slot by background slot sync worker
or SQl function. Thoughts?

thanks
Shveta



On Tue, Mar 26, 2024 at 1:24 AM Nathan Bossart <nathandbossart@gmail.com> wrote:
>
>
> On Sun, Mar 24, 2024 at 03:05:44PM +0530, Bharath Rupireddy wrote:
> > This commit particularly lets one specify the inactive_timeout for
> > a slot via SQL functions pg_create_physical_replication_slot and
> > pg_create_logical_replication_slot.
>
> Off-list, Bharath brought to my attention that the current proposal was to
> set the timeout at the slot level.  While I think that is an entirely
> reasonable thing to support, the main use-case I have in mind for this
> feature is for an administrator that wants to prevent inactive slots from
> causing problems (e.g., transaction ID wraparound) on a server or a number
> of servers.  For that use-case, I think a GUC would be much more
> convenient.  Perhaps there could be a default inactive slot timeout GUC
> that would be used in the absence of a slot-level setting.  Thoughts?
>

Yeah, that is a valid point. One of the reasons for keeping it at slot
level was to allow different subscribers/output plugins to have a
different setting for invalid_timeout for their respective slots based
on their usage. Now, having it as a GUC also has some valid use cases
as pointed out by you but I am not sure having both at slot level and
at GUC level is required. I was a bit inclined to have it at slot
level for now and then based on some field usage report we can later
add GUC as well.

--
With Regards,
Amit Kapila.



On Tue, Mar 26, 2024 at 9:30 AM shveta malik <shveta.malik@gmail.com> wrote:
>
> On Mon, Mar 25, 2024 at 12:43 PM shveta malik <shveta.malik@gmail.com> wrote:
> >
> > I have one concern, for synced slots on standby, how do we disallow
> > invalidation due to inactive-timeout immediately after promotion?
> >
> > For synced slots, last_inactive_time and inactive_timeout are both
> > set. Let's say I bring down primary for promotion of standby and then
> > promote standby, there are chances that it may end up invalidating
> > synced slots (considering standby is not brought down during promotion
> > and thus inactive_timeout may already be past 'last_inactive_time').
> >
>
> On standby, if we decide to maintain valid last_inactive_time for
> synced slots, then invalidation is correctly restricted in
> InvalidateSlotForInactiveTimeout() for synced slots using the check:
>
>         if (RecoveryInProgress() && slot->data.synced)
>                 return false;
>
> But immediately after promotion, we can not rely on the above check
> and thus possibility of synced slots invalidation is there. To
> maintain consistent behavior regarding the setting of
> last_inactive_time for synced slots, similar to user slots, one
> potential solution to prevent this invalidation issue is to update the
> last_inactive_time of all synced slots within the ShutDownSlotSync()
> function during FinishWalRecovery(). This approach ensures that
> promotion doesn't immediately invalidate slots, and henceforth, we
> possess a correct last_inactive_time as a basis for invalidation going
> forward. This will be equivalent to updating last_inactive_time during
> restart (but without actual restart during promotion).
> The plus point of maintaining last_inactive_time for synced slots
> could be, this can provide data to the user on when last time the sync
> was attempted on that particular slot by background slot sync worker
> or SQl function. Thoughts?

Please find the attached v21 patch implementing the above idea. It
also has changes for renaming last_inactive_time to inactive_since.

--
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Attachment
Hi,

On Tue, Mar 26, 2024 at 09:30:32AM +0530, shveta malik wrote:
> On Mon, Mar 25, 2024 at 12:43 PM shveta malik <shveta.malik@gmail.com> wrote:
> >
> > I have one concern, for synced slots on standby, how do we disallow
> > invalidation due to inactive-timeout immediately after promotion?
> >
> > For synced slots, last_inactive_time and inactive_timeout are both
> > set. Let's say I bring down primary for promotion of standby and then
> > promote standby, there are chances that it may end up invalidating
> > synced slots (considering standby is not brought down during promotion
> > and thus inactive_timeout may already be past 'last_inactive_time').
> >
> 
> On standby, if we decide to maintain valid last_inactive_time for
> synced slots, then invalidation is correctly restricted in
> InvalidateSlotForInactiveTimeout() for synced slots using the check:
> 
>         if (RecoveryInProgress() && slot->data.synced)
>                 return false;

Right.

> But immediately after promotion, we can not rely on the above check
> and thus possibility of synced slots invalidation is there. To
> maintain consistent behavior regarding the setting of
> last_inactive_time for synced slots, similar to user slots, one
> potential solution to prevent this invalidation issue is to update the
> last_inactive_time of all synced slots within the ShutDownSlotSync()
> function during FinishWalRecovery(). This approach ensures that
> promotion doesn't immediately invalidate slots, and henceforth, we
> possess a correct last_inactive_time as a basis for invalidation going
> forward. This will be equivalent to updating last_inactive_time during
> restart (but without actual restart during promotion).
> The plus point of maintaining last_inactive_time for synced slots
> could be, this can provide data to the user on when last time the sync
> was attempted on that particular slot by background slot sync worker
> or SQl function. Thoughts?

Yeah, another plus point is that if the primary is down then one could look
at the synced "active_since" on the standby to get an idea of it (depends of the
last sync though).

The issue that I can see with your proposal is: what if one synced the slots
manually (with pg_sync_replication_slots()) but does not use the sync worker?
Then I think ShutDownSlotSync() is not going to help in that case.

Regards,

-- 
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com



On Sun, Mar 24, 2024 at 3:05 PM Bharath Rupireddy
<bharath.rupireddyforpostgres@gmail.com> wrote:
>
> I've attached the v18 patch set here. I've also addressed earlier
> review comments from Amit, Ajin Cherian. Note that I've added new
> invalidation mechanism tests in a separate TAP test file just because
> I don't want to clutter or bloat any of the existing files and spread
> tests for physical slots and logical slots into separate existing TAP
> files.
>

Review comments on v18_0002 and v18_0005
=======================================
1.
 ReplicationSlotCreate(const char *name, bool db_specific,
    ReplicationSlotPersistency persistency,
-   bool two_phase, bool failover, bool synced)
+   bool two_phase, bool failover, bool synced,
+   int inactive_timeout)
 {
  ReplicationSlot *slot = NULL;
  int i;
@@ -345,6 +348,18 @@ ReplicationSlotCreate(const char *name, bool db_specific,
  errmsg("cannot enable failover for a temporary replication slot"));
  }

+ if (inactive_timeout > 0)
+ {
+ /*
+ * Do not allow users to set inactive_timeout for temporary slots,
+ * because temporary slots will not be saved to the disk.
+ */
+ if (persistency == RS_TEMPORARY)
+ ereport(ERROR,
+ errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+ errmsg("cannot set inactive_timeout for a temporary replication slot"));
+ }

We have decided to update inactive_since for temporary slots. So,
unless there is some reason, we should allow inactive_timeout to also
be set for temporary slots.

2.
--- a/src/backend/catalog/system_views.sql
+++ b/src/backend/catalog/system_views.sql
@@ -1024,6 +1024,7 @@ CREATE VIEW pg_replication_slots AS
             L.safe_wal_size,
             L.two_phase,
             L.last_inactive_time,
+            L.inactive_timeout,

Shall we keep inactive_timeout before
last_inactive_time/inactive_since? I don't have any strong reason to
propose that way apart from that the former is provided by the user.

3.
@@ -287,6 +288,13 @@ pg_get_replication_slots(PG_FUNCTION_ARGS)
  slot_contents = *slot;
  SpinLockRelease(&slot->mutex);

+ /*
+ * Here's an opportunity to invalidate inactive replication slots
+ * based on timeout, so let's do it.
+ */
+ if (InvalidateReplicationSlotForInactiveTimeout(slot, false, true, true))
+ invalidated = true;

I don't think we should try to invalidate the slots in
pg_get_replication_slots. This function's purpose is to get the
current information on slots and has no intention to perform any work
for slots. Any error due to invalidation won't be what the user would
be expecting here.

4.
+static bool
+InvalidateSlotForInactiveTimeout(ReplicationSlot *slot,
+ bool need_control_lock,
+ bool need_mutex)
{
...
...
+ if (need_control_lock)
+ LWLockAcquire(ReplicationSlotControlLock, LW_SHARED);
+
+ Assert(LWLockHeldByMeInMode(ReplicationSlotControlLock, LW_SHARED));
+
+ /*
+ * Check if the slot needs to be invalidated due to inactive_timeout. We
+ * do this with the spinlock held to avoid race conditions -- for example
+ * the restart_lsn could move forward, or the slot could be dropped.
+ */
+ if (need_mutex)
+ SpinLockAcquire(&slot->mutex);
...

I find this combination of parameters a bit strange. Because, say if
need_mutex is false and need_control_lock is true then that means this
function will acquire LWlock after acquiring spinlock which is
unacceptable. Now, this may not happen in practice as the callers
won't pass such a combination but still, this functionality should be
improved.

--
With Regards,
Amit Kapila.



Hi,

On Tue, Mar 26, 2024 at 05:55:11AM +0000, Bertrand Drouvot wrote:
> Hi,
> 
> On Tue, Mar 26, 2024 at 09:30:32AM +0530, shveta malik wrote:
> > On Mon, Mar 25, 2024 at 12:43 PM shveta malik <shveta.malik@gmail.com> wrote:
> > >
> > > I have one concern, for synced slots on standby, how do we disallow
> > > invalidation due to inactive-timeout immediately after promotion?
> > >
> > > For synced slots, last_inactive_time and inactive_timeout are both
> > > set. Let's say I bring down primary for promotion of standby and then
> > > promote standby, there are chances that it may end up invalidating
> > > synced slots (considering standby is not brought down during promotion
> > > and thus inactive_timeout may already be past 'last_inactive_time').
> > >
> > 
> > On standby, if we decide to maintain valid last_inactive_time for
> > synced slots, then invalidation is correctly restricted in
> > InvalidateSlotForInactiveTimeout() for synced slots using the check:
> > 
> >         if (RecoveryInProgress() && slot->data.synced)
> >                 return false;
> 
> Right.
> 
> > But immediately after promotion, we can not rely on the above check
> > and thus possibility of synced slots invalidation is there. To
> > maintain consistent behavior regarding the setting of
> > last_inactive_time for synced slots, similar to user slots, one
> > potential solution to prevent this invalidation issue is to update the
> > last_inactive_time of all synced slots within the ShutDownSlotSync()
> > function during FinishWalRecovery(). This approach ensures that
> > promotion doesn't immediately invalidate slots, and henceforth, we
> > possess a correct last_inactive_time as a basis for invalidation going
> > forward. This will be equivalent to updating last_inactive_time during
> > restart (but without actual restart during promotion).
> > The plus point of maintaining last_inactive_time for synced slots
> > could be, this can provide data to the user on when last time the sync
> > was attempted on that particular slot by background slot sync worker
> > or SQl function. Thoughts?
> 
> Yeah, another plus point is that if the primary is down then one could look
> at the synced "active_since" on the standby to get an idea of it (depends of the
> last sync though).
> 
> The issue that I can see with your proposal is: what if one synced the slots
> manually (with pg_sync_replication_slots()) but does not use the sync worker?
> Then I think ShutDownSlotSync() is not going to help in that case.

It looks like ShutDownSlotSync() is always called (even if sync_replication_slots = off),
so that sounds ok to me (I should have checked the code, I was under the impression
ShutDownSlotSync() was not called if sync_replication_slots = off).

Regards,

-- 
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com



On Tue, Mar 26, 2024 at 11:36 AM Bertrand Drouvot
<bertranddrouvot.pg@gmail.com> wrote:
> >
> > The issue that I can see with your proposal is: what if one synced the slots
> > manually (with pg_sync_replication_slots()) but does not use the sync worker?
> > Then I think ShutDownSlotSync() is not going to help in that case.
>
> It looks like ShutDownSlotSync() is always called (even if sync_replication_slots = off),
> so that sounds ok to me (I should have checked the code, I was under the impression
> ShutDownSlotSync() was not called if sync_replication_slots = off).

Right, it is called irrespective of sync_replication_slots.

thanks
Shveta



On Tue, Mar 26, 2024 at 11:08 AM Bharath Rupireddy
<bharath.rupireddyforpostgres@gmail.com> wrote:
>
> On Tue, Mar 26, 2024 at 9:30 AM shveta malik <shveta.malik@gmail.com> wrote:
> >
> > On Mon, Mar 25, 2024 at 12:43 PM shveta malik <shveta.malik@gmail.com> wrote:
> > >
> > > I have one concern, for synced slots on standby, how do we disallow
> > > invalidation due to inactive-timeout immediately after promotion?
> > >
> > > For synced slots, last_inactive_time and inactive_timeout are both
> > > set. Let's say I bring down primary for promotion of standby and then
> > > promote standby, there are chances that it may end up invalidating
> > > synced slots (considering standby is not brought down during promotion
> > > and thus inactive_timeout may already be past 'last_inactive_time').
> > >
> >
> > On standby, if we decide to maintain valid last_inactive_time for
> > synced slots, then invalidation is correctly restricted in
> > InvalidateSlotForInactiveTimeout() for synced slots using the check:
> >
> >         if (RecoveryInProgress() && slot->data.synced)
> >                 return false;
> >
> > But immediately after promotion, we can not rely on the above check
> > and thus possibility of synced slots invalidation is there. To
> > maintain consistent behavior regarding the setting of
> > last_inactive_time for synced slots, similar to user slots, one
> > potential solution to prevent this invalidation issue is to update the
> > last_inactive_time of all synced slots within the ShutDownSlotSync()
> > function during FinishWalRecovery(). This approach ensures that
> > promotion doesn't immediately invalidate slots, and henceforth, we
> > possess a correct last_inactive_time as a basis for invalidation going
> > forward. This will be equivalent to updating last_inactive_time during
> > restart (but without actual restart during promotion).
> > The plus point of maintaining last_inactive_time for synced slots
> > could be, this can provide data to the user on when last time the sync
> > was attempted on that particular slot by background slot sync worker
> > or SQl function. Thoughts?
>
> Please find the attached v21 patch implementing the above idea. It
> also has changes for renaming last_inactive_time to inactive_since.
>

Thanks for the patch. I have tested this patch alone, and it does what
it says. One additional thing which I noticed is that now it sets
inactive_since for temp slots as well, but that idea looks fine to me.

I could not test 'invalidation on promotion bug' with this change, as
that needed rebasing of the rest of the patches.

Few trivial things:

1)
Commti msg:

ensures the value is set to current timestamp during the
shutdown to help correctly interpret the time if the standby gets
promoted without a restart.

shutdown --> shutdown of slot sync worker   (as it was not clear if it
is instance shutdown or something else)

2)
'The time since the slot has became inactive'.

has became-->has become
or just became

Please check it in all the files. There are multiple places.

thanks
Shveta



Hi,

On Tue, Mar 26, 2024 at 11:07:51AM +0530, Bharath Rupireddy wrote:
> On Tue, Mar 26, 2024 at 9:30 AM shveta malik <shveta.malik@gmail.com> wrote:
> > But immediately after promotion, we can not rely on the above check
> > and thus possibility of synced slots invalidation is there. To
> > maintain consistent behavior regarding the setting of
> > last_inactive_time for synced slots, similar to user slots, one
> > potential solution to prevent this invalidation issue is to update the
> > last_inactive_time of all synced slots within the ShutDownSlotSync()
> > function during FinishWalRecovery(). This approach ensures that
> > promotion doesn't immediately invalidate slots, and henceforth, we
> > possess a correct last_inactive_time as a basis for invalidation going
> > forward. This will be equivalent to updating last_inactive_time during
> > restart (but without actual restart during promotion).
> > The plus point of maintaining last_inactive_time for synced slots
> > could be, this can provide data to the user on when last time the sync
> > was attempted on that particular slot by background slot sync worker
> > or SQl function. Thoughts?
> 
> Please find the attached v21 patch implementing the above idea. It
> also has changes for renaming last_inactive_time to inactive_since.

Thanks!

A few comments:

1 ===

One trailing whitespace:

Applying: Fix review comments for slot's last_inactive_time property
.git/rebase-apply/patch:433: trailing whitespace.
# got a valid inactive_since value representing the last slot sync time.
warning: 1 line adds whitespace errors.

2 ===

It looks like inactive_since is set to the current timestamp on the standby
each time the sync worker does a cycle:

primary:

postgres=# select slot_name,inactive_since from pg_replication_slots where failover = 't';
  slot_name  |        inactive_since
-------------+-------------------------------
 lsub27_slot | 2024-03-26 07:39:19.745517+00
 lsub28_slot | 2024-03-26 07:40:24.953826+00

standby:

postgres=# select slot_name,inactive_since from pg_replication_slots where failover = 't';
  slot_name  |        inactive_since
-------------+-------------------------------
 lsub27_slot | 2024-03-26 07:43:56.387324+00
 lsub28_slot | 2024-03-26 07:43:56.387338+00

I don't think that should be the case.

Regards,

-- 
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com



On Tue, Mar 26, 2024 at 1:15 PM Bertrand Drouvot
<bertranddrouvot.pg@gmail.com> wrote:
>
> 2 ===
>
> It looks like inactive_since is set to the current timestamp on the standby
> each time the sync worker does a cycle:
>
> primary:
>
> postgres=# select slot_name,inactive_since from pg_replication_slots where failover = 't';
>   slot_name  |        inactive_since
> -------------+-------------------------------
>  lsub27_slot | 2024-03-26 07:39:19.745517+00
>  lsub28_slot | 2024-03-26 07:40:24.953826+00
>
> standby:
>
> postgres=# select slot_name,inactive_since from pg_replication_slots where failover = 't';
>   slot_name  |        inactive_since
> -------------+-------------------------------
>  lsub27_slot | 2024-03-26 07:43:56.387324+00
>  lsub28_slot | 2024-03-26 07:43:56.387338+00
>
> I don't think that should be the case.
>

But why? This is exactly what we discussed in another thread where we
agreed to update inactive_since even for sync slots. In each sync
cycle, we acquire/release the slot, so the inactive_since gets
updated. See synchronize_one_slot().

--
With Regards,
Amit Kapila.



Hi,

On Tue, Mar 26, 2024 at 01:37:21PM +0530, Amit Kapila wrote:
> On Tue, Mar 26, 2024 at 1:15 PM Bertrand Drouvot
> <bertranddrouvot.pg@gmail.com> wrote:
> >
> > 2 ===
> >
> > It looks like inactive_since is set to the current timestamp on the standby
> > each time the sync worker does a cycle:
> >
> > primary:
> >
> > postgres=# select slot_name,inactive_since from pg_replication_slots where failover = 't';
> >   slot_name  |        inactive_since
> > -------------+-------------------------------
> >  lsub27_slot | 2024-03-26 07:39:19.745517+00
> >  lsub28_slot | 2024-03-26 07:40:24.953826+00
> >
> > standby:
> >
> > postgres=# select slot_name,inactive_since from pg_replication_slots where failover = 't';
> >   slot_name  |        inactive_since
> > -------------+-------------------------------
> >  lsub27_slot | 2024-03-26 07:43:56.387324+00
> >  lsub28_slot | 2024-03-26 07:43:56.387338+00
> >
> > I don't think that should be the case.
> >
> 
> But why? This is exactly what we discussed in another thread where we
> agreed to update inactive_since even for sync slots.

Hum, I thought we agreed to "sync" it and to "update it to current time"
only at promotion time.

I don't think updating inactive_since to current time during each cycle makes
sense (I mean I understand the use case: being able to say when slots have been
sync, but if this is what we want then we should consider an extra view or an
extra field but not relying on the inactive_since one).

If the primary goes down, not updating inactive_since to the current time could
also provide benefit such as knowing the inactive_since of the primary slots
(from the standby) the last time it has been synced. If we update it to the current
time then this information is lost.

> In each sync
> cycle, we acquire/release the slot, so the inactive_since gets
> updated. See synchronize_one_slot().

Right, and I think we should put an extra condition if in recovery.

Regards,

-- 
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com



On Tue, Mar 26, 2024 at 11:26 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> Review comments on v18_0002 and v18_0005
> =======================================
>
> 1.
> We have decided to update inactive_since for temporary slots. So,
> unless there is some reason, we should allow inactive_timeout to also
> be set for temporary slots.

WFM. A temporary slot that's inactive for a long time before even the
server isn't shutdown can utilize this inactive_timeout based
invalidation mechanism. And, I'd also vote for we being consistent for
temporary and synced slots.

>              L.last_inactive_time,
> +            L.inactive_timeout,
>
> Shall we keep inactive_timeout before
> last_inactive_time/inactive_since? I don't have any strong reason to
> propose that way apart from that the former is provided by the user.

Done.

> + if (InvalidateReplicationSlotForInactiveTimeout(slot, false, true, true))
> + invalidated = true;
>
> I don't think we should try to invalidate the slots in
> pg_get_replication_slots. This function's purpose is to get the
> current information on slots and has no intention to perform any work
> for slots. Any error due to invalidation won't be what the user would
> be expecting here.

Agree. Removed.

> 4.
> +static bool
> +InvalidateSlotForInactiveTimeout(ReplicationSlot *slot,
> + bool need_control_lock,
> + bool need_mutex)
> {
> ...
> ...
> + if (need_control_lock)
> + LWLockAcquire(ReplicationSlotControlLock, LW_SHARED);
> +
> + Assert(LWLockHeldByMeInMode(ReplicationSlotControlLock, LW_SHARED));
> +
> + /*
> + * Check if the slot needs to be invalidated due to inactive_timeout. We
> + * do this with the spinlock held to avoid race conditions -- for example
> + * the restart_lsn could move forward, or the slot could be dropped.
> + */
> + if (need_mutex)
> + SpinLockAcquire(&slot->mutex);
> ...
>
> I find this combination of parameters a bit strange. Because, say if
> need_mutex is false and need_control_lock is true then that means this
> function will acquire LWlock after acquiring spinlock which is
> unacceptable. Now, this may not happen in practice as the callers
> won't pass such a combination but still, this functionality should be
> improved.

Right. Either we need two locks or not. So, changed it to use just one
bool need_locks, upon set both control lock and spin lock are acquired
and released.

On Mon, Mar 25, 2024 at 10:33 AM shveta malik <shveta.malik@gmail.com> wrote:
>
> patch 002:
>
> 2)
> slotsync.c:
>
>   ReplicationSlotCreate(remote_slot->name, true, RS_TEMPORARY,
>     remote_slot->two_phase,
>     remote_slot->failover,
> -   true);
> +   true, 0);
>
> + slot->data.inactive_timeout = remote_slot->inactive_timeout;
>
> Is there a reason we are not passing 'remote_slot->inactive_timeout'
> to ReplicationSlotCreate() directly?

The slot there gets created temporarily for which we were not
supporting inactive_timeout being set. But, in the latest v22 patch we
are supporting, so passing the remote_slot->inactive_timeout directly.

> 3)
> slotfuncs.c
> pg_create_logical_replication_slot():
> + int inactive_timeout = PG_GETARG_INT32(5);
>
> Can we mention here that timeout is in seconds either in comment or
> rename variable to inactive_timeout_secs?
>
> Please do this for create_physical_replication_slot(),
> create_logical_replication_slot(),
> pg_create_physical_replication_slot() as well.

Added /* in seconds */ next the variable declaration.

> ---------
> 4)
> + int inactive_timeout; /* The amount of time in seconds the slot
> + * is allowed to be inactive. */
>  } LogicalSlotInfo;
>
>  Do we need to mention "before getting invalided" like other places
> (in last patch)?

Done.

>  5)
> Same at these two places. "before getting invalided" to be added in
> the last patch otherwise the info is incompleted.
>
> +
> + /* The amount of time in seconds the slot is allowed to be inactive */
> + int inactive_timeout;
>  } ReplicationSlotPersistentData;
>
>
> + * inactive_timeout: The amount of time in seconds the slot is allowed to be
> + *     inactive.
>   */
>  void
>  ReplicationSlotCreate(const char *name, bool db_specific,
>  Same here. "before getting invalidated" ?

Done.

On Tue, Mar 26, 2024 at 12:04 PM shveta malik <shveta.malik@gmail.com> wrote:
>
> > Please find the attached v21 patch implementing the above idea. It
> > also has changes for renaming last_inactive_time to inactive_since.
>
> Thanks for the patch. I have tested this patch alone, and it does what
> it says. One additional thing which I noticed is that now it sets
> inactive_since for temp slots as well, but that idea looks fine to me.

Right. Let's be consistent by treating all slots the same.

> I could not test 'invalidation on promotion bug' with this change, as
> that needed rebasing of the rest of the patches.

Please use the v22 patch set.

> Few trivial things:
>
> 1)
> Commti msg:
>
> ensures the value is set to current timestamp during the
> shutdown to help correctly interpret the time if the standby gets
> promoted without a restart.
>
> shutdown --> shutdown of slot sync worker   (as it was not clear if it
> is instance shutdown or something else)

Changed it to "shutdown of slot sync machinery" to be consistent with
the comments.

> 2)
> 'The time since the slot has became inactive'.
>
> has became-->has become
> or just became
>
> Please check it in all the files. There are multiple places.

Fixed.

Please see the attached v23 patches. I've addressed all the review
comments received so far from Amit and Shveta.

--
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Attachment
On Tue, Mar 26, 2024 at 2:27 PM Bharath Rupireddy
<bharath.rupireddyforpostgres@gmail.com> wrote:
>
> >
> > 1)
> > Commti msg:
> >
> > ensures the value is set to current timestamp during the
> > shutdown to help correctly interpret the time if the standby gets
> > promoted without a restart.
> >
> > shutdown --> shutdown of slot sync worker   (as it was not clear if it
> > is instance shutdown or something else)
>
> Changed it to "shutdown of slot sync machinery" to be consistent with
> the comments.

Thanks for addressing the comments. Just to give more clarity here (so
that you take a informed decision), I am not sure if we actually shut
down slot-sync machinery. We only shot down slot sync worker.
Slot-sync machinery can still be used using
'pg_sync_replication_slots' SQL function. I can easily reproduce the
scenario where SQL function and  reset_synced_slots_info() are going
in parallel where the latter hits 'Assert(s->active_pid == 0)' due to
the fact that  parallel SQL sync function is active on that slot.

thanks
Shveta



Hi,

On Tue, Mar 26, 2024 at 02:27:17PM +0530, Bharath Rupireddy wrote:
> Please use the v22 patch set.

Thanks!

1 ===

+reset_synced_slots_info(void)

I'm not sure "reset" is the right word, what about slot_sync_shutdown_update()?

2 ===

+       for (int i = 0; i < max_replication_slots; i++)
+       {
+               ReplicationSlot *s = &ReplicationSlotCtl->replication_slots[i];
+
+               /* Check if it is a synchronized slot */
+               if (s->in_use && s->data.synced)
+               {
+                       TimestampTz now;
+
+                       Assert(SlotIsLogical(s));
+                       Assert(s->active_pid == 0);
+
+                       /*
+                        * Set the time since the slot has become inactive after shutting
+                        * down slot sync machinery. This helps correctly interpret the
+                        * time if the standby gets promoted without a restart. We get the
+                        * current time beforehand to avoid a system call while holding
+                        * the lock.
+                        */
+                       now = GetCurrentTimestamp();

What about moving "now = GetCurrentTimestamp()" outside of the for loop? (it
would be less costly and probably good enough).

Regards,

-- 
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com



On Tue, Mar 26, 2024 at 1:54 PM Bertrand Drouvot
<bertranddrouvot.pg@gmail.com> wrote:
>
> Hi,
>
> On Tue, Mar 26, 2024 at 01:37:21PM +0530, Amit Kapila wrote:
> > On Tue, Mar 26, 2024 at 1:15 PM Bertrand Drouvot
> > <bertranddrouvot.pg@gmail.com> wrote:
> > >
> > > 2 ===
> > >
> > > It looks like inactive_since is set to the current timestamp on the standby
> > > each time the sync worker does a cycle:
> > >
> > > primary:
> > >
> > > postgres=# select slot_name,inactive_since from pg_replication_slots where failover = 't';
> > >   slot_name  |        inactive_since
> > > -------------+-------------------------------
> > >  lsub27_slot | 2024-03-26 07:39:19.745517+00
> > >  lsub28_slot | 2024-03-26 07:40:24.953826+00
> > >
> > > standby:
> > >
> > > postgres=# select slot_name,inactive_since from pg_replication_slots where failover = 't';
> > >   slot_name  |        inactive_since
> > > -------------+-------------------------------
> > >  lsub27_slot | 2024-03-26 07:43:56.387324+00
> > >  lsub28_slot | 2024-03-26 07:43:56.387338+00
> > >
> > > I don't think that should be the case.
> > >
> >
> > But why? This is exactly what we discussed in another thread where we
> > agreed to update inactive_since even for sync slots.
>
> Hum, I thought we agreed to "sync" it and to "update it to current time"
> only at promotion time.

I think there may have been some misunderstanding here. But now if I
rethink this, I am fine with 'inactive_since' getting synced from
primary to standby. But if we do that, we need to add docs stating
"inactive_since" represents primary's inactivity and not standby's
slots inactivity for synced slots. The reason for this clarification
is that the synced slot might be generated much later, yet
'inactive_since' is synced from the primary, potentially indicating a
time considerably earlier than when the synced slot was actually
created.

Another approach could be that "inactive_since" for synced slot
actually gives its own inactivity data rather than giving primary's
slot data. We update inactive_since on standby only at 3 occasions:
1) at the time of creation of the synced slot.
2) during standby restart.
3) during promotion of standby.

I have attached a sample patch for this idea as.txt file.

I am fine with any of these approaches.  One gives data synced from
primary for synced slots, while another gives actual inactivity data
of synced slots.

thanks
Shveta

Attachment
Hi,

On Tue, Mar 26, 2024 at 03:17:36PM +0530, shveta malik wrote:
> On Tue, Mar 26, 2024 at 1:54 PM Bertrand Drouvot
> <bertranddrouvot.pg@gmail.com> wrote:
> >
> > Hi,
> >
> > On Tue, Mar 26, 2024 at 01:37:21PM +0530, Amit Kapila wrote:
> > > On Tue, Mar 26, 2024 at 1:15 PM Bertrand Drouvot
> > > <bertranddrouvot.pg@gmail.com> wrote:
> > > >
> > > > 2 ===
> > > >
> > > > It looks like inactive_since is set to the current timestamp on the standby
> > > > each time the sync worker does a cycle:
> > > >
> > > > primary:
> > > >
> > > > postgres=# select slot_name,inactive_since from pg_replication_slots where failover = 't';
> > > >   slot_name  |        inactive_since
> > > > -------------+-------------------------------
> > > >  lsub27_slot | 2024-03-26 07:39:19.745517+00
> > > >  lsub28_slot | 2024-03-26 07:40:24.953826+00
> > > >
> > > > standby:
> > > >
> > > > postgres=# select slot_name,inactive_since from pg_replication_slots where failover = 't';
> > > >   slot_name  |        inactive_since
> > > > -------------+-------------------------------
> > > >  lsub27_slot | 2024-03-26 07:43:56.387324+00
> > > >  lsub28_slot | 2024-03-26 07:43:56.387338+00
> > > >
> > > > I don't think that should be the case.
> > > >
> > >
> > > But why? This is exactly what we discussed in another thread where we
> > > agreed to update inactive_since even for sync slots.
> >
> > Hum, I thought we agreed to "sync" it and to "update it to current time"
> > only at promotion time.
> 
> I think there may have been some misunderstanding here.

Indeed ;-)

> But now if I
> rethink this, I am fine with 'inactive_since' getting synced from
> primary to standby. But if we do that, we need to add docs stating
> "inactive_since" represents primary's inactivity and not standby's
> slots inactivity for synced slots.

Yeah sure.

> The reason for this clarification
> is that the synced slot might be generated much later, yet
> 'inactive_since' is synced from the primary, potentially indicating a
> time considerably earlier than when the synced slot was actually
> created.

Right.

> Another approach could be that "inactive_since" for synced slot
> actually gives its own inactivity data rather than giving primary's
> slot data. We update inactive_since on standby only at 3 occasions:
> 1) at the time of creation of the synced slot.
> 2) during standby restart.
> 3) during promotion of standby.
> 
> I have attached a sample patch for this idea as.txt file.

Thanks!

> I am fine with any of these approaches.  One gives data synced from
> primary for synced slots, while another gives actual inactivity data
> of synced slots.

What about another approach?: inactive_since gives data synced from primary for
synced slots and another dedicated field (could be added later...) could
represent what you suggest as the other option.

Another cons of updating inactive_since at the current time during each slot
sync cycle is that calling GetCurrentTimestamp() very frequently
(during each sync cycle of very active slots) could be too costly.

Regards,

-- 
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com



On Tue, Mar 26, 2024 at 3:12 PM Bertrand Drouvot
<bertranddrouvot.pg@gmail.com> wrote:
>
> On Tue, Mar 26, 2024 at 02:27:17PM +0530, Bharath Rupireddy wrote:
> > Please use the v22 patch set.
>
> Thanks!
>
> 1 ===
>
> +reset_synced_slots_info(void)
>
> I'm not sure "reset" is the right word, what about slot_sync_shutdown_update()?
>

*shutdown_update() sounds generic. How about
update_synced_slots_inactive_time()? I think it is a bit longer but
conveys the meaning.

--
With Regards,
Amit Kapila.





On Tue, Mar 26, 2024 at 7:57 PM Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com> wrote:
Please see the attached v23 patches. I've addressed all the review
comments received so far from Amit and Shveta.


In patch 0003:
+ SpinLockAcquire(&slot->mutex);
+ }
+
+ Assert(LWLockHeldByMeInMode(ReplicationSlotControlLock, LW_SHARED));
+
+ if (slot->inactive_since > 0 &&
+ slot->data.inactive_timeout > 0)
+ {
+ TimestampTz now;
+
+ /* inactive_since is only tracked for inactive slots */
+ Assert(slot->active_pid == 0);
+
+ now = GetCurrentTimestamp();
+ if (TimestampDifferenceExceeds(slot->inactive_since, now,
+   slot->data.inactive_timeout * 1000))
+ inavidation_cause = RS_INVAL_INACTIVE_TIMEOUT;
+ }
+
+ if (need_locks)
+ {
+ SpinLockRelease(&slot->mutex);

Here, GetCurrentTimestamp() is still called with SpinLock held. Maybe do this prior to acquiring the spinlock.

regards,
Ajin Cherian
Fujitsu Australia
On Tue, Mar 26, 2024 at 3:50 PM Bertrand Drouvot
<bertranddrouvot.pg@gmail.com> wrote:
>
> Hi,
>
> > I think there may have been some misunderstanding here.
>
> Indeed ;-)
>
> > But now if I
> > rethink this, I am fine with 'inactive_since' getting synced from
> > primary to standby. But if we do that, we need to add docs stating
> > "inactive_since" represents primary's inactivity and not standby's
> > slots inactivity for synced slots.
>
> Yeah sure.
>
> > The reason for this clarification
> > is that the synced slot might be generated much later, yet
> > 'inactive_since' is synced from the primary, potentially indicating a
> > time considerably earlier than when the synced slot was actually
> > created.
>
> Right.
>
> > Another approach could be that "inactive_since" for synced slot
> > actually gives its own inactivity data rather than giving primary's
> > slot data. We update inactive_since on standby only at 3 occasions:
> > 1) at the time of creation of the synced slot.
> > 2) during standby restart.
> > 3) during promotion of standby.
> >
> > I have attached a sample patch for this idea as.txt file.
>
> Thanks!
>
> > I am fine with any of these approaches.  One gives data synced from
> > primary for synced slots, while another gives actual inactivity data
> > of synced slots.
>
> What about another approach?: inactive_since gives data synced from primary for
> synced slots and another dedicated field (could be added later...) could
> represent what you suggest as the other option.

Yes, okay with me. I think there is some confusion here as well. In my
second approach above, I have not suggested anything related to
sync-worker. We can think on that later if we really need another
field which give us sync time.  In my second approach, I have tried to
avoid updating inactive_since for synced slots during sync process. We
update that field during creation of synced slot so that
inactive_since reflects correct info even for synced slots (rather
than copying from primary). Please have a look at my patch and let me
know your thoughts. I am fine with copying it from primary as well and
documenting this behaviour.

> Another cons of updating inactive_since at the current time during each slot
> sync cycle is that calling GetCurrentTimestamp() very frequently
> (during each sync cycle of very active slots) could be too costly.

Right.

thanks
Shveta



On Tue, Mar 26, 2024 at 4:18 PM shveta malik <shveta.malik@gmail.com> wrote:
>
> > What about another approach?: inactive_since gives data synced from primary for
> > synced slots and another dedicated field (could be added later...) could
> > represent what you suggest as the other option.
>
> Yes, okay with me. I think there is some confusion here as well. In my
> second approach above, I have not suggested anything related to
> sync-worker. We can think on that later if we really need another
> field which give us sync time.  In my second approach, I have tried to
> avoid updating inactive_since for synced slots during sync process. We
> update that field during creation of synced slot so that
> inactive_since reflects correct info even for synced slots (rather
> than copying from primary). Please have a look at my patch and let me
> know your thoughts. I am fine with copying it from primary as well and
> documenting this behaviour.

I took a look at your patch.

--- a/src/backend/replication/logical/slotsync.c
+++ b/src/backend/replication/logical/slotsync.c
@@ -628,6 +628,7 @@ synchronize_one_slot(RemoteSlot *remote_slot, Oid
remote_dbid)
         SpinLockAcquire(&slot->mutex);
         slot->effective_catalog_xmin = xmin_horizon;
         slot->data.catalog_xmin = xmin_horizon;
+        slot->inactive_since = GetCurrentTimestamp();
         SpinLockRelease(&slot->mutex);

If we just sync inactive_since value for synced slots while in
recovery from the primary, so be it. Why do we need to update it to
the current time when the slot is being created? We don't expose slot
creation time, no? Aren't we fine if we just sync the value from
primary and document that fact? After the promotion, we can reset it
to the current time so that it gets its own time. Do you see any
issues with it?

--
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com



On Tue, Mar 26, 2024 at 4:35 PM Bharath Rupireddy
<bharath.rupireddyforpostgres@gmail.com> wrote:
>
> On Tue, Mar 26, 2024 at 4:18 PM shveta malik <shveta.malik@gmail.com> wrote:
> >
> > > What about another approach?: inactive_since gives data synced from primary for
> > > synced slots and another dedicated field (could be added later...) could
> > > represent what you suggest as the other option.
> >
> > Yes, okay with me. I think there is some confusion here as well. In my
> > second approach above, I have not suggested anything related to
> > sync-worker. We can think on that later if we really need another
> > field which give us sync time.  In my second approach, I have tried to
> > avoid updating inactive_since for synced slots during sync process. We
> > update that field during creation of synced slot so that
> > inactive_since reflects correct info even for synced slots (rather
> > than copying from primary). Please have a look at my patch and let me
> > know your thoughts. I am fine with copying it from primary as well and
> > documenting this behaviour.
>
> I took a look at your patch.
>
> --- a/src/backend/replication/logical/slotsync.c
> +++ b/src/backend/replication/logical/slotsync.c
> @@ -628,6 +628,7 @@ synchronize_one_slot(RemoteSlot *remote_slot, Oid
> remote_dbid)
>          SpinLockAcquire(&slot->mutex);
>          slot->effective_catalog_xmin = xmin_horizon;
>          slot->data.catalog_xmin = xmin_horizon;
> +        slot->inactive_since = GetCurrentTimestamp();
>          SpinLockRelease(&slot->mutex);
>
> If we just sync inactive_since value for synced slots while in
> recovery from the primary, so be it. Why do we need to update it to
> the current time when the slot is being created?

If we update inactive_since  at synced slot's creation or during
restart (skipping setting it during sync), then this time reflects
actual 'inactive_since' for that particular synced slot.  Isn't that a
clear info for the user and in alignment of what the name
'inactive_since' actually suggests?

> We don't expose slot
> creation time, no?

No, we don't. But for synced slot, that is the time since that slot is
inactive  (unless promoted), so we are exposing inactive_since and not
creation time.

>Aren't we fine if we just sync the value from
> primary and document that fact? After the promotion, we can reset it
> to the current time so that it gets its own time. Do you see any
> issues with it?

Yes, we can do that. But curious to know, do we see any additional
benefit of reflecting primary's inactive_since at standby which I
might be missing?

thanks
Shveta



Hi,

On Tue, Mar 26, 2024 at 04:49:18PM +0530, shveta malik wrote:
> On Tue, Mar 26, 2024 at 4:35 PM Bharath Rupireddy
> <bharath.rupireddyforpostgres@gmail.com> wrote:
> >
> > On Tue, Mar 26, 2024 at 4:18 PM shveta malik <shveta.malik@gmail.com> wrote:
> > >
> > > > What about another approach?: inactive_since gives data synced from primary for
> > > > synced slots and another dedicated field (could be added later...) could
> > > > represent what you suggest as the other option.
> > >
> > > Yes, okay with me. I think there is some confusion here as well. In my
> > > second approach above, I have not suggested anything related to
> > > sync-worker. We can think on that later if we really need another
> > > field which give us sync time.  In my second approach, I have tried to
> > > avoid updating inactive_since for synced slots during sync process. We
> > > update that field during creation of synced slot so that
> > > inactive_since reflects correct info even for synced slots (rather
> > > than copying from primary). Please have a look at my patch and let me
> > > know your thoughts. I am fine with copying it from primary as well and
> > > documenting this behaviour.
> >
> > I took a look at your patch.
> >
> > --- a/src/backend/replication/logical/slotsync.c
> > +++ b/src/backend/replication/logical/slotsync.c
> > @@ -628,6 +628,7 @@ synchronize_one_slot(RemoteSlot *remote_slot, Oid
> > remote_dbid)
> >          SpinLockAcquire(&slot->mutex);
> >          slot->effective_catalog_xmin = xmin_horizon;
> >          slot->data.catalog_xmin = xmin_horizon;
> > +        slot->inactive_since = GetCurrentTimestamp();
> >          SpinLockRelease(&slot->mutex);
> >
> > If we just sync inactive_since value for synced slots while in
> > recovery from the primary, so be it. Why do we need to update it to
> > the current time when the slot is being created?
> 
> If we update inactive_since  at synced slot's creation or during
> restart (skipping setting it during sync), then this time reflects
> actual 'inactive_since' for that particular synced slot.  Isn't that a
> clear info for the user and in alignment of what the name
> 'inactive_since' actually suggests?
> 
> > We don't expose slot
> > creation time, no?
> 
> No, we don't. But for synced slot, that is the time since that slot is
> inactive  (unless promoted), so we are exposing inactive_since and not
> creation time.
> 
> >Aren't we fine if we just sync the value from
> > primary and document that fact? After the promotion, we can reset it
> > to the current time so that it gets its own time. Do you see any
> > issues with it?
> 
> Yes, we can do that. But curious to know, do we see any additional
> benefit of reflecting primary's inactive_since at standby which I
> might be missing?

In case the primary goes down, then one could use the value on the standby
to get the value coming from the primary. I think that could be useful info to
have.

Regards,

-- 
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com



Hi,

On Tue, Mar 26, 2024 at 09:59:23PM +0530, Bharath Rupireddy wrote:
> On Tue, Mar 26, 2024 at 4:35 PM Bharath Rupireddy
> <bharath.rupireddyforpostgres@gmail.com> wrote:
> >
> > If we just sync inactive_since value for synced slots while in
> > recovery from the primary, so be it. Why do we need to update it to
> > the current time when the slot is being created? We don't expose slot
> > creation time, no? Aren't we fine if we just sync the value from
> > primary and document that fact? After the promotion, we can reset it
> > to the current time so that it gets its own time.
> 
> I'm attaching v24 patches. It implements the above idea proposed
> upthread for synced slots. I've now separated
> s/last_inactive_time/inactive_since and synced slots behaviour. Please
> have a look.

Thanks!

==== v24-0001

It's now pure mechanical changes and it looks good to me.

==== v24-0002

1 ===

    This commit does two things:
    1) Updates inactive_since for sync slots with the value
    received from the primary's slot.

Tested it and it does that.

2 ===

    2) Ensures the value is set to current timestamp during the
    shutdown of slot sync machinery to help correctly interpret the
    time if the standby gets promoted without a restart.

Tested it and it does that.

3 ===

+/*
+ * Reset the synced slots info such as inactive_since after shutting
+ * down the slot sync machinery.
+ */
+static void
+update_synced_slots_inactive_time(void)

Looks like the comment "reset" is not matching the name of the function and
what it does.

4 ===

+                       /*
+                        * We get the current time beforehand and only once to avoid
+                        * system calls overhead while holding the lock.
+                        */
+                       if (now == 0)
+                               now = GetCurrentTimestamp();

Also +1 of having GetCurrentTimestamp() just called one time within the loop.

5 ===

-               if (!(RecoveryInProgress() && slot->data.synced))
+               if (!(InRecovery && slot->data.synced))
                        slot->inactive_since = GetCurrentTimestamp();
                else
                        slot->inactive_since = 0;

Not related to this change but more the way RestoreSlotFromDisk() behaves here:

For a sync slot on standby it will be set to zero and then later will be
synchronized with the one coming from the primary. I think that's fine to have
it to zero for this window of time.

Now, if the standby is down and one sets sync_replication_slots to off,
then inactive_since will be set to zero on the standby at startup and not 
synchronized (unless one triggers a manual sync). I also think that's fine but
it might be worth to document this behavior (that after a standby startup
inactive_since is zero until the next sync...). 

6 ===

+       print "HI  $slot_name $name $inactive_since $slot_creation_time\n";

garbage?

7 ===

+# Capture and validate inactive_since of a given slot.
+sub capture_and_validate_slot_inactive_since
+{
+       my ($node, $slot_name, $slot_creation_time) = @_;
+       my $name = $node->name;

We know have capture_and_validate_slot_inactive_since at 2 places:
040_standby_failover_slots_sync.pl and 019_replslot_limit.pl.

Worth to create a sub in Cluster.pm?

Regards,

-- 
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com



On Tue, Mar 26, 2024 at 9:59 PM Bharath Rupireddy
<bharath.rupireddyforpostgres@gmail.com> wrote:
>
> On Tue, Mar 26, 2024 at 4:35 PM Bharath Rupireddy
> <bharath.rupireddyforpostgres@gmail.com> wrote:
> >
> > If we just sync inactive_since value for synced slots while in
> > recovery from the primary, so be it. Why do we need to update it to
> > the current time when the slot is being created? We don't expose slot
> > creation time, no? Aren't we fine if we just sync the value from
> > primary and document that fact? After the promotion, we can reset it
> > to the current time so that it gets its own time.
>
> I'm attaching v24 patches. It implements the above idea proposed
> upthread for synced slots. I've now separated
> s/last_inactive_time/inactive_since and synced slots behaviour. Please
> have a look.

Thanks for the patches. Few trivial comments for v24-002:

1)
slot.c:
+ * data from the remote slot. We use InRecovery flag instead of
+ * RecoveryInProgress() as it always returns true even for normal
+ * server startup.

a) Not clear what 'it' refers to. Better to use 'the latter'
b) Is it better to mention the primary here:
 'as the latter always returns true even on the primary server during startup'.


2)
update_local_synced_slot():

- strcmp(remote_slot->plugin, NameStr(slot->data.plugin)) == 0)
+ strcmp(remote_slot->plugin, NameStr(slot->data.plugin)) == 0 &&
+ remote_slot->inactive_since == slot->inactive_since)

When this code was written initially, the intent was to do strcmp at
the end (only if absolutely needed). It will be good if we maintain
the same and add new checks before strcmp.

3)
update_synced_slots_inactive_time():

This assert is removed, is it intentional?
Assert(s->active_pid == 0);


4)
040_standby_failover_slots_sync.pl:

+# Capture the inactive_since of the slot from the standby the logical failover
+# slots are synced/created on the standby.

The comment is unclear, something seems missing.

thanks
Shveta



On Tue, Mar 26, 2024 at 11:22 PM Bertrand Drouvot
<bertranddrouvot.pg@gmail.com> wrote:
>
> > I'm attaching v24 patches. It implements the above idea proposed
> > upthread for synced slots.
>
> ==== v24-0002
>
> 1 ===
>
>     This commit does two things:
>     1) Updates inactive_since for sync slots with the value
>     received from the primary's slot.
>
> Tested it and it does that.

Thanks. I've added a test case for this.

> 2 ===
>
>     2) Ensures the value is set to current timestamp during the
>     shutdown of slot sync machinery to help correctly interpret the
>     time if the standby gets promoted without a restart.
>
> Tested it and it does that.

Thanks. I've added a test case for this.

> 3 ===
>
> +/*
> + * Reset the synced slots info such as inactive_since after shutting
> + * down the slot sync machinery.
> + */
> +static void
> +update_synced_slots_inactive_time(void)
>
> Looks like the comment "reset" is not matching the name of the function and
> what it does.

Changed. I've also changed the function name to
update_synced_slots_inactive_since to be precise on what it exactly
does.

> 4 ===
>
> +                       /*
> +                        * We get the current time beforehand and only once to avoid
> +                        * system calls overhead while holding the lock.
> +                        */
> +                       if (now == 0)
> +                               now = GetCurrentTimestamp();
>
> Also +1 of having GetCurrentTimestamp() just called one time within the loop.

Right.

> 5 ===
>
> -               if (!(RecoveryInProgress() && slot->data.synced))
> +               if (!(InRecovery && slot->data.synced))
>                         slot->inactive_since = GetCurrentTimestamp();
>                 else
>                         slot->inactive_since = 0;
>
> Not related to this change but more the way RestoreSlotFromDisk() behaves here:
>
> For a sync slot on standby it will be set to zero and then later will be
> synchronized with the one coming from the primary. I think that's fine to have
> it to zero for this window of time.

Right.

> Now, if the standby is down and one sets sync_replication_slots to off,
> then inactive_since will be set to zero on the standby at startup and not
> synchronized (unless one triggers a manual sync). I also think that's fine but
> it might be worth to document this behavior (that after a standby startup
> inactive_since is zero until the next sync...).

Isn't this behaviour applicable for other slot parameters that the
slot syncs from the remote slot on the primary?

I've added the following note in the comments when we update
inactive_since in RestoreSlotFromDisk.

         * Note that for synced slots after the standby starts up (i.e. after
         * the slots are loaded from the disk), the inactive_since will remain
         * zero until the next slot sync cycle.
         */
        if (!(InRecovery && slot->data.synced))
            slot->inactive_since = GetCurrentTimestamp();
        else
            slot->inactive_since = 0;

> 6 ===
>
> +       print "HI  $slot_name $name $inactive_since $slot_creation_time\n";
>
> garbage?

Removed.

> 7 ===
>
> +# Capture and validate inactive_since of a given slot.
> +sub capture_and_validate_slot_inactive_since
> +{
> +       my ($node, $slot_name, $slot_creation_time) = @_;
> +       my $name = $node->name;
>
> We know have capture_and_validate_slot_inactive_since at 2 places:
> 040_standby_failover_slots_sync.pl and 019_replslot_limit.pl.
>
> Worth to create a sub in Cluster.pm?

I'd second that thought for now. We might have to debate first if it's
useful for all the nodes even without replication, and if yes, the
naming stuff and all that. Historically, we've had such duplicated
functions until recently, for instance advance_wal and log_contains.
We
moved them over to a common perl library Cluster.pm very recently. I'm
sure we can come back later to move it to Cluster.pm.

On Wed, Mar 27, 2024 at 9:02 AM shveta malik <shveta.malik@gmail.com> wrote:
>
> 1)
> slot.c:
> + * data from the remote slot. We use InRecovery flag instead of
> + * RecoveryInProgress() as it always returns true even for normal
> + * server startup.
>
> a) Not clear what 'it' refers to. Better to use 'the latter'
> b) Is it better to mention the primary here:
>  'as the latter always returns true even on the primary server during startup'.

Modified.

> 2)
> update_local_synced_slot():
>
> - strcmp(remote_slot->plugin, NameStr(slot->data.plugin)) == 0)
> + strcmp(remote_slot->plugin, NameStr(slot->data.plugin)) == 0 &&
> + remote_slot->inactive_since == slot->inactive_since)
>
> When this code was written initially, the intent was to do strcmp at
> the end (only if absolutely needed). It will be good if we maintain
> the same and add new checks before strcmp.

Done.

> 3)
> update_synced_slots_inactive_time():
>
> This assert is removed, is it intentional?
> Assert(s->active_pid == 0);

Yes, the slot can get acquired in the corner case when someone runs
pg_sync_replication_slots concurrently at this time. I'm referring to
the issue reported upthread. We don't prevent one running
pg_sync_replication_slots in promotion/ShutDownSlotSync phase right?
Maybe we should prevent that otherwise some of the slots are synced
and the standby gets promoted while others are yet-to-be-synced.

> 4)
> 040_standby_failover_slots_sync.pl:
>
> +# Capture the inactive_since of the slot from the standby the logical failover
> +# slots are synced/created on the standby.
>
> The comment is unclear, something seems missing.

Nice catch. Yes, that was wrong. I've modified it now.

Please find the attached v25-0001 (made this 0001 patch now as
inactive_since patch is committed) patch with the above changes.
--
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Attachment
On Wed, Mar 27, 2024 at 10:08 AM Bharath Rupireddy
<bharath.rupireddyforpostgres@gmail.com> wrote:
>
> On Tue, Mar 26, 2024 at 11:22 PM Bertrand Drouvot
> <bertranddrouvot.pg@gmail.com> wrote:
>
> > 3)
> > update_synced_slots_inactive_time():
> >
> > This assert is removed, is it intentional?
> > Assert(s->active_pid == 0);
>
> Yes, the slot can get acquired in the corner case when someone runs
> pg_sync_replication_slots concurrently at this time. I'm referring to
> the issue reported upthread. We don't prevent one running
> pg_sync_replication_slots in promotion/ShutDownSlotSync phase right?
> Maybe we should prevent that otherwise some of the slots are synced
> and the standby gets promoted while others are yet-to-be-synced.
>

We should do something about it but that shouldn't be done in this
patch. We can handle it separately and then add such an assert.

--
With Regards,
Amit Kapila.



On Wed, Mar 27, 2024 at 10:22 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> On Wed, Mar 27, 2024 at 10:08 AM Bharath Rupireddy
> <bharath.rupireddyforpostgres@gmail.com> wrote:
> >
> > On Tue, Mar 26, 2024 at 11:22 PM Bertrand Drouvot
> > <bertranddrouvot.pg@gmail.com> wrote:
> >
> > > 3)
> > > update_synced_slots_inactive_time():
> > >
> > > This assert is removed, is it intentional?
> > > Assert(s->active_pid == 0);
> >
> > Yes, the slot can get acquired in the corner case when someone runs
> > pg_sync_replication_slots concurrently at this time. I'm referring to
> > the issue reported upthread. We don't prevent one running
> > pg_sync_replication_slots in promotion/ShutDownSlotSync phase right?
> > Maybe we should prevent that otherwise some of the slots are synced
> > and the standby gets promoted while others are yet-to-be-synced.
> >
>
> We should do something about it but that shouldn't be done in this
> patch. We can handle it separately and then add such an assert.

Agreed. Once this patch is concluded, I can fix the slot sync shutdown
issue and will also add this 'assert' back.

thanks
Shveta



On Wed, Mar 27, 2024 at 10:24 AM shveta malik <shveta.malik@gmail.com> wrote:
>
> On Wed, Mar 27, 2024 at 10:22 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
> >
> > On Wed, Mar 27, 2024 at 10:08 AM Bharath Rupireddy
> > <bharath.rupireddyforpostgres@gmail.com> wrote:
> > >
> > > On Tue, Mar 26, 2024 at 11:22 PM Bertrand Drouvot
> > > <bertranddrouvot.pg@gmail.com> wrote:
> > >
> > > > 3)
> > > > update_synced_slots_inactive_time():
> > > >
> > > > This assert is removed, is it intentional?
> > > > Assert(s->active_pid == 0);
> > >
> > > Yes, the slot can get acquired in the corner case when someone runs
> > > pg_sync_replication_slots concurrently at this time. I'm referring to
> > > the issue reported upthread. We don't prevent one running
> > > pg_sync_replication_slots in promotion/ShutDownSlotSync phase right?
> > > Maybe we should prevent that otherwise some of the slots are synced
> > > and the standby gets promoted while others are yet-to-be-synced.
> > >
> >
> > We should do something about it but that shouldn't be done in this
> > patch. We can handle it separately and then add such an assert.
>
> Agreed. Once this patch is concluded, I can fix the slot sync shutdown
> issue and will also add this 'assert' back.

Agreed. Thanks.

--
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com



On Tue, Mar 26, 2024 at 6:05 PM Bertrand Drouvot
<bertranddrouvot.pg@gmail.com> wrote:
>
>
> > We can think on that later if we really need another
> > field which give us sync time.
>
> I think that calling GetCurrentTimestamp() so frequently could be too costly, so
> I'm not sure we should.

Agreed.

> > In my second approach, I have tried to
> > avoid updating inactive_since for synced slots during sync process. We
> > update that field during creation of synced slot so that
> > inactive_since reflects correct info even for synced slots (rather
> > than copying from primary).
>
> Yeah, and I think we could create a dedicated field with this information
> if we feel the need.

Okay.

thanks
Shveta



On Wed, Mar 27, 2024 at 10:08 AM Bharath Rupireddy
<bharath.rupireddyforpostgres@gmail.com> wrote:
>
> Please find the attached v25-0001 (made this 0001 patch now as
> inactive_since patch is committed) patch with the above changes.

Fixed an issue in synchronize_slots where DatumGetLSN is being used in
place of DatumGetTimestampTz. Found this via CF bot member [1], not on
my dev system.

Please find the attached v6 patch.


[1]
[05:14:39.281] #7  DatumGetLSN (X=<optimized out>) at
../src/include/utils/pg_lsn.h:24
[05:14:39.281] No locals.
[05:14:39.281] #8  synchronize_slots (wrconn=wrconn@entry=0x583cd170)
at ../src/backend/replication/logical/slotsync.c:757
[05:14:39.281]         isnull = false
[05:14:39.281]         remote_slot = 0x583ce1a8
[05:14:39.281]         d = <optimized out>
[05:14:39.281]         col = 10
[05:14:39.281]         slotRow = {25, 25, 3220, 3220, 28, 16, 16, 25, 25, 1184}
[05:14:39.281]         res = 0x583cd1b8
[05:14:39.281]         tupslot = 0x583ce11c
[05:14:39.281]         remote_slot_list = 0x0
[05:14:39.281]         some_slot_updated = false
[05:14:39.281]         started_tx = false
[05:14:39.281]         query = 0x57692bc4 "SELECT slot_name, plugin,
confirmed_flush_lsn, restart_lsn, catalog_xmin, two_phase, failover,
database, invalidation_reason, inactive_since FROM
pg_catalog.pg_replication_slots WHERE failover and NOT"...
[05:14:39.281]         __func__ = "synchronize_slots"
[05:14:39.281] #9  0x56ff9d1e in SyncReplicationSlots
(wrconn=0x583cd170) at
../src/backend/replication/logical/slotsync.c:1504

--
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Attachment
Hi,

On Wed, Mar 27, 2024 at 10:08:33AM +0530, Bharath Rupireddy wrote:
> On Tue, Mar 26, 2024 at 11:22 PM Bertrand Drouvot
> <bertranddrouvot.pg@gmail.com> wrote:
> >
> > -               if (!(RecoveryInProgress() && slot->data.synced))
> > +               if (!(InRecovery && slot->data.synced))
> >                         slot->inactive_since = GetCurrentTimestamp();
> >                 else
> >                         slot->inactive_since = 0;
> >
> > Not related to this change but more the way RestoreSlotFromDisk() behaves here:
> >
> > For a sync slot on standby it will be set to zero and then later will be
> > synchronized with the one coming from the primary. I think that's fine to have
> > it to zero for this window of time.
> 
> Right.
> 
> > Now, if the standby is down and one sets sync_replication_slots to off,
> > then inactive_since will be set to zero on the standby at startup and not
> > synchronized (unless one triggers a manual sync). I also think that's fine but
> > it might be worth to document this behavior (that after a standby startup
> > inactive_since is zero until the next sync...).
> 
> Isn't this behaviour applicable for other slot parameters that the
> slot syncs from the remote slot on the primary?

No they are persisted on disk. If not, we'd not know where to resume the decoding
from on the standby in case primary is down and/or sync is off.

> I've added the following note in the comments when we update
> inactive_since in RestoreSlotFromDisk.
> 
>          * Note that for synced slots after the standby starts up (i.e. after
>          * the slots are loaded from the disk), the inactive_since will remain
>          * zero until the next slot sync cycle.
>          */
>         if (!(InRecovery && slot->data.synced))
>             slot->inactive_since = GetCurrentTimestamp();
>         else
>             slot->inactive_since = 0;

I think we should add some words in the doc too and also about what the meaning
of inactive_since on the standby is (as suggested by Shveta in [1]).

[1]:
https://www.postgresql.org/message-id/CAJpy0uDkTW%2Bt1k3oPkaipFBzZePfFNB5DmiA%3D%3DpxRGcAdpF%3DPg%40mail.gmail.com

> > 7 ===
> >
> > +# Capture and validate inactive_since of a given slot.
> > +sub capture_and_validate_slot_inactive_since
> > +{
> > +       my ($node, $slot_name, $slot_creation_time) = @_;
> > +       my $name = $node->name;
> >
> > We know have capture_and_validate_slot_inactive_since at 2 places:
> > 040_standby_failover_slots_sync.pl and 019_replslot_limit.pl.
> >
> > Worth to create a sub in Cluster.pm?
> 
> I'd second that thought for now. We might have to debate first if it's
> useful for all the nodes even without replication, and if yes, the
> naming stuff and all that. Historically, we've had such duplicated
> functions until recently, for instance advance_wal and log_contains.
> We
> moved them over to a common perl library Cluster.pm very recently. I'm
> sure we can come back later to move it to Cluster.pm.

I thought that would be the right time not to introduce duplicated code.

Regards,

-- 
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com



On Wed, Mar 27, 2024 at 11:05 AM Bharath Rupireddy
<bharath.rupireddyforpostgres@gmail.com> wrote:
>
> Fixed an issue in synchronize_slots where DatumGetLSN is being used in
> place of DatumGetTimestampTz. Found this via CF bot member [1], not on
> my dev system.
>
> Please find the attached v6 patch.

Thanks for the patch. Few trivial things:

----------
1)
system-views.sgml:

a) "Note that the slots" --> "Note that the slots on the standbys,"
--it is good to mention "standbys" as synced could be true on primary
as well (promoted standby)

b) If you plan to add more info which Bertrand suggested, then it will
be better to make a <note> section instead of using "Note"

2)
commit msg:

"The impact of this
on a promoted standby inactive_since is always NULL for all
synced slots even after server restart.
"
Sentence looks broken.
---------

Apart from the above trivial things, v26-001 looks good to me.

thanks
Shveta



On Wed, Mar 27, 2024 at 11:39 AM shveta malik <shveta.malik@gmail.com> wrote:
>
> Thanks for the patch. Few trivial things:

Thanks for reviewing.

> ----------
> 1)
> system-views.sgml:
>
> a) "Note that the slots" --> "Note that the slots on the standbys,"
> --it is good to mention "standbys" as synced could be true on primary
> as well (promoted standby)

Done.

> b) If you plan to add more info which Bertrand suggested, then it will
> be better to make a <note> section instead of using "Note"

I added the note that Bertrand specified upthread. But, I couldn't
find an instance of adding <note> ... </note> within a table. Hence
with "Note that ...." statments just like any other notes in the
system-views.sgml. pg_replication_slot in system-vews.sgml renders as
table, so having <note> ... </note> may not be a great idea.

> 2)
> commit msg:
>
> "The impact of this
> on a promoted standby inactive_since is always NULL for all
> synced slots even after server restart.
> "
> Sentence looks broken.
> ---------

Reworded.

> Apart from the above trivial things, v26-001 looks good to me.

Please check the attached v27 patch which also has Bertrand's comment
on deduplicating the TAP function. I've now moved it to Cluster.pm.

--
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Attachment
Hi,

On Wed, Mar 27, 2024 at 02:55:17PM +0530, Bharath Rupireddy wrote:
> Please check the attached v27 patch which also has Bertrand's comment
> on deduplicating the TAP function. I've now moved it to Cluster.pm.

Thanks!

1 ===

+        Note that the slots on the standbys that are being synced from a
+        primary server (whose <structfield>synced</structfield> field is
+        <literal>true</literal>), will get the
+        <structfield>inactive_since</structfield> value from the
+        corresponding remote slot on the primary. Also, note that for the
+        synced slots on the standby, after the standby starts up (i.e. after
+        the slots are loaded from the disk), the inactive_since will remain
+        zero until the next slot sync cycle.

Not sure we should mention the "(i.e. after the slots are loaded from the disk)"
and also "cycle" (as that does not sound right in case of manual sync).

My proposal (in text) but feel free to reword it:

Note that the slots on the standbys that are being synced from a
primary server (whose synced field is true), will get the inactive_since value
from the corresponding remote slot on the primary. Also, after the standby starts
up, the inactive_since (for such synced slots) will remain zero until the next
synchronization.

2 ===

+=item $node->create_logical_slot_on_standby(self, primary, slot_name, dbname)

get_slot_inactive_since_value instead?

3 ===

+against given reference time.

s/given reference/optional given reference/?


Apart from the above, LGTM.

Regards,

-- 
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com



On Wed, Mar 27, 2024 at 2:55 PM Bharath Rupireddy
<bharath.rupireddyforpostgres@gmail.com> wrote:
>
> On Wed, Mar 27, 2024 at 11:39 AM shveta malik <shveta.malik@gmail.com> wrote:
> >
> > Thanks for the patch. Few trivial things:
>
> Thanks for reviewing.
>
> > ----------
> > 1)
> > system-views.sgml:
> >
> > a) "Note that the slots" --> "Note that the slots on the standbys,"
> > --it is good to mention "standbys" as synced could be true on primary
> > as well (promoted standby)
>
> Done.
>
> > b) If you plan to add more info which Bertrand suggested, then it will
> > be better to make a <note> section instead of using "Note"
>
> I added the note that Bertrand specified upthread. But, I couldn't
> find an instance of adding <note> ... </note> within a table. Hence
> with "Note that ...." statments just like any other notes in the
> system-views.sgml. pg_replication_slot in system-vews.sgml renders as
> table, so having <note> ... </note> may not be a great idea.
>
> > 2)
> > commit msg:
> >
> > "The impact of this
> > on a promoted standby inactive_since is always NULL for all
> > synced slots even after server restart.
> > "
> > Sentence looks broken.
> > ---------
>
> Reworded.
>
> > Apart from the above trivial things, v26-001 looks good to me.
>
> Please check the attached v27 patch which also has Bertrand's comment
> on deduplicating the TAP function. I've now moved it to Cluster.pm.
>

Thanks for the patch. Regarding doc, I have few comments.

+        Note that the slots on the standbys that are being synced from a
+        primary server (whose <structfield>synced</structfield> field is
+        <literal>true</literal>), will get the
+        <structfield>inactive_since</structfield> value from the
+        corresponding remote slot on the primary. Also, note that for the
+        synced slots on the standby, after the standby starts up (i.e. after
+        the slots are loaded from the disk), the inactive_since will remain
+        zero until the next slot sync cycle.

a)  "inactive_since will remain  zero"
Since it is user exposed info and the user finds it NULL in
pg_replication_slots, shall we mention NULL instead of 0?

b) Since we are referring to the sync cycle here, I feel it will be
good to give a link to that page.
+        zero until the next slot sync cycle (see
+        <xref linkend="logicaldecoding-replication-slots-synchronization"/> for
+        slot synchronization details).

thanks
Shveta



On Wed, Mar 27, 2024 at 3:42 PM Bertrand Drouvot
<bertranddrouvot.pg@gmail.com> wrote:
>
> 1 ===
>
> My proposal (in text) but feel free to reword it:
>
> Note that the slots on the standbys that are being synced from a
> primary server (whose synced field is true), will get the inactive_since value
> from the corresponding remote slot on the primary. Also, after the standby starts
> up, the inactive_since (for such synced slots) will remain zero until the next
> synchronization.

WFM.

> 2 ===
>
> +=item $node->create_logical_slot_on_standby(self, primary, slot_name, dbname)
>
> get_slot_inactive_since_value instead?

Ugh. Changed.

> 3 ===
>
> +against given reference time.
>
> s/given reference/optional given reference/?

Done.

> Apart from the above, LGTM.

Thanks for reviewing.

On Wed, Mar 27, 2024 at 3:43 PM shveta malik <shveta.malik@gmail.com> wrote:
>
> Thanks for the patch. Regarding doc, I have few comments.

Thanks for reviewing.

> a)  "inactive_since will remain  zero"
> Since it is user exposed info and the user finds it NULL in
> pg_replication_slots, shall we mention NULL instead of 0?

Right. Changed.

> b) Since we are referring to the sync cycle here, I feel it will be
> good to give a link to that page.
> +        zero until the next slot sync cycle (see
> +        <xref linkend="logicaldecoding-replication-slots-synchronization"/> for
> +        slot synchronization details).

WFM.

Please see the attached v28 patch.

--
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Attachment
Hi,

On Wed, Mar 27, 2024 at 05:55:05PM +0530, Bharath Rupireddy wrote:
> On Wed, Mar 27, 2024 at 3:42 PM Bertrand Drouvot
> Please see the attached v28 patch.

Thanks!

1 === sorry I missed it in the previous review

        if (!(RecoveryInProgress() && slot->data.synced))
+       {
                now = GetCurrentTimestamp();
+               update_inactive_since = true;
+       }
+       else
+               update_inactive_since = false;

I think update_inactive_since is not needed, we could rely on (now > 0) instead.

2 ===

+=item $node->get_slot_inactive_since_value(self, primary, slot_name, dbname)
+
+Get inactive_since column value for a given replication slot validating it
+against optional reference time.
+
+=cut
+
+sub get_slot_inactive_since_value
+{

shouldn't be "=item $node->get_slot_inactive_since_value(self, slot_name, reference_time)"
instead?

Apart from the above, LGTM.

Regards,

-- 
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com



On Wed, Mar 27, 2024 at 6:54 PM Bertrand Drouvot
<bertranddrouvot.pg@gmail.com> wrote:
>
> Hi,
>
> On Wed, Mar 27, 2024 at 05:55:05PM +0530, Bharath Rupireddy wrote:
> > On Wed, Mar 27, 2024 at 3:42 PM Bertrand Drouvot
> > Please see the attached v28 patch.
>
> Thanks!
>
> 1 === sorry I missed it in the previous review
>
>         if (!(RecoveryInProgress() && slot->data.synced))
> +       {
>                 now = GetCurrentTimestamp();
> +               update_inactive_since = true;
> +       }
> +       else
> +               update_inactive_since = false;
>
> I think update_inactive_since is not needed, we could rely on (now > 0) instead.

Thought of using it, but, at the expense of readability. I prefer to
use a variable instead. However, I changed the variable to be more
meaningful to is_slot_being_synced.

> 2 ===
>
> +=item $node->get_slot_inactive_since_value(self, primary, slot_name, dbname)
> +
> +Get inactive_since column value for a given replication slot validating it
> +against optional reference time.
> +
> +=cut
> +
> +sub get_slot_inactive_since_value
> +{
>
> shouldn't be "=item $node->get_slot_inactive_since_value(self, slot_name, reference_time)"
> instead?

Ugh. Changed.

> Apart from the above, LGTM.

Thanks. I'm attaching v29 patches. 0001 managing inactive_since on the
standby for sync slots. 0002 implementing inactive timeout GUC based
invalidation mechanism.

Please have a look.

--
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Attachment
Hi,

On Wed, Mar 27, 2024 at 09:00:37PM +0530, Bharath Rupireddy wrote:
> On Wed, Mar 27, 2024 at 6:54 PM Bertrand Drouvot
> <bertranddrouvot.pg@gmail.com> wrote:
> >
> > Hi,
> >
> > On Wed, Mar 27, 2024 at 05:55:05PM +0530, Bharath Rupireddy wrote:
> > > On Wed, Mar 27, 2024 at 3:42 PM Bertrand Drouvot
> > > Please see the attached v28 patch.
> >
> > Thanks!
> >
> > 1 === sorry I missed it in the previous review
> >
> >         if (!(RecoveryInProgress() && slot->data.synced))
> > +       {
> >                 now = GetCurrentTimestamp();
> > +               update_inactive_since = true;
> > +       }
> > +       else
> > +               update_inactive_since = false;
> >
> > I think update_inactive_since is not needed, we could rely on (now > 0) instead.
> 
> Thought of using it, but, at the expense of readability. I prefer to
> use a variable instead.

That's fine too.

> However, I changed the variable to be more meaningful to is_slot_being_synced.

Yeah makes sense and even easier to read.

v29-0001 LGTM.

Regards,

-- 
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com



On Wed, Mar 27, 2024 at 9:00 PM Bharath Rupireddy
<bharath.rupireddyforpostgres@gmail.com> wrote:
>
> Thanks. I'm attaching v29 patches. 0001 managing inactive_since on the
> standby for sync slots. 0002 implementing inactive timeout GUC based
> invalidation mechanism.
>
> Please have a look.

Thanks for the patches. v29-001 looks good to me.

thanks
Shveta



Hi,

On Wed, Mar 27, 2024 at 09:00:37PM +0530, Bharath Rupireddy wrote:
> standby for sync slots. 0002 implementing inactive timeout GUC based
> invalidation mechanism.
> 
> Please have a look.

Thanks!

Regarding 0002:

Some testing:

T1 ===

When the slot is invalidated on the primary, then the reason is propagated to
the sync slot (if any). That's fine but we are loosing the inactive_since on the
standby:

Primary:

postgres=# select slot_name,inactive_since,conflicting,invalidation_reason from pg_replication_slots where
slot_name='lsub29_slot';
  slot_name  |        inactive_since         | conflicting | invalidation_reason
-------------+-------------------------------+-------------+---------------------
 lsub29_slot | 2024-03-28 08:24:51.672528+00 | f           | inactive_timeout
(1 row)

Standby:

postgres=# select slot_name,inactive_since,conflicting,invalidation_reason from pg_replication_slots where
slot_name='lsub29_slot';
  slot_name  | inactive_since | conflicting | invalidation_reason
-------------+----------------+-------------+---------------------
 lsub29_slot |                | f           | inactive_timeout
(1 row)

I think in this case it should always reflect the value from the primary (so
that one can understand why it is invalidated).

T2 ===

And it is set to a value during promotion:

postgres=# select pg_promote();
 pg_promote
------------
 t
(1 row)

postgres=# select slot_name,inactive_since,conflicting,invalidation_reason from pg_replication_slots where
slot_name='lsub29_slot';
  slot_name  |        inactive_since        | conflicting | invalidation_reason
-------------+------------------------------+-------------+---------------------
 lsub29_slot | 2024-03-28 08:30:11.74505+00 | f           | inactive_timeout
(1 row)

I think when it is invalidated it should always reflect the value from the
primary (so that one can understand why it is invalidated).

T3 ===

As far the slot invalidation on the primary:

postgres=# SELECT * FROM pg_logical_slot_get_changes('lsub29_slot', NULL, NULL, 'include-xids', '0');
ERROR:  cannot acquire invalidated replication slot "lsub29_slot"

Can we make the message more consistent with what can be found in CreateDecodingContext()
for example?

T4 ===

Also, it looks like querying pg_replication_slots() does not trigger an
invalidation: I think it should if the slot is not invalidated yet (and matches
the invalidation criteria).

Code review:

CR1 ===

+        Invalidate replication slots that are inactive for longer than this
+        amount of time. If this value is specified without units, it is taken

s/Invalidate/Invalidates/?

Should we mention the relationship with inactive_since?

CR2 ===

+ *
+ * If check_for_invalidation is true, the slot is checked for invalidation
+ * based on replication_slot_inactive_timeout GUC and an error is raised after making the slot ours.
  */
 void
-ReplicationSlotAcquire(const char *name, bool nowait)
+ReplicationSlotAcquire(const char *name, bool nowait,
+                                          bool check_for_invalidation)


s/check_for_invalidation/check_for_timeout_invalidation/?

CR3 ===

+       if (slot->inactive_since == 0 ||
+               replication_slot_inactive_timeout == 0)
+               return false;

Better to test replication_slot_inactive_timeout first? (I mean there is no
point of testing inactive_since if replication_slot_inactive_timeout == 0)

CR4 ===

+       if (slot->inactive_since > 0 &&
+               replication_slot_inactive_timeout > 0)
+       {

Same.

So, instead of CR3 === and CR4 ===, I wonder if it wouldn't be better to do
something like:

if (replication_slot_inactive_timeout == 0)
    return false;
else if (slot->inactive_since > 0)
.
.
.
.
else
    return false;

That would avoid checking replication_slot_inactive_timeout and inactive_since
multiple times.

CR5 ===

+        * held to avoid race conditions -- for example the restart_lsn could move
+        * forward, or the slot could be dropped.

Does the restart_lsn example makes sense here?

CR6 ===

+static bool
+InvalidateSlotForInactiveTimeout(ReplicationSlot *slot, bool need_locks)
+{

InvalidatePossiblyInactiveSlot() maybe?

CR7 ===

+       /* Make sure the invalidated state persists across server restart */
+       slot->just_dirtied = true;
+       slot->dirty = true;
+       SpinLockRelease(&slot->mutex);

Maybe we could create a new function say MarkGivenReplicationSlotDirty()
with a slot as parameter, that ReplicationSlotMarkDirty could call too?

Then maybe we could set slot->data.invalidated = RS_INVAL_INACTIVE_TIMEOUT in
InvalidateSlotForInactiveTimeout()? (to avoid multiple SpinLockAcquire/SpinLockRelease).

CR8 ===

+       if (persist_state)
+       {
+               char            path[MAXPGPATH];
+
+               sprintf(path, "pg_replslot/%s", NameStr(slot->data.name));
+               SaveSlotToPath(slot, path, ERROR);
+       }

Maybe we could create a new function say GivenReplicationSlotSave()
with a slot as parameter, that ReplicationSlotSave() could call too?

CR9 ===

+       if (check_for_invalidation)
+       {
+               /* The slot is ours by now */
+               Assert(s->active_pid == MyProcPid);
+
+               /*
+                * Well, the slot is not yet ours really unless we check for the
+                * invalidation below.
+                */
+               s->active_pid = 0;
+               if (InvalidateReplicationSlotForInactiveTimeout(s, true, true))
+               {
+                       /*
+                        * If the slot has been invalidated, recalculate the resource
+                        * limits.
+                        */
+                       ReplicationSlotsComputeRequiredXmin(false);
+                       ReplicationSlotsComputeRequiredLSN();
+
+                       /* Might need it for slot clean up on error, so restore it */
+                       s->active_pid = MyProcPid;
+                       ereport(ERROR,
+                                       (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
+                                        errmsg("cannot acquire invalidated replication slot \"%s\"",
+                                                       NameStr(MyReplicationSlot->data.name))));
+               }
+               s->active_pid = MyProcPid;

Are we not missing some SpinLockAcquire/Release on the slot's mutex here? (the
places where we set the active_pid).

CR10 ===

@@ -1628,6 +1674,10 @@ InvalidatePossiblyObsoleteSlot(ReplicationSlotInvalidationCause cause,
                                        if (SlotIsLogical(s))
                                                invalidation_cause = cause;
                                        break;
+                               case RS_INVAL_INACTIVE_TIMEOUT:
+                                       if (InvalidateReplicationSlotForInactiveTimeout(s, false, false))
+                                               invalidation_cause = cause;
+                                       break;

InvalidatePossiblyObsoleteSlot() is not called with such a reason, better to use
an Assert here and in the caller too?

CR11 ===

+++ b/src/test/recovery/t/050_invalidate_slots.pl

why not using 019_replslot_limit.pl?

Regards,

-- 
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com



On Wed, Mar 27, 2024 at 9:00 PM Bharath Rupireddy
<bharath.rupireddyforpostgres@gmail.com> wrote:
>
>
> Thanks. I'm attaching v29 patches. 0001 managing inactive_since on the
> standby for sync slots.
>

Commit message states: "why we can't just update inactive_since for
synced slots on the standby with the value received from remote slot
on the primary. This is consistent with any other slot parameter i.e.
all of them are synced from the primary."

The inactive_since is not consistent with other slot parameters which
we copy. We don't perform anything related to those other parameters
like say two_phase phase which can change that property. However, we
do acquire the slot, advance the slot (as per recent discussion [1]),
and release it. Since these operations can impact inactive_since, it
seems to me that inactive_since is not the same as other parameters.
It can have a different value than the primary. Why would anyone want
to know the value of inactive_since from primary after the standby is
promoted? Now, the other concern is that calling GetCurrentTimestamp()
could be costly when the values for the slot are not going to be
updated but if that happens we can optimize such that before acquiring
the slot we can have some minimal pre-checks to ensure whether we need
to update the slot or not.

[1] -
https://www.postgresql.org/message-id/OS0PR01MB571615D35F486080616CA841943A2%40OS0PR01MB5716.jpnprd01.prod.outlook.com

--
With Regards,
Amit Kapila.



Hi,

On Fri, Mar 29, 2024 at 09:39:31AM +0530, Amit Kapila wrote:
> On Wed, Mar 27, 2024 at 9:00 PM Bharath Rupireddy
> <bharath.rupireddyforpostgres@gmail.com> wrote:
> >
> >
> > Thanks. I'm attaching v29 patches. 0001 managing inactive_since on the
> > standby for sync slots.
> >
> 
> Commit message states: "why we can't just update inactive_since for
> synced slots on the standby with the value received from remote slot
> on the primary. This is consistent with any other slot parameter i.e.
> all of them are synced from the primary."
> 
> The inactive_since is not consistent with other slot parameters which
> we copy. We don't perform anything related to those other parameters
> like say two_phase phase which can change that property. However, we
> do acquire the slot, advance the slot (as per recent discussion [1]),
> and release it. Since these operations can impact inactive_since, it
> seems to me that inactive_since is not the same as other parameters.
> It can have a different value than the primary. Why would anyone want
> to know the value of inactive_since from primary after the standby is
> promoted?

I think it can be useful "before" it is promoted and in case the primary is down.
I agree that tracking the activity time of a synced slot can be useful, why
not creating a dedicated field for that purpose (and keep inactive_since a
perfect "copy" of the primary)?

> Now, the other concern is that calling GetCurrentTimestamp()
> could be costly when the values for the slot are not going to be
> updated but if that happens we can optimize such that before acquiring
> the slot we can have some minimal pre-checks to ensure whether we need
> to update the slot or not.

Right, but for a very active slot it is likely that we call GetCurrentTimestamp()
during almost each sync cycle.

Regards,

-- 
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com



On Fri, Mar 29, 2024 at 11:49 AM Bertrand Drouvot
<bertranddrouvot.pg@gmail.com> wrote:
>
> On Fri, Mar 29, 2024 at 09:39:31AM +0530, Amit Kapila wrote:
> >
> > Commit message states: "why we can't just update inactive_since for
> > synced slots on the standby with the value received from remote slot
> > on the primary. This is consistent with any other slot parameter i.e.
> > all of them are synced from the primary."
> >
> > The inactive_since is not consistent with other slot parameters which
> > we copy. We don't perform anything related to those other parameters
> > like say two_phase phase which can change that property. However, we
> > do acquire the slot, advance the slot (as per recent discussion [1]),
> > and release it. Since these operations can impact inactive_since, it
> > seems to me that inactive_since is not the same as other parameters.
> > It can have a different value than the primary. Why would anyone want
> > to know the value of inactive_since from primary after the standby is
> > promoted?
>
> I think it can be useful "before" it is promoted and in case the primary is down.
>

It is not clear to me what is user going to do by checking the
inactivity time for slots when the corresponding server is down. I
thought the idea was to check such slots and see if they need to be
dropped or enabled again to avoid excessive disk usage, etc.

> I agree that tracking the activity time of a synced slot can be useful, why
> not creating a dedicated field for that purpose (and keep inactive_since a
> perfect "copy" of the primary)?
>

We can have a separate field for this but not sure if it is worth it.

> > Now, the other concern is that calling GetCurrentTimestamp()
> > could be costly when the values for the slot are not going to be
> > updated but if that happens we can optimize such that before acquiring
> > the slot we can have some minimal pre-checks to ensure whether we need
> > to update the slot or not.
>
> Right, but for a very active slot it is likely that we call GetCurrentTimestamp()
> during almost each sync cycle.
>

True, but if we have to save a slot to disk each time to persist the
changes (for an active slot) then probably GetCurrentTimestamp()
shouldn't be costly enough to matter.

--
With Regards,
Amit Kapila.



Hi,

On Fri, Mar 29, 2024 at 03:03:01PM +0530, Amit Kapila wrote:
> On Fri, Mar 29, 2024 at 11:49 AM Bertrand Drouvot
> <bertranddrouvot.pg@gmail.com> wrote:
> >
> > On Fri, Mar 29, 2024 at 09:39:31AM +0530, Amit Kapila wrote:
> > >
> > > Commit message states: "why we can't just update inactive_since for
> > > synced slots on the standby with the value received from remote slot
> > > on the primary. This is consistent with any other slot parameter i.e.
> > > all of them are synced from the primary."
> > >
> > > The inactive_since is not consistent with other slot parameters which
> > > we copy. We don't perform anything related to those other parameters
> > > like say two_phase phase which can change that property. However, we
> > > do acquire the slot, advance the slot (as per recent discussion [1]),
> > > and release it. Since these operations can impact inactive_since, it
> > > seems to me that inactive_since is not the same as other parameters.
> > > It can have a different value than the primary. Why would anyone want
> > > to know the value of inactive_since from primary after the standby is
> > > promoted?
> >
> > I think it can be useful "before" it is promoted and in case the primary is down.
> >
> 
> It is not clear to me what is user going to do by checking the
> inactivity time for slots when the corresponding server is down.

Say a failover needs to be done, then it could be useful to know for which
slots the activity needs to be resumed (thinking about external logical decoding
plugin, not about pub/sub here). If one see an inactive slot (since long "enough")
then he can start to reasonate about what to do with it.

> I thought the idea was to check such slots and see if they need to be
> dropped or enabled again to avoid excessive disk usage, etc.

Yeah that's the case but it does not mean inactive_since can't be useful in other
ways.

Also, say the slot has been invalidated on the primary (due to inactivity timeout),
primary is down and there is a failover. By keeping the inactive_since from
the primary, one could know when the inactivity that lead to the timeout started.

Again, more concerned about external logical decoding plugin than pub/sub here.

> > I agree that tracking the activity time of a synced slot can be useful, why
> > not creating a dedicated field for that purpose (and keep inactive_since a
> > perfect "copy" of the primary)?
> >
> 
> We can have a separate field for this but not sure if it is worth it.

OTOH I'm not sure that erasing this information from the primary is useful. I
think that 2 fields would be the best option and would be less subject of
misinterpretation.

> > > Now, the other concern is that calling GetCurrentTimestamp()
> > > could be costly when the values for the slot are not going to be
> > > updated but if that happens we can optimize such that before acquiring
> > > the slot we can have some minimal pre-checks to ensure whether we need
> > > to update the slot or not.
> >
> > Right, but for a very active slot it is likely that we call GetCurrentTimestamp()
> > during almost each sync cycle.
> >
> 
> True, but if we have to save a slot to disk each time to persist the
> changes (for an active slot) then probably GetCurrentTimestamp()
> shouldn't be costly enough to matter.

Right, persisting the changes to disk would be even more costly.

Regards,

-- 
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com



On Thu, Mar 28, 2024 at 3:13 PM Bertrand Drouvot
<bertranddrouvot.pg@gmail.com> wrote:
>
> Regarding 0002:

Thanks for reviewing it.

> Some testing:
>
> T1 ===
>
> When the slot is invalidated on the primary, then the reason is propagated to
> the sync slot (if any). That's fine but we are loosing the inactive_since on the
> standby:
>
> Primary:
>
> postgres=# select slot_name,inactive_since,conflicting,invalidation_reason from pg_replication_slots where
slot_name='lsub29_slot';
>   slot_name  |        inactive_since         | conflicting | invalidation_reason
> -------------+-------------------------------+-------------+---------------------
>  lsub29_slot | 2024-03-28 08:24:51.672528+00 | f           | inactive_timeout
> (1 row)
>
> Standby:
>
> postgres=# select slot_name,inactive_since,conflicting,invalidation_reason from pg_replication_slots where
slot_name='lsub29_slot';
>   slot_name  | inactive_since | conflicting | invalidation_reason
> -------------+----------------+-------------+---------------------
>  lsub29_slot |                | f           | inactive_timeout
> (1 row)
>
> I think in this case it should always reflect the value from the primary (so
> that one can understand why it is invalidated).

I'll come back to this as soon as we all agree on inactive_since
behavior for synced slots.

> T2 ===
>
> And it is set to a value during promotion:
>
> postgres=# select pg_promote();
>  pg_promote
> ------------
>  t
> (1 row)
>
> postgres=# select slot_name,inactive_since,conflicting,invalidation_reason from pg_replication_slots where
slot_name='lsub29_slot';
>   slot_name  |        inactive_since        | conflicting | invalidation_reason
> -------------+------------------------------+-------------+---------------------
>  lsub29_slot | 2024-03-28 08:30:11.74505+00 | f           | inactive_timeout
> (1 row)
>
> I think when it is invalidated it should always reflect the value from the
> primary (so that one can understand why it is invalidated).

I'll come back to this as soon as we all agree on inactive_since
behavior for synced slots.

> T3 ===
>
> As far the slot invalidation on the primary:
>
> postgres=# SELECT * FROM pg_logical_slot_get_changes('lsub29_slot', NULL, NULL, 'include-xids', '0');
> ERROR:  cannot acquire invalidated replication slot "lsub29_slot"
>
> Can we make the message more consistent with what can be found in CreateDecodingContext()
> for example?

Hm, that makes sense because slot acquisition and release is something
internal to the server.

> T4 ===
>
> Also, it looks like querying pg_replication_slots() does not trigger an
> invalidation: I think it should if the slot is not invalidated yet (and matches
> the invalidation criteria).

There's a different opinion on this, check comment #3 from
https://www.postgresql.org/message-id/CAA4eK1LLj%2BeaMN-K8oeOjfG%2BUuzTY%3DL5PXbcMJURZbFm%2B_aJSA%40mail.gmail.com.

> Code review:
>
> CR1 ===
>
> +        Invalidate replication slots that are inactive for longer than this
> +        amount of time. If this value is specified without units, it is taken
>
> s/Invalidate/Invalidates/?

Done.

> Should we mention the relationship with inactive_since?

Done.

> CR2 ===
>
> + *
> + * If check_for_invalidation is true, the slot is checked for invalidation
> + * based on replication_slot_inactive_timeout GUC and an error is raised after making the slot ours.
>   */
>  void
> -ReplicationSlotAcquire(const char *name, bool nowait)
> +ReplicationSlotAcquire(const char *name, bool nowait,
> +                                          bool check_for_invalidation)
>
>
> s/check_for_invalidation/check_for_timeout_invalidation/?

Done.

> CR3 ===
>
> +       if (slot->inactive_since == 0 ||
> +               replication_slot_inactive_timeout == 0)
> +               return false;
>
> Better to test replication_slot_inactive_timeout first? (I mean there is no
> point of testing inactive_since if replication_slot_inactive_timeout == 0)
>
> CR4 ===
>
> +       if (slot->inactive_since > 0 &&
> +               replication_slot_inactive_timeout > 0)
> +       {
>
> Same.
>
> So, instead of CR3 === and CR4 ===, I wonder if it wouldn't be better to do
> something like:
>
> if (replication_slot_inactive_timeout == 0)
>         return false;
> else if (slot->inactive_since > 0)
> .
> else
>         return false;
>
> That would avoid checking replication_slot_inactive_timeout and inactive_since
> multiple times.

Done.

> CR5 ===
>
> +        * held to avoid race conditions -- for example the restart_lsn could move
> +        * forward, or the slot could be dropped.
>
> Does the restart_lsn example makes sense here?

No, it doesn't. Modified that.

> CR6 ===
>
> +static bool
> +InvalidateSlotForInactiveTimeout(ReplicationSlot *slot, bool need_locks)
> +{
>
> InvalidatePossiblyInactiveSlot() maybe?

I think we will lose the essence i.e. timeout from the suggested
function name, otherwise just the inactive doesn't give a clearer
meaning. I kept it that way unless anyone suggests otherwise.

> CR7 ===
>
> +       /* Make sure the invalidated state persists across server restart */
> +       slot->just_dirtied = true;
> +       slot->dirty = true;
> +       SpinLockRelease(&slot->mutex);
>
> Maybe we could create a new function say MarkGivenReplicationSlotDirty()
> with a slot as parameter, that ReplicationSlotMarkDirty could call too?

Done that.

> Then maybe we could set slot->data.invalidated = RS_INVAL_INACTIVE_TIMEOUT in
> InvalidateSlotForInactiveTimeout()? (to avoid multiple SpinLockAcquire/SpinLockRelease).

Done that.

> CR8 ===
>
> +       if (persist_state)
> +       {
> +               char            path[MAXPGPATH];
> +
> +               sprintf(path, "pg_replslot/%s", NameStr(slot->data.name));
> +               SaveSlotToPath(slot, path, ERROR);
> +       }
>
> Maybe we could create a new function say GivenReplicationSlotSave()
> with a slot as parameter, that ReplicationSlotSave() could call too?

Done that.

> CR9 ===
>
> +       if (check_for_invalidation)
> +       {
> +               /* The slot is ours by now */
> +               Assert(s->active_pid == MyProcPid);
> +
> +               /*
> +                * Well, the slot is not yet ours really unless we check for the
> +                * invalidation below.
> +                */
> +               s->active_pid = 0;
> +               if (InvalidateReplicationSlotForInactiveTimeout(s, true, true))
> +               {
> +                       /*
> +                        * If the slot has been invalidated, recalculate the resource
> +                        * limits.
> +                        */
> +                       ReplicationSlotsComputeRequiredXmin(false);
> +                       ReplicationSlotsComputeRequiredLSN();
> +
> +                       /* Might need it for slot clean up on error, so restore it */
> +                       s->active_pid = MyProcPid;
> +                       ereport(ERROR,
> +                                       (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
> +                                        errmsg("cannot acquire invalidated replication slot \"%s\"",
> +                                                       NameStr(MyReplicationSlot->data.name))));
> +               }
> +               s->active_pid = MyProcPid;
>
> Are we not missing some SpinLockAcquire/Release on the slot's mutex here? (the
> places where we set the active_pid).

Hm, yes. But, shall I acquire the mutex, set active_pid to 0 for a
moment just to satisfy Assert(slot->active_pid == 0); in
InvalidateReplicationSlotForInactiveTimeout and
InvalidateSlotForInactiveTimeout? I just removed the assertions
because being replication_slot_inactive_timeout > 0 and inactive_since
> 0 is enough for these functions to think and decide on inactive
timeout invalidation.

> CR10 ===
>
> @@ -1628,6 +1674,10 @@ InvalidatePossiblyObsoleteSlot(ReplicationSlotInvalidationCause cause,
>                                         if (SlotIsLogical(s))
>                                                 invalidation_cause = cause;
>                                         break;
> +                               case RS_INVAL_INACTIVE_TIMEOUT:
> +                                       if (InvalidateReplicationSlotForInactiveTimeout(s, false, false))
> +                                               invalidation_cause = cause;
> +                                       break;
>
> InvalidatePossiblyObsoleteSlot() is not called with such a reason, better to use
> an Assert here and in the caller too?

Done.

> CR11 ===
>
> +++ b/src/test/recovery/t/050_invalidate_slots.pl
>
> why not using 019_replslot_limit.pl?

I understand that 019_replslot_limit covers wal_removed related
invalidations. But, I don't want to kludge it with a bunch of other
tests. The new tests anyway need a bunch of new nodes and a couple of
helper functions. Any future invalidation mechanisms can be added here
in this new file. Also, having a separate file quickly helps isolate
any test failures that BF animals might report in future. I don't
think a separate test file here hurts anyone unless there's a strong
reason against it.

Please see the attached v30 patch. 0002 is where all of the above
review comments have been addressed.

--
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Attachment
On Fri, Mar 29, 2024 at 9:39 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> Commit message states: "why we can't just update inactive_since for
> synced slots on the standby with the value received from remote slot
> on the primary. This is consistent with any other slot parameter i.e.
> all of them are synced from the primary."
>
> The inactive_since is not consistent with other slot parameters which
> we copy. We don't perform anything related to those other parameters
> like say two_phase phase which can change that property. However, we
> do acquire the slot, advance the slot (as per recent discussion [1]),
> and release it. Since these operations can impact inactive_since, it
> seems to me that inactive_since is not the same as other parameters.
> It can have a different value than the primary. Why would anyone want
> to know the value of inactive_since from primary after the standby is
> promoted?

After thinking about it for a while now, it feels to me that the
synced slots (slots on the standby that are being synced from the
primary) can have their own inactive_sicne value. Fundamentally,
inactive_sicne is set to 0 when slot is acquired and set to current
time when slot is released, no matter who acquires and releases it -
be it walsenders for replication, or backends for slot advance, or
backends for slot sync using pg_sync_replication_slots, or backends
for other slot functions, or background sync worker. Remember the
earlier patch was updating inactive_since just for walsenders, but
then the suggestion was to update it unconditionally -
https://www.postgresql.org/message-id/CAJpy0uD64X%3D2ENmbHaRiWTKeQawr-rbGoy_GdhQQLVXzUSKTMg%40mail.gmail.com.
Whoever syncs the slot, *acutally* acquires the slot i.e. makes it
theirs, syncs it from the primary, and releases it. IMO, no
differentiation is to be made for synced slots.

There was a suggestion on using inactive_since of the synced slot on
the standby to know the inactivity of the slot on the primary. If one
wants to do that, they better look at/monitor the primary slot
info/logs/pg_replication_slot/whatever. I really don't see a point in
having two different meanings for a single property of a replication
slot - inactive_since for a regular slot tells since when this slot
has become inactive, and for a synced slot since when the
corresponding remote slot has become inactive. I think this will
confuse users for sure.

Also, if inactive_since is being changed on the primary so frequently,
and none of the other parameters are changing, if we copy
inactive_since to the synced slots, then standby will just be doing
*sync* work (mark the slots dirty and save to disk) for updating
inactive_since. I think this is unnecessary behaviour for sure.

Coming to a future patch for inactive timeout based slot invalidation,
we can either allow invalidation without any differentiation for
synced slots or restrict invalidation to avoid more sync work. For
instance, if inactive timeout is kept low on the standby, the sync
worker will be doing more work as it drops and recreates a slot
repeatedly if it keeps getting invalidated. Another thing is that the
standby takes independent invalidation decisions for synced slots.
AFAICS, invalidation due to wal_removal is the only sole reason (out
of all available invalidation reasons) for a synced slot to get
invalidated independently of the primary. Check
https://www.postgresql.org/message-id/CAA4eK1JXBwTaDRD_%3D8t6UB1fhRNjC1C%2BgH4YdDxj_9U6djLnXw%40mail.gmail.com
for the suggestion on we better not differentiaing invalidation
decisions for synced slots.

The assumption of letting synced slots have their own inactive_since
not only simplifies the code, but also looks less-confusing and more
meaningful to the user. The only code that we put in on top of the
committed code is to use InRecovery in place of
RecoveryInProgress() in RestoreSlotFromDisk() to fix the issue raised
by Shveta upthread.

> Now, the other concern is that calling GetCurrentTimestamp()
> could be costly when the values for the slot are not going to be
> updated but if that happens we can optimize such that before acquiring
> the slot we can have some minimal pre-checks to ensure whether we need
> to update the slot or not.
>
> [1] -
https://www.postgresql.org/message-id/OS0PR01MB571615D35F486080616CA841943A2%40OS0PR01MB5716.jpnprd01.prod.outlook.com

A quick test with a function to measure the cost of
GetCurrentTimestamp [1] on my Ubuntu dev system (an AWS EC2 c5.4xlarge
instance), gives me [2]. It took 0.388 ms, 2.269 ms, 21.144 ms,
209.333 ms, 2091.174 ms, 20908.942 ms for 10K, 100K, 1million,
10million, 100million, 1billion times respectively. Costs might be
different on various systems with different OS, but it gives us a
rough idea.

If we are too much concerned about the cost of GetCurrentTimestamp(),
a possible approach is just don't set inactive_since for slots being
synced on the standby. Just let the first acquisition and release
after the promotion do that job. We can always call this out in the
docs saying "replication slots on the streaming standbys which are
being synced from the primary are not inactive in practice, so the
inactive_since is always NULL for them unless the standby is
promoted".

[1]
Datum
pg_get_current_timestamp(PG_FUNCTION_ARGS)
{
    int         loops = PG_GETARG_INT32(0);
    TimestampTz ctime;

    for (int i = 0; i < loops; i++)
        ctime = GetCurrentTimestamp();

    PG_RETURN_TIMESTAMPTZ(ctime);
}

[2]
postgres=# \timing
Timing is on.
postgres=# SELECT pg_get_current_timestamp(1000000000);
   pg_get_current_timestamp
-------------------------------
 2024-03-30 19:07:57.374797+00
(1 row)

Time: 20908.942 ms (00:20.909)
postgres=# SELECT pg_get_current_timestamp(100000000);
   pg_get_current_timestamp
-------------------------------
 2024-03-30 19:08:21.038064+00
(1 row)

Time: 2091.174 ms (00:02.091)
postgres=# SELECT pg_get_current_timestamp(10000000);
   pg_get_current_timestamp
-------------------------------
 2024-03-30 19:08:24.329949+00
(1 row)

Time: 209.333 ms
postgres=# SELECT pg_get_current_timestamp(1000000);
   pg_get_current_timestamp
-------------------------------
 2024-03-30 19:08:26.978016+00
(1 row)

Time: 21.144 ms
postgres=# SELECT pg_get_current_timestamp(100000);
   pg_get_current_timestamp
-------------------------------
 2024-03-30 19:08:29.142248+00
(1 row)

Time: 2.269 ms
postgres=# SELECT pg_get_current_timestamp(10000);
   pg_get_current_timestamp
------------------------------
 2024-03-30 19:08:31.34621+00
(1 row)

Time: 0.388 ms

--
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com



On Fri, Mar 29, 2024 at 6:17 PM Bertrand Drouvot
<bertranddrouvot.pg@gmail.com> wrote:
>
> On Fri, Mar 29, 2024 at 03:03:01PM +0530, Amit Kapila wrote:
> > On Fri, Mar 29, 2024 at 11:49 AM Bertrand Drouvot
> > <bertranddrouvot.pg@gmail.com> wrote:
> > >
> > > On Fri, Mar 29, 2024 at 09:39:31AM +0530, Amit Kapila wrote:
> > > >
> > > > Commit message states: "why we can't just update inactive_since for
> > > > synced slots on the standby with the value received from remote slot
> > > > on the primary. This is consistent with any other slot parameter i.e.
> > > > all of them are synced from the primary."
> > > >
> > > > The inactive_since is not consistent with other slot parameters which
> > > > we copy. We don't perform anything related to those other parameters
> > > > like say two_phase phase which can change that property. However, we
> > > > do acquire the slot, advance the slot (as per recent discussion [1]),
> > > > and release it. Since these operations can impact inactive_since, it
> > > > seems to me that inactive_since is not the same as other parameters.
> > > > It can have a different value than the primary. Why would anyone want
> > > > to know the value of inactive_since from primary after the standby is
> > > > promoted?
> > >
> > > I think it can be useful "before" it is promoted and in case the primary is down.
> > >
> >
> > It is not clear to me what is user going to do by checking the
> > inactivity time for slots when the corresponding server is down.
>
> Say a failover needs to be done, then it could be useful to know for which
> slots the activity needs to be resumed (thinking about external logical decoding
> plugin, not about pub/sub here). If one see an inactive slot (since long "enough")
> then he can start to reasonate about what to do with it.
>
> > I thought the idea was to check such slots and see if they need to be
> > dropped or enabled again to avoid excessive disk usage, etc.
>
> Yeah that's the case but it does not mean inactive_since can't be useful in other
> ways.
>
> Also, say the slot has been invalidated on the primary (due to inactivity timeout),
> primary is down and there is a failover. By keeping the inactive_since from
> the primary, one could know when the inactivity that lead to the timeout started.
>

So, this means at promotion, we won't set the current_time for
inactive_since which is not what the currently proposed patch is
doing. Moreover, doing the invalidation on promoted standby based on
inactive_since of the primary node sounds debatable because the
inactive_timeout could be different on the new node (promoted
standby).

> Again, more concerned about external logical decoding plugin than pub/sub here.
>
> > > I agree that tracking the activity time of a synced slot can be useful, why
> > > not creating a dedicated field for that purpose (and keep inactive_since a
> > > perfect "copy" of the primary)?
> > >
> >
> > We can have a separate field for this but not sure if it is worth it.
>
> OTOH I'm not sure that erasing this information from the primary is useful. I
> think that 2 fields would be the best option and would be less subject of
> misinterpretation.
>
> > > > Now, the other concern is that calling GetCurrentTimestamp()
> > > > could be costly when the values for the slot are not going to be
> > > > updated but if that happens we can optimize such that before acquiring
> > > > the slot we can have some minimal pre-checks to ensure whether we need
> > > > to update the slot or not.
> > >
> > > Right, but for a very active slot it is likely that we call GetCurrentTimestamp()
> > > during almost each sync cycle.
> > >
> >
> > True, but if we have to save a slot to disk each time to persist the
> > changes (for an active slot) then probably GetCurrentTimestamp()
> > shouldn't be costly enough to matter.
>
> Right, persisting the changes to disk would be even more costly.
>

The point I was making is that currently after copying the
remote_node's values, we always persist the slots to disk, so the cost
of current_time shouldn't be much. Now, if the values won't change
then probably there is some cost but in most cases (active slots), the
values will always change. Also, if all the slots are inactive then we
will slow down the speed of sync. We also need to consider if we want
to copy the value of inactive_since from the primary and if that is
the only value changed then shall we persist the slot or not?

--
With Regards,
Amit Kapila.



Hi,

On Mon, Apr 01, 2024 at 09:04:43AM +0530, Amit Kapila wrote:
> On Fri, Mar 29, 2024 at 6:17 PM Bertrand Drouvot
> <bertranddrouvot.pg@gmail.com> wrote:
> >
> > On Fri, Mar 29, 2024 at 03:03:01PM +0530, Amit Kapila wrote:
> > > On Fri, Mar 29, 2024 at 11:49 AM Bertrand Drouvot
> > > <bertranddrouvot.pg@gmail.com> wrote:
> > > >
> > > > On Fri, Mar 29, 2024 at 09:39:31AM +0530, Amit Kapila wrote:
> > > > >
> > > > > Commit message states: "why we can't just update inactive_since for
> > > > > synced slots on the standby with the value received from remote slot
> > > > > on the primary. This is consistent with any other slot parameter i.e.
> > > > > all of them are synced from the primary."
> > > > >
> > > > > The inactive_since is not consistent with other slot parameters which
> > > > > we copy. We don't perform anything related to those other parameters
> > > > > like say two_phase phase which can change that property. However, we
> > > > > do acquire the slot, advance the slot (as per recent discussion [1]),
> > > > > and release it. Since these operations can impact inactive_since, it
> > > > > seems to me that inactive_since is not the same as other parameters.
> > > > > It can have a different value than the primary. Why would anyone want
> > > > > to know the value of inactive_since from primary after the standby is
> > > > > promoted?
> > > >
> > > > I think it can be useful "before" it is promoted and in case the primary is down.
> > > >
> > >
> > > It is not clear to me what is user going to do by checking the
> > > inactivity time for slots when the corresponding server is down.
> >
> > Say a failover needs to be done, then it could be useful to know for which
> > slots the activity needs to be resumed (thinking about external logical decoding
> > plugin, not about pub/sub here). If one see an inactive slot (since long "enough")
> > then he can start to reasonate about what to do with it.
> >
> > > I thought the idea was to check such slots and see if they need to be
> > > dropped or enabled again to avoid excessive disk usage, etc.
> >
> > Yeah that's the case but it does not mean inactive_since can't be useful in other
> > ways.
> >
> > Also, say the slot has been invalidated on the primary (due to inactivity timeout),
> > primary is down and there is a failover. By keeping the inactive_since from
> > the primary, one could know when the inactivity that lead to the timeout started.
> >
> 
> So, this means at promotion, we won't set the current_time for
> inactive_since which is not what the currently proposed patch is
> doing.

Yeah, that's why I made the comment T2 in [1].

> Moreover, doing the invalidation on promoted standby based on
> inactive_since of the primary node sounds debatable because the
> inactive_timeout could be different on the new node (promoted
> standby).

I think that if the slot is not invalidated before the promotion then we should
erase the value from the primary and use the promotion time.

> > Again, more concerned about external logical decoding plugin than pub/sub here.
> >
> > > > I agree that tracking the activity time of a synced slot can be useful, why
> > > > not creating a dedicated field for that purpose (and keep inactive_since a
> > > > perfect "copy" of the primary)?
> > > >
> > >
> > > We can have a separate field for this but not sure if it is worth it.
> >
> > OTOH I'm not sure that erasing this information from the primary is useful. I
> > think that 2 fields would be the best option and would be less subject of
> > misinterpretation.
> >
> > > > > Now, the other concern is that calling GetCurrentTimestamp()
> > > > > could be costly when the values for the slot are not going to be
> > > > > updated but if that happens we can optimize such that before acquiring
> > > > > the slot we can have some minimal pre-checks to ensure whether we need
> > > > > to update the slot or not.
> > > >
> > > > Right, but for a very active slot it is likely that we call GetCurrentTimestamp()
> > > > during almost each sync cycle.
> > > >
> > >
> > > True, but if we have to save a slot to disk each time to persist the
> > > changes (for an active slot) then probably GetCurrentTimestamp()
> > > shouldn't be costly enough to matter.
> >
> > Right, persisting the changes to disk would be even more costly.
> >
> 
> The point I was making is that currently after copying the
> remote_node's values, we always persist the slots to disk, so the cost
> of current_time shouldn't be much.

Oh right, I missed this (was focusing only on inactive_since that we don't persist
to disk IIRC).

BTW, If we are going this way, maybe we could accept a bit less accuracy
and use GetCurrentTransactionStopTimestamp() instead?

> Now, if the values won't change
> then probably there is some cost but in most cases (active slots), the
> values will always change.

Right.

> Also, if all the slots are inactive then we
> will slow down the speed of sync.

Yes.

> We also need to consider if we want
> to copy the value of inactive_since from the primary and if that is
> the only value changed then shall we persist the slot or not?

Good point, then I don't think we should as inactive_since is not persisted on disk.

[1]: https://www.postgresql.org/message-id/ZgU70MjdOfO6l0O0%40ip-10-97-1-34.eu-west-3.compute.internal

Regards,

-- 
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com



Hi,

On Mon, Apr 01, 2024 at 08:47:59AM +0530, Bharath Rupireddy wrote:
> On Fri, Mar 29, 2024 at 9:39 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
> >
> > Commit message states: "why we can't just update inactive_since for
> > synced slots on the standby with the value received from remote slot
> > on the primary. This is consistent with any other slot parameter i.e.
> > all of them are synced from the primary."
> >
> > The inactive_since is not consistent with other slot parameters which
> > we copy. We don't perform anything related to those other parameters
> > like say two_phase phase which can change that property. However, we
> > do acquire the slot, advance the slot (as per recent discussion [1]),
> > and release it. Since these operations can impact inactive_since, it
> > seems to me that inactive_since is not the same as other parameters.
> > It can have a different value than the primary. Why would anyone want
> > to know the value of inactive_since from primary after the standby is
> > promoted?
> 
> After thinking about it for a while now, it feels to me that the
> synced slots (slots on the standby that are being synced from the
> primary) can have their own inactive_sicne value. Fundamentally,
> inactive_sicne is set to 0 when slot is acquired and set to current
> time when slot is released, no matter who acquires and releases it -
> be it walsenders for replication, or backends for slot advance, or
> backends for slot sync using pg_sync_replication_slots, or backends
> for other slot functions, or background sync worker. Remember the
> earlier patch was updating inactive_since just for walsenders, but
> then the suggestion was to update it unconditionally -
> https://www.postgresql.org/message-id/CAJpy0uD64X%3D2ENmbHaRiWTKeQawr-rbGoy_GdhQQLVXzUSKTMg%40mail.gmail.com.
> Whoever syncs the slot, *acutally* acquires the slot i.e. makes it
> theirs, syncs it from the primary, and releases it. IMO, no
> differentiation is to be made for synced slots.
> 
> There was a suggestion on using inactive_since of the synced slot on
> the standby to know the inactivity of the slot on the primary. If one
> wants to do that, they better look at/monitor the primary slot
> info/logs/pg_replication_slot/whatever.

Yeah but the use case was in case the primary is down for whatever reason.

> I really don't see a point in
> having two different meanings for a single property of a replication
> slot - inactive_since for a regular slot tells since when this slot
> has become inactive, and for a synced slot since when the
> corresponding remote slot has become inactive. I think this will
> confuse users for sure.

I'm not sure as we are speaking about "synced" slots. I can also see some confusion
if this value is not "synced".

> Also, if inactive_since is being changed on the primary so frequently,
> and none of the other parameters are changing, if we copy
> inactive_since to the synced slots, then standby will just be doing
> *sync* work (mark the slots dirty and save to disk) for updating
> inactive_since. I think this is unnecessary behaviour for sure.

Right, I think we should avoid the save slot to disk in that case (question raised
by Amit in [1]).

> Coming to a future patch for inactive timeout based slot invalidation,
> we can either allow invalidation without any differentiation for
> synced slots or restrict invalidation to avoid more sync work. For
> instance, if inactive timeout is kept low on the standby, the sync
> worker will be doing more work as it drops and recreates a slot
> repeatedly if it keeps getting invalidated. Another thing is that the
> standby takes independent invalidation decisions for synced slots.
> AFAICS, invalidation due to wal_removal is the only sole reason (out
> of all available invalidation reasons) for a synced slot to get
> invalidated independently of the primary. Check
> https://www.postgresql.org/message-id/CAA4eK1JXBwTaDRD_%3D8t6UB1fhRNjC1C%2BgH4YdDxj_9U6djLnXw%40mail.gmail.com
> for the suggestion on we better not differentiaing invalidation
> decisions for synced slots.

Yeah, I think the invalidation decision on the standby is highly linked to
what inactive_since on the standby is: synced from primary or not.

> The assumption of letting synced slots have their own inactive_since
> not only simplifies the code, but also looks less-confusing and more
> meaningful to the user.

I'm not sure at all. But if the majority of us thinks it's the case then let's
go that way.

> > Now, the other concern is that calling GetCurrentTimestamp()
> > could be costly when the values for the slot are not going to be
> > updated but if that happens we can optimize such that before acquiring
> > the slot we can have some minimal pre-checks to ensure whether we need
> > to update the slot or not.

Also maybe we could accept a bit less accuracy and use
GetCurrentTransactionStopTimestamp() instead?

> If we are too much concerned about the cost of GetCurrentTimestamp(),
> a possible approach is just don't set inactive_since for slots being
> synced on the standby.
> Just let the first acquisition and release
> after the promotion do that job. We can always call this out in the
> docs saying "replication slots on the streaming standbys which are
> being synced from the primary are not inactive in practice, so the
> inactive_since is always NULL for them unless the standby is
> promoted".

I think that was the initial behavior that lead to Robert's remark (see [2]):

"
And I'm suspicious that having an exception for slots being synced is
a bad idea. That makes too much of a judgement about how the user will
use this field. It's usually better to just expose the data, and if
the user needs helps to make sense of that data, then give them that
help separately.
"

[1]: https://www.postgresql.org/message-id/CAA4eK1JtKieWMivbswYg5FVVB5FugCftLvQKVsxh%3Dm_8nk04vw%40mail.gmail.com
[2]: https://www.postgresql.org/message-id/CA%2BTgmob_Ta-t2ty8QrKHBGnNLrf4ZYcwhGHGFsuUoFrAEDw4sA%40mail.gmail.com

Regards,

-- 
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com



Hi,

On Sun, Mar 31, 2024 at 10:25:46AM +0530, Bharath Rupireddy wrote:
> On Thu, Mar 28, 2024 at 3:13 PM Bertrand Drouvot
> <bertranddrouvot.pg@gmail.com> wrote:
> > I think in this case it should always reflect the value from the primary (so
> > that one can understand why it is invalidated).
> 
> I'll come back to this as soon as we all agree on inactive_since
> behavior for synced slots.

Makes sense. Also if the majority of us thinks it's not needed for inactive_since
to be an exact copy of the primary, then let's go that way.

> > I think when it is invalidated it should always reflect the value from the
> > primary (so that one can understand why it is invalidated).
> 
> I'll come back to this as soon as we all agree on inactive_since
> behavior for synced slots.

Yeah.

> > T4 ===
> >
> > Also, it looks like querying pg_replication_slots() does not trigger an
> > invalidation: I think it should if the slot is not invalidated yet (and matches
> > the invalidation criteria).
> 
> There's a different opinion on this, check comment #3 from
> https://www.postgresql.org/message-id/CAA4eK1LLj%2BeaMN-K8oeOjfG%2BUuzTY%3DL5PXbcMJURZbFm%2B_aJSA%40mail.gmail.com.

Oh right, I can see Amit's point too. Let's put pg_replication_slots() out of
the game then.

> > CR6 ===
> >
> > +static bool
> > +InvalidateSlotForInactiveTimeout(ReplicationSlot *slot, bool need_locks)
> > +{
> >
> > InvalidatePossiblyInactiveSlot() maybe?
> 
> I think we will lose the essence i.e. timeout from the suggested
> function name, otherwise just the inactive doesn't give a clearer
> meaning. I kept it that way unless anyone suggests otherwise.

Right. OTOH I think that "Possibly" adds some nuance (like InvalidatePossiblyObsoleteSlot()
is already doing).

> Please see the attached v30 patch. 0002 is where all of the above
> review comments have been addressed.

Thanks! FYI, I did not look at the content yet, just replied to the above
comments.

Regards,

-- 
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com



On Mon, Apr 1, 2024 at 12:18 PM Bharath Rupireddy
<bharath.rupireddyforpostgres@gmail.com> wrote:
>
> On Fri, Mar 29, 2024 at 9:39 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
> >
> > Commit message states: "why we can't just update inactive_since for
> > synced slots on the standby with the value received from remote slot
> > on the primary. This is consistent with any other slot parameter i.e.
> > all of them are synced from the primary."
> >
> > The inactive_since is not consistent with other slot parameters which
> > we copy. We don't perform anything related to those other parameters
> > like say two_phase phase which can change that property. However, we
> > do acquire the slot, advance the slot (as per recent discussion [1]),
> > and release it. Since these operations can impact inactive_since, it
> > seems to me that inactive_since is not the same as other parameters.
> > It can have a different value than the primary. Why would anyone want
> > to know the value of inactive_since from primary after the standby is
> > promoted?
>
> After thinking about it for a while now, it feels to me that the
> synced slots (slots on the standby that are being synced from the
> primary) can have their own inactive_sicne value. Fundamentally,
> inactive_sicne is set to 0 when slot is acquired and set to current
> time when slot is released, no matter who acquires and releases it -
> be it walsenders for replication, or backends for slot advance, or
> backends for slot sync using pg_sync_replication_slots, or backends
> for other slot functions, or background sync worker. Remember the
> earlier patch was updating inactive_since just for walsenders, but
> then the suggestion was to update it unconditionally -
> https://www.postgresql.org/message-id/CAJpy0uD64X%3D2ENmbHaRiWTKeQawr-rbGoy_GdhQQLVXzUSKTMg%40mail.gmail.com.
> Whoever syncs the slot, *acutally* acquires the slot i.e. makes it
> theirs, syncs it from the primary, and releases it. IMO, no
> differentiation is to be made for synced slots.

FWIW, coming to this thread late, I think that the inactive_since
should not be synchronized from the primary. The wall clocks are
different on the primary and the standby so having the primary's
timestamp on the standby can confuse users, especially when there is a
big clock drift. Also, as Amit mentioned, inactive_since seems not to
be consistent with other parameters we copy. The
replication_slot_inactive_timeout feature should work on the standby
independent from the primary, like other slot invalidation mechanisms,
and it should be based on its own local clock.

> Coming to a future patch for inactive timeout based slot invalidation,
> we can either allow invalidation without any differentiation for
> synced slots or restrict invalidation to avoid more sync work. For
> instance, if inactive timeout is kept low on the standby, the sync
> worker will be doing more work as it drops and recreates a slot
> repeatedly if it keeps getting invalidated. Another thing is that the
> standby takes independent invalidation decisions for synced slots.
> AFAICS, invalidation due to wal_removal is the only sole reason (out
> of all available invalidation reasons) for a synced slot to get
> invalidated independently of the primary. Check
> https://www.postgresql.org/message-id/CAA4eK1JXBwTaDRD_%3D8t6UB1fhRNjC1C%2BgH4YdDxj_9U6djLnXw%40mail.gmail.com
> for the suggestion on we better not differentiaing invalidation
> decisions for synced slots.
>
> The assumption of letting synced slots have their own inactive_since
> not only simplifies the code, but also looks less-confusing and more
> meaningful to the user. The only code that we put in on top of the
> committed code is to use InRecovery in place of
> RecoveryInProgress() in RestoreSlotFromDisk() to fix the issue raised
> by Shveta upthread.

If we want to invalidate the synced slots due to the timeout, I think
we need to define what is "inactive" for synced slots.

Suppose that the slotsync worker updates the local (synced) slot's
inactive_since whenever releasing the slot, irrespective of the actual
LSNs (or other slot parameters) having been updated. I think that this
idea cannot handle a slot that is not acquired on the primary. In this
case, the remote slot is inactive but the local slot is regarded as
active.  WAL files are piled up on the standby (and on the primary) as
the slot's LSNs don't move forward. I think we want to regard such a
slot as "inactive" also on the standby and invalidate it because of
the timeout.

>
> > Now, the other concern is that calling GetCurrentTimestamp()
> > could be costly when the values for the slot are not going to be
> > updated but if that happens we can optimize such that before acquiring
> > the slot we can have some minimal pre-checks to ensure whether we need
> > to update the slot or not.

If we use such pre-checks, another problem might happen; it cannot
handle a case where the slot is acquired on the primary but its LSNs
don't move forward. Imagine a logical replication conflict happened on
the subscriber, and the logical replication enters the retry loop. In
this case, the remote slot's inactive_since gets updated for every
retry, but it looks inactive from the standby since the slot LSNs
don't change. Therefore, only the local slot could be invalidated due
to the timeout but probably we don't want to regard such a slot as
"inactive".

Another idea I came up with is that the slotsync worker updates the
local slot's inactive_since to the local timestamp only when the
remote slot might have got inactive. If the remote slot is acquired by
someone, the local slot's inactive_since is also NULL. If the remote
slot gets inactive, the slotsync worker sets the local timestamp to
the local slot's inactive_since. Since the remote slot could be
acquired and released before the slotsync worker gets the remote slot
data again, if the remote slot's inactive_since > the local slot's
inactive_since, the slotsync worker updates the local one. IOW, we
detect whether the remote slot was acquired and released since the
last synchronization, by checking the remote slot's inactive_since.
This idea seems to handle these cases I mentioned unless I'm missing
something, but it requires for the slotsync worker to update
inactive_since in a different way than other parameters.

Or a simple solution is that the slotsync worker updates
inactive_since as it does for non-synced slots, and disables
timeout-based slot invalidation for synced slots.

Regards,

--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com



Hi,

On Tue, Apr 02, 2024 at 12:07:54PM +0900, Masahiko Sawada wrote:
> On Mon, Apr 1, 2024 at 12:18 PM Bharath Rupireddy
> 
> FWIW, coming to this thread late, I think that the inactive_since
> should not be synchronized from the primary. The wall clocks are
> different on the primary and the standby so having the primary's
> timestamp on the standby can confuse users, especially when there is a
> big clock drift. Also, as Amit mentioned, inactive_since seems not to
> be consistent with other parameters we copy. The
> replication_slot_inactive_timeout feature should work on the standby
> independent from the primary, like other slot invalidation mechanisms,
> and it should be based on its own local clock.

Thanks for sharing your thoughts! So, it looks like that most of us agree to not
sync inactive_since from the primary, I'm fine with that.

> If we want to invalidate the synced slots due to the timeout, I think
> we need to define what is "inactive" for synced slots.
> 
> Suppose that the slotsync worker updates the local (synced) slot's
> inactive_since whenever releasing the slot, irrespective of the actual
> LSNs (or other slot parameters) having been updated. I think that this
> idea cannot handle a slot that is not acquired on the primary. In this
> case, the remote slot is inactive but the local slot is regarded as
> active.  WAL files are piled up on the standby (and on the primary) as
> the slot's LSNs don't move forward. I think we want to regard such a
> slot as "inactive" also on the standby and invalidate it because of
> the timeout.

I think that makes sense to somehow link inactive_since on the standby to 
the actual LSNs (or other slot parameters) being updated or not.

> > > Now, the other concern is that calling GetCurrentTimestamp()
> > > could be costly when the values for the slot are not going to be
> > > updated but if that happens we can optimize such that before acquiring
> > > the slot we can have some minimal pre-checks to ensure whether we need
> > > to update the slot or not.
> 
> If we use such pre-checks, another problem might happen; it cannot
> handle a case where the slot is acquired on the primary but its LSNs
> don't move forward. Imagine a logical replication conflict happened on
> the subscriber, and the logical replication enters the retry loop. In
> this case, the remote slot's inactive_since gets updated for every
> retry, but it looks inactive from the standby since the slot LSNs
> don't change. Therefore, only the local slot could be invalidated due
> to the timeout but probably we don't want to regard such a slot as
> "inactive".
> 
> Another idea I came up with is that the slotsync worker updates the
> local slot's inactive_since to the local timestamp only when the
> remote slot might have got inactive. If the remote slot is acquired by
> someone, the local slot's inactive_since is also NULL. If the remote
> slot gets inactive, the slotsync worker sets the local timestamp to
> the local slot's inactive_since. Since the remote slot could be
> acquired and released before the slotsync worker gets the remote slot
> data again, if the remote slot's inactive_since > the local slot's
> inactive_since, the slotsync worker updates the local one.

Then I think we would need to be careful about time zone comparison.

> IOW, we
> detect whether the remote slot was acquired and released since the
> last synchronization, by checking the remote slot's inactive_since.
> This idea seems to handle these cases I mentioned unless I'm missing
> something, but it requires for the slotsync worker to update
> inactive_since in a different way than other parameters.
> 
> Or a simple solution is that the slotsync worker updates
> inactive_since as it does for non-synced slots, and disables
> timeout-based slot invalidation for synced slots.

Yeah, I think the main question to help us decide is: do we want to invalidate
"inactive" synced slots locally (in addition to synchronizing the invalidation
from the primary)? 

Regards,

-- 
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com



On Tue, Apr 2, 2024 at 11:58 AM Bertrand Drouvot
<bertranddrouvot.pg@gmail.com> wrote:
>
> > Or a simple solution is that the slotsync worker updates
> > inactive_since as it does for non-synced slots, and disables
> > timeout-based slot invalidation for synced slots.
>
> Yeah, I think the main question to help us decide is: do we want to invalidate
> "inactive" synced slots locally (in addition to synchronizing the invalidation
> from the primary)?

I think this approach looks way simpler than the other one. The other
approach of linking inactive_since on the standby for synced slots to
the actual LSNs (or other slot parameters) being updated or not looks
more complicated, and might not go well with the end user.  However,
we need to be able to say why we don't invalidate synced slots due to
inactive timeout unlike the wal_removed invalidation that can happen
right now on the standby for synced slots. This leads us to define
actually what a slot being active means. Is syncing the data from the
remote slot considered as the slot being active?

On the other hand, it may not sound great if we don't invalidate
synced slots due to inactive timeout even though they hold resources
such as WAL and XIDs.

--
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com



Hi,

On Tue, Apr 02, 2024 at 12:41:35PM +0530, Bharath Rupireddy wrote:
> On Tue, Apr 2, 2024 at 11:58 AM Bertrand Drouvot
> <bertranddrouvot.pg@gmail.com> wrote:
> >
> > > Or a simple solution is that the slotsync worker updates
> > > inactive_since as it does for non-synced slots, and disables
> > > timeout-based slot invalidation for synced slots.
> >
> > Yeah, I think the main question to help us decide is: do we want to invalidate
> > "inactive" synced slots locally (in addition to synchronizing the invalidation
> > from the primary)?
> 
> I think this approach looks way simpler than the other one. The other
> approach of linking inactive_since on the standby for synced slots to
> the actual LSNs (or other slot parameters) being updated or not looks
> more complicated, and might not go well with the end user.  However,
> we need to be able to say why we don't invalidate synced slots due to
> inactive timeout unlike the wal_removed invalidation that can happen
> right now on the standby for synced slots. This leads us to define
> actually what a slot being active means. Is syncing the data from the
> remote slot considered as the slot being active?
> 
> On the other hand, it may not sound great if we don't invalidate
> synced slots due to inactive timeout even though they hold resources
> such as WAL and XIDs.

Right and the "only" benefit then would be to give an idea as to when the last
sync did occur on the local slot.

Regards,

-- 
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com



On Tue, Apr 2, 2024 at 11:58 AM Bertrand Drouvot
<bertranddrouvot.pg@gmail.com> wrote:
>
> Hi,
>
> On Tue, Apr 02, 2024 at 12:07:54PM +0900, Masahiko Sawada wrote:
> > On Mon, Apr 1, 2024 at 12:18 PM Bharath Rupireddy
> >
> > FWIW, coming to this thread late, I think that the inactive_since
> > should not be synchronized from the primary. The wall clocks are
> > different on the primary and the standby so having the primary's
> > timestamp on the standby can confuse users, especially when there is a
> > big clock drift. Also, as Amit mentioned, inactive_since seems not to
> > be consistent with other parameters we copy. The
> > replication_slot_inactive_timeout feature should work on the standby
> > independent from the primary, like other slot invalidation mechanisms,
> > and it should be based on its own local clock.
>
> Thanks for sharing your thoughts! So, it looks like that most of us agree to not
> sync inactive_since from the primary, I'm fine with that.

+1 on not syncing slots from primary.

> > If we want to invalidate the synced slots due to the timeout, I think
> > we need to define what is "inactive" for synced slots.
> >
> > Suppose that the slotsync worker updates the local (synced) slot's
> > inactive_since whenever releasing the slot, irrespective of the actual
> > LSNs (or other slot parameters) having been updated. I think that this
> > idea cannot handle a slot that is not acquired on the primary. In this
> > case, the remote slot is inactive but the local slot is regarded as
> > active.  WAL files are piled up on the standby (and on the primary) as
> > the slot's LSNs don't move forward. I think we want to regard such a
> > slot as "inactive" also on the standby and invalidate it because of
> > the timeout.
>
> I think that makes sense to somehow link inactive_since on the standby to
> the actual LSNs (or other slot parameters) being updated or not.
>
> > > > Now, the other concern is that calling GetCurrentTimestamp()
> > > > could be costly when the values for the slot are not going to be
> > > > updated but if that happens we can optimize such that before acquiring
> > > > the slot we can have some minimal pre-checks to ensure whether we need
> > > > to update the slot or not.
> >
> > If we use such pre-checks, another problem might happen; it cannot
> > handle a case where the slot is acquired on the primary but its LSNs
> > don't move forward. Imagine a logical replication conflict happened on
> > the subscriber, and the logical replication enters the retry loop. In
> > this case, the remote slot's inactive_since gets updated for every
> > retry, but it looks inactive from the standby since the slot LSNs
> > don't change. Therefore, only the local slot could be invalidated due
> > to the timeout but probably we don't want to regard such a slot as
> > "inactive".
> >
> > Another idea I came up with is that the slotsync worker updates the
> > local slot's inactive_since to the local timestamp only when the
> > remote slot might have got inactive. If the remote slot is acquired by
> > someone, the local slot's inactive_since is also NULL. If the remote
> > slot gets inactive, the slotsync worker sets the local timestamp to
> > the local slot's inactive_since. Since the remote slot could be
> > acquired and released before the slotsync worker gets the remote slot
> > data again, if the remote slot's inactive_since > the local slot's
> > inactive_since, the slotsync worker updates the local one.
>
> Then I think we would need to be careful about time zone comparison.

Yes. Also we need to consider the case when a user is relying on
pg_sync_replication_slots() and has not enabled slot-sync worker. In
such a case if synced slot's inactive_since is derived from inactivity
of remote-slot, it might not be that frequently updated (based on when
the user actually runs the SQL function) and thus may be misleading.
OTOH, if inactivty_since of synced slots represents its own
inactivity, then it will give correct info even for the case when the
SQL function is run after a long time and slot-sync worker is
disabled.

> > IOW, we
> > detect whether the remote slot was acquired and released since the
> > last synchronization, by checking the remote slot's inactive_since.
> > This idea seems to handle these cases I mentioned unless I'm missing
> > something, but it requires for the slotsync worker to update
> > inactive_since in a different way than other parameters.
> >
> > Or a simple solution is that the slotsync worker updates
> > inactive_since as it does for non-synced slots, and disables
> > timeout-based slot invalidation for synced slots.

I like this idea better, it takes care of such a case too when the
user is relying on sync-function rather than worker and does not want
to get the slots invalidated in between 2 sync function calls.

> Yeah, I think the main question to help us decide is: do we want to invalidate
> "inactive" synced slots locally (in addition to synchronizing the invalidation
> from the primary)?


thanks
Shveta



On Wed, Apr 3, 2024 at 8:38 AM shveta malik <shveta.malik@gmail.com> wrote:
>
> > > Or a simple solution is that the slotsync worker updates
> > > inactive_since as it does for non-synced slots, and disables
> > > timeout-based slot invalidation for synced slots.
>
> I like this idea better, it takes care of such a case too when the
> user is relying on sync-function rather than worker and does not want
> to get the slots invalidated in between 2 sync function calls.

Please find the attached v31 patches implementing the above idea:

- synced slots get their on inactive_since just like any other slot
- synced slots don't get invalidated due to inactive timeout because
such slots not considered active at all as they don't perform logical
decoding (of course, they will perform in fast_forward mode to fix the
other data loss issue, but they don't generate changes for them to be
called as *active* slots)
- synced slots inactive_since is set to current timestamp after the
standby gets promoted to help inactive_since interpret correctly just
like any other slot.

--
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Attachment
Hi,

On Wed, Apr 03, 2024 at 11:17:41AM +0530, Bharath Rupireddy wrote:
> On Wed, Apr 3, 2024 at 8:38 AM shveta malik <shveta.malik@gmail.com> wrote:
> >
> > > > Or a simple solution is that the slotsync worker updates
> > > > inactive_since as it does for non-synced slots, and disables
> > > > timeout-based slot invalidation for synced slots.
> >
> > I like this idea better, it takes care of such a case too when the
> > user is relying on sync-function rather than worker and does not want
> > to get the slots invalidated in between 2 sync function calls.
> 
> Please find the attached v31 patches implementing the above idea:

Thanks!

Some comments related to v31-0001:

=== testing the behavior

T1 ===

> - synced slots get their on inactive_since just like any other slot

It behaves as described.

T2 ===

> - synced slots inactive_since is set to current timestamp after the
> standby gets promoted to help inactive_since interpret correctly just
> like any other slot.
 
It behaves as described.

CR1 ===

+        <structfield>inactive_since</structfield> value will get updated
+        after every synchronization

indicates the last synchronization time? (I think that after every synchronization
could lead to confusion).

CR2 ===

+                       /*
+                        * Set the time since the slot has become inactive after shutting
+                        * down slot sync machinery. This helps correctly interpret the
+                        * time if the standby gets promoted without a restart.
+                        */

It looks to me that this comment is not at the right place because there is
nothing after the comment that indicates that we shutdown the "slot sync machinery".

Maybe a better place is before the function definition and mention that this is
currently called when we shutdown the "slot sync machinery"?

CR3 ===

+                        * We get the current time beforehand and only once to avoid
+                        * system calls overhead while holding the lock.

s/avoid system calls overhead while holding the lock/avoid system calls while holding the spinlock/?

CR4 ===

+        * Set the time since the slot has become inactive. We get the current
+        * time beforehand to avoid system call overhead while holding the lock

Same.

CR5 ===

+       # Check that the captured time is sane
+       if (defined $reference_time)
+       {

s/Check that the captured time is sane/Check that the inactive_since is sane/?

Sorry if some of those comments could have been done while I did review v29-0001.

Regards,

-- 
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com



Hi,

On Wed, Apr 03, 2024 at 11:17:41AM +0530, Bharath Rupireddy wrote:
> On Wed, Apr 3, 2024 at 8:38 AM shveta malik <shveta.malik@gmail.com> wrote:
> >
> > > > Or a simple solution is that the slotsync worker updates
> > > > inactive_since as it does for non-synced slots, and disables
> > > > timeout-based slot invalidation for synced slots.
> >
> > I like this idea better, it takes care of such a case too when the
> > user is relying on sync-function rather than worker and does not want
> > to get the slots invalidated in between 2 sync function calls.
> 
> Please find the attached v31 patches implementing the above idea:

Thanks!

Some comments regarding v31-0002:

=== testing the behavior

T1 ===

> - synced slots don't get invalidated due to inactive timeout because
> such slots not considered active at all as they don't perform logical
> decoding (of course, they will perform in fast_forward mode to fix the
> other data loss issue, but they don't generate changes for them to be
> called as *active* slots)

It behaves as described. OTOH non synced logical slots on the standby and
physical slots on the standby are invalidated which is what is expected.

T2 ===

In case the slot is invalidated on the primary,

primary:

postgres=# select slot_name, inactive_since, invalidation_reason from pg_replication_slots where slot_name = 's1';
 slot_name |        inactive_since         | invalidation_reason
-----------+-------------------------------+---------------------
 s1        | 2024-04-03 06:56:28.075637+00 | inactive_timeout

then on the standby we get:

standby:

postgres=# select slot_name, inactive_since, invalidation_reason from pg_replication_slots where slot_name = 's1';
 slot_name |        inactive_since        | invalidation_reason
-----------+------------------------------+---------------------
 s1        | 2024-04-03 07:06:43.37486+00 | inactive_timeout

shouldn't the slot be dropped/recreated instead of updating inactive_since?

=== code

CR1 ===

+        Invalidates replication slots that are inactive for longer the
+        specified amount of time

s/for longer the/for longer that/?

CR2 ===

+        <literal>true</literal>) as such synced slots don't actually perform
+        logical decoding.

We're switching in fast forward logical due to [1], so I'm not sure that's 100%
accurate here. I'm not sure we need to specify a reason.

CR3 ===

+ errdetail("This slot has been invalidated because it was inactive for more than the time specified by
replication_slot_inactive_timeoutparameter.")));
 

I think we can remove "parameter" (see for example the error message in
validate_remote_info()) and reduce it a bit, something like?

"This slot has been invalidated because it was inactive for more than replication_slot_inactive_timeout"?

CR4 ===

+ appendStringInfoString(&err_detail, _("The slot has been inactive for more than the time specified by
replication_slot_inactive_timeoutparameter."));
 

Same.

CR5 ===

+       /*
+        * This function isn't expected to be called for inactive timeout based
+        * invalidation. A separate function InvalidateInactiveReplicationSlot is
+        * to be used for that.

Do you think it's worth to explain why?

CR6 ===

+       if (replication_slot_inactive_timeout == 0)
+               return false;
+       else if (slot->inactive_since > 0)

"else" is not needed here.

CR7 ===

+               SpinLockAcquire(&slot->mutex);
+
+               /*
+                * Check if the slot needs to be invalidated due to
+                * replication_slot_inactive_timeout GUC. We do this with the spinlock
+                * held to avoid race conditions -- for example the inactive_since
+                * could change, or the slot could be dropped.
+                */
+               now = GetCurrentTimestamp();

We should not call GetCurrentTimestamp() while holding a spinlock.

CR8 ===

+# Testcase start: Invalidate streaming standby's slot as well as logical
+# failover slot on primary due to inactive timeout GUC. Also, check the logical

s/inactive timeout GUC/replication_slot_inactive_timeout/?

CR9 ===

+# Start: Helper functions used for this test file
+# End: Helper functions used for this test file

I think that's the first TAP test with this comment. Not saying we should not but
why did you feel the need to add those?

[1]:
https://www.postgresql.org/message-id/OS0PR01MB5716B3942AE49F3F725ACA92943B2@OS0PR01MB5716.jpnprd01.prod.outlook.com

Regards,

-- 
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com



On Wed, Apr 3, 2024 at 11:17 AM Bharath Rupireddy
<bharath.rupireddyforpostgres@gmail.com> wrote:
>
> On Wed, Apr 3, 2024 at 8:38 AM shveta malik <shveta.malik@gmail.com> wrote:
> >
> > > > Or a simple solution is that the slotsync worker updates
> > > > inactive_since as it does for non-synced slots, and disables
> > > > timeout-based slot invalidation for synced slots.
> >
> > I like this idea better, it takes care of such a case too when the
> > user is relying on sync-function rather than worker and does not want
> > to get the slots invalidated in between 2 sync function calls.
>
> Please find the attached v31 patches implementing the above idea:
>

Thanks for the patches, please find few comments:

v31-001:

1)
system-views.sgml:
value will get updated  after every synchronization from the
corresponding remote slot on the primary.

--This is confusing. It will be good to rephrase it.

2)
update_synced_slots_inactive_since()

--May be, we should mention in the header that this function is called
only during promotion.

3) 040_standby_failover_slots_sync.pl:
We capture inactive_since_on_primary when we do this for the first time at #175
ALTER SUBSCRIPTION regress_mysub1 DISABLE"

But we again recreate the sub and disable it at line #280.
Do you think we shall get inactive_since_on_primary again here, to be
compared with inactive_since_on_new_primary later?


v31-002:
(I had reviewed v29-002 but missed to post comments,  I think these
are still applicable)

1) I think replication_slot_inactivity_timeout was recommended here
(instead of replication_slot_inactive_timeout, so please give it a
thought):
https://www.postgresql.org/message-id/202403260739.udlp7lxixktx%40alvherre.pgsql

2) Commit msg:
a)
"It is often easy for developers to set a timeout of say 1
or 2 or 3 days at slot level, after which the inactive slots get
dropped."

Shall we say invalidated rather than dropped?

b)
"To achieve the above, postgres introduces a GUC allowing users
set inactive timeout and then a slot stays inactive for this much
amount of time it invalidates the slot."

Broken sentence.

<have not reviewed 002 patch in detail yet>

thanks
Shveta



On Wed, Apr 3, 2024 at 12:20 PM Bertrand Drouvot
<bertranddrouvot.pg@gmail.com> wrote:
>
> On Wed, Apr 03, 2024 at 11:17:41AM +0530, Bharath Rupireddy wrote:
> > On Wed, Apr 3, 2024 at 8:38 AM shveta malik <shveta.malik@gmail.com> wrote:
> > >
> > > > > Or a simple solution is that the slotsync worker updates
> > > > > inactive_since as it does for non-synced slots, and disables
> > > > > timeout-based slot invalidation for synced slots.
> > >
> > > I like this idea better, it takes care of such a case too when the
> > > user is relying on sync-function rather than worker and does not want
> > > to get the slots invalidated in between 2 sync function calls.
> >
> > Please find the attached v31 patches implementing the above idea:
>
> Thanks!
>
> Some comments related to v31-0001:
>
> === testing the behavior
>
> T1 ===
>
> > - synced slots get their on inactive_since just like any other slot
>
> It behaves as described.
>
> T2 ===
>
> > - synced slots inactive_since is set to current timestamp after the
> > standby gets promoted to help inactive_since interpret correctly just
> > like any other slot.
>
> It behaves as described.
>
> CR1 ===
>
> +        <structfield>inactive_since</structfield> value will get updated
> +        after every synchronization
>
> indicates the last synchronization time? (I think that after every synchronization
> could lead to confusion).
>

+1.

> CR2 ===
>
> +                       /*
> +                        * Set the time since the slot has become inactive after shutting
> +                        * down slot sync machinery. This helps correctly interpret the
> +                        * time if the standby gets promoted without a restart.
> +                        */
>
> It looks to me that this comment is not at the right place because there is
> nothing after the comment that indicates that we shutdown the "slot sync machinery".
>
> Maybe a better place is before the function definition and mention that this is
> currently called when we shutdown the "slot sync machinery"?
>

Won't it be better to have an assert for SlotSyncCtx->pid? IIRC, we
have some existing issues where we don't ensure that no one is running
sync API before shutdown is complete but I think we can deal with that
separately and here we can still have an Assert.

> CR3 ===
>
> +                        * We get the current time beforehand and only once to avoid
> +                        * system calls overhead while holding the lock.
>
> s/avoid system calls overhead while holding the lock/avoid system calls while holding the spinlock/?
>

Is it valid to say that there is overhead of this call while holding
spinlock? Because I don't think at the time of promotion we expect any
other concurrent slot activity. The first reason seems good enough.

One other observation:
--- a/src/backend/replication/slot.c
+++ b/src/backend/replication/slot.c
@@ -42,6 +42,7 @@
 #include "access/transam.h"
 #include "access/xlog_internal.h"
 #include "access/xlogrecovery.h"
+#include "access/xlogutils.h"

Is there a reason for this inclusion? I don't see any change which
should need this one.

--
With Regards,
Amit Kapila.



On Wed, Apr 3, 2024 at 2:58 PM shveta malik <shveta.malik@gmail.com> wrote:
>
> On Wed, Apr 3, 2024 at 11:17 AM Bharath Rupireddy
> <bharath.rupireddyforpostgres@gmail.com> wrote:
> >
> > On Wed, Apr 3, 2024 at 8:38 AM shveta malik <shveta.malik@gmail.com> wrote:
> > >
> > > > > Or a simple solution is that the slotsync worker updates
> > > > > inactive_since as it does for non-synced slots, and disables
> > > > > timeout-based slot invalidation for synced slots.
> > >
> > > I like this idea better, it takes care of such a case too when the
> > > user is relying on sync-function rather than worker and does not want
> > > to get the slots invalidated in between 2 sync function calls.
> >
> > Please find the attached v31 patches implementing the above idea:
> >
>
> Thanks for the patches, please find few comments:
>
> v31-001:
>
> 1)
> system-views.sgml:
> value will get updated  after every synchronization from the
> corresponding remote slot on the primary.
>
> --This is confusing. It will be good to rephrase it.
>
> 2)
> update_synced_slots_inactive_since()
>
> --May be, we should mention in the header that this function is called
> only during promotion.
>
> 3) 040_standby_failover_slots_sync.pl:
> We capture inactive_since_on_primary when we do this for the first time at #175
> ALTER SUBSCRIPTION regress_mysub1 DISABLE"
>
> But we again recreate the sub and disable it at line #280.
> Do you think we shall get inactive_since_on_primary again here, to be
> compared with inactive_since_on_new_primary later?
>

I think so.

Few additional comments on tests:
1.
+is( $standby1->safe_psql(
+ 'postgres',
+ "SELECT '$inactive_since_on_primary'::timestamptz <
'$inactive_since_on_standby'::timestamptz AND
+ '$inactive_since_on_standby'::timestamptz < '$slot_sync_time'::timestamptz;"

Shall we do <= check as we are doing in the main function
get_slot_inactive_since_value as the time duration is less so it can
be the same as well? Similarly, please check other tests.

2.
+=item $node->get_slot_inactive_since_value(self, slot_name, reference_time)
+
+Get inactive_since column value for a given replication slot validating it
+against optional reference time.
+
+=cut
+
+sub get_slot_inactive_since_value

I see that all callers validate against reference time. It is better
to name it validate_slot_inactive_since rather than using get_* as the
main purpose is to validate the passed value.

--
With Regards,
Amit Kapila.



On Wed, Apr 3, 2024 at 12:20 PM Bertrand Drouvot
<bertranddrouvot.pg@gmail.com> wrote:
>
> > Please find the attached v31 patches implementing the above idea:
>
> Some comments related to v31-0001:
>
> === testing the behavior
>
> T1 ===
>
> > - synced slots get their on inactive_since just like any other slot
>
> It behaves as described.
>
> T2 ===
>
> > - synced slots inactive_since is set to current timestamp after the
> > standby gets promoted to help inactive_since interpret correctly just
> > like any other slot.
>
> It behaves as described.

Thanks for testing.

> CR1 ===
>
> +        <structfield>inactive_since</structfield> value will get updated
> +        after every synchronization
>
> indicates the last synchronization time? (I think that after every synchronization
> could lead to confusion).

Done.

> CR2 ===
>
> +                       /*
> +                        * Set the time since the slot has become inactive after shutting
> +                        * down slot sync machinery. This helps correctly interpret the
> +                        * time if the standby gets promoted without a restart.
> +                        */
>
> It looks to me that this comment is not at the right place because there is
> nothing after the comment that indicates that we shutdown the "slot sync machinery".
>
> Maybe a better place is before the function definition and mention that this is
> currently called when we shutdown the "slot sync machinery"?

Done.

> CR3 ===
>
> +                        * We get the current time beforehand and only once to avoid
> +                        * system calls overhead while holding the lock.
>
> s/avoid system calls overhead while holding the lock/avoid system calls while holding the spinlock/?

Done.

> CR4 ===
>
> +        * Set the time since the slot has become inactive. We get the current
> +        * time beforehand to avoid system call overhead while holding the lock
>
> Same.

Done.

> CR5 ===
>
> +       # Check that the captured time is sane
> +       if (defined $reference_time)
> +       {
>
> s/Check that the captured time is sane/Check that the inactive_since is sane/?
>
> Sorry if some of those comments could have been done while I did review v29-0001.

Done.

On Wed, Apr 3, 2024 at 2:58 PM shveta malik <shveta.malik@gmail.com> wrote:
>
> Thanks for the patches, please find few comments:
>
> v31-001:
>
> 1)
> system-views.sgml:
> value will get updated  after every synchronization from the
> corresponding remote slot on the primary.
>
> --This is confusing. It will be good to rephrase it.

Done as per Bertrand's suggestion.

> 2)
> update_synced_slots_inactive_since()
>
> --May be, we should mention in the header that this function is called
> only during promotion.

Done as per Bertrand's suggestion.

> 3) 040_standby_failover_slots_sync.pl:
> We capture inactive_since_on_primary when we do this for the first time at #175
> ALTER SUBSCRIPTION regress_mysub1 DISABLE"
>
> But we again recreate the sub and disable it at line #280.
> Do you think we shall get inactive_since_on_primary again here, to be
> compared with inactive_since_on_new_primary later?

Hm. Done that. Recapturing both slot_creation_time_on_primary and
inactive_since_on_primary before and after CREATE SUBSCRIPTION creates
the slot again on the primary/publisher.

On Wed, Apr 3, 2024 at 3:32 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> > CR2 ===
> >
> > +                       /*
> > +                        * Set the time since the slot has become inactive after shutting
> > +                        * down slot sync machinery. This helps correctly interpret the
> > +                        * time if the standby gets promoted without a restart.
> > +                        */
> >
> > It looks to me that this comment is not at the right place because there is
> > nothing after the comment that indicates that we shutdown the "slot sync machinery".
> >
> > Maybe a better place is before the function definition and mention that this is
> > currently called when we shutdown the "slot sync machinery"?
> >
> Won't it be better to have an assert for SlotSyncCtx->pid? IIRC, we
> have some existing issues where we don't ensure that no one is running
> sync API before shutdown is complete but I think we can deal with that
> separately and here we can still have an Assert.

That can work to ensure the slot sync worker isn't running as
SlotSyncCtx->pid gets updated only for the slot sync worker. I added
this assertion for now.

We need to ensure (in a separate patch and thread) there is no backend
acquiring it and performing sync while the slot sync worker is
shutting down. Otherwise, some of the slots can get resynced and some
are not while we are shutting down the slot sync worker as part of the
standby promotion which might leave the slots in an inconsistent
state.

> > CR3 ===
> >
> > +                        * We get the current time beforehand and only once to avoid
> > +                        * system calls overhead while holding the lock.
> >
> > s/avoid system calls overhead while holding the lock/avoid system calls while holding the spinlock/?
> >
> Is it valid to say that there is overhead of this call while holding
> spinlock? Because I don't think at the time of promotion we expect any
> other concurrent slot activity. The first reason seems good enough.

No slot activity but why GetCurrentTimestamp needs to be called every
time in a loop.

> One other observation:
> --- a/src/backend/replication/slot.c
> +++ b/src/backend/replication/slot.c
> @@ -42,6 +42,7 @@
>  #include "access/transam.h"
>  #include "access/xlog_internal.h"
>  #include "access/xlogrecovery.h"
> +#include "access/xlogutils.h"
>
> Is there a reason for this inclusion? I don't see any change which
> should need this one.

Not anymore. It was earlier needed for using the InRecovery flag in
the then approach.

On Wed, Apr 3, 2024 at 4:19 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> > 3) 040_standby_failover_slots_sync.pl:
> > We capture inactive_since_on_primary when we do this for the first time at #175
> > ALTER SUBSCRIPTION regress_mysub1 DISABLE"
> >
> > But we again recreate the sub and disable it at line #280.
> > Do you think we shall get inactive_since_on_primary again here, to be
> > compared with inactive_since_on_new_primary later?
> >
>
> I think so.

Modified this to recapture the times before and after the slot gets recreated.

> Few additional comments on tests:
> 1.
> +is( $standby1->safe_psql(
> + 'postgres',
> + "SELECT '$inactive_since_on_primary'::timestamptz <
> '$inactive_since_on_standby'::timestamptz AND
> + '$inactive_since_on_standby'::timestamptz < '$slot_sync_time'::timestamptz;"
>
> Shall we do <= check as we are doing in the main function
> get_slot_inactive_since_value as the time duration is less so it can
> be the same as well? Similarly, please check other tests.

I get you. If the tests are so fast that losing a bit of precision
might cause tests to fail. So, I'v added equality check for all the
tests.

> 2.
> +=item $node->get_slot_inactive_since_value(self, slot_name, reference_time)
> +
> +Get inactive_since column value for a given replication slot validating it
> +against optional reference time.
> +
> +=cut
> +
> +sub get_slot_inactive_since_value
>
> I see that all callers validate against reference time. It is better
> to name it validate_slot_inactive_since rather than using get_* as the
> main purpose is to validate the passed value.

Existing callers yes. Also, I've removed the reference time as an
optional parameter.

Per an offlist chat with Amit, I've added the following note in
synchronize_one_slot:

@@ -584,6 +585,11 @@ synchronize_one_slot(RemoteSlot *remote_slot, Oid
remote_dbid)
          * overwriting 'invalidated' flag to remote_slot's value. See
          * InvalidatePossiblyObsoleteSlot() where it invalidates slot directly
          * if the slot is not acquired by other processes.
+         *
+         * XXX: If it ever turns out that slot acquire/release is costly for
+         * cases when none of the slot property is changed then we can do a
+         * pre-check to ensure that at least one of the slot property is
+         * changed before acquiring the slot.
          */
         ReplicationSlotAcquire(remote_slot->name, true);

Please find the attached v32-0001 patch with the above review comments
addressed. I'm working on review comments for 0002.

--
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Attachment
Hi,

On Wed, Apr 03, 2024 at 05:12:12PM +0530, Bharath Rupireddy wrote:
> On Wed, Apr 3, 2024 at 4:19 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> >
> > + 'postgres',
> > + "SELECT '$inactive_since_on_primary'::timestamptz <
> > '$inactive_since_on_standby'::timestamptz AND
> > + '$inactive_since_on_standby'::timestamptz < '$slot_sync_time'::timestamptz;"
> >
> > Shall we do <= check as we are doing in the main function
> > get_slot_inactive_since_value as the time duration is less so it can
> > be the same as well? Similarly, please check other tests.
> 
> I get you. If the tests are so fast that losing a bit of precision
> might cause tests to fail. So, I'v added equality check for all the
> tests.

> Please find the attached v32-0001 patch with the above review comments
> addressed.

Thanks!

Just one comment on v32-0001:

+# Synced slot on the standby must get its own inactive_since.
+is( $standby1->safe_psql(
+               'postgres',
+               "SELECT '$inactive_since_on_primary'::timestamptz <= '$inactive_since_on_standby'::timestamptz AND
+                       '$inactive_since_on_standby'::timestamptz <= '$slot_sync_time'::timestamptz;"
+       ),
+       "t",
+       'synchronized slot has got its own inactive_since');
+

By using <= we are not testing that it must get its own inactive_since (as we
allow them to be equal in the test). I think we should just add some usleep()
where appropriate and deny equality during the tests on inactive_since.

Except for the above, v32-0001 LGTM.

Regards,

-- 
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com



On Wed, Apr 3, 2024 at 6:46 PM Bertrand Drouvot
<bertranddrouvot.pg@gmail.com> wrote:
>
> Just one comment on v32-0001:
>
> +# Synced slot on the standby must get its own inactive_since.
> +is( $standby1->safe_psql(
> +               'postgres',
> +               "SELECT '$inactive_since_on_primary'::timestamptz <= '$inactive_since_on_standby'::timestamptz AND
> +                       '$inactive_since_on_standby'::timestamptz <= '$slot_sync_time'::timestamptz;"
> +       ),
> +       "t",
> +       'synchronized slot has got its own inactive_since');
> +
>
> By using <= we are not testing that it must get its own inactive_since (as we
> allow them to be equal in the test). I think we should just add some usleep()
> where appropriate and deny equality during the tests on inactive_since.

Thanks. It looks like we can ignore the equality in all of the
inactive_since comparisons. IIUC, all the TAP tests do run with
primary and standbys on the single BF animals. And, it looks like
assigning the inactive_since timestamps to perl variables is giving
the microseconds precision level
(./tmp_check/log/regress_log_040_standby_failover_slots_sync:inactive_since
2024-04-03 14:30:09.691648+00). FWIW, we already have some TAP and SQL
tests relying on stats_reset timestamps without equality. So, I've
left the equality for the inactive_since tests.

> Except for the above, v32-0001 LGTM.

Thanks. Please see the attached v33-0001 patch after removing equality
on inactive_since TAP tests.

On Wed, Apr 3, 2024 at 1:47 PM Bertrand Drouvot
<bertranddrouvot.pg@gmail.com> wrote:
>
> Some comments regarding v31-0002:
>
> === testing the behavior
>
> T1 ===
>
> > - synced slots don't get invalidated due to inactive timeout because
> > such slots not considered active at all as they don't perform logical
> > decoding (of course, they will perform in fast_forward mode to fix the
> > other data loss issue, but they don't generate changes for them to be
> > called as *active* slots)
>
> It behaves as described. OTOH non synced logical slots on the standby and
> physical slots on the standby are invalidated which is what is expected.

Right.

> T2 ===
>
> In case the slot is invalidated on the primary,
>
> primary:
>
> postgres=# select slot_name, inactive_since, invalidation_reason from pg_replication_slots where slot_name = 's1';
>  slot_name |        inactive_since         | invalidation_reason
> -----------+-------------------------------+---------------------
>  s1        | 2024-04-03 06:56:28.075637+00 | inactive_timeout
>
> then on the standby we get:
>
> standby:
>
> postgres=# select slot_name, inactive_since, invalidation_reason from pg_replication_slots where slot_name = 's1';
>  slot_name |        inactive_since        | invalidation_reason
> -----------+------------------------------+---------------------
>  s1        | 2024-04-03 07:06:43.37486+00 | inactive_timeout
>
> shouldn't the slot be dropped/recreated instead of updating inactive_since?

The sync slots that are invalidated on the primary aren't dropped and
recreated on the standby. There's no point in doing so because
invalidated slots on the primary can't be made useful. However, I
found that the synced slot is acquired and released unnecessarily
after the invalidation_reason is synced from the primary. I added a
skip check in synchronize_one_slot to skip acquiring and releasing the
slot if it's locally found inactive. With this, inactive_since won't
get updated for invalidated sync slots on the standby as we don't
acquire and release the slot.

> === code
>
> CR1 ===
>
> +        Invalidates replication slots that are inactive for longer the
> +        specified amount of time
>
> s/for longer the/for longer that/?

Fixed.

> CR2 ===
>
> +        <literal>true</literal>) as such synced slots don't actually perform
> +        logical decoding.
>
> We're switching in fast forward logical due to [1], so I'm not sure that's 100%
> accurate here. I'm not sure we need to specify a reason.

Fixed.

> CR3 ===
>
> + errdetail("This slot has been invalidated because it was inactive for more than the time specified by
replication_slot_inactive_timeoutparameter."))); 
>
> I think we can remove "parameter" (see for example the error message in
> validate_remote_info()) and reduce it a bit, something like?
>
> "This slot has been invalidated because it was inactive for more than replication_slot_inactive_timeout"?

Done.

> CR4 ===
>
> + appendStringInfoString(&err_detail, _("The slot has been inactive for more than the time specified by
replication_slot_inactive_timeoutparameter.")); 
>
> Same.

Done. Changed it to "The slot has been inactive for more than
replication_slot_inactive_timeout."

> CR5 ===
>
> +       /*
> +        * This function isn't expected to be called for inactive timeout based
> +        * invalidation. A separate function InvalidateInactiveReplicationSlot is
> +        * to be used for that.
>
> Do you think it's worth to explain why?

Hm, I just wanted to point out the actual function here. I modified it
to something like the following, if others feel we don't need that, I
can remove it.

    /*
     * Use InvalidateInactiveReplicationSlot for inactive timeout based
     * invalidation.
     */

> CR6 ===
>
> +       if (replication_slot_inactive_timeout == 0)
> +               return false;
> +       else if (slot->inactive_since > 0)
>
> "else" is not needed here.

Nothing wrong there, but removed.

> CR7 ===
>
> +               SpinLockAcquire(&slot->mutex);
> +
> +               /*
> +                * Check if the slot needs to be invalidated due to
> +                * replication_slot_inactive_timeout GUC. We do this with the spinlock
> +                * held to avoid race conditions -- for example the inactive_since
> +                * could change, or the slot could be dropped.
> +                */
> +               now = GetCurrentTimestamp();
>
> We should not call GetCurrentTimestamp() while holding a spinlock.

I was thinking why to add up the wait time to acquire
LWLockAcquire(ReplicationSlotControlLock, LW_SHARED);. Now that I
moved it up before the spinlock but after the LWLockAcquire.

> CR8 ===
>
> +# Testcase start: Invalidate streaming standby's slot as well as logical
> +# failover slot on primary due to inactive timeout GUC. Also, check the logical
>
> s/inactive timeout GUC/replication_slot_inactive_timeout/?

Done.

> CR9 ===
>
> +# Start: Helper functions used for this test file
> +# End: Helper functions used for this test file
>
> I think that's the first TAP test with this comment. Not saying we should not but
> why did you feel the need to add those?

Hm. Removed.

> [1]:
https://www.postgresql.org/message-id/OS0PR01MB5716B3942AE49F3F725ACA92943B2@OS0PR01MB5716.jpnprd01.prod.outlook.com


On Wed, Apr 3, 2024 at 2:58 PM shveta malik <shveta.malik@gmail.com> wrote:
>
> v31-002:
> (I had reviewed v29-002 but missed to post comments,  I think these
> are still applicable)
>
> 1) I think replication_slot_inactivity_timeout was recommended here
> (instead of replication_slot_inactive_timeout, so please give it a
> thought):
> https://www.postgresql.org/message-id/202403260739.udlp7lxixktx%40alvherre.pgsql

Yeah. It's synonymous with inactive_since. If others have an opinion
to have replication_slot_inactivity_timeout, I'm fine with it.

> 2) Commit msg:
> a)
> "It is often easy for developers to set a timeout of say 1
> or 2 or 3 days at slot level, after which the inactive slots get
> dropped."
>
> Shall we say invalidated rather than dropped?

Right. Done that.

> b)
> "To achieve the above, postgres introduces a GUC allowing users
> set inactive timeout and then a slot stays inactive for this much
> amount of time it invalidates the slot."
>
> Broken sentence.

Reworded it a bit.

Please find the attached v33 patches.

--
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Attachment
Hi,

On Wed, Apr 03, 2024 at 08:28:04PM +0530, Bharath Rupireddy wrote:
> On Wed, Apr 3, 2024 at 6:46 PM Bertrand Drouvot
> <bertranddrouvot.pg@gmail.com> wrote:
> >
> > Just one comment on v32-0001:
> >
> > +# Synced slot on the standby must get its own inactive_since.
> > +is( $standby1->safe_psql(
> > +               'postgres',
> > +               "SELECT '$inactive_since_on_primary'::timestamptz <= '$inactive_since_on_standby'::timestamptz AND
> > +                       '$inactive_since_on_standby'::timestamptz <= '$slot_sync_time'::timestamptz;"
> > +       ),
> > +       "t",
> > +       'synchronized slot has got its own inactive_since');
> > +
> >
> > By using <= we are not testing that it must get its own inactive_since (as we
> > allow them to be equal in the test). I think we should just add some usleep()
> > where appropriate and deny equality during the tests on inactive_since.
> 
> > Except for the above, v32-0001 LGTM.
> 
> Thanks. Please see the attached v33-0001 patch after removing equality
> on inactive_since TAP tests.

Thanks! v33-0001 LGTM.

> On Wed, Apr 3, 2024 at 1:47 PM Bertrand Drouvot
> <bertranddrouvot.pg@gmail.com> wrote:
> > Some comments regarding v31-0002:
> >
> > T2 ===
> >
> > In case the slot is invalidated on the primary,
> >
> > primary:
> >
> > postgres=# select slot_name, inactive_since, invalidation_reason from pg_replication_slots where slot_name = 's1';
> >  slot_name |        inactive_since         | invalidation_reason
> > -----------+-------------------------------+---------------------
> >  s1        | 2024-04-03 06:56:28.075637+00 | inactive_timeout
> >
> > then on the standby we get:
> >
> > standby:
> >
> > postgres=# select slot_name, inactive_since, invalidation_reason from pg_replication_slots where slot_name = 's1';
> >  slot_name |        inactive_since        | invalidation_reason
> > -----------+------------------------------+---------------------
> >  s1        | 2024-04-03 07:06:43.37486+00 | inactive_timeout
> >
> > shouldn't the slot be dropped/recreated instead of updating inactive_since?
> 
> The sync slots that are invalidated on the primary aren't dropped and
> recreated on the standby.

Yeah, right (I was confused with synced slots that are invalidated locally).

> However, I
> found that the synced slot is acquired and released unnecessarily
> after the invalidation_reason is synced from the primary. I added a
> skip check in synchronize_one_slot to skip acquiring and releasing the
> slot if it's locally found inactive. With this, inactive_since won't
> get updated for invalidated sync slots on the standby as we don't
> acquire and release the slot.

CR1 ===

Yeah, I can see:

@@ -575,6 +575,13 @@ synchronize_one_slot(RemoteSlot *remote_slot, Oid remote_dbid)
                                                   " name slot \"%s\" already exists on the standby",
                                                   remote_slot->name));

+               /*
+                * Skip the sync if the local slot is already invalidated. We do this
+                * beforehand to save on slot acquire and release.
+                */
+               if (slot->data.invalidated != RS_INVAL_NONE)
+                       return false;

Thanks to the drop_local_obsolete_slots() call I think we are not missing the case
where the slot has been invalidated on the primary, invalidation reason has been
synced on the standby and later the slot is dropped/ recreated manually on the
primary (then it should be dropped/recreated on the standby too).

Also it seems we are not missing the case where a sync slot is invalidated
locally due to wal removal (it should be dropped/recreated).

> 
> > CR5 ===
> >
> > +       /*
> > +        * This function isn't expected to be called for inactive timeout based
> > +        * invalidation. A separate function InvalidateInactiveReplicationSlot is
> > +        * to be used for that.
> >
> > Do you think it's worth to explain why?
> 
> Hm, I just wanted to point out the actual function here. I modified it
> to something like the following, if others feel we don't need that, I
> can remove it.

Sorry If I was not clear but I meant to say "Do you think it's worth to explain
why we decided to create a dedicated function"? (currently we "just" explain that
we created one).

Regards,

-- 
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com



On Wed, Apr 3, 2024 at 11:58 PM Bharath Rupireddy
<bharath.rupireddyforpostgres@gmail.com> wrote:
>
>
> Please find the attached v33 patches.

@@ -1368,6 +1416,7 @@ ShutDownSlotSync(void)
    if (SlotSyncCtx->pid == InvalidPid)
    {
        SpinLockRelease(&SlotSyncCtx->mutex);
+       update_synced_slots_inactive_since();
        return;
    }
    SpinLockRelease(&SlotSyncCtx->mutex);
@@ -1400,6 +1449,8 @@ ShutDownSlotSync(void)
    }

    SpinLockRelease(&SlotSyncCtx->mutex);
+
+   update_synced_slots_inactive_since();
 }

Why do we want to update all synced slots' inactive_since values at
shutdown in spite of updating the value every time when releasing the
slot? It seems to contradict the fact that inactive_since is updated
when releasing or restoring the slot.

Regards,

--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com



On Thu, Apr 4, 2024 at 9:42 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
>
> @@ -1368,6 +1416,7 @@ ShutDownSlotSync(void)
>     if (SlotSyncCtx->pid == InvalidPid)
>     {
>         SpinLockRelease(&SlotSyncCtx->mutex);
> +       update_synced_slots_inactive_since();
>         return;
>     }
>     SpinLockRelease(&SlotSyncCtx->mutex);
> @@ -1400,6 +1449,8 @@ ShutDownSlotSync(void)
>     }
>
>     SpinLockRelease(&SlotSyncCtx->mutex);
> +
> +   update_synced_slots_inactive_since();
>  }
>
> Why do we want to update all synced slots' inactive_since values at
> shutdown in spite of updating the value every time when releasing the
> slot? It seems to contradict the fact that inactive_since is updated
> when releasing or restoring the slot.

It is to get the inactive_since right for the cases where the standby
is promoted without a restart similar to when a standby is promoted
with restart in which case the inactive_since is set to current time
in RestoreSlotFromDisk.

Imagine the slot is synced last time at time t1 and then a few hours
passed, the standby is promoted without a restart. If we don't set
inactive_since to current time in this case in ShutDownSlotSync, the
inactive timeout invalidation mechanism can kick in immediately.

--
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com



On Wed, Apr 3, 2024 at 8:28 PM Bharath Rupireddy
<bharath.rupireddyforpostgres@gmail.com> wrote:
>
> On Wed, Apr 3, 2024 at 6:46 PM Bertrand Drouvot
> <bertranddrouvot.pg@gmail.com> wrote:
> >
> > Just one comment on v32-0001:
> >
> > +# Synced slot on the standby must get its own inactive_since.
> > +is( $standby1->safe_psql(
> > +               'postgres',
> > +               "SELECT '$inactive_since_on_primary'::timestamptz <= '$inactive_since_on_standby'::timestamptz AND
> > +                       '$inactive_since_on_standby'::timestamptz <= '$slot_sync_time'::timestamptz;"
> > +       ),
> > +       "t",
> > +       'synchronized slot has got its own inactive_since');
> > +
> >
> > By using <= we are not testing that it must get its own inactive_since (as we
> > allow them to be equal in the test). I think we should just add some usleep()
> > where appropriate and deny equality during the tests on inactive_since.
>
> Thanks. It looks like we can ignore the equality in all of the
> inactive_since comparisons. IIUC, all the TAP tests do run with
> primary and standbys on the single BF animals. And, it looks like
> assigning the inactive_since timestamps to perl variables is giving
> the microseconds precision level
> (./tmp_check/log/regress_log_040_standby_failover_slots_sync:inactive_since
> 2024-04-03 14:30:09.691648+00). FWIW, we already have some TAP and SQL
> tests relying on stats_reset timestamps without equality. So, I've
> left the equality for the inactive_since tests.
>
> > Except for the above, v32-0001 LGTM.
>
> Thanks. Please see the attached v33-0001 patch after removing equality
> on inactive_since TAP tests.
>

The v33-0001 looks good to me. I have made minor changes in the
comments/commit message and removed one part of the test which was a
bit confusing and didn't seem to add much value. Let me know what you
think of the attached?

--
With Regards,
Amit Kapila.

Attachment
On Thu, Apr 4, 2024 at 10:48 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> The v33-0001 looks good to me. I have made minor changes in the
> comments/commit message and removed one part of the test which was a
> bit confusing and didn't seem to add much value. Let me know what you
> think of the attached?

Thanks for the changes. v34-0001 LGTM.

--
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com



On Thu, Apr 4, 2024 at 1:34 PM Bharath Rupireddy
<bharath.rupireddyforpostgres@gmail.com> wrote:
>
> On Thu, Apr 4, 2024 at 9:42 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> >
> > @@ -1368,6 +1416,7 @@ ShutDownSlotSync(void)
> >     if (SlotSyncCtx->pid == InvalidPid)
> >     {
> >         SpinLockRelease(&SlotSyncCtx->mutex);
> > +       update_synced_slots_inactive_since();
> >         return;
> >     }
> >     SpinLockRelease(&SlotSyncCtx->mutex);
> > @@ -1400,6 +1449,8 @@ ShutDownSlotSync(void)
> >     }
> >
> >     SpinLockRelease(&SlotSyncCtx->mutex);
> > +
> > +   update_synced_slots_inactive_since();
> >  }
> >
> > Why do we want to update all synced slots' inactive_since values at
> > shutdown in spite of updating the value every time when releasing the
> > slot? It seems to contradict the fact that inactive_since is updated
> > when releasing or restoring the slot.
>
> It is to get the inactive_since right for the cases where the standby
> is promoted without a restart similar to when a standby is promoted
> with restart in which case the inactive_since is set to current time
> in RestoreSlotFromDisk.
>
> Imagine the slot is synced last time at time t1 and then a few hours
> passed, the standby is promoted without a restart. If we don't set
> inactive_since to current time in this case in ShutDownSlotSync, the
> inactive timeout invalidation mechanism can kick in immediately.
>

Thank you for the explanation! I understood the needs.

Regards,

--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com



On Thu, Apr 4, 2024 at 1:32 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
>
> On Thu, Apr 4, 2024 at 1:34 PM Bharath Rupireddy
> <bharath.rupireddyforpostgres@gmail.com> wrote:
> >
> > On Thu, Apr 4, 2024 at 9:42 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> > >
> > > @@ -1368,6 +1416,7 @@ ShutDownSlotSync(void)
> > >     if (SlotSyncCtx->pid == InvalidPid)
> > >     {
> > >         SpinLockRelease(&SlotSyncCtx->mutex);
> > > +       update_synced_slots_inactive_since();
> > >         return;
> > >     }
> > >     SpinLockRelease(&SlotSyncCtx->mutex);
> > > @@ -1400,6 +1449,8 @@ ShutDownSlotSync(void)
> > >     }
> > >
> > >     SpinLockRelease(&SlotSyncCtx->mutex);
> > > +
> > > +   update_synced_slots_inactive_since();
> > >  }
> > >
> > > Why do we want to update all synced slots' inactive_since values at
> > > shutdown in spite of updating the value every time when releasing the
> > > slot? It seems to contradict the fact that inactive_since is updated
> > > when releasing or restoring the slot.
> >
> > It is to get the inactive_since right for the cases where the standby
> > is promoted without a restart similar to when a standby is promoted
> > with restart in which case the inactive_since is set to current time
> > in RestoreSlotFromDisk.
> >
> > Imagine the slot is synced last time at time t1 and then a few hours
> > passed, the standby is promoted without a restart. If we don't set
> > inactive_since to current time in this case in ShutDownSlotSync, the
> > inactive timeout invalidation mechanism can kick in immediately.
> >
>
> Thank you for the explanation! I understood the needs.
>

Do you want to review the v34_0001* further or shall I proceed with
the commit of the same?

--
With Regards,
Amit Kapila.



On Thu, Apr 4, 2024 at 5:36 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> On Thu, Apr 4, 2024 at 1:32 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> >
> > On Thu, Apr 4, 2024 at 1:34 PM Bharath Rupireddy
> > <bharath.rupireddyforpostgres@gmail.com> wrote:
> > >
> > > On Thu, Apr 4, 2024 at 9:42 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> > > >
> > > > @@ -1368,6 +1416,7 @@ ShutDownSlotSync(void)
> > > >     if (SlotSyncCtx->pid == InvalidPid)
> > > >     {
> > > >         SpinLockRelease(&SlotSyncCtx->mutex);
> > > > +       update_synced_slots_inactive_since();
> > > >         return;
> > > >     }
> > > >     SpinLockRelease(&SlotSyncCtx->mutex);
> > > > @@ -1400,6 +1449,8 @@ ShutDownSlotSync(void)
> > > >     }
> > > >
> > > >     SpinLockRelease(&SlotSyncCtx->mutex);
> > > > +
> > > > +   update_synced_slots_inactive_since();
> > > >  }
> > > >
> > > > Why do we want to update all synced slots' inactive_since values at
> > > > shutdown in spite of updating the value every time when releasing the
> > > > slot? It seems to contradict the fact that inactive_since is updated
> > > > when releasing or restoring the slot.
> > >
> > > It is to get the inactive_since right for the cases where the standby
> > > is promoted without a restart similar to when a standby is promoted
> > > with restart in which case the inactive_since is set to current time
> > > in RestoreSlotFromDisk.
> > >
> > > Imagine the slot is synced last time at time t1 and then a few hours
> > > passed, the standby is promoted without a restart. If we don't set
> > > inactive_since to current time in this case in ShutDownSlotSync, the
> > > inactive timeout invalidation mechanism can kick in immediately.
> > >
> >
> > Thank you for the explanation! I understood the needs.
> >
>
> Do you want to review the v34_0001* further or shall I proceed with
> the commit of the same?

Thanks for asking. The v34-0001 patch looks good to me.

Regards,

--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com



On Thu, Apr 4, 2024 at 11:12 AM Bharath Rupireddy
<bharath.rupireddyforpostgres@gmail.com> wrote:
>
> On Thu, Apr 4, 2024 at 10:48 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
> >
> > The v33-0001 looks good to me. I have made minor changes in the
> > comments/commit message and removed one part of the test which was a
> > bit confusing and didn't seem to add much value. Let me know what you
> > think of the attached?
>
> Thanks for the changes. v34-0001 LGTM.
>

I was doing a final review before pushing 0001 and found that
'inactive_since' could be set twice during startup after promotion,
once while restoring slots and then via ShutDownSlotSync(). The reason
is that ShutDownSlotSync() will be invoked in normal startup on
primary though it won't do anything apart from setting inactive_since
if we have synced slots. I think you need to check 'StandbyMode' in
update_synced_slots_inactive_since() and return if the same is not
set. We can't use 'InRecovery' flag as that will be set even during
crash recovery.

Can you please test this once unless you don't agree with the above theory?

--
With Regards,
Amit Kapila.



On Thu, Apr 4, 2024 at 4:35 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> > Thanks for the changes. v34-0001 LGTM.
>
> I was doing a final review before pushing 0001 and found that
> 'inactive_since' could be set twice during startup after promotion,
> once while restoring slots and then via ShutDownSlotSync(). The reason
> is that ShutDownSlotSync() will be invoked in normal startup on
> primary though it won't do anything apart from setting inactive_since
> if we have synced slots. I think you need to check 'StandbyMode' in
> update_synced_slots_inactive_since() and return if the same is not
> set. We can't use 'InRecovery' flag as that will be set even during
> crash recovery.
>
> Can you please test this once unless you don't agree with the above theory?

Nice catch. I've verified that update_synced_slots_inactive_since is
called even for normal server startups/crash recovery. I've added a
check to exit if the StandbyMode isn't set.

Please find the attached v35 patch.

--
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Attachment
On Thu, Apr 4, 2024 at 5:53 PM Bharath Rupireddy
<bharath.rupireddyforpostgres@gmail.com> wrote:
>
> On Thu, Apr 4, 2024 at 4:35 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> >
> > > Thanks for the changes. v34-0001 LGTM.
> >
> > I was doing a final review before pushing 0001 and found that
> > 'inactive_since' could be set twice during startup after promotion,
> > once while restoring slots and then via ShutDownSlotSync(). The reason
> > is that ShutDownSlotSync() will be invoked in normal startup on
> > primary though it won't do anything apart from setting inactive_since
> > if we have synced slots. I think you need to check 'StandbyMode' in
> > update_synced_slots_inactive_since() and return if the same is not
> > set. We can't use 'InRecovery' flag as that will be set even during
> > crash recovery.
> >
> > Can you please test this once unless you don't agree with the above theory?
>
> Nice catch. I've verified that update_synced_slots_inactive_since is
> called even for normal server startups/crash recovery. I've added a
> check to exit if the StandbyMode isn't set.
>
> Please find the attached v35 patch.

Thanks for the patch. Tested it , works well. Few cosmetic changes needed:

in 040 test file:
1)
# Capture the inactive_since of the slot from the primary. Note that the slot
# will be inactive since the corresponding subscription is disabled..

2 .. at the end. Replace with one.

2)
# Synced slot on the standby must get its own inactive_since.

. not needed in single line comment (to be consistent with
neighbouring comments)


3)
update_synced_slots_inactive_since():

if (!StandbyMode)
return;

It will be good to add comments here.

thanks
Shveta



On Wed, Apr 3, 2024 at 9:57 PM Bertrand Drouvot
<bertranddrouvot.pg@gmail.com> wrote:
>
> > > shouldn't the slot be dropped/recreated instead of updating inactive_since?
> >
> > The sync slots that are invalidated on the primary aren't dropped and
> > recreated on the standby.
>
> Yeah, right (I was confused with synced slots that are invalidated locally).
>
> > However, I
> > found that the synced slot is acquired and released unnecessarily
> > after the invalidation_reason is synced from the primary. I added a
> > skip check in synchronize_one_slot to skip acquiring and releasing the
> > slot if it's locally found inactive. With this, inactive_since won't
> > get updated for invalidated sync slots on the standby as we don't
> > acquire and release the slot.
>
> CR1 ===
>
> Yeah, I can see:
>
> @@ -575,6 +575,13 @@ synchronize_one_slot(RemoteSlot *remote_slot, Oid remote_dbid)
>                                                    " name slot \"%s\" already exists on the standby",
>                                                    remote_slot->name));
>
> +               /*
> +                * Skip the sync if the local slot is already invalidated. We do this
> +                * beforehand to save on slot acquire and release.
> +                */
> +               if (slot->data.invalidated != RS_INVAL_NONE)
> +                       return false;
>
> Thanks to the drop_local_obsolete_slots() call I think we are not missing the case
> where the slot has been invalidated on the primary, invalidation reason has been
> synced on the standby and later the slot is dropped/ recreated manually on the
> primary (then it should be dropped/recreated on the standby too).
>
> Also it seems we are not missing the case where a sync slot is invalidated
> locally due to wal removal (it should be dropped/recreated).

Right.

> > > CR5 ===
> > >
> > > +       /*
> > > +        * This function isn't expected to be called for inactive timeout based
> > > +        * invalidation. A separate function InvalidateInactiveReplicationSlot is
> > > +        * to be used for that.
> > >
> > > Do you think it's worth to explain why?
> >
> > Hm, I just wanted to point out the actual function here. I modified it
> > to something like the following, if others feel we don't need that, I
> > can remove it.
>
> Sorry If I was not clear but I meant to say "Do you think it's worth to explain
> why we decided to create a dedicated function"? (currently we "just" explain that
> we created one).

We added a new function (InvalidateInactiveReplicationSlot) to
invalidate slot based on inactive timeout because 1) we do the
inactive timeout invalidation at slot level as opposed to
InvalidateObsoleteReplicationSlots which does loop over all the slots,
2)
InvalidatePossiblyObsoleteSlot does release the lock in some cases,
has a lot of unneeded code for inactive timeout invalidation check, 3)
we want some control over saving the slot to disk because we hook the
inactive timeout invalidation into the loop that checkpoints the slot
info to the disk in CheckPointReplicationSlots.

I've added a comment atop InvalidateInactiveReplicationSlot.

Please find the attached v36 patch.

--
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Attachment
Hi,

On Fri, Apr 05, 2024 at 11:21:43AM +0530, Bharath Rupireddy wrote:
> On Wed, Apr 3, 2024 at 9:57 PM Bertrand Drouvot
> <bertranddrouvot.pg@gmail.com> wrote:
> Please find the attached v36 patch.

Thanks!

A few comments:

1 ===

+       <para>
+        The timeout is measured from the time since the slot has become
+        inactive (known from its
+        <structfield>inactive_since</structfield> value) until it gets
+        used (i.e., its <structfield>active</structfield> is set to true).
+       </para>

That's right except when it's invalidated during the checkpoint (as the slot
is not acquired in CheckPointReplicationSlots()).

So, what about adding: "or a checkpoint occurs"? That would also explain that
the invalidation could occur during checkpoint.

2 ===

+       /* If the slot has been invalidated, recalculate the resource limits */
+       if (invalidated)
+       {

/If the slot/If a slot/?

3 ===

+ * NB - this function also runs as part of checkpoint, so avoid raising errors

s/NB - this/NB: This function/? (that looks more consistent with other comments
in the code)

4 ===

+ * Note that having a new function for RS_INVAL_INACTIVE_TIMEOUT cause instead

I understand it's "the RS_INVAL_INACTIVE_TIMEOUT cause" but reading "cause instead"
looks weird to me. Maybe it would make sense to reword this a bit.

5 ===

+        * considered not active as they don't actually perform logical decoding.

Not sure that's 100% accurate as we switched in fast forward logical
in 2ec005b4e2.

"as they perform only fast forward logical decoding (or not at all)", maybe?

6 ===

+       if (RecoveryInProgress() && slot->data.synced)
+               return false;
+
+       if (replication_slot_inactive_timeout == 0)
+               return false;

What about just using one if? It's more a matter of taste but it also probably
reduces the object file size a bit for non optimized build.

7 ===

+               /*
+                * Do not invalidate the slots which are currently being synced from
+                * the primary to the standby.
+                */
+               if (RecoveryInProgress() && slot->data.synced)
+                       return false;

I think we don't need this check as the exact same one is done just before.

8 ===

+sub check_for_slot_invalidation_in_server_log
+{
+       my ($node, $slot_name, $offset) = @_;
+       my $invalidated = 0;
+
+       for (my $i = 0; $i < 10 * $PostgreSQL::Test::Utils::timeout_default; $i++)
+       {
+               $node->safe_psql('postgres', "CHECKPOINT");

Wouldn't be better to wait for the replication_slot_inactive_timeout time before
instead of triggering all those checkpoints? (it could be passed as an extra arg
to wait_for_slot_invalidation()).

9 ===

# Synced slot mustn't get invalidated on the standby, it must sync invalidation
# from the primary. So, we must not see the slot's invalidation message in server
# log.
ok( !$standby1->log_contains(
        "invalidating obsolete replication slot \"lsub1_sync_slot\"",
        $standby1_logstart),
    'check that syned slot has not been invalidated on the standby');

Would that make sense to trigger a checkpoint on the standby before this test?
I mean I think that without a checkpoint on the standby we should not see the
invalidation in the log anyway.

Regards,

-- 
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com



On Fri, Apr 5, 2024 at 1:14 PM Bertrand Drouvot
<bertranddrouvot.pg@gmail.com> wrote:
>
> > Please find the attached v36 patch.
>
> A few comments:
>
> 1 ===
>
> +       <para>
> +        The timeout is measured from the time since the slot has become
> +        inactive (known from its
> +        <structfield>inactive_since</structfield> value) until it gets
> +        used (i.e., its <structfield>active</structfield> is set to true).
> +       </para>
>
> That's right except when it's invalidated during the checkpoint (as the slot
> is not acquired in CheckPointReplicationSlots()).
>
> So, what about adding: "or a checkpoint occurs"? That would also explain that
> the invalidation could occur during checkpoint.

Reworded.

> 2 ===
>
> +       /* If the slot has been invalidated, recalculate the resource limits */
> +       if (invalidated)
> +       {
>
> /If the slot/If a slot/?

Modified it to be like elsewhere.

> 3 ===
>
> + * NB - this function also runs as part of checkpoint, so avoid raising errors
>
> s/NB - this/NB: This function/? (that looks more consistent with other comments
> in the code)

Done.

> 4 ===
>
> + * Note that having a new function for RS_INVAL_INACTIVE_TIMEOUT cause instead
>
> I understand it's "the RS_INVAL_INACTIVE_TIMEOUT cause" but reading "cause instead"
> looks weird to me. Maybe it would make sense to reword this a bit.

Reworded.

> 5 ===
>
> +        * considered not active as they don't actually perform logical decoding.
>
> Not sure that's 100% accurate as we switched in fast forward logical
> in 2ec005b4e2.
>
> "as they perform only fast forward logical decoding (or not at all)", maybe?

Changed it to "as they don't perform logical decoding to produce the
changes". In fast_forward mode no changes are produced.

> 6 ===
>
> +       if (RecoveryInProgress() && slot->data.synced)
> +               return false;
> +
> +       if (replication_slot_inactive_timeout == 0)
> +               return false;
>
> What about just using one if? It's more a matter of taste but it also probably
> reduces the object file size a bit for non optimized build.

Changed.

> 7 ===
>
> +               /*
> +                * Do not invalidate the slots which are currently being synced from
> +                * the primary to the standby.
> +                */
> +               if (RecoveryInProgress() && slot->data.synced)
> +                       return false;
>
> I think we don't need this check as the exact same one is done just before.

Right. Removed.

> 8 ===
>
> +sub check_for_slot_invalidation_in_server_log
> +{
> +       my ($node, $slot_name, $offset) = @_;
> +       my $invalidated = 0;
> +
> +       for (my $i = 0; $i < 10 * $PostgreSQL::Test::Utils::timeout_default; $i++)
> +       {
> +               $node->safe_psql('postgres', "CHECKPOINT");
>
> Wouldn't be better to wait for the replication_slot_inactive_timeout time before
> instead of triggering all those checkpoints? (it could be passed as an extra arg
> to wait_for_slot_invalidation()).

Done.

> 9 ===
>
> # Synced slot mustn't get invalidated on the standby, it must sync invalidation
> # from the primary. So, we must not see the slot's invalidation message in server
> # log.
> ok( !$standby1->log_contains(
>         "invalidating obsolete replication slot \"lsub1_sync_slot\"",
>         $standby1_logstart),
>     'check that syned slot has not been invalidated on the standby');
>
> Would that make sense to trigger a checkpoint on the standby before this test?
> I mean I think that without a checkpoint on the standby we should not see the
> invalidation in the log anyway.

Done.

Please find the attached v37 patch for further review.

--
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Attachment
On Sat, Apr 6, 2024 at 11:55 AM Bharath Rupireddy
<bharath.rupireddyforpostgres@gmail.com> wrote:
>

Why the handling w.r.t active_pid in InvalidatePossiblyInactiveSlot()
is not similar to InvalidatePossiblyObsoleteSlot(). Won't we need to
ensure that there is no other active slot user? Is it sufficient to
check inactive_since for the same? If so, we need some comments to
explain the same.

Can we avoid introducing the new functions like
SaveGivenReplicationSlot() and MarkGivenReplicationSlotDirty(), if we
do the required work in the caller?

--
With Regards,
Amit Kapila.



On Sat, Apr 6, 2024 at 12:18 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> Why the handling w.r.t active_pid in InvalidatePossiblyInactiveSlot()
> is not similar to InvalidatePossiblyObsoleteSlot(). Won't we need to
> ensure that there is no other active slot user? Is it sufficient to
> check inactive_since for the same? If so, we need some comments to
> explain the same.

I removed the separate functions and with minimal changes, I've now
placed the RS_INVAL_INACTIVE_TIMEOUT logic into
InvalidatePossiblyObsoleteSlot and use that even in
CheckPointReplicationSlots.

> Can we avoid introducing the new functions like
> SaveGivenReplicationSlot() and MarkGivenReplicationSlotDirty(), if we
> do the required work in the caller?

Hm. Removed them now.

Please see the attached v38 patch.

--
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Attachment
On Sat, Apr 6, 2024 at 5:10 PM Bharath Rupireddy
<bharath.rupireddyforpostgres@gmail.com> wrote:
>
> Please see the attached v38 patch.

Hi, thanks everyone for reviewing the design and patches so far. Here
I'm with the v39 patches implementing inactive timeout based (0001)
and XID age based (0002) invalidation mechanisms.

I'm quoting the hackers who are okay with inactive timeout based
invalidation mechanism:
Bertrand Drouvot -
https://www.postgresql.org/message-id/ZgL0N%2BxVJNkyqsKL%40ip-10-97-1-34.eu-west-3.compute.internal
and https://www.postgresql.org/message-id/ZgPHDAlM79iLtGIH%40ip-10-97-1-34.eu-west-3.compute.internal
Amit Kapila -
https://www.postgresql.org/message-id/CAA4eK1L3awyzWMuymLJUm8SoFEQe%3DDa9KUwCcAfC31RNJ1xdJA%40mail.gmail.com
Nathan Bossart -
https://www.postgresql.org/message-id/20240325195443.GA2923888%40nathanxps13
Robert Haas -
https://www.postgresql.org/message-id/CA%2BTgmoZTbaaEjSZUG1FL0mzxAdN3qmXksO3O9_PZhEuXTkVnRQ%40mail.gmail.com

I'm quoting the hackers who are okay with XID age based invalidation mechanism:
Nathan Bossart -
https://www.postgresql.org/message-id/20240326150918.GB3181099%40nathanxps13
and https://www.postgresql.org/message-id/20240327150557.GA3994937%40nathanxps13
Alvaro Herrera -
https://www.postgresql.org/message-id/202403261539.xcjfle7sksz7%40alvherre.pgsql
Bertrand Drouvot -
https://www.postgresql.org/message-id/ZgPHDAlM79iLtGIH%40ip-10-97-1-34.eu-west-3.compute.internal
Amit Kapila -
https://www.postgresql.org/message-id/CAA4eK1L3awyzWMuymLJUm8SoFEQe%3DDa9KUwCcAfC31RNJ1xdJA%40mail.gmail.com

There was a point raised by Robert
https://www.postgresql.org/message-id/CA%2BTgmoaRECcnyqxAxUhP5dk2S4HX%3DpGh-p-PkA3uc%2BjG_9hiMw%40mail.gmail.com
for XID age based invalidation. An issue related to
vacuum_defer_cleanup_age
https://git.postgresql.org/gitweb/?p=postgresql.git;a=commitdiff;h=be504a3e974d75be6f95c8f9b7367126034f2d12
led to the removal of the GUC
https://git.postgresql.org/gitweb/?p=postgresql.git;a=commitdiff;h=1118cd37eb61e6a2428f457a8b2026a7bb3f801a.
The same issue may not happen for the XID age based invaliation. This
is because the XID age is not calculated using FullTransactionId but
using TransactionId as the slot's xmin and catalog_xmin are tracked as
TransactionId.

There was a point raised by Amit
https://www.postgresql.org/message-id/CAA4eK1K8wqLsMw6j0hE_SFoWAeo3Kw8UNnMfhsWaYDF1GWYQ%2Bg%40mail.gmail.com
on when to do the XID age based invalidation - whether in checkpointer
or when vacuum is being run or whenever ComputeXIDHorizons gets called
or in autovacuum process. For now, I've chosen the design to do these
new invalidation checks in two places - 1) whenever the slot is
acquired and the slot acquisition errors out if invalidated, 2) during
checkpoint. However, I'm open to suggestions on this.

I've also verified the case whether the replication_slot_xid_age
setting can help in case of server inching towards the XID wraparound.
I've created a primary and streaming standby setup with
hot_standby_feedback set to on (so that the slot gets an xmin). Then,
I've set replication_slot_xid_age to 2 billion on the primary, and
used xid_wraparound extension to reach XID wraparound on the primary.
Once I start receiving the WARNINGs about VACUUM, I did a checkpoint
after which the slot got invalidated enabling my VACUUM to freeze XIDs
saving my database from XID wraparound problem.

Thanks a lot Masahiko Sawada for an offlist chat about the XID age
calculation logic.

--
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Attachment
Hi,

On Thu, Apr 4, 2024 at 9:23 PM Bharath Rupireddy
<bharath.rupireddyforpostgres@gmail.com> wrote:
>
> On Thu, Apr 4, 2024 at 4:35 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> >
> > > Thanks for the changes. v34-0001 LGTM.
> >
> > I was doing a final review before pushing 0001 and found that
> > 'inactive_since' could be set twice during startup after promotion,
> > once while restoring slots and then via ShutDownSlotSync(). The reason
> > is that ShutDownSlotSync() will be invoked in normal startup on
> > primary though it won't do anything apart from setting inactive_since
> > if we have synced slots. I think you need to check 'StandbyMode' in
> > update_synced_slots_inactive_since() and return if the same is not
> > set. We can't use 'InRecovery' flag as that will be set even during
> > crash recovery.
> >
> > Can you please test this once unless you don't agree with the above theory?
>
> Nice catch. I've verified that update_synced_slots_inactive_since is
> called even for normal server startups/crash recovery. I've added a
> check to exit if the StandbyMode isn't set.
>
> Please find the attached v35 patch.
>

The documentation says about both 'active' and 'inactive_since'
columns of pg_replication_slots say:

---
active bool
True if this slot is currently actively being used

inactive_since timestamptz
The time since the slot has become inactive. NULL if the slot is
currently being used. Note that for slots on the standby that are
being synced from a primary server (whose synced field is true), the
inactive_since indicates the last synchronization (see Section 47.2.3)
time.
---

When reading the description I thought if 'active' is true,
'inactive_since' is NULL, but it doesn't seem to apply for temporary
slots. Since we don't reset the active_pid field of temporary slots
when the release, the 'active' is still true in the view but
'inactive_since' is not NULL. Do you think we need to mention it in
the documentation?

As for the timeout-based slot invalidation feature, we could end up
invalidating the temporary slots even if they are shown as active,
which could confuse users. Do we want to somehow deal with it?

Regards,

--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com



On Mon, Apr 22, 2024 at 7:21 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
>
> > Please find the attached v35 patch.
>
> The documentation says about both 'active' and 'inactive_since'
> columns of pg_replication_slots say:
>
> ---
> active bool
> True if this slot is currently actively being used
>
> inactive_since timestamptz
> The time since the slot has become inactive. NULL if the slot is
> currently being used. Note that for slots on the standby that are
> being synced from a primary server (whose synced field is true), the
> inactive_since indicates the last synchronization (see Section 47.2.3)
> time.
> ---
>
> When reading the description I thought if 'active' is true,
> 'inactive_since' is NULL, but it doesn't seem to apply for temporary
> slots.

Right.

> Since we don't reset the active_pid field of temporary slots
> when the release, the 'active' is still true in the view but
> 'inactive_since' is not NULL.

Right. inactive_since is reset whenever the temporary slot is acquired
again within the same backend that created the temporary slot.

> Do you think we need to mention it in
> the documentation?

I think that's the reason we dropped "active" from the statement. It
was earlier "NULL if the slot is currently actively being used.". But,
per Bertrand's comment
https://www.postgresql.org/message-id/ZehE2IJcsetSJMHC%40ip-10-97-1-34.eu-west-3.compute.internal
changed it to ""NULL if the slot is currently being used.".

Temporary slots retain the active = true and active_pid = <pid of the
backend that created it> even when the slot is not being used until
the lifetime of the backend process. We haven't tied active or
active_pid flags to inactive_since, doing so now to represent the
temporary slot behaviour for active and active_pid will confuse users
more. As far as the inactive_since of a slot is concerned, it is set
to 0 when the slot is being used (acquired) and set to current
timestamp when the slot is not being used (released).

> As for the timeout-based slot invalidation feature, we could end up
> invalidating the temporary slots even if they are shown as active,
> which could confuse users. Do we want to somehow deal with it?

Yes. As long as the temporary slot is lying unused holding up
resources for more than the specified
replication_slot_inactive_timeout, it is bound to get invalidated.
This keeps behaviour consistent and less-confusing to the users.

--
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com



On Thu, Apr 25, 2024 at 11:11 AM Bharath Rupireddy
<bharath.rupireddyforpostgres@gmail.com> wrote:
>
> On Mon, Apr 22, 2024 at 7:21 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> >
> > > Please find the attached v35 patch.
> >
> > The documentation says about both 'active' and 'inactive_since'
> > columns of pg_replication_slots say:
> >
> > ---
> > active bool
> > True if this slot is currently actively being used
> >
> > inactive_since timestamptz
> > The time since the slot has become inactive. NULL if the slot is
> > currently being used. Note that for slots on the standby that are
> > being synced from a primary server (whose synced field is true), the
> > inactive_since indicates the last synchronization (see Section 47.2.3)
> > time.
> > ---
> >
> > When reading the description I thought if 'active' is true,
> > 'inactive_since' is NULL, but it doesn't seem to apply for temporary
> > slots.
>
> Right.
>
> > Since we don't reset the active_pid field of temporary slots
> > when the release, the 'active' is still true in the view but
> > 'inactive_since' is not NULL.
>
> Right. inactive_since is reset whenever the temporary slot is acquired
> again within the same backend that created the temporary slot.
>
> > Do you think we need to mention it in
> > the documentation?
>
> I think that's the reason we dropped "active" from the statement. It
> was earlier "NULL if the slot is currently actively being used.". But,
> per Bertrand's comment
> https://www.postgresql.org/message-id/ZehE2IJcsetSJMHC%40ip-10-97-1-34.eu-west-3.compute.internal
> changed it to ""NULL if the slot is currently being used.".
>
> Temporary slots retain the active = true and active_pid = <pid of the
> backend that created it> even when the slot is not being used until
> the lifetime of the backend process. We haven't tied active or
> active_pid flags to inactive_since, doing so now to represent the
> temporary slot behaviour for active and active_pid will confuse users
> more.
>

This is true and it's probably easy for us to understand as we
developed this feature but the same may not be true for others. I
wonder if we can be explicit about the difference of
active/inactive_since by adding something like the following for
inactive_since: Note that this field is not related to the active flag
as temporary slots can remain active till the session ends even when
they are not being used.

Sawada-San, do you have any suggestions on the wording?

>
 As far as the inactive_since of a slot is concerned, it is set
> to 0 when the slot is being used (acquired) and set to current
> timestamp when the slot is not being used (released).
>
> > As for the timeout-based slot invalidation feature, we could end up
> > invalidating the temporary slots even if they are shown as active,
> > which could confuse users. Do we want to somehow deal with it?
>
> Yes. As long as the temporary slot is lying unused holding up
> resources for more than the specified
> replication_slot_inactive_timeout, it is bound to get invalidated.
> This keeps behaviour consistent and less-confusing to the users.
>

Agreed. We may want to add something in the docs for this to avoid
confusion with the active flag.

--
With Regards,
Amit Kapila.



Hi,

On Sat, Apr 13, 2024 at 9:36 AM Bharath Rupireddy
<bharath.rupireddyforpostgres@gmail.com> wrote:
>
> There was a point raised by Amit
> https://www.postgresql.org/message-id/CAA4eK1K8wqLsMw6j0hE_SFoWAeo3Kw8UNnMfhsWaYDF1GWYQ%2Bg%40mail.gmail.com
> on when to do the XID age based invalidation - whether in checkpointer
> or when vacuum is being run or whenever ComputeXIDHorizons gets called
> or in autovacuum process. For now, I've chosen the design to do these
> new invalidation checks in two places - 1) whenever the slot is
> acquired and the slot acquisition errors out if invalidated, 2) during
> checkpoint. However, I'm open to suggestions on this.

Here are my thoughts on when to do the XID age invalidation. In all
the patches sent so far, the XID age invalidation happens in two
places - one during the slot acquisition, and another during the
checkpoint. As the suggestion is to do it during the vacuum (manual
and auto), so that even if the checkpoint isn't happening in the
database for whatever reasons, a vacuum command or autovacuum can
invalidate the slots whose XID is aged.

An idea is to check for XID age based invalidation for all the slots
in ComputeXidHorizons() before it reads replication_slot_xmin and
replication_slot_catalog_xmin, and obviously before the proc array
lock is acquired. A potential problem with this approach is that the
invalidation check can become too aggressive as XID horizons are
computed from many places.

Another idea is to check for XID age based invalidation for all the
slots in higher levels than ComputeXidHorizons(), for example in
vacuum() which is an entry point for both vacuum command and
autovacuum. This approach seems similar to vacuum_failsafe_age GUC
which checks each relation for the failsafe age before vacuum gets
triggered on it.

Does anyone see any issues or risks with the above two approaches or
have any other ideas? Thoughts?

I attached v40 patches here. I reworded some of the ERROR messages,
and did some code clean-up. Note that I haven't implemented any of the
above approaches yet.

--
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Attachment
On Mon, Jun 17, 2024 at 05:55:04PM +0530, Bharath Rupireddy wrote:
> Here are my thoughts on when to do the XID age invalidation. In all
> the patches sent so far, the XID age invalidation happens in two
> places - one during the slot acquisition, and another during the
> checkpoint. As the suggestion is to do it during the vacuum (manual
> and auto), so that even if the checkpoint isn't happening in the
> database for whatever reasons, a vacuum command or autovacuum can
> invalidate the slots whose XID is aged.

+1.  IMHO this is a principled choice.  The similar max_slot_wal_keep_size
parameter is considered where it arguably matters most: when we are trying
to remove/recycle WAL segments.  Since this parameter is intended to
prevent the server from running out of space, it makes sense that we'd
apply it at the point where we are trying to free up space.  The proposed
max_slot_xid_age parameter is intended to prevent the server from running
out of transaction IDs, so it follows that we'd apply it at the point where
we reclaim them, which happens to be vacuum.

> An idea is to check for XID age based invalidation for all the slots
> in ComputeXidHorizons() before it reads replication_slot_xmin and
> replication_slot_catalog_xmin, and obviously before the proc array
> lock is acquired. A potential problem with this approach is that the
> invalidation check can become too aggressive as XID horizons are
> computed from many places.
>
> Another idea is to check for XID age based invalidation for all the
> slots in higher levels than ComputeXidHorizons(), for example in
> vacuum() which is an entry point for both vacuum command and
> autovacuum. This approach seems similar to vacuum_failsafe_age GUC
> which checks each relation for the failsafe age before vacuum gets
> triggered on it.

I don't presently have any strong opinion on where this logic should go,
but in general, I think we should only invalidate slots if invalidating
them would allow us to advance the vacuum cutoff.  If the cutoff is held
back by something else, I don't see a point in invalidating slots because
we'll just be breaking replication in return for no additional reclaimed
transaction IDs.

-- 
nathan



Hi,

On Mon, Jun 17, 2024 at 5:55 PM Bharath Rupireddy
<bharath.rupireddyforpostgres@gmail.com> wrote:
>
> Here are my thoughts on when to do the XID age invalidation. In all
> the patches sent so far, the XID age invalidation happens in two
> places - one during the slot acquisition, and another during the
> checkpoint. As the suggestion is to do it during the vacuum (manual
> and auto), so that even if the checkpoint isn't happening in the
> database for whatever reasons, a vacuum command or autovacuum can
> invalidate the slots whose XID is aged.
>
> An idea is to check for XID age based invalidation for all the slots
> in ComputeXidHorizons() before it reads replication_slot_xmin and
> replication_slot_catalog_xmin, and obviously before the proc array
> lock is acquired. A potential problem with this approach is that the
> invalidation check can become too aggressive as XID horizons are
> computed from many places.
>
> Another idea is to check for XID age based invalidation for all the
> slots in higher levels than ComputeXidHorizons(), for example in
> vacuum() which is an entry point for both vacuum command and
> autovacuum. This approach seems similar to vacuum_failsafe_age GUC
> which checks each relation for the failsafe age before vacuum gets
> triggered on it.

I am attaching the patches implementing the idea of invalidating
replication slots during vacuum when current slot xmin limits
(procArray->replication_slot_xmin and
procArray->replication_slot_catalog_xmin) are aged as per the new XID
age GUC. When either of these limits are aged, there must be at least
one replication slot that is aged, because the xmin limits, after all,
are the minimum of xmin or catalog_xmin of all replication slots. In
this approach, the new XID age GUC will help vacuum when needed,
because the current slot xmin limits are recalculated after
invalidating replication slots that are holding xmins for longer than
the age. The code is placed in vacuum() which is common for both
vacuum command and autovacuum, and gets executed only once every
vacuum cycle to not be too aggressive in invalidating.

However, there might be some concerns with this approach like the following:
1) Adding more code to vacuum might not be acceptable
2) What if invalidation of replication slots emits an error, will it
block vacuum forever? Currently, InvalidateObsoleteReplicationSlots()
is also called as part of the checkpoint, and emitting ERRORs from
within is avoided already. Therefore, there is no concern here for
now.
3) What if there are more replication slots to be invalidated, will it
delay the vacuum? If yes, by how much? <<TODO>>
4) Will the invalidation based on just current replication slot xmin
limits suffice irrespective of vacuum cutoffs? IOW, if the replication
slots are invalidated but vacuum isn't going to do any work because
vacuum cutoffs are not yet met? Is the invalidation work wasteful
here?
5) Is it okay to take just one more time the proc array lock to get
current replication slot xmin limits via
ProcArrayGetReplicationSlotXmin() once every vacuum cycle? <<TODO>>
6) Vacuum command can't be run on the standby in recovery. So, to help
invalidate replication slots on the standby, I have for now let the
checkpointer also do the XID age based invalidation. I know
invalidating both in checkpointer and vacuum may not be a great idea,
but I'm open to thoughts.

Following are some of the alternative approaches which IMHO don't help
vacuum when needed:
a) Let the checkpointer do the XID age based invalidation, and call it
out in the documentation that if the checkpoint doesn't happen, the
new GUC doesn't help even if the vacuum is run. This has been the
approach until v40 patch.
b) Checkpointer and/or other backends add an autovacuum work item via
AutoVacuumRequestWork(), and autovacuum when it gets to it will
invalidate the replication slots. But, what to do for the vacuum
command here?

Please find the attached v41 patches implementing the idea of vacuum
doing the invalidation.

Thoughts?

Thanks to Sawada-san for a detailed off-list discussion.

--
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Attachment
On Mon, Jun 24, 2024 at 11:30:00AM +0530, Bharath Rupireddy wrote:
> 6) Vacuum command can't be run on the standby in recovery. So, to help
> invalidate replication slots on the standby, I have for now let the
> checkpointer also do the XID age based invalidation. I know
> invalidating both in checkpointer and vacuum may not be a great idea,
> but I'm open to thoughts.

Hm.  I hadn't considered this angle.

> a) Let the checkpointer do the XID age based invalidation, and call it
> out in the documentation that if the checkpoint doesn't happen, the
> new GUC doesn't help even if the vacuum is run. This has been the
> approach until v40 patch.

My first reaction is that this is probably okay.  I guess you might run
into problems if you set max_slot_xid_age to 2B and checkpoint_timeout to 1
day, but even in that case your transaction ID usage rate would need to be
pretty high for wraparound to occur.

-- 
nathan





On Mon, Jun 24, 2024 at 4:01 PM Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com> wrote:
Hi,

On Mon, Jun 17, 2024 at 5:55 PM Bharath Rupireddy
<bharath.rupireddyforpostgres@gmail.com> wrote:
>
> Here are my thoughts on when to do the XID age invalidation. In all
> the patches sent so far, the XID age invalidation happens in two
> places - one during the slot acquisition, and another during the
> checkpoint. As the suggestion is to do it during the vacuum (manual
> and auto), so that even if the checkpoint isn't happening in the
> database for whatever reasons, a vacuum command or autovacuum can
> invalidate the slots whose XID is aged.
>
> An idea is to check for XID age based invalidation for all the slots
> in ComputeXidHorizons() before it reads replication_slot_xmin and
> replication_slot_catalog_xmin, and obviously before the proc array
> lock is acquired. A potential problem with this approach is that the
> invalidation check can become too aggressive as XID horizons are
> computed from many places.
>
> Another idea is to check for XID age based invalidation for all the
> slots in higher levels than ComputeXidHorizons(), for example in
> vacuum() which is an entry point for both vacuum command and
> autovacuum. This approach seems similar to vacuum_failsafe_age GUC
> which checks each relation for the failsafe age before vacuum gets
> triggered on it.

I am attaching the patches implementing the idea of invalidating
replication slots during vacuum when current slot xmin limits
(procArray->replication_slot_xmin and
procArray->replication_slot_catalog_xmin) are aged as per the new XID
age GUC. When either of these limits are aged, there must be at least
one replication slot that is aged, because the xmin limits, after all,
are the minimum of xmin or catalog_xmin of all replication slots. In
this approach, the new XID age GUC will help vacuum when needed,
because the current slot xmin limits are recalculated after
invalidating replication slots that are holding xmins for longer than
the age. The code is placed in vacuum() which is common for both
vacuum command and autovacuum, and gets executed only once every
vacuum cycle to not be too aggressive in invalidating.

However, there might be some concerns with this approach like the following:
1) Adding more code to vacuum might not be acceptable
2) What if invalidation of replication slots emits an error, will it
block vacuum forever? Currently, InvalidateObsoleteReplicationSlots()
is also called as part of the checkpoint, and emitting ERRORs from
within is avoided already. Therefore, there is no concern here for
now.
3) What if there are more replication slots to be invalidated, will it
delay the vacuum? If yes, by how much? <<TODO>>
4) Will the invalidation based on just current replication slot xmin
limits suffice irrespective of vacuum cutoffs? IOW, if the replication
slots are invalidated but vacuum isn't going to do any work because
vacuum cutoffs are not yet met? Is the invalidation work wasteful
here?
5) Is it okay to take just one more time the proc array lock to get
current replication slot xmin limits via
ProcArrayGetReplicationSlotXmin() once every vacuum cycle? <<TODO>>
6) Vacuum command can't be run on the standby in recovery. So, to help
invalidate replication slots on the standby, I have for now let the
checkpointer also do the XID age based invalidation. I know
invalidating both in checkpointer and vacuum may not be a great idea,
but I'm open to thoughts.

Following are some of the alternative approaches which IMHO don't help
vacuum when needed:
a) Let the checkpointer do the XID age based invalidation, and call it
out in the documentation that if the checkpoint doesn't happen, the
new GUC doesn't help even if the vacuum is run. This has been the
approach until v40 patch.
b) Checkpointer and/or other backends add an autovacuum work item via
AutoVacuumRequestWork(), and autovacuum when it gets to it will
invalidate the replication slots. But, what to do for the vacuum
command here?

Please find the attached v41 patches implementing the idea of vacuum
doing the invalidation.

Thoughts?

Thanks to Sawada-san for a detailed off-list discussion.

The patch no longer applies on HEAD, please rebase.

regards,
Ajin Cherian
Fujitsu Australia
On Tue, Jul 9, 2024 at 3:01 PM Nathan Bossart <nathandbossart@gmail.com> wrote:
>
> On Mon, Jun 24, 2024 at 11:30:00AM +0530, Bharath Rupireddy wrote:
> > 6) Vacuum command can't be run on the standby in recovery. So, to help
> > invalidate replication slots on the standby, I have for now let the
> > checkpointer also do the XID age based invalidation. I know
> > invalidating both in checkpointer and vacuum may not be a great idea,
> > but I'm open to thoughts.
>
> Hm.  I hadn't considered this angle.

Another idea would be to let the startup process do slot invalidation
when replaying a RUNNING_XACTS record. Since a RUNNING_XACTS record
has the latest XID on the primary, I think the startup process can
compare it to the slot-xmin, and invalidate slots which are older than
the age limit.

Regards,

--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com





On Mon, Jun 24, 2024 at 4:01 PM Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com> wrote:
Hi,

On Mon, Jun 17, 2024 at 5:55 PM Bharath Rupireddy
<bharath.rupireddyforpostgres@gmail.com> wrote:

Please find the attached v41 patches implementing the idea of vacuum
doing the invalidation.

Thoughts?




Some minor comments on the patch:
1.
+ /*
+ * Release the lock if it's not yet to keep the cleanup path on
+ * error happy.
+ */

I suggest rephrasing to: " "Release the lock if it hasn't been already to ensure smooth cleanup on error."


2.

elog(DEBUG1, "performing replication slot invalidation");

Probably change it to "performing replication slot invalidation checks" as we might not actually invalidate any slot here.

3.

In CheckPointReplicationSlots()

+ invalidated = InvalidateObsoleteReplicationSlots(RS_INVAL_INACTIVE_TIMEOUT,
+ 0,
+ InvalidOid,
+ InvalidTransactionId);
+
+ if (invalidated)
+ {
+ /*
+ * If any slots have been invalidated, recalculate the resource
+ * limits.
+ */
+ ReplicationSlotsComputeRequiredXmin(false);
+ ReplicationSlotsComputeRequiredLSN();
+ }

Is this calculation of resource limits really required here when the same is already done inside InvalidateObsoleteReplicationSlots()


regards,
Ajin Cherian
Fujitsu Australia



On Mon, Aug 26, 2024 at 11:44 AM Bharath Rupireddy
<bharath.rupireddyforpostgres@gmail.com> wrote:
>

Few comments on 0001:
1.
@@ -651,6 +651,13 @@ synchronize_one_slot(RemoteSlot *remote_slot, Oid
remote_dbid)
     " name slot \"%s\" already exists on the standby",
     remote_slot->name));

+ /*
+ * Skip the sync if the local slot is already invalidated. We do this
+ * beforehand to avoid slot acquire and release.
+ */
+ if (slot->data.invalidated != RS_INVAL_NONE)
+ return false;
+
  /*
  * The slot has been synchronized before.

I was wondering why you have added this new check as part of this
patch. If you see the following comments in the related code, you will
know why we haven't done this previously.

/*
* The slot has been synchronized before.
*
* It is important to acquire the slot here before checking
* invalidation. If we don't acquire the slot first, there could be a
* race condition that the local slot could be invalidated just after
* checking the 'invalidated' flag here and we could end up
* overwriting 'invalidated' flag to remote_slot's value. See
* InvalidatePossiblyObsoleteSlot() where it invalidates slot directly
* if the slot is not acquired by other processes.
*
* XXX: If it ever turns out that slot acquire/release is costly for
* cases when none of the slot properties is changed then we can do a
* pre-check to ensure that at least one of the slot properties is
* changed before acquiring the slot.
*/
ReplicationSlotAcquire(remote_slot->name, true);

We need some modifications in these comments if you want to add a
pre-check here.

2.
@@ -1907,6 +2033,31 @@ CheckPointReplicationSlots(bool is_shutdown)
  SaveSlotToPath(s, path, LOG);
  }
  LWLockRelease(ReplicationSlotAllocationLock);
+
+ elog(DEBUG1, "performing replication slot invalidation checks");
+
+ /*
+ * Note that we will make another pass over replication slots for
+ * invalidations to keep the code simple. The assumption here is that the
+ * traversal over replication slots isn't that costly even with hundreds
+ * of replication slots. If it ever turns out that this assumption is
+ * wrong, we might have to put the invalidation check logic in the above
+ * loop, for that we might have to do the following:
+ *
+ * - Acqure ControlLock lock once before the loop.
+ *
+ * - Call InvalidatePossiblyObsoleteSlot for each slot.
+ *
+ * - Handle the cases in which ControlLock gets released just like
+ * InvalidateObsoleteReplicationSlots does.
+ *
+ * - Avoid saving slot info to disk two times for each invalidated slot.
+ *
+ * XXX: Should we move inactive_timeout inavalidation check closer to
+ * wal_removed in CreateCheckPoint and CreateRestartPoint?
+ */
+ InvalidateObsoleteReplicationSlots(RS_INVAL_INACTIVE_TIMEOUT,
+    0, InvalidOid, InvalidTransactionId);

Why do we want to call this for shutdown case (when is_shutdown is
true)? I understand trying to invalidate slots during regular
checkpoint but not sure if we need it at the time of shutdown. The
other point is can we try to check the performance impact with 100s of
slots as mentioned in the code comments?

--
With Regards,
Amit Kapila.



On Thu, Aug 29, 2024 at 11:31 AM Bharath Rupireddy
<bharath.rupireddyforpostgres@gmail.com> wrote:
>
> Thanks for looking into this.
>
> On Mon, Aug 26, 2024 at 4:35 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> >
> > Few comments on 0001:
> > 1.
> > @@ -651,6 +651,13 @@ synchronize_one_slot(RemoteSlot *remote_slot, Oid
> >
> > + /*
> > + * Skip the sync if the local slot is already invalidated. We do this
> > + * beforehand to avoid slot acquire and release.
> > + */
> >
> > I was wondering why you have added this new check as part of this
> > patch. If you see the following comments in the related code, you will
> > know why we haven't done this previously.
>
> Removed. Can deal with optimization separately.
>
> > 2.
> > + */
> > + InvalidateObsoleteReplicationSlots(RS_INVAL_INACTIVE_TIMEOUT,
> > +    0, InvalidOid, InvalidTransactionId);
> >
> > Why do we want to call this for shutdown case (when is_shutdown is
> > true)? I understand trying to invalidate slots during regular
> > checkpoint but not sure if we need it at the time of shutdown.
>
> Changed it to invalidate only for non-shutdown checkpoints. inactive_timeout invalidation isn't critical for shutdown
unlikewal_removed which can help shutdown by freeing up some disk space. 
>
> > The
> > other point is can we try to check the performance impact with 100s of
> > slots as mentioned in the code comments?
>
> I first checked how much does the wal_removed invalidation check add to the checkpoint (see 2nd and 3rd column). I
thenchecked how much inactive_timeout invalidation check adds to the checkpoint (see 4th column), it is not more than
wal_removeinvalidation check. I then checked how much the wal_removed invalidation check adds for replication slots
thathave already been invalidated due to inactive_timeout (see 5th column), looks like not much. 
>
> | # of slots | HEAD (no invalidation) ms | HEAD (wal_removed) ms | PATCHED (inactive_timeout) ms | PATCHED
(inactive_timeout+wal_removed)ms | 
>
|------------|----------------------------|-----------------------|-------------------------------|------------------------------------------|
> | 100        | 18.591                     | 370.586               | 359.299                       | 373.882
                      | 
> | 1000       | 15.722                     | 4834.901              | 5081.751                      | 5072.128
                      | 
> | 10000      | 19.261                     | 59801.062             | 61270.406                     | 60270.099
                      | 
>
> Having said that, I'm okay to implement the optimization specified. Thoughts?
>

The other possibility is to try invalidating due to timeout along with
wal_removed case during checkpoint. The idea is that if the slot can
be invalidated due to WAL then fine, otherwise check if it can be
invalidated due to timeout. This can avoid looping the slots and doing
similar work multiple times during the checkpoint.

--
With Regards,
Amit Kapila.



On Sat, Aug 31, 2024 at 1:45 PM Bharath Rupireddy
<bharath.rupireddyforpostgres@gmail.com> wrote:
>
> Please find the attached v44 patch with the above changes. I will
> include the 0002 xid_age based invalidation patch later.
>

It is better to get the 0001 reviewed and committed first. We can
discuss about 0002 afterwards as 0001 is in itself a complete and
separate patch that can be committed.

--
With Regards,
Amit Kapila.



Hi, my previous review posts did not cover the test code.

Here are my review comments for the v44-0001 test code

======
TEST CASE #1

1.
+# Wait for the inactive replication slot to be invalidated.
+$standby1->poll_query_until(
+ 'postgres', qq[
+ SELECT COUNT(slot_name) = 1 FROM pg_replication_slots
+ WHERE slot_name = 'lsub1_sync_slot' AND
+ invalidation_reason = 'inactive_timeout';
+])
+  or die
+  "Timed out while waiting for lsub1_sync_slot invalidation to be
synced on standby";
+

Is that comment correct? IIUC the synced slot should *already* be
invalidated from the primary, so here we are not really "waiting" for
it to be invalidated; Instead, we are just "confirming" that the
synchronized slot is already invalidated with the correct reason as
expected.

~~~

2.
+# Synced slot mustn't get invalidated on the standby even after a checkpoint,
+# it must sync invalidation from the primary. So, we must not see the slot's
+# invalidation message in server log.
+$standby1->safe_psql('postgres', "CHECKPOINT");
+ok( !$standby1->log_contains(
+ "invalidating obsolete replication slot \"lsub1_sync_slot\"",
+ $standby1_logstart),
+ 'check that syned lsub1_sync_slot has not been invalidated on the standby'
+);
+

This test case seemed bogus, for a couple of reasons:

2a. IIUC this 'lsub1_sync_slot' is the same one that is already
invalid (from the primary), so nobody should be surprised that an
already invalid slot doesn't get flagged as invalid again. i.e.
Shouldn't your test scenario here be done using a valid synced slot?

2b. AFAICT it was only moments above this CHECKPOINT where you
assigned the standby inactivity timeout to 2s. So even if there was
some bug invalidating synced slots I don't think you gave it enough
time to happen -- e.g. I doubt 2s has elapsed yet.

~

3.
+# Stop standby to make the standby's replication slot on the primary inactive
+$standby1->stop;
+
+# Wait for the standby's replication slot to become inactive
+wait_for_slot_invalidation($primary, 'sb1_slot', $logstart,
+ $inactive_timeout);

This seems a bit tricky. Both these (the stop and the wait) seem to
belong together, so I think maybe a single bigger explanatory comment
covering both parts would help for understanding.

======
TEST CASE #2

4.
+# Stop subscriber to make the replication slot on publisher inactive
+$subscriber->stop;
+
+# Wait for the replication slot to become inactive and then invalidated due to
+# timeout.
+wait_for_slot_invalidation($publisher, 'lsub1_slot', $logstart,
+ $inactive_timeout);

IIUC, this is just like comment #3 above. Both these (the stop and the
wait) seem to belong together, so I think maybe a single bigger
explanatory comment covering both parts would help for understanding.

~~~

5.
+# Testcase end: Invalidate logical subscriber's slot due to
+# replication_slot_inactive_timeout.
+# =============================================================================


IMO the rest of the comment after "Testcase end" isn't very useful.

======
sub wait_for_slot_invalidation

6.
+sub wait_for_slot_invalidation
+{

An explanatory header comment for this subroutine would be helpful.

~~~

7.
+ # Wait for the replication slot to become inactive
+ $node->poll_query_until(
+ 'postgres', qq[
+ SELECT COUNT(slot_name) = 1 FROM pg_replication_slots
+ WHERE slot_name = '$slot_name' AND active = 'f';
+ ])
+   or die
+   "Timed out while waiting for slot $slot_name to become inactive on
node $name";
+
+ # Wait for the replication slot info to be updated
+ $node->poll_query_until(
+ 'postgres', qq[
+ SELECT COUNT(slot_name) = 1 FROM pg_replication_slots
+ WHERE inactive_since IS NOT NULL
+ AND slot_name = '$slot_name' AND active = 'f';
+ ])
+   or die
+   "Timed out while waiting for info of slot $slot_name to be updated
on node $name";
+

Why are there are 2 separate poll_query_until's here? Can't those be
combined into just one?

~~~

8.
+ # Sleep at least $inactive_timeout duration to avoid multiple checkpoints
+ # for the slot to get invalidated.
+ sleep($inactive_timeout);
+

Maybe this special sleep to prevent too many CHECKPOINTs should be
moved to be inside the other subroutine, which is actually doing those
CHECKPOINTs.

~~~

9.
+ # Wait for the inactive replication slot to be invalidated
+ $node->poll_query_until(
+ 'postgres', qq[
+ SELECT COUNT(slot_name) = 1 FROM pg_replication_slots
+ WHERE slot_name = '$slot_name' AND
+ invalidation_reason = 'inactive_timeout';
+ ])
+   or die
+   "Timed out while waiting for inactive slot $slot_name to be
invalidated on node $name";
+

The comment seems misleading. IIUC you are not "waiting" for the
invalidation here, because it is the other subroutine doing the
waiting for the invalidation message in the logs. Instead, here I
think you are just confirming the 'invalidation_reason' got set
correctly. The comment should say what it is really doing.

======
sub check_for_slot_invalidation_in_server_log

10.
+# Check for invalidation of slot in server log
+sub check_for_slot_invalidation_in_server_log
+{

I think the main function of this subroutine is the CHECKPOINT and the
waiting for the server log to say invalidation happened. It is doing a
loop of a) CHECKPOINT then b) inspecting the server log for the slot
invalidation, and c) waiting for a bit. Repeat 10 times.

A comment describing the logic for this subroutine would be helpful.

The most important side-effect of this function is the CHECKPOINT
because without that nothing will ever get invalidated due to
inactivity, but this key point is not obvious from the subroutine
name.

IMO it would be better to name this differently to reflect what it is
really doing:
e.g. "CHECKPOINT_and_wait_for_slot_invalidation_in_server_log"

======
Kind Regards,
Peter Smith.
Fujitsu Australia



On Sat, Aug 31, 2024 at 1:45 PM Bharath Rupireddy
<bharath.rupireddyforpostgres@gmail.com> wrote:
>
> Hi,
>
>
> Please find the attached v44 patch with the above changes. I will
> include the 0002 xid_age based invalidation patch later.
>

Thanks for the patch Bharath. My review and testing is WIP, but please
find few comments and queries:

1)
I see that ReplicationSlotAlter() will error out if the slot is
invalidated due to timeout. I have not tested it myself, but do you
know if  slot-alter errors out for other invalidation causes as well?
Just wanted to confirm that the behaviour is consistent for all
invalidation causes.

2)
When a slot is invalidated, and we try to use that slot, it gives this msg:

ERROR:  can no longer get changes from replication slot "mysubnew1_2"
DETAIL:  The slot became invalid because it was inactive since
2024-09-03 14:23:34.094067+05:30, which is more than 600 seconds ago.
HINT:  You might need to increase "replication_slot_inactive_timeout.".

Isn't HINT misleading? Even if we increase it now, the slot can not be
reused again.


3)
When the slot is invalidated, the' inactive_since' still keeps on
changing when there is a subscriber trying to start replication
continuously. I think ReplicationSlotAcquire() keeps on failing and
thus Release keeps on setting it again and again. Shouldn't we stop
setting/chnaging  'inactive_since' once the slot is invalidated
already, otherwise it will be misleading.

postgres=# select failover,synced,inactive_since,invalidation_reason
from pg_replication_slots;

 failover | synced |          inactive_since          | invalidation_reason
----------+--------+----------------------------------+---------------------
 t        | f      | 2024-09-03 14:23:.. | inactive_timeout

after sometime:
 failover | synced |          inactive_since          | invalidation_reason
----------+--------+----------------------------------+---------------------
 t        | f      | 2024-09-03 14:26:..| inactive_timeout


4)
src/sgml/config.sgml:

4a)
+ A value of zero (which is default) disables the timeout mechanism.

Better will be:
A value of zero (which is default) disables the inactive timeout
invalidation mechanism .
or
A value of zero (which is default) disables the slot invalidation due
to the inactive timeout mechanism.

i.e. rephrase to indicate that invalidation is disabled.

4b)
'synced' and inactive_since should point to pg_replication_slots:

example:
<link linkend="view-pg-replication-slots">pg_replication_slots</link>.<structfield>synced</structfield>

5)
src/sgml/system-views.sgml:
+ ..the slot has been inactive for longer than the duration specified
by replication_slot_inactive_timeout parameter.

Better to have:
..the slot has been inactive for a time longer than the duration
specified by the replication_slot_inactive_timeout parameter.

thanks
Shveta



On Tue, Sep 3, 2024 at 3:01 PM shveta malik <shveta.malik@gmail.com> wrote:
>
>
> 1)
> I see that ReplicationSlotAlter() will error out if the slot is
> invalidated due to timeout. I have not tested it myself, but do you
> know if  slot-alter errors out for other invalidation causes as well?
> Just wanted to confirm that the behaviour is consistent for all
> invalidation causes.

I was able to test this and as anticipated behavior is different. When
slot is invalidated due to say 'wal_removed', I am still able to do
'alter' of that slot.
Please see:

Pub:
  slot_name  | failover | synced |          inactive_since          |
invalidation_reason
-------------+----------+--------+----------------------------------+---------------------
 mysubnew1_1 | t        | f      | 2024-09-04 08:58:12.802278+05:30 |
wal_removed

Sub:
newdb1=# alter subscription mysubnew1_1 disable;
ALTER SUBSCRIPTION

newdb1=# alter subscription mysubnew1_1 set (failover=false);
ALTER SUBSCRIPTION

Pub: (failover altered)
  slot_name  | failover | synced |          inactive_since          |
invalidation_reason
-------------+----------+--------+----------------------------------+---------------------
 mysubnew1_1 | f        | f      | 2024-09-04 08:58:47.824471+05:30 |
wal_removed


while when invalidation_reason is 'inactive_timeout', it fails:

Pub:
  slot_name  | failover | synced |          inactive_since          |
invalidation_reason
-------------+----------+--------+----------------------------------+---------------------
 mysubnew1_1 | t        | f      | 2024-09-03 14:30:57.532206+05:30 |
inactive_timeout

Sub:
newdb1=# alter subscription mysubnew1_1 disable;
ALTER SUBSCRIPTION

newdb1=# alter subscription mysubnew1_1 set (failover=false);
ERROR:  could not alter replication slot "mysubnew1_1": ERROR:  can no
longer get changes from replication slot "mysubnew1_1"
DETAIL:  The slot became invalid because it was inactive since
2024-09-04 08:54:20.308996+05:30, which is more than 0 seconds ago.
HINT:  You might need to increase "replication_slot_inactive_timeout.".

I think the behavior should be same.

thanks
Shveta



On Wed, Sep 4, 2024 at 9:17 AM shveta malik <shveta.malik@gmail.com> wrote:
>
> On Tue, Sep 3, 2024 at 3:01 PM shveta malik <shveta.malik@gmail.com> wrote:
> >
> >


1)
It is related to one of my previous comments (pt 3 in [1]) where I
stated that inactive_since should not keep on changing once a slot is
invalidated.
Below is one side effect if inactive_since keeps on changing:

postgres=# SELECT * FROM pg_replication_slot_advance('mysubnew1_1',
pg_current_wal_lsn());
ERROR:  can no longer get changes from replication slot "mysubnew1_1"
DETAIL:  The slot became invalid because it was inactive since
2024-09-04 10:03:56.68053+05:30, which is more than 10 seconds ago.
HINT:  You might need to increase "replication_slot_inactive_timeout.".

postgres=# select now();
               now
---------------------------------
 2024-09-04 10:04:00.26564+05:30

'DETAIL' gives wrong information, we are not past 10-seconds. This is
because inactive_since got updated even in ERROR scenario.


2)
One more issue in this message is, once I set
replication_slot_inactive_timeout to a bigger value, it becomes more
misleading. This is because invalidation was done in the past using
previous value while message starts showing new value:

ALTER SYSTEM SET replication_slot_inactive_timeout TO '36h';

--see 129600 secs in DETAIL and the current time.
postgres=# SELECT * FROM pg_replication_slot_advance('mysubnew1_1',
pg_current_wal_lsn());
ERROR:  can no longer get changes from replication slot "mysubnew1_1"
DETAIL:  The slot became invalid because it was inactive since
2024-09-04 10:06:38.980939+05:30, which is more than 129600 seconds
ago.
postgres=# select now();
               now
----------------------------------
 2024-09-04 10:07:35.201894+05:30

I feel we should change this message itself.

~~~~~

When invalidation is due to wal_removed, we get a way simpler message:

newdb1=# SELECT * FROM pg_replication_slot_advance('mysubnew1_2',
pg_current_wal_lsn());
ERROR:  replication slot "mysubnew1_2" cannot be advanced
DETAIL:  This slot has never previously reserved WAL, or it has been
invalidated.

This message does not mention 'max_slot_wal_keep_size'. We should have
a similar message for our case. Thoughts?

[1]:  https://www.postgresql.org/message-id/CAJpy0uC8Dg-0JS3NRUwVUemgz5Ar2v3_EQQFXyAigWSEQ8U47Q%40mail.gmail.com

thanks
Shveta



On Wed, Sep 4, 2024 at 2:49 PM shveta malik <shveta.malik@gmail.com> wrote:
>
> On Wed, Sep 4, 2024 at 9:17 AM shveta malik <shveta.malik@gmail.com> wrote:
> >
> > On Tue, Sep 3, 2024 at 3:01 PM shveta malik <shveta.malik@gmail.com> wrote:
> > >
> > >
>
>
> 1)
> It is related to one of my previous comments (pt 3 in [1]) where I
> stated that inactive_since should not keep on changing once a slot is
> invalidated.
>

Agreed. Updating the inactive_since for a slot that is already invalid
is misleading.

>
>
> 2)
> One more issue in this message is, once I set
> replication_slot_inactive_timeout to a bigger value, it becomes more
> misleading. This is because invalidation was done in the past using
> previous value while message starts showing new value:
>
> ALTER SYSTEM SET replication_slot_inactive_timeout TO '36h';
>
> --see 129600 secs in DETAIL and the current time.
> postgres=# SELECT * FROM pg_replication_slot_advance('mysubnew1_1',
> pg_current_wal_lsn());
> ERROR:  can no longer get changes from replication slot "mysubnew1_1"
> DETAIL:  The slot became invalid because it was inactive since
> 2024-09-04 10:06:38.980939+05:30, which is more than 129600 seconds
> ago.
> postgres=# select now();
>                now
> ----------------------------------
>  2024-09-04 10:07:35.201894+05:30
>
> I feel we should change this message itself.
>

+1.

--
With Regards,
Amit Kapila.



On Wed, Sep 4, 2024 at 9:17 AM shveta malik <shveta.malik@gmail.com> wrote:
>
> On Tue, Sep 3, 2024 at 3:01 PM shveta malik <shveta.malik@gmail.com> wrote:
> >
> >
> > 1)
> > I see that ReplicationSlotAlter() will error out if the slot is
> > invalidated due to timeout. I have not tested it myself, but do you
> > know if  slot-alter errors out for other invalidation causes as well?
> > Just wanted to confirm that the behaviour is consistent for all
> > invalidation causes.
>
> I was able to test this and as anticipated behavior is different. When
> slot is invalidated due to say 'wal_removed', I am still able to do
> 'alter' of that slot.
> Please see:
>
> Pub:
>   slot_name  | failover | synced |          inactive_since          |
> invalidation_reason
> -------------+----------+--------+----------------------------------+---------------------
>  mysubnew1_1 | t        | f      | 2024-09-04 08:58:12.802278+05:30 |
> wal_removed
>
> Sub:
> newdb1=# alter subscription mysubnew1_1 disable;
> ALTER SUBSCRIPTION
>
> newdb1=# alter subscription mysubnew1_1 set (failover=false);
> ALTER SUBSCRIPTION
>
> Pub: (failover altered)
>   slot_name  | failover | synced |          inactive_since          |
> invalidation_reason
> -------------+----------+--------+----------------------------------+---------------------
>  mysubnew1_1 | f        | f      | 2024-09-04 08:58:47.824471+05:30 |
> wal_removed
>
>
> while when invalidation_reason is 'inactive_timeout', it fails:
>
> Pub:
>   slot_name  | failover | synced |          inactive_since          |
> invalidation_reason
> -------------+----------+--------+----------------------------------+---------------------
>  mysubnew1_1 | t        | f      | 2024-09-03 14:30:57.532206+05:30 |
> inactive_timeout
>
> Sub:
> newdb1=# alter subscription mysubnew1_1 disable;
> ALTER SUBSCRIPTION
>
> newdb1=# alter subscription mysubnew1_1 set (failover=false);
> ERROR:  could not alter replication slot "mysubnew1_1": ERROR:  can no
> longer get changes from replication slot "mysubnew1_1"
> DETAIL:  The slot became invalid because it was inactive since
> 2024-09-04 08:54:20.308996+05:30, which is more than 0 seconds ago.
> HINT:  You might need to increase "replication_slot_inactive_timeout.".
>
> I think the behavior should be same.
>

We should not allow the invalid replication slot to be altered
irrespective of the reason unless there is any benefit.

--
With Regards,
Amit Kapila.



Re: Introduce XID age and inactive timeout based replication slot invalidation

From
Bharath Rupireddy
Date:
Hi,

Thanks for reviewing.

On Tue, Sep 3, 2024 at 3:01 PM shveta malik <shveta.malik@gmail.com> wrote:
>
> 1)
> I see that ReplicationSlotAlter() will error out if the slot is
> invalidated due to timeout. I have not tested it myself, but do you
> know if  slot-alter errors out for other invalidation causes as well?
> Just wanted to confirm that the behaviour is consistent for all
> invalidation causes.

Will respond to Amit's comment soon.

> 2)
> When a slot is invalidated, and we try to use that slot, it gives this msg:
>
> ERROR:  can no longer get changes from replication slot "mysubnew1_2"
> DETAIL:  The slot became invalid because it was inactive since
> 2024-09-03 14:23:34.094067+05:30, which is more than 600 seconds ago.
> HINT:  You might need to increase "replication_slot_inactive_timeout.".
>
> Isn't HINT misleading? Even if we increase it now, the slot can not be
> reused again.
>
> Below is one side effect if inactive_since keeps on changing:
>
> postgres=# SELECT * FROM pg_replication_slot_advance('mysubnew1_1',
> pg_current_wal_lsn());
> ERROR:  can no longer get changes from replication slot "mysubnew1_1"
> DETAIL:  The slot became invalid because it was inactive since
> 2024-09-04 10:03:56.68053+05:30, which is more than 10 seconds ago.
> HINT:  You might need to increase "replication_slot_inactive_timeout.".
>
> postgres=# select now();
>                now
> ---------------------------------
>  2024-09-04 10:04:00.26564+05:30
>
> 'DETAIL' gives wrong information, we are not past 10-seconds. This is
> because inactive_since got updated even in ERROR scenario.
>
> ERROR:  can no longer get changes from replication slot "mysubnew1_1"
> DETAIL:  The slot became invalid because it was inactive since
> 2024-09-04 10:06:38.980939+05:30, which is more than 129600 seconds
> ago.
> postgres=# select now();
>                now
> ----------------------------------
>  2024-09-04 10:07:35.201894+05:30
>
> I feel we should change this message itself.

Removed the hint and corrected the detail message as following:

errmsg("can no longer get changes from replication slot \"%s\"",
NameStr(s->data.name)),
errdetail("This slot has been invalidated because it was inactive for
longer than the amount of time specified by \"%s\".",
"replication_slot_inactive_timeout.")));

> 3)
> When the slot is invalidated, the' inactive_since' still keeps on
> changing when there is a subscriber trying to start replication
> continuously. I think ReplicationSlotAcquire() keeps on failing and
> thus Release keeps on setting it again and again. Shouldn't we stop
> setting/chnaging  'inactive_since' once the slot is invalidated
> already, otherwise it will be misleading.
>
> postgres=# select failover,synced,inactive_since,invalidation_reason
> from pg_replication_slots;
>
>  failover | synced |          inactive_since          | invalidation_reason
> ----------+--------+----------------------------------+---------------------
>  t        | f      | 2024-09-03 14:23:.. | inactive_timeout
>
> after sometime:
>  failover | synced |          inactive_since          | invalidation_reason
> ----------+--------+----------------------------------+---------------------
>  t        | f      | 2024-09-03 14:26:..| inactive_timeout

Changed it to not update inactive_since for slots invalidated due to
inactive timeout.

> 4)
> src/sgml/config.sgml:
>
> 4a)
> + A value of zero (which is default) disables the timeout mechanism.
>
> Better will be:
> A value of zero (which is default) disables the inactive timeout
> invalidation mechanism .

Changed.

> 4b)
> 'synced' and inactive_since should point to pg_replication_slots:
>
> example:
> <link linkend="view-pg-replication-slots">pg_replication_slots</link>.<structfield>synced</structfield>

Modified.

> 5)
> src/sgml/system-views.sgml:
> + ..the slot has been inactive for longer than the duration specified
> by replication_slot_inactive_timeout parameter.
>
> Better to have:
> ..the slot has been inactive for a time longer than the duration
> specified by the replication_slot_inactive_timeout parameter.

Changed it to the following to be consistent with the config.sgml.

          <literal>inactive_timeout</literal> means that the slot has been
          inactive for longer than the amount of time specified by the
          <xref linkend="guc-replication-slot-inactive-timeout"/> parameter.

Please find the v45 patch posted upthread at
https://www.postgresql.org/message-id/CALj2ACWXQT3_HY40ceqKf1DadjLQP6b1r%3D0sZRh-xhAOd-b0pA%40mail.gmail.com
for the changes.

--
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com



On Thu, Sep 5, 2024 at 9:30 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> On Wed, Sep 4, 2024 at 9:17 AM shveta malik <shveta.malik@gmail.com> wrote:
> >
> > On Tue, Sep 3, 2024 at 3:01 PM shveta malik <shveta.malik@gmail.com> wrote:
> > >
> > >
> > > 1)
> > > I see that ReplicationSlotAlter() will error out if the slot is
> > > invalidated due to timeout. I have not tested it myself, but do you
> > > know if  slot-alter errors out for other invalidation causes as well?
> > > Just wanted to confirm that the behaviour is consistent for all
> > > invalidation causes.
> >
> > I was able to test this and as anticipated behavior is different. When
> > slot is invalidated due to say 'wal_removed', I am still able to do
> > 'alter' of that slot.
> > Please see:
> >
> > Pub:
> >   slot_name  | failover | synced |          inactive_since          |
> > invalidation_reason
> > -------------+----------+--------+----------------------------------+---------------------
> >  mysubnew1_1 | t        | f      | 2024-09-04 08:58:12.802278+05:30 |
> > wal_removed
> >
> > Sub:
> > newdb1=# alter subscription mysubnew1_1 disable;
> > ALTER SUBSCRIPTION
> >
> > newdb1=# alter subscription mysubnew1_1 set (failover=false);
> > ALTER SUBSCRIPTION
> >
> > Pub: (failover altered)
> >   slot_name  | failover | synced |          inactive_since          |
> > invalidation_reason
> > -------------+----------+--------+----------------------------------+---------------------
> >  mysubnew1_1 | f        | f      | 2024-09-04 08:58:47.824471+05:30 |
> > wal_removed
> >
> >
> > while when invalidation_reason is 'inactive_timeout', it fails:
> >
> > Pub:
> >   slot_name  | failover | synced |          inactive_since          |
> > invalidation_reason
> > -------------+----------+--------+----------------------------------+---------------------
> >  mysubnew1_1 | t        | f      | 2024-09-03 14:30:57.532206+05:30 |
> > inactive_timeout
> >
> > Sub:
> > newdb1=# alter subscription mysubnew1_1 disable;
> > ALTER SUBSCRIPTION
> >
> > newdb1=# alter subscription mysubnew1_1 set (failover=false);
> > ERROR:  could not alter replication slot "mysubnew1_1": ERROR:  can no
> > longer get changes from replication slot "mysubnew1_1"
> > DETAIL:  The slot became invalid because it was inactive since
> > 2024-09-04 08:54:20.308996+05:30, which is more than 0 seconds ago.
> > HINT:  You might need to increase "replication_slot_inactive_timeout.".
> >
> > I think the behavior should be same.
> >
>
> We should not allow the invalid replication slot to be altered
> irrespective of the reason unless there is any benefit.
>

Okay, then I think we need to change the existing behaviour of the
other invalidation causes which still allow alter-slot.

thanks
Shveta



Re: Introduce XID age and inactive timeout based replication slot invalidation

From
Bharath Rupireddy
Date:
Hi,

On Mon, Sep 9, 2024 at 9:17 AM shveta malik <shveta.malik@gmail.com> wrote:
>
> > We should not allow the invalid replication slot to be altered
> > irrespective of the reason unless there is any benefit.
>
> Okay, then I think we need to change the existing behaviour of the
> other invalidation causes which still allow alter-slot.

+1. Perhaps, track it in a separate thread?

--
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com



On Mon, Sep 9, 2024 at 10:26 AM Bharath Rupireddy
<bharath.rupireddyforpostgres@gmail.com> wrote:
>
> Hi,
>
> On Mon, Sep 9, 2024 at 9:17 AM shveta malik <shveta.malik@gmail.com> wrote:
> >
> > > We should not allow the invalid replication slot to be altered
> > > irrespective of the reason unless there is any benefit.
> >
> > Okay, then I think we need to change the existing behaviour of the
> > other invalidation causes which still allow alter-slot.
>
> +1. Perhaps, track it in a separate thread?

I think so. It does not come under the scope of this thread.

thanks
Shveta



On Sun, Sep 8, 2024 at 5:25 PM Bharath Rupireddy
<bharath.rupireddyforpostgres@gmail.com> wrote:
>
>
> Please find the v45 patch. Addressed above and Shveta's review comments [1].
>

Thanks for the patch. Please find my comments:

1)
src/sgml/config.sgml:

+  Synced slots are always considered to be inactive because they
don't perform logical decoding to produce changes.

It is better we avoid such a statement, as internally we use logical
decoding to advance restart-lsn, see
'LogicalSlotAdvanceAndCheckSnapState' called form slotsync.c.
<Also see related comment 6 below>

2)
src/sgml/config.sgml:

+ disables the inactive timeout invalidation mechanism

+ Slot invalidation due to inactivity timeout occurs during checkpoint.

Either have 'inactive' at both the places or 'inactivity'.


3)
slot.c:
+static bool InvalidatePossiblyObsoleteSlot(ReplicationSlotInvalidationCause
cause,
+    ReplicationSlot *s,
+    XLogRecPtr oldestLSN,
+    Oid dboid,
+    TransactionId snapshotConflictHorizon,
+    bool *invalidated);
+static inline bool SlotInactiveTimeoutCheckAllowed(ReplicationSlot *s);

I think, we do not need above 2 declarations. The code compile fine
without these as the usage is later than the definition.


4)
+ /*
+ * An error is raised if error_if_invalid is true and the slot has been
+ * invalidated previously.
+ */
+ if (error_if_invalid && s->data.invalidated == RS_INVAL_INACTIVE_TIMEOUT)

The comment is generic while the 'if condition' is specific to one
invalidation cause. Even though I feel it can be made generic test for
all invalidation causes but that is not under scope of this thread and
needs more testing/analysis. For the time being, we can make comment
specific to the concerned invalidation cause. The header of function
will also need the same change.

5)
SlotInactiveTimeoutCheckAllowed():

+ * Check if inactive timeout invalidation mechanism is disabled or slot is
+ * currently being used or server is in recovery mode or slot on standby is
+ * currently being synced from the primary.
+ *

These comments say exact opposite of what we are checking in code.
Since the function name has 'Allowed' in it, we should be putting
comments which say what allows it instead of what disallows it.


6)

+ * Synced slots are always considered to be inactive because they don't
+ * perform logical decoding to produce changes.
+ */
+static inline bool
+SlotInactiveTimeoutCheckAllowed(ReplicationSlot *s)

Perhaps we should avoid mentioning logical decoding here. When slots
are synced, they are performing decoding and their inactive_since is
changing continuously. A better way to make this statement will be:

We want to ensure that the slots being synchronized are not
invalidated, as they need to be preserved for future use when the
standby server is promoted to the primary. This is necessary for
resuming logical replication from the new primary server.
<Rephrase if needed>

7)

InvalidatePossiblyObsoleteSlot()

we are calling SlotInactiveTimeoutCheckAllowed() twice in this
function. We shall optimize.

At the first usage place, shall we simply get timestamp when cause is
RS_INVAL_INACTIVE_TIMEOUT without checking
SlotInactiveTimeoutCheckAllowed() as IMO it does not seem a
performance critical section. Or if we retain check at first place,
then at the second place we can avoid calling it again based on
whether 'now' is NULL or not.

thanks
Shveta



On Mon, Sep 9, 2024 at 10:28 AM shveta malik <shveta.malik@gmail.com> wrote:
>
> On Mon, Sep 9, 2024 at 10:26 AM Bharath Rupireddy
> <bharath.rupireddyforpostgres@gmail.com> wrote:
> >
> > Hi,
> >
> > On Mon, Sep 9, 2024 at 9:17 AM shveta malik <shveta.malik@gmail.com> wrote:
> > >
> > > > We should not allow the invalid replication slot to be altered
> > > > irrespective of the reason unless there is any benefit.
> > >
> > > Okay, then I think we need to change the existing behaviour of the
> > > other invalidation causes which still allow alter-slot.
> >
> > +1. Perhaps, track it in a separate thread?
>
> I think so. It does not come under the scope of this thread.
>

It makes sense to me as well. But let's go ahead and get that sorted out first.

--
With Regards,
Amit Kapila.



Re: Introduce XID age and inactive timeout based replication slot invalidation

From
Bharath Rupireddy
Date:
Hi,

On Mon, Sep 9, 2024 at 3:04 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> > > > > We should not allow the invalid replication slot to be altered
> > > > > irrespective of the reason unless there is any benefit.
> > > >
> > > > Okay, then I think we need to change the existing behaviour of the
> > > > other invalidation causes which still allow alter-slot.
> > >
> > > +1. Perhaps, track it in a separate thread?
> >
> > I think so. It does not come under the scope of this thread.
>
> It makes sense to me as well. But let's go ahead and get that sorted out first.

Moved the discussion to new thread -
https://www.postgresql.org/message-id/CALj2ACW4fSOMiKjQ3%3D2NVBMTZRTG8Ujg6jsK9z3EvOtvA4vzKQ%40mail.gmail.com.
Please have a look.

--
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com



On Tue, Sep 10, 2024 at 12:13 AM Bharath Rupireddy
<bharath.rupireddyforpostgres@gmail.com> wrote:
>
> On Mon, Sep 9, 2024 at 3:04 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> >
> > > > > > We should not allow the invalid replication slot to be altered
> > > > > > irrespective of the reason unless there is any benefit.
> > > > >
> > > > > Okay, then I think we need to change the existing behaviour of the
> > > > > other invalidation causes which still allow alter-slot.
> > > >
> > > > +1. Perhaps, track it in a separate thread?
> > >
> > > I think so. It does not come under the scope of this thread.
> >
> > It makes sense to me as well. But let's go ahead and get that sorted out first.
>
> Moved the discussion to new thread -
> https://www.postgresql.org/message-id/CALj2ACW4fSOMiKjQ3%3D2NVBMTZRTG8Ujg6jsK9z3EvOtvA4vzKQ%40mail.gmail.com.
> Please have a look.
>

That is pushed now. Please send the rebased patch after addressing the
pending comments.

--
With Regards,
Amit Kapila.



Re: Introduce XID age and inactive timeout based replication slot invalidation

From
Bharath Rupireddy
Date:
Hi,

Thanks for reviewing.

On Mon, Sep 9, 2024 at 10:54 AM shveta malik <shveta.malik@gmail.com> wrote:
>
> 2)
> src/sgml/config.sgml:
>
> + disables the inactive timeout invalidation mechanism
>
> + Slot invalidation due to inactivity timeout occurs during checkpoint.
>
> Either have 'inactive' at both the places or 'inactivity'.

Used "inactive timeout".

> 3)
> slot.c:
> +static bool InvalidatePossiblyObsoleteSlot(ReplicationSlotInvalidationCause
> cause,
> +    ReplicationSlot *s,
> +    XLogRecPtr oldestLSN,
> +    Oid dboid,
> +    TransactionId snapshotConflictHorizon,
> +    bool *invalidated);
> +static inline bool SlotInactiveTimeoutCheckAllowed(ReplicationSlot *s);
>
> I think, we do not need above 2 declarations. The code compile fine
> without these as the usage is later than the definition.

Hm, it's a usual practice that I follow irrespective of the placement
of function declarations. Since it was brought up, I removed the
declarations.

> 4)
> + /*
> + * An error is raised if error_if_invalid is true and the slot has been
> + * invalidated previously.
> + */
> + if (error_if_invalid && s->data.invalidated == RS_INVAL_INACTIVE_TIMEOUT)
>
> The comment is generic while the 'if condition' is specific to one
> invalidation cause. Even though I feel it can be made generic test for
> all invalidation causes but that is not under scope of this thread and
> needs more testing/analysis.

Right.

> For the time being, we can make comment
> specific to the concerned invalidation cause. The header of function
> will also need the same change.

Adjusted the comment, but left the variable name error_if_invalid as
is. Didn't want to make it long, one can look at the code to
understand what it is used for.

> 5)
> SlotInactiveTimeoutCheckAllowed():
>
> + * Check if inactive timeout invalidation mechanism is disabled or slot is
> + * currently being used or server is in recovery mode or slot on standby is
> + * currently being synced from the primary.
> + *
>
> These comments say exact opposite of what we are checking in code.
> Since the function name has 'Allowed' in it, we should be putting
> comments which say what allows it instead of what disallows it.

Modified.

> 1)
> src/sgml/config.sgml:
>
> +  Synced slots are always considered to be inactive because they
> don't perform logical decoding to produce changes.
>
> It is better we avoid such a statement, as internally we use logical
> decoding to advance restart-lsn, see
> 'LogicalSlotAdvanceAndCheckSnapState' called form slotsync.c.
> <Also see related comment 6 below>
>
> 6)
>
> + * Synced slots are always considered to be inactive because they don't
> + * perform logical decoding to produce changes.
> + */
> +static inline bool
> +SlotInactiveTimeoutCheckAllowed(ReplicationSlot *s)
>
> Perhaps we should avoid mentioning logical decoding here. When slots
> are synced, they are performing decoding and their inactive_since is
> changing continuously. A better way to make this statement will be:
>
> We want to ensure that the slots being synchronized are not
> invalidated, as they need to be preserved for future use when the
> standby server is promoted to the primary. This is necessary for
> resuming logical replication from the new primary server.
> <Rephrase if needed>

They are performing logical decoding, but not producing the changes
for the clients to consume. So, IMO, the accompanying "to produce
changes" next to the "logical decoding" is good here.

> 7)
>
> InvalidatePossiblyObsoleteSlot()
>
> we are calling SlotInactiveTimeoutCheckAllowed() twice in this
> function. We shall optimize.
>
> At the first usage place, shall we simply get timestamp when cause is
> RS_INVAL_INACTIVE_TIMEOUT without checking
> SlotInactiveTimeoutCheckAllowed() as IMO it does not seem a
> performance critical section. Or if we retain check at first place,
> then at the second place we can avoid calling it again based on
> whether 'now' is NULL or not.

Getting a current timestamp can get costlier on platforms that use
various clock sources, so assigning 'now' unconditionally isn't the
way IMO. Using the inline function in two places improves the
readability. Can optimize it if there's any performance impact of
calling the inline function in two places.

Will post the new patch version soon.

--
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com



On Mon, Sep 16, 2024 at 3:31 PM Bharath Rupireddy
<bharath.rupireddyforpostgres@gmail.com> wrote:
>
> Please find the attached v46 patch having changes for the above review
> comments and your test review comments and Shveta's review comments.
>

-ReplicationSlotAcquire(const char *name, bool nowait)
+ReplicationSlotAcquire(const char *name, bool nowait, bool error_if_invalid)
 {
  ReplicationSlot *s;
  int active_pid;
@@ -615,6 +620,22 @@ retry:
  /* We made this slot active, so it's ours now. */
  MyReplicationSlot = s;

+ /*
+ * An error is raised if error_if_invalid is true and the slot has been
+ * previously invalidated due to inactive timeout.
+ */
+ if (error_if_invalid &&
+ s->data.invalidated == RS_INVAL_INACTIVE_TIMEOUT)
+ {
+ Assert(s->inactive_since > 0);
+ ereport(ERROR,
+ (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
+ errmsg("can no longer get changes from replication slot \"%s\"",
+ NameStr(s->data.name)),
+ errdetail("This slot has been invalidated because it was inactive
for longer than the amount of time specified by \"%s\".",
+    "replication_slot_inactive_timeout")));
+ }

Why raise the ERROR just for timeout invalidation here and why not if
the slot is invalidated for other reasons? This raises the question of
what happens before this patch if the invalid slot is used from places
where we call ReplicationSlotAcquire(). I did a brief code analysis
and found that for StartLogicalReplication(), even if the error won't
occur in ReplicationSlotAcquire(), it would have been caught in
CreateDecodingContext(). I think that is where we should also add this
new error. Similarly, pg_logical_slot_get_changes_guts() and other
logical replication functions should be calling
CreateDecodingContext() which can raise the new ERROR. I am not sure
about how the invalid slots are handled during physical replication,
please check the behavior of that before this patch.

--
With Regards,
Amit Kapila.



Re: Introduce XID age and inactive timeout based replication slot invalidation

From
Bharath Rupireddy
Date:
Hi,

Thanks for looking into this.

On Mon, Sep 16, 2024 at 4:54 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> Why raise the ERROR just for timeout invalidation here and why not if
> the slot is invalidated for other reasons? This raises the question of
> what happens before this patch if the invalid slot is used from places
> where we call ReplicationSlotAcquire(). I did a brief code analysis
> and found that for StartLogicalReplication(), even if the error won't
> occur in ReplicationSlotAcquire(), it would have been caught in
> CreateDecodingContext(). I think that is where we should also add this
> new error. Similarly, pg_logical_slot_get_changes_guts() and other
> logical replication functions should be calling
> CreateDecodingContext() which can raise the new ERROR. I am not sure
> about how the invalid slots are handled during physical replication,
> please check the behavior of that before this patch.

When physical slots are invalidated due to wal_removed reason, the failure happens at a much later point for the streaming standbys while reading the requested WAL files like the following:

2024-09-16 16:29:52.416 UTC [876059] FATAL:  could not receive data from WAL stream: ERROR:  requested WAL segment 000000010000000000000005 has already been removed
2024-09-16 16:29:52.416 UTC [872418] LOG:  waiting for WAL to become available at 0/5002000

At this point, despite the slot being invalidated, its wal_status can still come back to 'unreserved' even from 'lost', and the standby can catch up if removed WAL files are copied either by manually or by a tool/script to the primary's pg_wal directory. IOW, the physical slots invalidated due to wal_removed are *somehow* recoverable unlike the logical slots.

IIUC, the invalidation of a slot implies that it is not guaranteed to hold any resources like WAL and XMINs. Does it also imply that the slot must be unusable?

--
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com
On Mon, Sep 16, 2024 at 3:31 PM Bharath Rupireddy
<bharath.rupireddyforpostgres@gmail.com> wrote:
>
> Hi,
>
>
> Please find the attached v46 patch having changes for the above review
> comments and your test review comments and Shveta's review comments.
>

Thanks for addressing comments.

Is there a reason that we don't support this invalidation on hot
standby for non-synced slots? Shouldn't we support this time-based
invalidation there too just like other invalidations?

thanks
Shveta



On Wed, Sep 18, 2024 at 12:21 PM shveta malik <shveta.malik@gmail.com> wrote:
>
> On Mon, Sep 16, 2024 at 3:31 PM Bharath Rupireddy
> <bharath.rupireddyforpostgres@gmail.com> wrote:
> >
> > Hi,
> >
> >
> > Please find the attached v46 patch having changes for the above review
> > comments and your test review comments and Shveta's review comments.
> >
>
> Thanks for addressing comments.
>
> Is there a reason that we don't support this invalidation on hot
> standby for non-synced slots? Shouldn't we support this time-based
> invalidation there too just like other invalidations?
>

Now since we are not changing inactive_since once it is invalidated,
we are not even initializing it during restart; and thus later when
someone tries to use slot, it leads to assert in
ReplicationSlotAcquire()  ( Assert(s->inactive_since > 0);

Steps:
--Disable logical subscriber and let the slot on publisher gets
invalidated due to inactive_timeout.
--Enable the logical subscriber again.
--Restart publisher.

a) We should initialize inactive_since when
ReplicationSlotSetInactiveSince() is called from RestoreSlotFromDisk()
even though it is invalidated.
b) And shall we mention in the doc of 'active_since', that once the
slot is invalidated, this value will remain unchanged until we
shutdown the server. On server restart, it is initialized to start
time. Thought?


thanks
Shveta



On Wed, Sep 18, 2024 at 2:49 PM shveta malik <shveta.malik@gmail.com> wrote:
>
> > > Please find the attached v46 patch having changes for the above review
> > > comments and your test review comments and Shveta's review comments.
> > >

When the synced slot is marked as 'inactive_timeout' invalidated on
hot standby due to invalidation of publisher 's failover slot, the
former starts showing NULL' inactive_since'. Is this intentional
behaviour? I feel inactive_since should be non-NULL here too?
Thoughts?

physical standby:
postgres=# select slot_name, inactive_since, invalidation_reason,
failover, synced from pg_replication_slots;
slot_name  |          inactive_since                              |
invalidation_reason | failover | synced
-------------+----------------------------------+---------------------+----------+--------
sub2 | 2024-09-18 15:20:04.364998+05:30 |           | t        | t
sub3 | 2024-09-18 15:20:04.364953+05:30 |           | t        | t

After sync of invalidation_reason:

slot_name  |          inactive_since          | invalidation_reason |
failover | synced
-------------+----------------------------------+---------------------+----------+--------
 sub2 |                               | inactive_timeout    | t        | t
 sub3 |                               | inactive_timeout    | t        | t


thanks
shveta



On Mon, Sep 16, 2024 at 10:41 PM Bharath Rupireddy
<bharath.rupireddyforpostgres@gmail.com> wrote:
>
> Thanks for looking into this.
>
> On Mon, Sep 16, 2024 at 4:54 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> >
> > Why raise the ERROR just for timeout invalidation here and why not if
> > the slot is invalidated for other reasons? This raises the question of
> > what happens before this patch if the invalid slot is used from places
> > where we call ReplicationSlotAcquire(). I did a brief code analysis
> > and found that for StartLogicalReplication(), even if the error won't
> > occur in ReplicationSlotAcquire(), it would have been caught in
> > CreateDecodingContext(). I think that is where we should also add this
> > new error. Similarly, pg_logical_slot_get_changes_guts() and other
> > logical replication functions should be calling
> > CreateDecodingContext() which can raise the new ERROR. I am not sure
> > about how the invalid slots are handled during physical replication,
> > please check the behavior of that before this patch.
>
> When physical slots are invalidated due to wal_removed reason, the failure happens at a much later point for the
streamingstandbys while reading the requested WAL files like the following: 
>
> 2024-09-16 16:29:52.416 UTC [876059] FATAL:  could not receive data from WAL stream: ERROR:  requested WAL segment
000000010000000000000005has already been removed 
> 2024-09-16 16:29:52.416 UTC [872418] LOG:  waiting for WAL to become available at 0/5002000
>
> At this point, despite the slot being invalidated, its wal_status can still come back to 'unreserved' even from
'lost',and the standby can catch up if removed WAL files are copied either by manually or by a tool/script to the
primary'spg_wal directory. IOW, the physical slots invalidated due to wal_removed are *somehow* recoverable unlike the
logicalslots. 
>
> IIUC, the invalidation of a slot implies that it is not guaranteed to hold any resources like WAL and XMINs. Does it
alsoimply that the slot must be unusable? 
>

If we can't hold the dead rows against xmin of the invalid slot, then
how can we make it usable even after copying the required WAL?

--
With Regards,
Amit Kapila.



On Wed, Sep 18, 2024 at 3:31 PM shveta malik <shveta.malik@gmail.com> wrote:
>
> On Wed, Sep 18, 2024 at 2:49 PM shveta malik <shveta.malik@gmail.com> wrote:
> >
> > > > Please find the attached v46 patch having changes for the above review
> > > > comments and your test review comments and Shveta's review comments.
> > > >
>

When we promote hot standby with synced logical slots to become new
primary, the logical slots are never invalidated with
'inactive_timeout' on new primary.  It seems the check in
SlotInactiveTimeoutCheckAllowed() is wrong. We should allow
invalidation of slots on primary even if they are marked as 'synced'.
Please see [4].
I have raised 4 issues so far on v46, the first 3 are in [1],[2],[3].
Once all these are addressed, I can continue reviewing further.

[1]: https://www.postgresql.org/message-id/CAJpy0uAwxc49Dz6t%3D-y_-z-MU%2BA4RWX4BR3Zri_jj2qgGMq_8g%40mail.gmail.com
[2]: https://www.postgresql.org/message-id/CAJpy0uC6nN3SLbEuCvz7-CpaPdNdXxH%3DfeW5MhYQch-JWV0tLg%40mail.gmail.com
[3]: https://www.postgresql.org/message-id/CAJpy0uBXXJC6f04%2BFU1axKaU%2Bp78wN0SEhUNE9XoqbjXj%3Dhhgw%40mail.gmail.com

[4]:
--------------------
postgres=#  select pg_is_in_recovery();
--------
 f

postgres=# show replication_slot_inactive_timeout;
 replication_slot_inactive_timeout
-----------------------------------
 10s

postgres=# select slot_name, inactive_since, invalidation_reason,
synced from pg_replication_slots;
  slot_name  |          inactive_since          | invalidation_reason | synced
-------------+----------------------------------+---------------------+----------+--------
 mysubnew1_1 | 2024-09-19 09:04:09.714283+05:30 |                     | t

postgres=# select now();
               now
----------------------------------
 2024-09-19 09:06:28.871354+05:30

postgres=# checkpoint;
CHECKPOINT

postgres=# select slot_name, inactive_since, invalidation_reason,
synced from pg_replication_slots;
  slot_name  |          inactive_since          | invalidation_reason | synced
-------------+----------------------------------+---------------------+----------+--------
 mysubnew1_1 | 2024-09-19 09:04:09.714283+05:30 |                     | t
--------------------

thanks
Shveta



On Wed, Sep 18, 2024 at 3:31 PM shveta malik <shveta.malik@gmail.com> wrote:
>
> On Wed, Sep 18, 2024 at 2:49 PM shveta malik <shveta.malik@gmail.com> wrote:
> >
> > > > Please find the attached v46 patch having changes for the above review
> > > > comments and your test review comments and Shveta's review comments.
> > > >
>
> When the synced slot is marked as 'inactive_timeout' invalidated on
> hot standby due to invalidation of publisher 's failover slot, the
> former starts showing NULL' inactive_since'. Is this intentional
> behaviour? I feel inactive_since should be non-NULL here too?
> Thoughts?
>
> physical standby:
> postgres=# select slot_name, inactive_since, invalidation_reason,
> failover, synced from pg_replication_slots;
> slot_name  |          inactive_since                              |
> invalidation_reason | failover | synced
> -------------+----------------------------------+---------------------+----------+--------
> sub2 | 2024-09-18 15:20:04.364998+05:30 |           | t        | t
> sub3 | 2024-09-18 15:20:04.364953+05:30 |           | t        | t
>
> After sync of invalidation_reason:
>
> slot_name  |          inactive_since          | invalidation_reason |
> failover | synced
> -------------+----------------------------------+---------------------+----------+--------
>  sub2 |                               | inactive_timeout    | t        | t
>  sub3 |                               | inactive_timeout    | t        | t
>
>

For synced slots on the standby, inactive_since indicates the last
synchronization time rather than the time the slot became inactive
(see doc - https://www.postgresql.org/docs/devel/view-pg-replication-slots.html).

In the reported case above, once a synced slot is invalidated we don't
even keep the last synchronization time for it. This is because when a
synced slot on the standby is marked invalid, inactive_since is reset
to NULL each time the slot-sync worker acquires a lock on it. This
lock acquisition before checking invalidation is done to avoid certain
race conditions and will activate the slot temporarily, resetting
inactive_since. Later, the slot-sync worker updates inactive_since for
all synced slots to the current synchronization time. However, for
invalid slots, this update is skipped, as per the patch’s design.

If we want to preserve the inactive_since value for the invalid synced
slots on standby, we need to clarify the time it should display. Here
are three possible approaches:

1) Copy the primary's inactive_since upon invalidation: When a slot
becomes invalid on the primary, the slot-sync worker could copy the
primary slot’s inactive_since to the standby slot and retain it, by
preventing future updates on the standby.

2) Use the current time of standby when the synced slot is marked
invalid for the first time and do not update it in subsequent sync
cycles if the slot is invalid.

Approach (2) seems more reasonable to me, however, Both 1) & 2)
approaches contradicts the purpose of inactive_since, as it no longer
represents either the true "last sync time" or the "time slot became
inactive" because the slot-sync worker acquires locks periodically for
syncing, and keeps activating the slot.

3) Continuously update inactive_since for invalid synced slots as
well: Treat invalid synced slots like valid ones by updating
inactive_since with each sync cycle. This way, we can keep the "last
sync time" in the inactive_since. However, this could confuse users
when "invalidation_reason=inactive_timeout" is set for a synced slot
on standby but inactive_since would reflect sync time rather than the
time slot became inactive. IIUC, on the primary, when
invalidation_reason=inactive_timeout for a slot, the inactive_since
represents the actual time the slot became inactive before getting
invalidated, unless the primary is restarted.

Thoughts?

--
Thanks,
Nisha



On Thu, 7 Nov 2024 at 15:33, Nisha Moond <nisha.moond412@gmail.com> wrote:
>
> On Mon, Sep 16, 2024 at 3:31 PM Bharath Rupireddy
> <bharath.rupireddyforpostgres@gmail.com> wrote:
> >
> > Please find the attached v46 patch having changes for the above review
> > comments and your test review comments and Shveta's review comments.
> >
> Hi,
>
> I’ve reviewed this thread and am interested in working on the
> remaining tasks and comments, as well as the future review comments.
> However, Bharath, please let me know if you'd prefer to continue with
> it.
>
> Attached the rebased v47 patch, which also addresses Peter’s comments
> #2, #3, and #4 at [1]. I will try addressing other comments as well in
> next versions.

The following crash occurs while upgrading:
2024-11-13 14:19:45.955 IST [44539] LOG:  checkpoint starting: time
TRAP: failed Assert("!(*invalidated && SlotIsLogical(s) &&
IsBinaryUpgrade)"), File: "slot.c", Line: 1793, PID: 44539
postgres: checkpointer (ExceptionalCondition+0xbb)[0x555555e305bd]
postgres: checkpointer (+0x63ab04)[0x555555b8eb04]
postgres: checkpointer
(InvalidateObsoleteReplicationSlots+0x149)[0x555555b8ee5f]
postgres: checkpointer (CheckPointReplicationSlots+0x267)[0x555555b8f125]
postgres: checkpointer (+0x1f3ee8)[0x555555747ee8]
postgres: checkpointer (CreateCheckPoint+0x78f)[0x5555557475ee]
postgres: checkpointer (CheckpointerMain+0x632)[0x555555b2f1e7]
postgres: checkpointer (postmaster_child_launch+0x119)[0x555555b30892]
postgres: checkpointer (+0x5e2dc8)[0x555555b36dc8]
postgres: checkpointer (PostmasterMain+0x14bd)[0x555555b33647]
postgres: checkpointer (+0x487f2e)[0x5555559dbf2e]
/lib/x86_64-linux-gnu/libc.so.6(+0x29d90)[0x7ffff6c29d90]
/lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0x80)[0x7ffff6c29e40]
postgres: checkpointer (_start+0x25)[0x555555634c25]
2024-11-13 14:19:45.967 IST [44538] LOG:  checkpointer process (PID
44539) was terminated by signal 6: Aborted

This can happen in the following case:
1) Setup a logical replication cluster with enough data so that it
will take at least few minutes to upgrade
2) Stop the publisher node
3) Configure replication_slot_inactive_timeout and checkpoint_timeout
to 30 seconds
4) Upgrade the publisher node.

This is happening because logical replication slots are getting
invalidated during upgrade and there is an assertion which checks that
the slots are not invalidated.
I feel this can be fixed by having a function similar to
check_max_slot_wal_keep_size which will make sure that
replication_slot_inactive_timeout is 0 during upgrade.

Regards,
Vignesh



On Wed, Sep 18, 2024 at 12:22 PM shveta malik <shveta.malik@gmail.com> wrote:
>
> On Mon, Sep 16, 2024 at 3:31 PM Bharath Rupireddy
> <bharath.rupireddyforpostgres@gmail.com> wrote:
> >
> > Hi,
> >
> >
> > Please find the attached v46 patch having changes for the above review
> > comments and your test review comments and Shveta's review comments.
> >
>
> Thanks for addressing comments.
>
> Is there a reason that we don't support this invalidation on hot
> standby for non-synced slots? Shouldn't we support this time-based
> invalidation there too just like other invalidations?
>

I don’t see any reason to *not* support this invalidation on hot
standby for non-synced slots. Therefore, I’ve added the same in v48.

--
Thanks,
Nisha



Hi Nisha.

Thanks for the recent patch updates. Here are my review comments for
the latest patch v48-0001.

======
Commit message

1.
Till now, postgres has the ability to invalidate inactive
replication slots based on the amount of WAL (set via
max_slot_wal_keep_size GUC) that will be needed for the slots in
case they become active. However, choosing a default value for
this GUC is a bit tricky. Because the amount of WAL a database
generates, and the allocated storage for instance will vary
greatly in production, making it difficult to pin down a
one-size-fits-all value.

~

What do the words "for instance" mean here? Did it mean "per instance"
or "(for example)" or something else?

======
doc/src/sgml/system-views.sgml

2.
       <para>
         The time since the slot has become inactive.
-        <literal>NULL</literal> if the slot is currently being used.
-        Note that for slots on the standby that are being synced from a
+        <literal>NULL</literal> if the slot is currently being used. Once the
+        slot is invalidated, this value will remain unchanged until we shutdown
+        the server. Note that for slots on the standby that are being
synced from a
         primary server (whose <structfield>synced</structfield> field is
         <literal>true</literal>), the

Is this change related to the new inactivity timeout feature or are
you just clarifying the existing behaviour of the 'active_since'
field.

Note there is already another thread [1] created to patch/clarify this
same field. So if you are just clarifying existing behavior then IMO
it would be better if you can to try and get your desired changes
included there quickly before that other patch gets pushed.

~~~

3.
+         <para>
+          <literal>inactive_timeout</literal> means that the slot has been
+          inactive for longer than the amount of time specified by the
+          <xref linkend="guc-replication-slot-inactive-timeout"/> parameter.
+         </para>

Maybe there is a slightly shorter/simpler way to express this. For example,

BEFORE
inactive_timeout means that the slot has been inactive for longer than
the amount of time specified by the replication_slot_inactive_timeout
parameter.

SUGGESTION
inactive_timeout means that the slot has remained inactive beyond the
duration specified by the replication_slot_inactive_timeout parameter.

======
src/backend/replication/slot.c

4.
+int replication_slot_inactive_timeout = 0;

IMO it would be more informative to give the units in the variable
name (but not in the GUC name). e.g.
'replication_slot_inactive_timeout_secs'.

~~~

ReplicationSlotAcquire:

5.
+ *
+ * An error is raised if error_if_invalid is true and the slot has been
+ * invalidated previously.
  */
 void
-ReplicationSlotAcquire(const char *name, bool nowait)
+ReplicationSlotAcquire(const char *name, bool nowait, bool error_if_invalid)

This function comment makes it seem like "invalidated previously"
might mean *any* kind of invalidation, but later in the body of the
function we find the logic is really only used for inactive timeout.

+ /*
+ * An error is raised if error_if_invalid is true and the slot has been
+ * previously invalidated due to inactive timeout.
+ */

So, I think a better name for that parameter might be
'error_if_inactive_timeout'

OTOH, if it really is supposed to erro for *any* kind of invalidation
then there needs to be more ereports.

~~~

6.
+ errdetail("This slot has been invalidated because it was inactive
for longer than the amount of time specified by \"%s\".",

This errdetail message seems quite long. I think it can be shortened
like below and still retain exactly the same meaning:

BEFORE:
This slot has been invalidated because it was inactive for longer than
the amount of time specified by \"%s\".

SUGGESTION:
This slot has been invalidated due to inactivity exceeding the time
limit set by "%s".

~~~

ReportSlotInvalidation:

7.
+ case RS_INVAL_INACTIVE_TIMEOUT:
+ Assert(inactive_since > 0);
+ appendStringInfo(&err_detail,
+ _("The slot has been inactive since %s for longer than the amount of
time specified by \"%s\"."),
+ timestamptz_to_str(inactive_since),
+ "replication_slot_inactive_timeout");
+ break;

Here also as in the above review comment #6 I think the message can be
shorter and still say the same thing

BEFORE:
_("The slot has been inactive since %s for longer than the amount of
time specified by \"%s\"."),

SUGGESTION:
_("The slot has been inactive since %s, exceeding the time limit set
by \"%s\"."),

~~~

SlotInactiveTimeoutCheckAllowed:

8.
+/*
+ * Is this replication slot allowed for inactive timeout invalidation check?
+ *
+ * Inactive timeout invalidation is allowed only when:
+ *
+ * 1. Inactive timeout is set
+ * 2. Slot is inactive
+ * 3. Server is in recovery and slot is not being synced from the primary
+ *
+ * Note that the inactive timeout invalidation mechanism is not
+ * applicable for slots on the standby server that are being synced
+ * from the primary server (i.e., standby slots having 'synced' field 'true').
+ * Synced slots are always considered to be inactive because they don't
+ * perform logical decoding to produce changes.
+ */

8a.
Somehow that first sentence seems strange. Would it be better to write it like:

SUGGESTION
Can this replication slot timeout due to inactivity?

~

8b.
AFAICT that reason 3 ("Server is in recovery and slot is not being
synced from the primary") seems not quite worded right...

Should it say more like:
The slot is not being synced from the primary while the server is in recovery

or maybe like:
The slot is not currently being synced from the primary (e.g. not
'synced' is true when server is in recovery)

~

8c.
Similarly, I think something about that "Note that the inactive
timeout invalidation mechanism is not applicable..." paragraph needs
tweaking because IMO that should also now be saying something about
'RecoveryInProgress'.

~~~

9.
+static inline bool
+SlotInactiveTimeoutCheckAllowed(ReplicationSlot *s)

Maybe the function name should be 'IsSlotInactiveTimeoutPossible' or
something better.

~~~

InvalidatePossiblyObsoleteSlot:

10.
  break;
+ case RS_INVAL_INACTIVE_TIMEOUT:
+
+ /*
+ * Check if the slot needs to be invalidated due to
+ * replication_slot_inactive_timeout GUC.
+ */

Since there are no other blank lines anywhere in this switch, the
introduction of this one in v48 looks out of place to me. IMO it would
be more readable if a blank line followed each/every of the breaks,
but then that is not a necessary change for this patch so...

~~~

11.
+ /*
+ * Invalidation due to inactive timeout implies that
+ * no one is using the slot.
+ */
+ Assert(s->active_pid == 0);

Given this assertion, does it mean that "(s->active_pid == 0)" should
have been another condition done up-front in the function
'SlotInactiveTimeoutCheckAllowed'?

~~~

12.
  /*
- * If the slot can be acquired, do so and mark it invalidated
- * immediately.  Otherwise we'll signal the owning process, below, and
- * retry.
+ * If the slot can be acquired, do so and mark it as invalidated. If
+ * the slot is already ours, mark it as invalidated. Otherwise, we'll
+ * signal the owning process below and retry.
  */
- if (active_pid == 0)
+ if (active_pid == 0 ||
+ (MyReplicationSlot == s &&
+ active_pid == MyProcPid))

I wasn't sure how this change belongs to this patch, because the logic
of the previous review comment said for the case of invalidation due
to inactivity that active_id must be 0. e.g. Assert(s->active_pid ==
0);

~~~

RestoreSlotFromDisk:

13.
- slot->inactive_since = GetCurrentTimestamp();
+ slot->inactive_since = now;

In v47 this assignment used to call the function
'ReplicationSlotSetInactiveSince'. I recognise there is a very subtle
difference between direct assignment and the function, because the
function will skip assignment if the slot is already invalidated.
Anyway, if you are *deliberately* not wanting to call
ReplicationSlotSetInactiveSince here then I think this assignment
should be commented to explain the reason why not, otherwise someone
in the future might be tempted to think it was just an oversight and
add the call back in that you don't want.

======
src/test/recovery/t/050_invalidate_slots.pl

14.
+# Despite inactive timeout being set, the synced slot won't get invalidated on
+# its own on the standby. So, we must not see invalidation message in server
+# log.
+$standby1->safe_psql('postgres', "CHECKPOINT");
+is( $standby1->safe_psql(
+ 'postgres',
+ q{SELECT count(*) = 1 FROM pg_replication_slots
+   WHERE slot_name = 'sync_slot1'
+ AND invalidation_reason IS NULL;}
+ ),
+ "t",
+ 'check that synced slot sync_slot1 has not been invalidated on standby');
+

But, now, we are confirming this by another way -- not checking the
logs here, so the comment "So, we must not see invalidation message in
server log." is no longer appropriate here.

======
[1]
https://www.postgresql.org/message-id/flat/CAA4eK1JQFdssaBBh-oQskpKM-UpG8jPyUdtmGWa_0qCDy%2BK7_A%40mail.gmail.com#ab98379f220288ed40d34f8c2a21cf96

Kind Regards,
Peter Smith.
Fujitsu Australia



On Wed, 13 Nov 2024 at 15:00, Nisha Moond <nisha.moond412@gmail.com> wrote:
>
> Please find the v48 patch attached.
>
> On Thu, Sep 19, 2024 at 9:40 AM shveta malik <shveta.malik@gmail.com> wrote:
> >
> > When we promote hot standby with synced logical slots to become new
> > primary, the logical slots are never invalidated with
> > 'inactive_timeout' on new primary.  It seems the check in
> > SlotInactiveTimeoutCheckAllowed() is wrong. We should allow
> > invalidation of slots on primary even if they are marked as 'synced'.
>
> fixed.
>
> > I have raised 4 issues so far on v46, the first 3 are in [1],[2],[3].
> > Once all these are addressed, I can continue reviewing further.
> >
>
> Fixed issues reported in [1], [2].

Few comments:
1) Since we don't change the value of now in
ReplicationSlotSetInactiveSince, the function parameter can be passed
by value:
+/*
+ * Set slot's inactive_since property unless it was previously invalidated.
+ */
+static inline void
+ReplicationSlotSetInactiveSince(ReplicationSlot *s, TimestampTz *now,
+                                                               bool
acquire_lock)
+{
+       if (s->data.invalidated != RS_INVAL_NONE)
+               return;
+
+       if (acquire_lock)
+               SpinLockAcquire(&s->mutex);
+
+       s->inactive_since = *now;

2) Currently it allows a minimum value of less than 1 second like in
milliseconds, I feel we can have some minimum value at least something
like checkpoint_timeout:
diff --git a/src/backend/utils/misc/guc_tables.c
b/src/backend/utils/misc/guc_tables.c
index 8a67f01200..367f510118 100644
--- a/src/backend/utils/misc/guc_tables.c
+++ b/src/backend/utils/misc/guc_tables.c
@@ -3028,6 +3028,18 @@ struct config_int ConfigureNamesInt[] =
                NULL, NULL, NULL
        },

+       {
+               {"replication_slot_inactive_timeout", PGC_SIGHUP,
REPLICATION_SENDING,
+                       gettext_noop("Sets the amount of time a
replication slot can remain inactive before "
+                                                "it will be invalidated."),
+                       NULL,
+                       GUC_UNIT_S
+               },
+               &replication_slot_inactive_timeout,
+               0, 0, INT_MAX,
+               NULL, NULL, NULL
+       },

3) Since SlotInactiveTimeoutCheckAllowed check is just done above and
the current time has been retrieved can we used "now" variable instead
of SlotInactiveTimeoutCheckAllowed again second time:
@@ -1651,6 +1713,26 @@
InvalidatePossiblyObsoleteSlot(ReplicationSlotInvalidationCause cause,
                                        if (SlotIsLogical(s))
                                                invalidation_cause = cause;
                                        break;
+                               case RS_INVAL_INACTIVE_TIMEOUT:
+
+                                       /*
+                                        * Check if the slot needs to
be invalidated due to
+                                        *
replication_slot_inactive_timeout GUC.
+                                        */
+                                       if
(SlotInactiveTimeoutCheckAllowed(s) &&
+
TimestampDifferenceExceeds(s->inactive_since, now,
+
                            replication_slot_inactive_timeout * 1000))
+                                       {
+                                               invalidation_cause = cause;
+                                               inactive_since =
s->inactive_since;

4) I'm not sure if this change required by this patch or is it a
general optimization, if it is required for this patch we can detail
the comments:
@@ -2208,6 +2328,7 @@ RestoreSlotFromDisk(const char *name)
        bool            restored = false;
        int                     readBytes;
        pg_crc32c       checksum;
+       TimestampTz now;

        /* no need to lock here, no concurrent access allowed yet */

@@ -2368,6 +2489,9 @@ RestoreSlotFromDisk(const char *name)
                                                NameStr(cp.slotdata.name)),
                                 errhint("Change \"wal_level\" to be
\"replica\" or higher.")));

+       /* Use same inactive_since time for all slots */
+       now = GetCurrentTimestamp();
+
        /* nothing can be active yet, don't lock anything */
        for (i = 0; i < max_replication_slots; i++)
        {
@@ -2400,7 +2524,7 @@ RestoreSlotFromDisk(const char *name)
                 * slot from the disk into memory. Whoever acquires
the slot i.e.
                 * makes the slot active will reset it.
                 */
-               slot->inactive_since = GetCurrentTimestamp();
+               slot->inactive_since = now;

5) Why should the slot invalidation be updated during shutdown,
shouldn't the inactive_since value be intact during shutdown?
-        <literal>NULL</literal> if the slot is currently being used.
-        Note that for slots on the standby that are being synced from a
+        <literal>NULL</literal> if the slot is currently being used. Once the
+        slot is invalidated, this value will remain unchanged until we shutdown
+        the server. Note that for slots on the standby that are being
synced from a

6) New Style of ereport does not need braces around errcode, it can be
changed similarly:
+       if (error_if_invalid &&
+               s->data.invalidated == RS_INVAL_INACTIVE_TIMEOUT)
+       {
+               Assert(s->inactive_since > 0);
+               ereport(ERROR,
+
(errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
+                                errmsg("can no longer get changes
from replication slot \"%s\"",
+                                               NameStr(s->data.name)),
+                                errdetail("This slot has been
invalidated because it was inactive for longer than the amount of time
specified by \"%s\".",
+
"replication_slot_inactive_timeout")));

Regards,
Vignesh



On Thu, Nov 14, 2024 at 5:29 AM Peter Smith <smithpb2250@gmail.com> wrote:
>
> Hi Nisha.
>
> Thanks for the recent patch updates. Here are my review comments for
> the latest patch v48-0001.
>

Thank you for the review. Comments are addressed in v49 version.
Below is my response to comments that may require further discussion.

> ======
> doc/src/sgml/system-views.sgml
>
> 2.
>        <para>
>          The time since the slot has become inactive.
> -        <literal>NULL</literal> if the slot is currently being used.
> -        Note that for slots on the standby that are being synced from a
> +        <literal>NULL</literal> if the slot is currently being used. Once the
> +        slot is invalidated, this value will remain unchanged until we shutdown
> +        the server. Note that for slots on the standby that are being
> synced from a
>          primary server (whose <structfield>synced</structfield> field is
>          <literal>true</literal>), the
>
> Is this change related to the new inactivity timeout feature or are
> you just clarifying the existing behaviour of the 'active_since'
> field.
>

Yes, this patch introduces inactive_timeout invalidation and prevents
updates to inactive_since for invalid slots. Only a node restart can
modify it,  so, I believe we should retain these lines in this patch.

> Note there is already another thread [1] created to patch/clarify this
> same field. So if you are just clarifying existing behavior then IMO
> it would be better if you can to try and get your desired changes
> included there quickly before that other patch gets pushed.
>

Thanks for the reference, I have posted my suggestion on the thread.

>
> ReplicationSlotAcquire:
>
> 5.
> + *
> + * An error is raised if error_if_invalid is true and the slot has been
> + * invalidated previously.
>   */
>  void
> -ReplicationSlotAcquire(const char *name, bool nowait)
> +ReplicationSlotAcquire(const char *name, bool nowait, bool error_if_invalid)
>
> This function comment makes it seem like "invalidated previously"
> might mean *any* kind of invalidation, but later in the body of the
> function we find the logic is really only used for inactive timeout.
>
> + /*
> + * An error is raised if error_if_invalid is true and the slot has been
> + * previously invalidated due to inactive timeout.
> + */
>
> So, I think a better name for that parameter might be
> 'error_if_inactive_timeout'
>
> OTOH, if it really is supposed to erro for *any* kind of invalidation
> then there needs to be more ereports.
>

+1 to the idea.
I have created a separate patch v49-0001 adding more ereports for all
kinds of invalidations.

> ~~~
> SlotInactiveTimeoutCheckAllowed:
>
> 8.
> +/*
> + * Is this replication slot allowed for inactive timeout invalidation check?
> + *
> + * Inactive timeout invalidation is allowed only when:
> + *
> + * 1. Inactive timeout is set
> + * 2. Slot is inactive
> + * 3. Server is in recovery and slot is not being synced from the primary
> + *
> + * Note that the inactive timeout invalidation mechanism is not
> + * applicable for slots on the standby server that are being synced
> + * from the primary server (i.e., standby slots having 'synced' field 'true').
> + * Synced slots are always considered to be inactive because they don't
> + * perform logical decoding to produce changes.
> + */
>
> 8a.
> Somehow that first sentence seems strange. Would it be better to write it like:
>
> SUGGESTION
> Can this replication slot timeout due to inactivity?
>

I feel the suggestion is not very clear on the purpose of the
function, This function doesn't check inactivity or decide slot
timeout invalidation. It only pre-checks if the slot qualifies for an
inactivity check, which the caller will perform.
As I have changed function name too as per commnet#9, I used the following  -
"Is inactive timeout invalidation possible for this replication slot?"
Thoughts?

> ~
> 8c.
> Similarly, I think something about that "Note that the inactive
> timeout invalidation mechanism is not applicable..." paragraph needs
> tweaking because IMO that should also now be saying something about
> 'RecoveryInProgress'.
>

'RecoveryInProgress' check indicates that the server is a standby, and
the mentioned paragraph uses the term "standby" to describe the
condition. It seems unnecessary to mention RecoveryInProgress
separately.

> ~~~
>
> InvalidatePossiblyObsoleteSlot:
>
> 10.
>   break;
> + case RS_INVAL_INACTIVE_TIMEOUT:
> +
> + /*
> + * Check if the slot needs to be invalidated due to
> + * replication_slot_inactive_timeout GUC.
> + */
>
> Since there are no other blank lines anywhere in this switch, the
> introduction of this one in v48 looks out of place to me.

pgindent automatically added this blank line after 'case
RS_INVAL_INACTIVE_TIMEOUT'.

> IMO it would
> be more readable if a blank line followed each/every of the breaks,
> but then that is not a necessary change for this patch so...
>

Since it's not directly related to the patch, I feel it might be best
to leave it as is for now.

> ~~~
>
> 11.
> + /*
> + * Invalidation due to inactive timeout implies that
> + * no one is using the slot.
> + */
> + Assert(s->active_pid == 0);
>
> Given this assertion, does it mean that "(s->active_pid == 0)" should
> have been another condition done up-front in the function
> 'SlotInactiveTimeoutCheckAllowed'?
>

I don't think it's a good idea to check (s->active_pid == 0) upfront,
before the timeout-invalidation check. AFAIU, this assertion is meant
to ensure active_pid = 0 only if the slot is going to be invalidated,
i.e., when the following condition is true:

TimestampDifferenceExceeds(s->inactive_since, now,

replication_slot_inactive_timeout_sec * 1000)

Thoughts? Open to others' opinions too.

> ~~~
>
> 12.
>   /*
> - * If the slot can be acquired, do so and mark it invalidated
> - * immediately.  Otherwise we'll signal the owning process, below, and
> - * retry.
> + * If the slot can be acquired, do so and mark it as invalidated. If
> + * the slot is already ours, mark it as invalidated. Otherwise, we'll
> + * signal the owning process below and retry.
>   */
> - if (active_pid == 0)
> + if (active_pid == 0 ||
> + (MyReplicationSlot == s &&
> + active_pid == MyProcPid))
>
> I wasn't sure how this change belongs to this patch, because the logic
> of the previous review comment said for the case of invalidation due
> to inactivity that active_id must be 0. e.g. Assert(s->active_pid ==
> 0);
>

I don't fully understand the purpose of this change yet. I'll look
into it further and get back.

> ~~~
>
> RestoreSlotFromDisk:
>
> 13.
> - slot->inactive_since = GetCurrentTimestamp();
> + slot->inactive_since = now;
>
> In v47 this assignment used to call the function
> 'ReplicationSlotSetInactiveSince'. I recognise there is a very subtle
> difference between direct assignment and the function, because the
> function will skip assignment if the slot is already invalidated.
> Anyway, if you are *deliberately* not wanting to call
> ReplicationSlotSetInactiveSince here then I think this assignment
> should be commented to explain the reason why not, otherwise someone
> in the future might be tempted to think it was just an oversight and
> add the call back in that you don't want.
>

Added comment saying avoid using ReplicationSlotSetInactiveSince()
here as it will skip the invalid slots.

~~~~

--
Thanks,
Nisha



On Thu, Nov 14, 2024 at 9:14 AM vignesh C <vignesh21@gmail.com> wrote:
>
> On Wed, 13 Nov 2024 at 15:00, Nisha Moond <nisha.moond412@gmail.com> wrote:
> >
> > Please find the v48 patch attached.
> >
> > On Thu, Sep 19, 2024 at 9:40 AM shveta malik <shveta.malik@gmail.com> wrote:
> > >
> > > When we promote hot standby with synced logical slots to become new
> > > primary, the logical slots are never invalidated with
> > > 'inactive_timeout' on new primary.  It seems the check in
> > > SlotInactiveTimeoutCheckAllowed() is wrong. We should allow
> > > invalidation of slots on primary even if they are marked as 'synced'.
> >
> > fixed.
> >
> > > I have raised 4 issues so far on v46, the first 3 are in [1],[2],[3].
> > > Once all these are addressed, I can continue reviewing further.
> > >
> >
> > Fixed issues reported in [1], [2].
>
> Few comments:

Thanks for the review.

>
> 2) Currently it allows a minimum value of less than 1 second like in
> milliseconds, I feel we can have some minimum value at least something
> like checkpoint_timeout:
> diff --git a/src/backend/utils/misc/guc_tables.c
> b/src/backend/utils/misc/guc_tables.c
> index 8a67f01200..367f510118 100644
> --- a/src/backend/utils/misc/guc_tables.c
> +++ b/src/backend/utils/misc/guc_tables.c
> @@ -3028,6 +3028,18 @@ struct config_int ConfigureNamesInt[] =
>                 NULL, NULL, NULL
>         },
>
> +       {
> +               {"replication_slot_inactive_timeout", PGC_SIGHUP,
> REPLICATION_SENDING,
> +                       gettext_noop("Sets the amount of time a
> replication slot can remain inactive before "
> +                                                "it will be invalidated."),
> +                       NULL,
> +                       GUC_UNIT_S
> +               },
> +               &replication_slot_inactive_timeout,
> +               0, 0, INT_MAX,
> +               NULL, NULL, NULL
> +       },
>

Currently, the feature is disabled by default when
replication_slot_inactive_timeout = 0. However, if we set a minimum
value, the default_val cannot be less than min_val, making it
impossible to use 0 to disable the feature.
Thoughts or any suggestions?

>
> 4) I'm not sure if this change required by this patch or is it a
> general optimization, if it is required for this patch we can detail
> the comments:
> @@ -2208,6 +2328,7 @@ RestoreSlotFromDisk(const char *name)
>         bool            restored = false;
>         int                     readBytes;
>         pg_crc32c       checksum;
> +       TimestampTz now;
>
>         /* no need to lock here, no concurrent access allowed yet */
>
> @@ -2368,6 +2489,9 @@ RestoreSlotFromDisk(const char *name)
>                                                 NameStr(cp.slotdata.name)),
>                                  errhint("Change \"wal_level\" to be
> \"replica\" or higher.")));
>
> +       /* Use same inactive_since time for all slots */
> +       now = GetCurrentTimestamp();
> +
>         /* nothing can be active yet, don't lock anything */
>         for (i = 0; i < max_replication_slots; i++)
>         {
> @@ -2400,7 +2524,7 @@ RestoreSlotFromDisk(const char *name)
>                  * slot from the disk into memory. Whoever acquires
> the slot i.e.
>                  * makes the slot active will reset it.
>                  */
> -               slot->inactive_since = GetCurrentTimestamp();
> +               slot->inactive_since = now;
>

After removing the "ReplicationSlotSetInactiveSince" from here, it
became irrelevant to this patch. Now, it is a general optimization to
set the same timestamp for all slots while restoring from disk. I have
added a few comments as per Peter's suggestion.

> 5) Why should the slot invalidation be updated during shutdown,
> shouldn't the inactive_since value be intact during shutdown?
> -        <literal>NULL</literal> if the slot is currently being used.
> -        Note that for slots on the standby that are being synced from a
> +        <literal>NULL</literal> if the slot is currently being used. Once the
> +        slot is invalidated, this value will remain unchanged until we shutdown
> +        the server. Note that for slots on the standby that are being
> synced from a
>

The "inactive_since" data of a slot is not stored on disk, so the
older value cannot be restored after a restart.

--
Thanks,
Nisha



On Tue, 19 Nov 2024 at 12:43, Nisha Moond <nisha.moond412@gmail.com> wrote:
>
> Attached is the v49 patch set:
> - Fixed the bug reported in [1].
> - Addressed comments in [2] and [3].
>
> I've split the patch into two, implementing the suggested idea in
> comment #5 of [2] separately in 001:
>
> Patch-001: Adds additional error reports (for all invalidation types)
> in ReplicationSlotAcquire() for invalid slots when error_if_invalid =
> true.
> Patch-002: The original patch with comments addressed.

Few comments:
1) I felt this check in wait_for_slot_invalidation is not required as
there is a call to trigger_slot_invalidation which sleeps for
inactive_timeout seconds and ensures checkpoint is triggered, also the
test passes without this:
+       # Wait for slot to become inactive
+       $node->poll_query_until(
+               'postgres', qq[
+               SELECT COUNT(slot_name) = 1 FROM pg_replication_slots
+                       WHERE slot_name = '$slot' AND active = 'f' AND
+                                 inactive_since IS NOT NULL;
+       ])
+         or die
+         "Timed out while waiting for slot $slot to become inactive
on node $node_name";

2) Instead of calling this in a loop, won't it be enough to call
checkpoint only once explicitly:
+       for (my $i = 0; $i < 10 *
$PostgreSQL::Test::Utils::timeout_default; $i++)
+       {
+               $node->safe_psql('postgres', "CHECKPOINT");
+               if ($node->log_contains(
+                               "invalidating obsolete replication
slot \"$slot\"", $offset))
+               {
+                       $invalidated = 1;
+                       last;
+               }
+               usleep(100_000);
+       }
+       ok($invalidated,
+               "check that slot $slot invalidation has been logged on
node $node_name"
+       );

3) Since pg_sync_replication_slots is a sync call, we can directly use
"is( $standby1->safe_psql('postgres', SELECT COUNT(slot_name) = 1 FROM
pg_replication_slots..." instead of poll_query_until:
+$standby1->safe_psql('postgres', "SELECT pg_sync_replication_slots();");
+$standby1->poll_query_until(
+       'postgres', qq[
+       SELECT COUNT(slot_name) = 1 FROM pg_replication_slots
+               WHERE slot_name = 'sync_slot1' AND
+               invalidation_reason = 'inactive_timeout';
+])
+  or die
+  "Timed out while waiting for sync_slot1 invalidation to be synced
on standby";

4) Since this variable is being referred to at many places, how about
changing it to inactive_timeout_1s so that it is easier while
reviewing across many places:
# Set timeout GUC on the standby to verify that the next checkpoint will not
# invalidate synced slots.
my $inactive_timeout = 1;

5) Since we have already tested invalidation of logical replication
slot 'sync_slot1' above, this test might not be required:
+# =============================================================================
+# Testcase start
+# Invalidate logical subscriber slot due to inactive timeout.
+
+my $publisher = $primary;
+
+# Prepare for test
+$publisher->safe_psql(
+       'postgres', qq[
+    ALTER SYSTEM SET replication_slot_inactive_timeout TO '0';
+]);
+$publisher->reload;

Regards,
Vignesh



On Tue, 19 Nov 2024 at 12:43, Nisha Moond <nisha.moond412@gmail.com> wrote:
>
> Attached is the v49 patch set:
> - Fixed the bug reported in [1].
> - Addressed comments in [2] and [3].
>
> I've split the patch into two, implementing the suggested idea in
> comment #5 of [2] separately in 001:
>
> Patch-001: Adds additional error reports (for all invalidation types)
> in ReplicationSlotAcquire() for invalid slots when error_if_invalid =
> true.
> Patch-002: The original patch with comments addressed.

This Assert can fail:
+                                       /*
+                                        * Check if the slot needs to
be invalidated due to
+                                        *
replication_slot_inactive_timeout GUC.
+                                        */
+                                       if (now &&
+
TimestampDifferenceExceeds(s->inactive_since, now,
+
                            replication_slot_inactive_timeout_sec *
1000))
+                                       {
+                                               invalidation_cause = cause;
+                                               inactive_since =
s->inactive_since;
+
+                                               /*
+                                                * Invalidation due to
inactive timeout implies that
+                                                * no one is using the slot.
+                                                */
+                                               Assert(s->active_pid == 0);

With the following scenario:
Set replication_slot_inactive_timeout to 10 seconds
-- Create a slot
postgres=# select pg_create_logical_replication_slot ('test',
'pgoutput', true, true);
 pg_create_logical_replication_slot
------------------------------------
 (test,0/1748068)
(1 row)

-- Wait for 10 seconds and execute checkpoint
postgres=# checkpoint;
WARNING:  terminating connection because of crash of another server process
DETAIL:  The postmaster has commanded this server process to roll back
the current transaction and exit, because another server process
exited abnormally and possibly corrupted shared memory.
HINT:  In a moment you should be able to reconnect to the database and
repeat your command.
server closed the connection unexpectedly

The assert fails:
#5  0x00005b074f0c922f in ExceptionalCondition
(conditionName=0x5b074f2f0b4c "s->active_pid == 0",
fileName=0x5b074f2f0010 "slot.c", lineNumber=1762) at assert.c:66
#6  0x00005b074ee26ead in InvalidatePossiblyObsoleteSlot
(cause=RS_INVAL_INACTIVE_TIMEOUT, s=0x740925361780, oldestLSN=0,
dboid=0, snapshotConflictHorizon=0, invalidated=0x7fffaee87e63) at
slot.c:1762
#7  0x00005b074ee273b2 in InvalidateObsoleteReplicationSlots
(cause=RS_INVAL_INACTIVE_TIMEOUT, oldestSegno=0, dboid=0,
snapshotConflictHorizon=0) at slot.c:1952
#8  0x00005b074ee27678 in CheckPointReplicationSlots
(is_shutdown=false) at slot.c:2061
#9  0x00005b074e9dfda7 in CheckPointGuts (checkPointRedo=24412528,
flags=108) at xlog.c:7513
#10 0x00005b074e9df4ad in CreateCheckPoint (flags=108) at xlog.c:7179
#11 0x00005b074edc6bfc in CheckpointerMain (startup_data=0x0,
startup_data_len=0) at checkpointer.c:463

Regards,
Vignesh



On Tue, 19 Nov 2024 at 12:51, Nisha Moond <nisha.moond412@gmail.com> wrote:
>
> On Thu, Nov 14, 2024 at 9:14 AM vignesh C <vignesh21@gmail.com> wrote:
> >
> > On Wed, 13 Nov 2024 at 15:00, Nisha Moond <nisha.moond412@gmail.com> wrote:
> > >
> > > Please find the v48 patch attached.
> > >
> > 2) Currently it allows a minimum value of less than 1 second like in
> > milliseconds, I feel we can have some minimum value at least something
> > like checkpoint_timeout:
> > diff --git a/src/backend/utils/misc/guc_tables.c
> > b/src/backend/utils/misc/guc_tables.c
> > index 8a67f01200..367f510118 100644
> > --- a/src/backend/utils/misc/guc_tables.c
> > +++ b/src/backend/utils/misc/guc_tables.c
> > @@ -3028,6 +3028,18 @@ struct config_int ConfigureNamesInt[] =
> >                 NULL, NULL, NULL
> >         },
> >
> > +       {
> > +               {"replication_slot_inactive_timeout", PGC_SIGHUP,
> > REPLICATION_SENDING,
> > +                       gettext_noop("Sets the amount of time a
> > replication slot can remain inactive before "
> > +                                                "it will be invalidated."),
> > +                       NULL,
> > +                       GUC_UNIT_S
> > +               },
> > +               &replication_slot_inactive_timeout,
> > +               0, 0, INT_MAX,
> > +               NULL, NULL, NULL
> > +       },
> >
>
> Currently, the feature is disabled by default when
> replication_slot_inactive_timeout = 0. However, if we set a minimum
> value, the default_val cannot be less than min_val, making it
> impossible to use 0 to disable the feature.
> Thoughts or any suggestions?

We could implement this similarly to how the vacuum_buffer_usage_limit
GUC is handled. Setting the value to 0 would allow the operation to
use any amount of shared_buffers. Otherwise, valid sizes would range
from 128 kB to 16 GB. Similarly, we can modify
check_replication_slot_inactive_timeout to behave in the same way as
check_vacuum_buffer_usage_limit function.

Regards,
Vignesh



On Thu, 21 Nov 2024 at 17:35, Nisha Moond <nisha.moond412@gmail.com> wrote:
>
> On Wed, Nov 20, 2024 at 1:29 PM vignesh C <vignesh21@gmail.com> wrote:
> >
> > On Tue, 19 Nov 2024 at 12:43, Nisha Moond <nisha.moond412@gmail.com> wrote:
> > >
> > > Attached is the v49 patch set:
> > > - Fixed the bug reported in [1].
> > > - Addressed comments in [2] and [3].
> > >
> > > I've split the patch into two, implementing the suggested idea in
> > > comment #5 of [2] separately in 001:
> > >
> > > Patch-001: Adds additional error reports (for all invalidation types)
> > > in ReplicationSlotAcquire() for invalid slots when error_if_invalid =
> > > true.
> > > Patch-002: The original patch with comments addressed.
> >
> > This Assert can fail:
> >
>
> Attached v50 patch-set addressing review comments in [1] and [2].

We are setting inactive_since when the replication slot is released.
We are marking the slot as inactive only if it has been released.
However, there's a scenario where the network connection between the
publisher and subscriber may be lost where the replication slot is not
released, but no changes are replicated due to the network problem. In
this case, no updates would occur in the replication slot for a period
exceeding the replication_slot_inactive_timeout.
Should we invalidate these replication slots as well, or is it
intentionally left out?

Regards,
Vignesh



Hi Nisha,

Here are my review comments for the patch v50-0001.

======
Commit message

1.
In ReplicationSlotAcquire(), raise an error for invalid slots if caller
specify error_if_invalid=true.

/caller/the caller/
/specify/specifies/

======
src/backend/replication/slot.c

ReplicationSlotAcquire:

2.
+ *
+ * An error is raised if error_if_invalid is true and the slot has been
+ * invalidated previously.
  */
 void
-ReplicationSlotAcquire(const char *name, bool nowait)
+ReplicationSlotAcquire(const char *name, bool nowait, bool error_if_invalid)

The "has been invalidated previously." sounds a bit tricky. Do you just mean:

"An error is raised if error_if_invalid is true and the slot is found
to be invalid."

~

3.
+ /*
+ * An error is raised if error_if_invalid is true and the slot has been
+ * previously invalidated.
+ */

(ditto previous comment)

~

4.
+ appendStringInfo(&err_detail, _("This slot has been invalidated because "));
+
+ switch (s->data.invalidated)
+ {
+ case RS_INVAL_WAL_REMOVED:
+ appendStringInfo(&err_detail, _("the required WAL has been removed."));
+ break;
+
+ case RS_INVAL_HORIZON:
+ appendStringInfo(&err_detail, _("the required rows have been removed."));
+ break;
+
+ case RS_INVAL_WAL_LEVEL:
+ appendStringInfo(&err_detail, _("wal_level is insufficient for slot."));
+ break;

4a.
I suspect that building the errdetail in 2 parts like this will be
troublesome for the translators of some languages. Probably it is
safer to have the entire errdetail for each case.

~

4b.
By convention, I think the GUC "wal_level" should be double-quoted in
the message.

======
Kind Regards,
Peter Smith.
Fujitsu Australia



Hi Nisha,

Here are some review comments for the patch v50-0002.

======
src/backend/replication/slot.c

InvalidatePossiblyObsoleteSlot:

1.
+ if (now &&
+ TimestampDifferenceExceeds(s->inactive_since, now,
+    replication_slot_inactive_timeout_sec * 1000))

Previously this was using an additional call to SlotInactiveTimeoutCheckAllowed:

+ if (SlotInactiveTimeoutCheckAllowed(s) &&
+ TimestampDifferenceExceeds(s->inactive_since, now,
+    replication_slot_inactive_timeout * 1000))

Is it OK to skip that call? e.g. can the slot fields possibly change
between assigning the 'now' and acquiring the mutex? If not, then the
current code is fine. The only reason for asking is because it is
slightly suspicious that it was not done this "easy" way in the first
place.

~~~

check_replication_slot_inactive_timeout:

2.
+/*
+ * GUC check_hook for replication_slot_inactive_timeout
+ *
+ * We don't allow the value of replication_slot_inactive_timeout other than 0
+ * during the binary upgrade.
+ */

The "We don't allow..." sentence seems like a backward way of saying:
The value of replication_slot_inactive_timeout must be set to 0 during
the binary upgrade.

======
src/test/recovery/t/050_invalidate_slots.pl

3.
+# Despite inactive timeout being set, the synced slot won't get invalidated on
+# its own on the standby.

What does "on its own" mean here? Do you mean it won't get invalidated
unless the invalidation state is propagated from the primary? Maybe
the comment can be clearer.

~

4.
+# Wait for slot to first become inactive and then get invalidated
+sub wait_for_slot_invalidation
+{
+ my ($node, $slot, $offset, $inactive_timeout_1s) = @_;
+ my $node_name = $node->name;
+

It was OK to change the variable name to 'inactive_timeout_1s' outside
of here, but within the subroutine, I don't think it is appropriate
because this is a parameter that potentially could have any value.

~

5.
+# Trigger slot invalidation and confirm it in the server log
+sub trigger_slot_invalidation
+{
+ my ($node, $slot, $offset, $inactive_timeout_1s) = @_;
+ my $node_name = $node->name;
+ my $invalidated = 0;

It was OK to change the variable name to 'inactive_timeout_1s' outside
of here, but within the subroutine, I don't think it is appropriate
because this is a parameter that potentially could have any value.

~

6.
+ # Give enough time to avoid multiple checkpoints
+ sleep($inactive_timeout_1s + 1);
+
+ # Run a checkpoint
+ $node->safe_psql('postgres', "CHECKPOINT");

Since you are not doing multiple checkpoints anymore, it looks like
that "Give enough time..." comment needs updating.

======
Kind Regards,
Peter Smith.
Fujitsu Australia



Hi Nisha, here are my review comments for the patch v51-0001.

======
src/backend/replication/slot.c

ReplicationSlotAcquire:

1.
+ ereport(ERROR,
+ errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
+ errmsg("can no longer get changes from replication slot \"%s\"",
+    NameStr(s->data.name)),
+ errdetail_internal("%s", err_detail.data));
+
+ pfree(err_detail.data);
+ }
+

Won't the 'pfree' be unreachable due to the prior ereport ERROR?

======
Kind Regards,
Peter Smith.
Fujitsu Australia



Hi Nisha. Here are some review comments for patch v51-0002.

======
doc/src/sgml/system-views.sgml

1.
         The time when the slot became inactive. <literal>NULL</literal> if the
-        slot is currently being streamed.
+        slot is currently being streamed. Once the slot is invalidated, this
+        value will remain unchanged until we shutdown the server.
.

I think "Once the ..." kind of makes it sound like invalidation is
inevitable. Also maybe it's better to remove the "we".

SUGGESTION:
If the slot becomes invalidated, this value will remain unchanged
until server shutdown.

======
src/backend/replication/slot.c

ReplicationSlotAcquire:

2.
GENERAL.

This just is a question/idea. It may not be feasible to change. It
seems like there is a lot of overlap between the error messages in
'ReplicationSlotAcquire' which are saying "This slot has been
invalidated because...", and with the other function
'ReportSlotInvalidation' which is kind of the same but called in
different circumstances and with slightly different message text. I
wondered if there is a way to use common code to unify these messages
instead of having a nearly duplicate set of messages for all the
invalidation causes?

~~~

3.
+ case RS_INVAL_INACTIVE_TIMEOUT:
+ appendStringInfo(&err_detail, _("inactivity exceeded the time limit
set by \"%s\"."),
+ "replication_slot_inactive_timeout");
+ break;

Should this err_detail also say "This slot has been invalidated
because ..." like all the others?

~~~

InvalidatePossiblyObsoleteSlot:

4.
+ case RS_INVAL_INACTIVE_TIMEOUT:
+
+ /*
+ * Check if the slot needs to be invalidated due to
+ * replication_slot_inactive_timeout GUC.
+ */
+ if (IsSlotInactiveTimeoutPossible(s) &&
+ TimestampDifferenceExceeds(s->inactive_since, now,
+    replication_slot_inactive_timeout_sec * 1000))
+ {

Maybe this code should have Assert(now > 0); before the condition just
as a way to 'document' that it is assumed 'now' was already set this
outside the mutex.

======
Kind Regards,
Peter Smith.
Fujitsu Australia



RE: Introduce XID age and inactive timeout based replication slot invalidation

From
"Hayato Kuroda (Fujitsu)"
Date:
Dear Nisha,

> 
> Attached v51 patch-set addressing all comments in [1] and [2].
>

Thanks for working on the feature! I've stated to review the patch.
Here are my comments - sorry if there are something which have already been discussed.
The thread is too long to follow correctly.

Comments for 0001
=============

01. binary_upgrade_logical_slot_has_caught_up

ISTM that error_if_invalid is set to true when the slot can be moved forward, otherwise
it is set to false. Regarding the binary_upgrade_logical_slot_has_caught_up, however,
only valid slots will be passed to the funciton (see pg_upgrade/info.c) so I feel
it is OK to set to true. Thought?

02. ReplicationSlotAcquire

According to other functions, we are adding to a note to the translator when
parameters represent some common nouns, GUC names. I feel we should add a comment
for RS_INVAL_WAL_LEVEL part based on it.


Comments for 0002
=============

03. check_replication_slot_inactive_timeout

Can we overwrite replication_slot_inactive_timeout to zero when pg_uprade (and also
pg_createsubscriber?) starts a server process? Several parameters have already been
specified via -c option at that time. This can avoid an error while the upgrading.
Note that this part is still needed even if you accept the comment. Users can
manually boot with upgrade mode.

04. ReplicationSlotAcquire

Same comment as 02.

05. ReportSlotInvalidation

Same comment as 02.

06. found bug

While testing the patch, I found that slots can be invalidated too early when when
the GUC is quite large. I think because an overflow is caused in InvalidatePossiblyObsoleteSlot().

- Reproducer

I set the replication_slot_inactive_timeout to INT_MAX and executed below commands,
and found that the slot is invalidated.

```
postgres=# SHOW replication_slot_inactive_timeout;
 replication_slot_inactive_timeout 
-----------------------------------
 2147483647s
(1 row)
postgres=# SELECT * FROM pg_create_logical_replication_slot('test', 'test_decoding');
 slot_name |    lsn    
-----------+-----------
 test      | 0/18B7F38
(1 row)
postgres=# CHECKPOINT ;
CHECKPOINT
postgres=# SELECT slot_name, inactive_since, invalidation_reason FROM pg_replication_slots ;
 slot_name |        inactive_since         | invalidation_reason 
-----------+-------------------------------+---------------------
 test      | 2024-11-28 07:50:25.927594+00 | inactive_timeout
(1 row)
```

- analysis

In InvalidatePossiblyObsoleteSlot(), replication_slot_inactive_timeout_sec * 1000
is passed to the third argument of TimestampDifferenceExceeds(), which is also the
integer datatype. This causes an overflow and parameter is handled as the small
value.

- solution

I think there are two possible solutions. You can choose one of them:

a. Make the maximum INT_MAX/1000, or
b. Change the unit to millisecond.

Best regards,
Hayato Kuroda
FUJITSU LIMITED


On Fri, 22 Nov 2024 at 17:43, vignesh C <vignesh21@gmail.com> wrote:
>
> On Thu, 21 Nov 2024 at 17:35, Nisha Moond <nisha.moond412@gmail.com> wrote:
> >
> > On Wed, Nov 20, 2024 at 1:29 PM vignesh C <vignesh21@gmail.com> wrote:
> > >
> > > On Tue, 19 Nov 2024 at 12:43, Nisha Moond <nisha.moond412@gmail.com> wrote:
> > > >
> > > > Attached is the v49 patch set:
> > > > - Fixed the bug reported in [1].
> > > > - Addressed comments in [2] and [3].
> > > >
> > > > I've split the patch into two, implementing the suggested idea in
> > > > comment #5 of [2] separately in 001:
> > > >
> > > > Patch-001: Adds additional error reports (for all invalidation types)
> > > > in ReplicationSlotAcquire() for invalid slots when error_if_invalid =
> > > > true.
> > > > Patch-002: The original patch with comments addressed.
> > >
> > > This Assert can fail:
> > >
> >
> > Attached v50 patch-set addressing review comments in [1] and [2].
>
> We are setting inactive_since when the replication slot is released.
> We are marking the slot as inactive only if it has been released.
> However, there's a scenario where the network connection between the
> publisher and subscriber may be lost where the replication slot is not
> released, but no changes are replicated due to the network problem. In
> this case, no updates would occur in the replication slot for a period
> exceeding the replication_slot_inactive_timeout.
> Should we invalidate these replication slots as well, or is it
> intentionally left out?

On further thinking, I felt we can keep the current implementation as
is and simply add a brief comment in the code to address this.
Additionally, we can mention it in the commit message for clarity.

Regards,
Vignesh



On Wed, 27 Nov 2024 at 16:25, Nisha Moond <nisha.moond412@gmail.com> wrote:
>
> On Wed, Nov 27, 2024 at 8:39 AM Peter Smith <smithpb2250@gmail.com> wrote:
> >
> > Hi Nisha,
> >
> > Here are some review comments for the patch v50-0002.
> >
> > ======
> > src/backend/replication/slot.c
> >
> > InvalidatePossiblyObsoleteSlot:
> >
> > 1.
> > + if (now &&
> > + TimestampDifferenceExceeds(s->inactive_since, now,
> > +    replication_slot_inactive_timeout_sec * 1000))
> >
> > Previously this was using an additional call to SlotInactiveTimeoutCheckAllowed:
> >
> > + if (SlotInactiveTimeoutCheckAllowed(s) &&
> > + TimestampDifferenceExceeds(s->inactive_since, now,
> > +    replication_slot_inactive_timeout * 1000))
> >
> > Is it OK to skip that call? e.g. can the slot fields possibly change
> > between assigning the 'now' and acquiring the mutex? If not, then the
> > current code is fine. The only reason for asking is because it is
> > slightly suspicious that it was not done this "easy" way in the first
> > place.
> >
> Good catch! While the mutex was being acquired right after the now
> assignment, there was a rare chance of another process modifying the
> slot in the meantime. So, I reverted the change in v51. To optimize
> the SlotInactiveTimeoutCheckAllowed() call, it's sufficient to check
> it here instead of during the 'now' assignment.
>
> Attached v51 patch-set addressing all comments in [1] and [2].

Few comments:
1) replication_slot_inactive_timeout can be mentioned in logical
replication config, we could mention something like:
Logical replication slot is also affected by replication_slot_inactive_timeout

2.a) Is this change applicable only for inactive timeout or it is
applicable to others like wal removed, wal level etc also? If it is
applicable to all of them we could move this to the first patch and
update the commit message:
+                * If the slot can be acquired, do so and mark it as
invalidated. If
+                * the slot is already ours, mark it as invalidated.
Otherwise, we'll
+                * signal the owning process below and retry.
                 */
-               if (active_pid == 0)
+               if (active_pid == 0 ||
+                       (MyReplicationSlot == s &&
+                        active_pid == MyProcPid))

2.b) Also this MyReplicationSlot and active_pid check can be in same line:
+                       (MyReplicationSlot == s &&
+                        active_pid == MyProcPid))


3) Error detail should start in upper case here similar to how others are done:
+                       case RS_INVAL_INACTIVE_TIMEOUT:
+                               appendStringInfo(&err_detail,
_("inactivity exceeded the time limit set by \"%s\"."),
+
"replication_slot_inactive_timeout");
+                               break;

4) Since this change is not related to this patch, we can move this to
the first patch and update the commit message:
--- a/src/backend/replication/logical/slotsync.c
+++ b/src/backend/replication/logical/slotsync.c
@@ -1508,7 +1508,7 @@ ReplSlotSyncWorkerMain(char *startup_data,
size_t startup_data_len)
 static void
 update_synced_slots_inactive_since(void)
 {
-       TimestampTz now = 0;
+       TimestampTz now;

        /*
         * We need to update inactive_since only when we are promoting
standby to
@@ -1523,6 +1523,9 @@ update_synced_slots_inactive_since(void)
        /* The slot sync worker or SQL function mustn't be running by now */
        Assert((SlotSyncCtx->pid == InvalidPid) && !SlotSyncCtx->syncing);

+       /* Use same inactive_since time for all slots */
+       now = GetCurrentTimestamp();

5) Since this change is not related to this patch, we can move this to
the first patch.
@@ -2250,6 +2350,7 @@ RestoreSlotFromDisk(const char *name)
        bool            restored = false;
        int                     readBytes;
        pg_crc32c       checksum;
+       TimestampTz now;

        /* no need to lock here, no concurrent access allowed yet */

@@ -2410,6 +2511,9 @@ RestoreSlotFromDisk(const char *name)
                                                NameStr(cp.slotdata.name)),
                                 errhint("Change \"wal_level\" to be
\"replica\" or higher.")));

+       /* Use same inactive_since time for all slots */
+       now = GetCurrentTimestamp();
+
        /* nothing can be active yet, don't lock anything */
        for (i = 0; i < max_replication_slots; i++)
        {
@@ -2440,9 +2544,11 @@ RestoreSlotFromDisk(const char *name)
                /*
                 * Set the time since the slot has become inactive
after loading the
                 * slot from the disk into memory. Whoever acquires
the slot i.e.
-                * makes the slot active will reset it.
+                * makes the slot active will reset it. Avoid calling
+                * ReplicationSlotSetInactiveSince() here, as it will
not set the time
+                * for invalid slots.
                 */
-               slot->inactive_since = GetCurrentTimestamp();
+               slot->inactive_since = now;

[1] - https://www.postgresql.org/docs/current/logical-replication-config.html

Regards,
Vignesh



On Tue, Nov 19, 2024 at 12:47 PM Nisha Moond <nisha.moond412@gmail.com> wrote:
>
> On Thu, Nov 14, 2024 at 5:29 AM Peter Smith <smithpb2250@gmail.com> wrote:
> >
> >
> > 12.
> >   /*
> > - * If the slot can be acquired, do so and mark it invalidated
> > - * immediately.  Otherwise we'll signal the owning process, below, and
> > - * retry.
> > + * If the slot can be acquired, do so and mark it as invalidated. If
> > + * the slot is already ours, mark it as invalidated. Otherwise, we'll
> > + * signal the owning process below and retry.
> >   */
> > - if (active_pid == 0)
> > + if (active_pid == 0 ||
> > + (MyReplicationSlot == s &&
> > + active_pid == MyProcPid))
> >
> > I wasn't sure how this change belongs to this patch, because the logic
> > of the previous review comment said for the case of invalidation due
> > to inactivity that active_id must be 0. e.g. Assert(s->active_pid ==
> > 0);
> >
>
> I don't fully understand the purpose of this change yet. I'll look
> into it further and get back.
>

This change applies to all types of invalidation, not just
inactive_timeout case, so moved the change to patch-001. It’s a
general optimization for the case when the current process is the
active PID for the slot.
Also, the Assert(s->active_pid == 0); has been removed (in v50) as it
was unnecessary.

--
Thanks,
Nisha



On Thu, Nov 28, 2024 at 1:29 PM Hayato Kuroda (Fujitsu)
<kuroda.hayato@fujitsu.com> wrote:
>
> Dear Nisha,
>
> >
> > Attached v51 patch-set addressing all comments in [1] and [2].
> >
>
> Thanks for working on the feature! I've stated to review the patch.
> Here are my comments - sorry if there are something which have already been discussed.
> The thread is too long to follow correctly.
>
> Comments for 0001
> =============
>
> 01. binary_upgrade_logical_slot_has_caught_up
>
> ISTM that error_if_invalid is set to true when the slot can be moved forward, otherwise
> it is set to false. Regarding the binary_upgrade_logical_slot_has_caught_up, however,
> only valid slots will be passed to the funciton (see pg_upgrade/info.c) so I feel
> it is OK to set to true. Thought?
>

Right, corrected the call with error_if_invalid as true.

> Comments for 0002
> =============
>
> 03. check_replication_slot_inactive_timeout
>
> Can we overwrite replication_slot_inactive_timeout to zero when pg_uprade (and also
> pg_createsubscriber?) starts a server process? Several parameters have already been
> specified via -c option at that time. This can avoid an error while the upgrading.
> Note that this part is still needed even if you accept the comment. Users can
> manually boot with upgrade mode.
>

Done.

> 06. found bug
>
> While testing the patch, I found that slots can be invalidated too early when when
> the GUC is quite large. I think because an overflow is caused in InvalidatePossiblyObsoleteSlot().
>
> - Reproducer
>
> I set the replication_slot_inactive_timeout to INT_MAX and executed below commands,
> and found that the slot is invalidated.
>
> ```
> postgres=# SHOW replication_slot_inactive_timeout;
>  replication_slot_inactive_timeout
> -----------------------------------
>  2147483647s
> (1 row)
> postgres=# SELECT * FROM pg_create_logical_replication_slot('test', 'test_decoding');
>  slot_name |    lsn
> -----------+-----------
>  test      | 0/18B7F38
> (1 row)
> postgres=# CHECKPOINT ;
> CHECKPOINT
> postgres=# SELECT slot_name, inactive_since, invalidation_reason FROM pg_replication_slots ;
>  slot_name |        inactive_since         | invalidation_reason
> -----------+-------------------------------+---------------------
>  test      | 2024-11-28 07:50:25.927594+00 | inactive_timeout
> (1 row)
> ```
>
> - analysis
>
> In InvalidatePossiblyObsoleteSlot(), replication_slot_inactive_timeout_sec * 1000
> is passed to the third argument of TimestampDifferenceExceeds(), which is also the
> integer datatype. This causes an overflow and parameter is handled as the small
> value.
>
> - solution
>
> I think there are two possible solutions. You can choose one of them:
>
> a. Make the maximum INT_MAX/1000, or
> b. Change the unit to millisecond.
>

Fixed. It is reasonable to align with other timeout parameters by
using milliseconds as the unit.

--
Thanks,
Nisha



On Thu, Nov 28, 2024 at 5:20 AM Peter Smith <smithpb2250@gmail.com> wrote:
>
> Hi Nisha. Here are some review comments for patch v51-0002.
>
> ======
> src/backend/replication/slot.c
>
> ReplicationSlotAcquire:
>
> 2.
> GENERAL.
>
> This just is a question/idea. It may not be feasible to change. It
> seems like there is a lot of overlap between the error messages in
> 'ReplicationSlotAcquire' which are saying "This slot has been
> invalidated because...", and with the other function
> 'ReportSlotInvalidation' which is kind of the same but called in
> different circumstances and with slightly different message text. I
> wondered if there is a way to use common code to unify these messages
> instead of having a nearly duplicate set of messages for all the
> invalidation causes?
>

The error handling could be moved to a new function; however, as you
pointed out, the contexts in which these functions are called differ.
IMO, a single error message may not suit both cases. For example,
ReportSlotInvalidation provides additional details and a hint in its
message, which isn’t necessary for ReplicationSlotAcquire.
Thoughts?

--
Thanks,
Nisha



Hi Nisha, here are a couple of review comments for patch v52-0001.

======
Commit Message

Add check if slot is already acquired, then mark it invalidate directly.

~

/slot/the slot/

"mark it invalidate" ?

Maybe you meant:
"then invalidate it directly", or
"then mark it 'invalidated' directly", or
etc.

======
src/backend/replication/logical/slotsync.c

1.
@@ -1508,7 +1508,7 @@ ReplSlotSyncWorkerMain(char *startup_data,
size_t startup_data_len)
 static void
 update_synced_slots_inactive_since(void)
 {
- TimestampTz now = 0;
+ TimestampTz now;

  /*
  * We need to update inactive_since only when we are promoting standby to
@@ -1523,6 +1523,9 @@ update_synced_slots_inactive_since(void)
  /* The slot sync worker or SQL function mustn't be running by now */
  Assert((SlotSyncCtx->pid == InvalidPid) && !SlotSyncCtx->syncing);

+ /* Use same inactive_since time for all slots */
+ now = GetCurrentTimestamp();
+

Something is broken with these changes.

AFAICT, the result after applying patch 0001 still has code:
/* Use the same inactive_since time for all the slots. */
if (now == 0)
  now = GetCurrentTimestamp();

So the end result has multiple/competing assignments to variable 'now'.

======
Kind Regards,
Peter Smith.
Fujitsu Australia



RE: Introduce XID age and inactive timeout based replication slot invalidation

From
"Hayato Kuroda (Fujitsu)"
Date:
Dear Nisha,

Thanks for updating the patch!

> Fixed. It is reasonable to align with other timeout parameters by
> using milliseconds as the unit.

It looks you just replaced to GUC_UNIT_MS, but the documentation and
postgresql.conf.sample has not been changed yet. They should follow codes.
Anyway, here are other comments, mostly cosmetic.

01. slot.c

```
+int         replication_slot_inactive_timeout_ms = 0;
```

According to other lines, we should add a short comment for the GUC.

02. 050_invalidate_slots.pl

Do you have a reason why you use the number 050? I feel it can be 043.

03. 050_invalidate_slots.pl

Also, not sure the file name is correct. This file contains only a slot invalidation due to the
replication_slot_inactive_timeout. But I feel current name is too general.

04. 050_invalidate_slots.pl

```
+use Time::HiRes qw(usleep);
```

This line is not needed because usleep() is not used in this file.

Best regards,
Hayato Kuroda
FUJITSU LIMITED


On Wed, 4 Dec 2024 at 15:01, Nisha Moond <nisha.moond412@gmail.com> wrote:
>
> On Tue, Dec 3, 2024 at 1:09 PM Hayato Kuroda (Fujitsu)
> <kuroda.hayato@fujitsu.com> wrote:
> >
> > Dear Nisha,
> >
> > Thanks for updating the patch!
> >
> > > Fixed. It is reasonable to align with other timeout parameters by
> > > using milliseconds as the unit.
> >
> > It looks you just replaced to GUC_UNIT_MS, but the documentation and
> > postgresql.conf.sample has not been changed yet. They should follow codes.
> > Anyway, here are other comments, mostly cosmetic.
> >
>
> Here is v53 patch-set addressing all the comments in [1] and [2].

Currently, replication slots are invalidated based on the
replication_slot_inactive_timeout only during a checkpoint. This means
that if the checkpoint_timeout is set to a higher value than the
replication_slot_inactive_timeout, slot invalidation will occur only
when the checkpoint is triggered. Identifying the invalidation slots
might be slightly delayed in this case. As an alternative, users can
forcefully invalidate inactive slots that have exceeded the
replication_slot_inactive_timeout by forcing a checkpoint. I was
thinking we could suggest this in the documentation.

+       <para>
+        Slot invalidation due to inactive timeout occurs during checkpoint.
+        The duration of slot inactivity is calculated using the slot's
+        <link
linkend="view-pg-replication-slots">pg_replication_slots</link>.<structfield>inactive_since</structfield>
+        value.
+       </para>
+

We could accurately invalidate the slots using the checkpointer
process by calculating the invalidation time based on the active_since
timestamp and the replication_slot_inactive_timeout, and then set the
checkpointer's main wait-latch accordingly for triggering the next
checkpoint. Ideally, a different process handling this task would be
better, but there is currently no dedicated daemon capable of
identifying and managing slots across streaming replication, logical
replication, and other slots used by plugins. Additionally,
overloading the checkpointer with this responsibility may not be
ideal. As an alternative, we could document about this delay in
identifying and mention that it could be triggered by forceful manual
checkpoint.

Regards,
Vignesh



On Wed, 4 Dec 2024 at 15:01, Nisha Moond <nisha.moond412@gmail.com> wrote:
>
> On Tue, Dec 3, 2024 at 1:09 PM Hayato Kuroda (Fujitsu)
> <kuroda.hayato@fujitsu.com> wrote:
> >
> > Dear Nisha,
> >
> > Thanks for updating the patch!
> >
> > > Fixed. It is reasonable to align with other timeout parameters by
> > > using milliseconds as the unit.
> >
> > It looks you just replaced to GUC_UNIT_MS, but the documentation and
> > postgresql.conf.sample has not been changed yet. They should follow codes.
> > Anyway, here are other comments, mostly cosmetic.
> >
>
> Here is v53 patch-set addressing all the comments in [1] and [2].

CFBot is failing at [1] because the file name is changed to
043_invalidate_inactive_slots, the meson.build file should be updated
accordingly:
diff --git a/src/test/recovery/meson.build b/src/test/recovery/meson.build
index b1eb77b1ec..708a2a3798 100644
--- a/src/test/recovery/meson.build
+++ b/src/test/recovery/meson.build
@@ -51,6 +51,7 @@ tests += {
       't/040_standby_failover_slots_sync.pl',
       't/041_checkpoint_at_promote.pl',
       't/042_low_level_backup.pl',
+      't/050_invalidate_slots.pl',
     ],
   },
 }

[1] - https://cirrus-ci.com/task/6266479424831488

Regards,
Vignesh



Hi Nisha,

Here are my review comments for the v53* patch set

//////////

Patch v53-0001.

======
src/backend/replication/slot.c

1.
+ if (error_if_invalid &&
+ s->data.invalidated != RS_INVAL_NONE)

Looks like some unnecessary wrapping here. I think this condition can
be on one line.

//////////

Patch v53-0002.

======
GENERAL - How about using the term "idle"?

1.
I got to wondering why this new GUC was called
"replication_slot_inactive_timeout", with invalidation_reason =
"inactive_timeout". When I look at similar GUCs I don't see words like
"inactivity" or "inactive" anywhere; Instead, they are using the term
"idle" to refer to when something is inactive:
e.g.
#idle_in_transaction_session_timeout = 0 # in milliseconds, 0 is disabled
#idle_session_timeout = 0 # in milliseconds, 0 is disabled

I know the "inactive" term is used a bit in the slot code but that is
(mostly) not exposed to the user. Therefore, I am beginning to feel it
would be better (e.g. more consistent) to use "idle" for the
user-facing stuff. e.g.
New Slot GUC = "idle_replication_slot_timeout"
Slot invalidation_reason = "idle_timeout"

Of course, changing this will cascade to impact quite a lot of other
things in the patch -- comments, error messages, some function names
etc.

======
doc/src/sgml/logical-replication.sgml

2.
+   <para>
+    Logical replication slot is also affected by
+    <link
linkend="guc-replication-slot-inactive-timeout"><varname>replication_slot_inactive_timeout</varname></link>.
+   </para>
+

/Logical replication slot is also affected by/Logical replication
slots are also affected by/

======
Kind Regards,
Peter Smith.
Fujitsu Australia



On Wed, Dec 4, 2024 at 9:27 PM vignesh C <vignesh21@gmail.com> wrote:
>
...
>
> Currently, replication slots are invalidated based on the
> replication_slot_inactive_timeout only during a checkpoint. This means
> that if the checkpoint_timeout is set to a higher value than the
> replication_slot_inactive_timeout, slot invalidation will occur only
> when the checkpoint is triggered. Identifying the invalidation slots
> might be slightly delayed in this case. As an alternative, users can
> forcefully invalidate inactive slots that have exceeded the
> replication_slot_inactive_timeout by forcing a checkpoint. I was
> thinking we could suggest this in the documentation.
>
> +       <para>
> +        Slot invalidation due to inactive timeout occurs during checkpoint.
> +        The duration of slot inactivity is calculated using the slot's
> +        <link
linkend="view-pg-replication-slots">pg_replication_slots</link>.<structfield>inactive_since</structfield>
> +        value.
> +       </para>
> +
>
> We could accurately invalidate the slots using the checkpointer
> process by calculating the invalidation time based on the active_since
> timestamp and the replication_slot_inactive_timeout, and then set the
> checkpointer's main wait-latch accordingly for triggering the next
> checkpoint. Ideally, a different process handling this task would be
> better, but there is currently no dedicated daemon capable of
> identifying and managing slots across streaming replication, logical
> replication, and other slots used by plugins. Additionally,
> overloading the checkpointer with this responsibility may not be
> ideal. As an alternative, we could document about this delay in
> identifying and mention that it could be triggered by forceful manual
> checkpoint.
>

Hi Vignesh.

I felt that manipulating the checkpoint timing behind the scenes
without the user's consent might be a bit of an overreach.

But there might still be something else we could do:

1. We can add the documentation note like you suggested ("we could
document about this delay in identifying and mention that it could be
triggered by forceful manual checkpoint").

2. We can also detect such delays in the code. When the invalidation
occurs (e.g. code fragment below) we could check if there was some
excessive lag between the slot becoming idle and it being invalidated.
If the lag is too much (whatever "too much" means) we can log a hint
for the user to increase the checkpoint frequency (or whatever else we
might advise them to do).

+ /*
+ * Check if the slot needs to be invalidated due to
+ * replication_slot_inactive_timeout GUC.
+ */
+ if (IsSlotInactiveTimeoutPossible(s) &&
+ TimestampDifferenceExceeds(s->inactive_since, now,
+    replication_slot_inactive_timeout_ms))
+ {
+ invalidation_cause = cause;
+ inactive_since = s->inactive_since;

pseudo-code:
if (slot invalidation occurred much later after the
replication_slot_inactive_timeout GUC elapsed)
{
  elog(LOG, "This slot was inactive for a period of %s. Slot timeout
invalidation only occurs at a checkpoint so if you want inactive slots
to be invalidated in a more timely manner consider reducing the time
between checkpoints or executing a manual checkpoint.
(replication_slot_inactive_timeout = %s; checkpoint_timeout = %s,
....)"
}

+ }

======
Kind Regards,
Peter Smith.
Fujitsu Australia



On Thu, 5 Dec 2024 at 06:44, Peter Smith <smithpb2250@gmail.com> wrote:
>
> On Wed, Dec 4, 2024 at 9:27 PM vignesh C <vignesh21@gmail.com> wrote:
> >
> ...
> >
> > Currently, replication slots are invalidated based on the
> > replication_slot_inactive_timeout only during a checkpoint. This means
> > that if the checkpoint_timeout is set to a higher value than the
> > replication_slot_inactive_timeout, slot invalidation will occur only
> > when the checkpoint is triggered. Identifying the invalidation slots
> > might be slightly delayed in this case. As an alternative, users can
> > forcefully invalidate inactive slots that have exceeded the
> > replication_slot_inactive_timeout by forcing a checkpoint. I was
> > thinking we could suggest this in the documentation.
> >
> > +       <para>
> > +        Slot invalidation due to inactive timeout occurs during checkpoint.
> > +        The duration of slot inactivity is calculated using the slot's
> > +        <link
linkend="view-pg-replication-slots">pg_replication_slots</link>.<structfield>inactive_since</structfield>
> > +        value.
> > +       </para>
> > +
> >
> > We could accurately invalidate the slots using the checkpointer
> > process by calculating the invalidation time based on the active_since
> > timestamp and the replication_slot_inactive_timeout, and then set the
> > checkpointer's main wait-latch accordingly for triggering the next
> > checkpoint. Ideally, a different process handling this task would be
> > better, but there is currently no dedicated daemon capable of
> > identifying and managing slots across streaming replication, logical
> > replication, and other slots used by plugins. Additionally,
> > overloading the checkpointer with this responsibility may not be
> > ideal. As an alternative, we could document about this delay in
> > identifying and mention that it could be triggered by forceful manual
> > checkpoint.
> >
>
> Hi Vignesh.
>
> I felt that manipulating the checkpoint timing behind the scenes
> without the user's consent might be a bit of an overreach.

Agree

> But there might still be something else we could do:
>
> 1. We can add the documentation note like you suggested ("we could
> document about this delay in identifying and mention that it could be
> triggered by forceful manual checkpoint").

Yes, that makes sense

> 2. We can also detect such delays in the code. When the invalidation
> occurs (e.g. code fragment below) we could check if there was some
> excessive lag between the slot becoming idle and it being invalidated.
> If the lag is too much (whatever "too much" means) we can log a hint
> for the user to increase the checkpoint frequency (or whatever else we
> might advise them to do).
>
> + /*
> + * Check if the slot needs to be invalidated due to
> + * replication_slot_inactive_timeout GUC.
> + */
> + if (IsSlotInactiveTimeoutPossible(s) &&
> + TimestampDifferenceExceeds(s->inactive_since, now,
> +    replication_slot_inactive_timeout_ms))
> + {
> + invalidation_cause = cause;
> + inactive_since = s->inactive_since;
>
> pseudo-code:
> if (slot invalidation occurred much later after the
> replication_slot_inactive_timeout GUC elapsed)
> {
>   elog(LOG, "This slot was inactive for a period of %s. Slot timeout
> invalidation only occurs at a checkpoint so if you want inactive slots
> to be invalidated in a more timely manner consider reducing the time
> between checkpoints or executing a manual checkpoint.
> (replication_slot_inactive_timeout = %s; checkpoint_timeout = %s,
> ....)"
> }
>
> + }

Determining the correct time may be challenging for users, as it
depends on when the active_since value is set, as well as when the
checkpoint_timeout occurs and the subsequent checkpoint is triggered.
Even if the user sets it to an appropriate value, there is still a
possibility of delayed identification due to the timing of when the
slot's active_timeout is being set. Including this information in the
documentation should be sufficient.

Regards,
Vignesh



Hi Nisha.

Here are some review comments for patch v54-0002.

(I had also checked patch v54-0001, but have no further review
comments for that one).

======
doc/src/sgml/config.sgml

1.
+       <para>
+        Slot invalidation due to idle timeout occurs during checkpoint.
+        If the <varname>checkpoint_timeout</varname> exceeds
+        <varname>idle_replication_slot_timeout</varname>, the slot
+        invalidation will be delayed until the next checkpoint is triggered.
+        To avoid delays, users can force a checkpoint to promptly invalidate
+        inactive slots. The duration of slot inactivity is calculated
using the slot's
+        <link
linkend="view-pg-replication-slots">pg_replication_slots</link>.<structfield>inactive_since</structfield>
+        value.
+       </para>
+

The wording of "If the checkpoint_timeout exceeds
idle_replication_slot_timeout, the slot invalidation will be delayed
until the next checkpoint is triggered." seems slightly misleading,
because AFAIK it is not conditional on the GUC value differences like
that -- i.e. slot invalidation is *always* delayed until the next
checkpoint occurs.

SUGGESTION:
Slot invalidation due to idle timeout occurs during checkpoint.
Because checkpoints happen at checkpoint_timeout intervals, there can
be some lag between when the idle_replication_slot_timeout was
exceeded and when the slot invalidation is triggered at the next
checkpoint. To avoid such lags, users can force...

=======
src/backend/replication/slot.c

2. GENERAL

+/* Invalidate replication slots idle beyond this time; '0' disables it */
+int idle_replication_slot_timeout_ms = 0;

I noticed this patch is using a variety of ways of describing the same thing:
* guc var: Invalidate replication slots idle beyond this time...
* guc_tables: ... the amount of time a replication slot can remain
idle before it will be invalidated.
* docs: means that the slot has remained idle beyond the duration
specified by the idle_replication_slot_timeout parameter
* errmsg: ... slot has been invalidated because inactivity exceeded
the time limit set by ...
* etc..

They are all the same, but they are all worded slightly differently:
* "idle" vs "inactivity" vs ...
* "time" vs "amount of time" vs "duration" vs "time limit" vs ...

There may not be a one-size-fits-all, but still, it might be better to
try to search for all different phrasing and use common wording as
much as possible.

~~~

CheckPointReplicationSlots:

3.
+ * XXX: Slot invalidation due to 'idle_timeout' occurs only for
+ * released slots, based on 'idle_replication_slot_timeout'. Active
+ * slots in use for replication are excluded, preventing accidental
+ * invalidation. Slots where communication between the publisher and
+ * subscriber is down are also excluded, as they are managed by the
+ * 'wal_sender_timeout'.

Maybe a slight rewording like below is better. Maybe not. YMMV.

SUGGESTION:
XXX: Slot invalidation due to 'idle_timeout' applies only to released
slots, and is based on the 'idle_replication_slot_timeout' GUC. Active
slots
currently in use for replication are excluded to prevent accidental
invalidation.  Slots...

======
src/bin/pg_upgrade/server.c

4.
+ /*
+ * Use idle_replication_slot_timeout=0 to prevent slot invalidation due to
+ * inactive_timeout by checkpointer process during upgrade.
+ */
+ if (GET_MAJOR_VERSION(cluster->major_version) >= 1800)
+ appendPQExpBufferStr(&pgoptions, " -c idle_replication_slot_timeout=0");
+

/inactive_timeout/idle_timeout/

======
src/test/recovery/t/043_invalidate_inactive_slots.pl

5.
+# Wait for slot to first become idle and then get invalidated
+sub wait_for_slot_invalidation
+{
+ my ($node, $slot, $offset, $idle_timeout) = @_;
+ my $node_name = $node->name;

AFAICT this 'idle_timeout' parameter is passed units of "seconds", so
it would be better to call it something like 'idle_timeout_s' to make
the units clear.

~~~

6.
+# Trigger slot invalidation and confirm it in the server log
+sub trigger_slot_invalidation
+{
+ my ($node, $slot, $offset, $idle_timeout) = @_;
+ my $node_name = $node->name;
+ my $invalidated = 0;

Ditto above review comment #5 -- better to call it something like
'idle_timeout_s' to make the units clear.

======
Kind Regards,
Peter Smith.
Fujitsu Australia



On Tue, 10 Dec 2024 at 17:21, Nisha Moond <nisha.moond412@gmail.com> wrote:
>
> On Fri, Dec 6, 2024 at 11:04 AM vignesh C <vignesh21@gmail.com> wrote:
> >
> >
> > Determining the correct time may be challenging for users, as it
> > depends on when the active_since value is set, as well as when the
> > checkpoint_timeout occurs and the subsequent checkpoint is triggered.
> > Even if the user sets it to an appropriate value, there is still a
> > possibility of delayed identification due to the timing of when the
> > slot's active_timeout is being set. Including this information in the
> > documentation should be sufficient.
> >
>
> +1
> v54 documents this information as suggested.
>
> Attached the v54 patch-set addressing all the comments till now in

Few comments on the test added:
1) Can we remove this and set idle_replication_slot_timeout while the
standby node is created itself during append_conf:
+# Set timeout GUC on the standby to verify that the next checkpoint will not
+# invalidate synced slots.
+my $idle_timeout_1s = 1;
+$standby1->safe_psql(
+       'postgres', qq[
+    ALTER SYSTEM SET idle_replication_slot_timeout TO '${idle_timeout_1s}s';
+]);
+$standby1->reload;

2) You can move these statements before the standby node is created:
+# Create sync slot on the primary
+$primary->psql('postgres',
+       q{SELECT pg_create_logical_replication_slot('sync_slot1',
'test_decoding', false, false, true);}
+);
+
+# Create standby slot on the primary
+$primary->safe_psql(
+       'postgres', qq[
+    SELECT pg_create_physical_replication_slot(slot_name :=
'sb_slot1', immediately_reserve := true);
+]);

3) Do we need autovacuum as off for these tests, is there any
probability of a test failure without this. I felt it should not
impact these tests, if not we can remove this:
+# Avoid unpredictability
+$primary->append_conf(
+       'postgresql.conf', qq{
+checkpoint_timeout = 1h
+autovacuum = off
+});

4) Generally we mention single char in single quotes, we can update "t" to 't':
+       ),
+       "t",
+       'logical slot sync_slot1 is synced to standby');
+

5) Similarly here too:
+                 WHERE slot_name = 'sync_slot1'
+                       AND invalidation_reason IS NULL;}
+       ),
+       "t",
+       'check that synced slot sync_slot1 has not been invalidated on
standby');

6) This standby offset is not used anywhere, it can be removed:
+my $logstart = -s $standby1->logfile;
+
+# Set timeout GUC on the standby to verify that the next checkpoint will not
+# invalidate synced slots.

Regards,
Vignesh



On Tue, 10 Dec 2024 at 17:21, Nisha Moond <nisha.moond412@gmail.com> wrote:
>
> On Fri, Dec 6, 2024 at 11:04 AM vignesh C <vignesh21@gmail.com> wrote:
> >
> >
> > Determining the correct time may be challenging for users, as it
> > depends on when the active_since value is set, as well as when the
> > checkpoint_timeout occurs and the subsequent checkpoint is triggered.
> > Even if the user sets it to an appropriate value, there is still a
> > possibility of delayed identification due to the timing of when the
> > slot's active_timeout is being set. Including this information in the
> > documentation should be sufficient.
> >
>
> +1
> v54 documents this information as suggested.
>
> Attached the v54 patch-set addressing all the comments till now in
> [1], [2] and [3].

Now that we support idle_replication_slot_timeout in milliseconds, we
can set this value from 1s to 1ms or 10millseconds and change sleep to
usleep, this will bring down the test execution time significantly:
+# Set timeout GUC on the standby to verify that the next checkpoint will not
+# invalidate synced slots.
+my $idle_timeout_1s = 1;
+$standby1->safe_psql(
+       'postgres', qq[
+    ALTER SYSTEM SET idle_replication_slot_timeout TO '${idle_timeout_1s}s';
+]);
+$standby1->reload;
+
+# Sync the primary slots to the standby
+$standby1->safe_psql('postgres', "SELECT pg_sync_replication_slots();");
+
+# Confirm that the logical failover slot is created on the standby
+is( $standby1->safe_psql(
+               'postgres',
+               q{SELECT count(*) = 1 FROM pg_replication_slots
+                 WHERE slot_name = 'sync_slot1' AND synced
+                       AND NOT temporary
+                       AND invalidation_reason IS NULL;}
+       ),
+       "t",
+       'logical slot sync_slot1 is synced to standby');
+
+# Give enough time for inactive_since to exceed the timeout
+sleep($idle_timeout_1s + 1);

Regards,
Vignesh



On Wed, Dec 11, 2024 at 8:14 AM Peter Smith <smithpb2250@gmail.com> wrote:
>
> Hi Nisha.
>
> Here are some review comments for patch v54-0002.
> ======
> src/test/recovery/t/043_invalidate_inactive_slots.pl
>
> 5.
> +# Wait for slot to first become idle and then get invalidated
> +sub wait_for_slot_invalidation
> +{
> + my ($node, $slot, $offset, $idle_timeout) = @_;
> + my $node_name = $node->name;
>
> AFAICT this 'idle_timeout' parameter is passed units of "seconds", so
> it would be better to call it something like 'idle_timeout_s' to make
> the units clear.
>

As per the suggestion in [1], the test has been updated to use
idle_timeout=1ms. Since the parameter uses the default unit of
"milliseconds," keeping it as 'idle_timeout' seems reasonable to me.

> ~~~
>
> 6.
> +# Trigger slot invalidation and confirm it in the server log
> +sub trigger_slot_invalidation
> +{
> + my ($node, $slot, $offset, $idle_timeout) = @_;
> + my $node_name = $node->name;
> + my $invalidated = 0;
>
> Ditto above review comment #5 -- better to call it something like
> 'idle_timeout_s' to make the units clear.
>

The 'idle_timeout' parameter name remains unchanged as explained above.

[1] https://www.postgresql.org/message-id/CALDaNm1FQS04aG0C0gCRpvi-o-OTdq91y6Az34YKN-dVc9r5Ng%40mail.gmail.com

--
Thanks,
Nisha



Hi Nisha.

Thanks for the v55* patches.

I have no comments for patch v55-0001.

I have only 1 comment for patch v55-0002 regarding some remaining
nitpicks (below) about the consistency of phrases.

======

I scanned again over all the phrases for consistency:

CURRENT PATCH:

Docs (idle_replication_slot_timeout): Invalidate replication slots
that are idle for longer than this amount of time
Docs (idle_timeout): means that the slot has remained idle longer than
the duration specified by the idle_replication_slot_timeout parameter.

Code (guc var comment):  Invalidate replication slots idle longer than this time
Code (guc_tables): Sets the time limit for how long a replication slot
can remain idle before it is invalidated.

Msg (errdetail): This slot has been invalidated because it has
remained idle longer than the configured \"%s\" time.
Msg (errdetail): The slot has been inactive since %s and has remained
idle longer than the configured \"%s\" time.

~

NITPICKS:

nit -- There are still some variations "amount of time" versus "time"
versus "duration".  I think the term "duration" best describe the
maing so we can use that everywhere.

nit - Should consistently say "remained idle" instead of just "idle"
or "are idle",

nit - The last errdetail is also rearranged a bit because IMO we don't
need to say inactive and idle in the same sentence.

nit - Just say "longer than" instead of sometimes saying "for longer than"

~

SUGGESTIONS:

Docs (idle_replication_slot_timeout): Invalidate replication slots
that have remained idle longer than this duration.
Docs (idle_timeout): means that the slot has remained idle longer than
the configured idle_replication_slot_timeout duration.

Code (guc var comment):  Invalidate replication slots that have
remained idle longer than this duration.
Code (guc_tables): Sets the duration a replication slot can remain
idle before it is invalidated.

Msg (errdetail): This slot has been invalidated because it has
remained idle longer than the configured \"%s\" duration.
Msg (errdetail): The slot has remained idle since %s, which is longer
than the configured \"%s\" duration.

======
Kind Regards,
Peter Smith.
Fujitsu Australia



On Mon, Dec 16, 2024 at 9:40 PM Nisha Moond <nisha.moond412@gmail.com> wrote:
>
> On Mon, Dec 16, 2024 at 9:58 AM Peter Smith <smithpb2250@gmail.com> wrote:
> >
...
> > SUGGESTIONS:
> >
> > Docs (idle_replication_slot_timeout): Invalidate replication slots
> > that have remained idle longer than this duration.
> > Docs (idle_timeout): means that the slot has remained idle longer than
> > the configured idle_replication_slot_timeout duration.
> >
> > Code (guc var comment):  Invalidate replication slots that have
> > remained idle longer than this duration.
> > Code (guc_tables): Sets the duration a replication slot can remain
> > idle before it is invalidated.
> >
> > Msg (errdetail): This slot has been invalidated because it has
> > remained idle longer than the configured \"%s\" duration.
> > Msg (errdetail): The slot has remained idle since %s, which is longer
> > than the configured \"%s\" duration.
> >
>
> Here is the v56 patch set with the above comments incorporated.
>

Hi Nisha.

Thanks for the updates.

- Both patches could be applied cleanly.
- Tests (make check, TAP subscriber, TAP recovery) are all passing.
- The rendering of the documentation changes from patch 0002 looked good.
- I have no more review comments.

So, the v56* patchset LGTM.

======
Kind Regards,
Peter Smith.
Fujitsu Australia



On Mon, Dec 16, 2024 at 4:10 PM Nisha Moond <nisha.moond412@gmail.com> wrote:
>
> Here is the v56 patch set with the above comments incorporated.
>

Review comments:
===============
1.
+ {
+ {"idle_replication_slot_timeout", PGC_SIGHUP, REPLICATION_SENDING,
+ gettext_noop("Sets the duration a replication slot can remain idle before "
+ "it is invalidated."),
+ NULL,
+ GUC_UNIT_MS
+ },
+ &idle_replication_slot_timeout_ms,

I think users are going to keep idele_slot timeout at least in hours.
So, millisecond seems the wrong choice to me. I suggest to keep the
units in minutes. I understand that writing a test would be
challenging as spending a minute or more on one test is not advisable.
But I don't see any test testing the other GUCs that are in minutes
(wal_summary_keep_time and log_rotation_age). The default value should
be one day.

2.
+ /*
+ * An error is raised if error_if_invalid is true and the slot is found to
+ * be invalid.
+ */
+ if (error_if_invalid && s->data.invalidated != RS_INVAL_NONE)
+ {
+ StringInfoData err_detail;
+
+ initStringInfo(&err_detail);
+
+ switch (s->data.invalidated)
+ {
+ case RS_INVAL_WAL_REMOVED:
+ appendStringInfo(&err_detail, _("This slot has been invalidated
because the required WAL has been removed."));
+ break;
+
+ case RS_INVAL_HORIZON:
+ appendStringInfo(&err_detail, _("This slot has been invalidated
because the required rows have been removed."));
+ break;
+
+ case RS_INVAL_WAL_LEVEL:
+ /* translator: %s is a GUC variable name */
+ appendStringInfo(&err_detail, _("This slot has been invalidated
because \"%s\" is insufficient for slot."),
+ "wal_level");
+ break;
+
+ case RS_INVAL_NONE:
+ pg_unreachable();
+ }
+
+ ereport(ERROR,
+ errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
+ errmsg("can no longer get changes from replication slot \"%s\"",
+    NameStr(s->data.name)),
+ errdetail_internal("%s", err_detail.data));
+ }
+

This should be moved to a separate function.

3.
+static inline bool
+IsSlotIdleTimeoutPossible(ReplicationSlot *s)

Would it be better to name this function as CanInvalidateIdleSlot()?
The current name doesn't seem to match with similar other
functionalities.

--
With Regards,
Amit Kapila.



RE: Introduce XID age and inactive timeout based replication slot invalidation

From
"Zhijie Hou (Fujitsu)"
Date:
On Tuesday, December 24, 2024 8:57 PM Michail Nikolaev <michail.nikolaev@gmail.com>  wrote:

Hi,

> Yesterday I got a strange set of test errors, probably somehow related to
> that patch. It happened on changed master branch (based on
> d96d1d5152f30d15678e08e75b42756101b7cab6) but I don't think my changes were
> affecting it.
> 
> My setup is a little bit tricky: Windows 11 run WSL2 with Ubuntu, meson.
> 
> So, `recovery ` suite started failing on:
> 
> 1) at /src/test/recovery/t/http://019_replslot_limit.pl line 530.
> 2) at /src/test/recovery/t/http://040_standby_failover_slots_sync.pl line
>    198.
> 
> It was failing almost every run, one test or another. I was lurking around
> for about 10 min, and..... it just stopped failing. And I can't reproduce it
> anymore.
> 
> But I have logs of two fails. I am not sure if it is helpful, but decided to
> mail them here just in case.

Thanks for reporting the issue.

After checking the log, I think the failure is caused by the unexpected
behavior of the local system clock.

It's clear from the '019_replslot_limit_primary4.log'[1] that the clock went
backwards which makes the slot's inactive_since go backwards as well. That's
why the last testcase didn't pass.

And for 040_standby_failover_slots_sync, we can see that the clock of standby
lags behind that of the primary, which caused the inactive_since of newly synced
slot on standby to be earlier than the one on the primary.

So, I think it's not a bug in the committed patch but an issue in the testing
environment. Besides, since we have not seen such failures on BF, I think it
may not be necessary to improve the testcases.

[1]
2024-12-24 01:37:19.967 CET [161409] sub STATEMENT:  START_REPLICATION SLOT "lsub4_slot" LOGICAL 0/0 (proto_version
'4',streaming 'parallel', origin 'any', publication_names '"pub"')
 
...
2024-12-24 01:37:20.025 CET [161447] 019_replslot_limit.pl LOG:  statement: SELECT '0/30003D8' <= replay_lsn AND state
='streaming'
 
...
2024-12-24 01:37:19.388 CET [161097] LOG:  received fast shutdown request

Best Regards,
Hou zj