Thread: Long running query causing XID limit breach

Long running query causing XID limit breach

From
sud
Date:
Hello , 
It's RDS postgres version 15.4. We suddenly saw the "MaximumUsedTransactionIDs" reach to ~1.5billion and got alerted by team members who mentioned the database is going to be in shutdown/hung if this value reaches to ~2billion and won't be able to serve any incoming transactions. It was a panic situation.

I have heard of it before , because of the way postgres works and the XID being a datatype of length 32 bit integer can only represent (2^32)/2=~2 billion transactions. However, as RDS performs the auto vacuum , we thought that we need not worry about this issue. But it seems we were wrong. And we found one adhoc "SELECT '' query was running on the reader instance since the last couple of days and when that was killed, the max xid (MaximumUsedTransactionIDs) dropped to 50million immediately. 

So I have few questions,

1)This system is going to be a 24/7 up and running system which will process ~500million business transactions/day in future i.e. ~4-5billion rows/day inserted across multiple tables each day. And as I understand each row will have XID allocated. So in that case , does it mean that, we will need (5billion/24)=~200million XID/hour and thus , if any such legitimate application "SELECT" query keeps running for ~10 hours (and thus keep the historical XID alive) , then it can saturate the "MaximumUsedTransactionIDs" and make the database standstill in 2billion/200million=~10hrs. Is this understanding correct? Seems we are prone to hit this limit sooner going forward.

2)We have some legitimate cases where the reporting queries can run for 5-6hrs. So in such cases if the start of this SELECT query happen at 100th XID on table TAB1, then whatever transactions happen after that time, across all other tables(table2, table3 etc) in the database won't get vacuum until that SELECT query on table1 get vacuumed(as database will try to keep that same 100th XID image) and the XID will just keep incrementing for new transaction, eventually reaching the max limit. Is my understanding correct here?

3)Although RDS does the auto vacuum by default. but should we also consider doing manual vacuum without impacting ongoing transactions? Something as below options
vacuum freeze tab1;
vacuum freeze;
vacuum;
vacuum analyze tab1;
vacuum tab1;

4)Had worked in past in oracle database where the similar transaction identifier is called as "system change number" , but never encountered that being exhausted and also there it used to have UNDO record and if a SELECT query needs anything beyond certain limit(set undo_retention parameter) the select query used to fail with snapshot too old error but not impacting any write transactions. But in postgres it seems nothing like that happens and every "Select query" will try to run till its completion without any such failure, until it gets skilled by someone. Is my understanding correct?

 And in that case, It seems we have to mandatorily set "statement_timeout" to some value e.g. 4hrs(also i am not seeing a way to set it for any specific user level, so it will be set for all queries including application level) and also "idle_in_transaction_session_timeout" to 5minutes, even on all the prod and non prod databases, to restrict the long running transactions/queries and avoid such issues in future. Correct me if I'm wrong.
 
Regards
Sud

Re: Long running query causing XID limit breach

From
Muhammad Salahuddin Manzoor
Date:
Greetings,

In high-transaction environments like yours, it may be necessary to supplement this with manual vacuuming.

Few Recommendations

Monitor Long-Running Queries try to optimize.
Optimize Autovacuum.
Partitioning.
Adopt Vacuum Strategy after peak hours.

Salahuddin (살라후딘)




On Thu, 23 May 2024 at 02:16, sud <suds1434@gmail.com> wrote:
Hello , 
It's RDS postgres version 15.4. We suddenly saw the "MaximumUsedTransactionIDs" reach to ~1.5billion and got alerted by team members who mentioned the database is going to be in shutdown/hung if this value reaches to ~2billion and won't be able to serve any incoming transactions. It was a panic situation.

I have heard of it before , because of the way postgres works and the XID being a datatype of length 32 bit integer can only represent (2^32)/2=~2 billion transactions. However, as RDS performs the auto vacuum , we thought that we need not worry about this issue. But it seems we were wrong. And we found one adhoc "SELECT '' query was running on the reader instance since the last couple of days and when that was killed, the max xid (MaximumUsedTransactionIDs) dropped to 50million immediately. 

So I have few questions,

1)This system is going to be a 24/7 up and running system which will process ~500million business transactions/day in future i.e. ~4-5billion rows/day inserted across multiple tables each day. And as I understand each row will have XID allocated. So in that case , does it mean that, we will need (5billion/24)=~200million XID/hour and thus , if any such legitimate application "SELECT" query keeps running for ~10 hours (and thus keep the historical XID alive) , then it can saturate the "MaximumUsedTransactionIDs" and make the database standstill in 2billion/200million=~10hrs. Is this understanding correct? Seems we are prone to hit this limit sooner going forward.

2)We have some legitimate cases where the reporting queries can run for 5-6hrs. So in such cases if the start of this SELECT query happen at 100th XID on table TAB1, then whatever transactions happen after that time, across all other tables(table2, table3 etc) in the database won't get vacuum until that SELECT query on table1 get vacuumed(as database will try to keep that same 100th XID image) and the XID will just keep incrementing for new transaction, eventually reaching the max limit. Is my understanding correct here?

3)Although RDS does the auto vacuum by default. but should we also consider doing manual vacuum without impacting ongoing transactions? Something as below options
vacuum freeze tab1;
vacuum freeze;
vacuum;
vacuum analyze tab1;
vacuum tab1;

4)Had worked in past in oracle database where the similar transaction identifier is called as "system change number" , but never encountered that being exhausted and also there it used to have UNDO record and if a SELECT query needs anything beyond certain limit(set undo_retention parameter) the select query used to fail with snapshot too old error but not impacting any write transactions. But in postgres it seems nothing like that happens and every "Select query" will try to run till its completion without any such failure, until it gets skilled by someone. Is my understanding correct?

 And in that case, It seems we have to mandatorily set "statement_timeout" to some value e.g. 4hrs(also i am not seeing a way to set it for any specific user level, so it will be set for all queries including application level) and also "idle_in_transaction_session_timeout" to 5minutes, even on all the prod and non prod databases, to restrict the long running transactions/queries and avoid such issues in future. Correct me if I'm wrong.
 
Regards
Sud

Re: Long running query causing XID limit breach

From
sud
Date:
On Thu, May 23, 2024 at 9:00 AM Muhammad Salahuddin Manzoor <salahuddin.m@bitnine.net> wrote:
Greetings,

In high-transaction environments like yours, it may be necessary to supplement this with manual vacuuming.

Few Recommendations

Monitor Long-Running Queries try to optimize.
Optimize Autovacuum.
Partitioning.
Adopt Vacuum Strategy after peak hours.

We have these big tables already partitioned. So does "vacuum table_name" will endup scanning whole table or just the latest/live partition which is getting loaded currently? and do you mean to say running command "vacuum table_name;" frequently on selective tables that are experiencing heavy DML ? Hope this won't lock the table anyway because the data will be written/read from these tables 24/7.

When you say, "optimize autovacuum" does it mean to set a higher value of "autovacuum_max_workers" and "autovacuum_freeze_max_age"?

Considering we have ~4 billion rows inserted daily into the table and there is limit of ~2billion to the "Maximumusedtxnids", what threshold should we set for the alerting and to have enough time at hand to fix this issue?

Re: Long running query causing XID limit breach

From
Muhammad Salahuddin Manzoor
Date:
Greetings,

Running `VACUUM table_name;` on a partitioned table will vacuum each partition individually, not the whole table as a single unit.

Yes, running `VACUUM table_name;` frequently on tables or partitions with heavy DML is recommended.

Regular `VACUUM` does not lock the table for reads or writes, so it won't disrupt ongoing 24/7 data operations.

"optimize autovacuum"
Yes. Adjust following parameters as per your system/environment requirement
autovacuum_max_workers, 
autovacuum_freeze_max_age , 
autovacuum_vacuum_cost_delay

Following need to be first tested thoroughly in a test environment.
Recommended Alert Threshold
Alert at 50% Usage: Set the alert threshold at 1 billion used XIDs. This provides a significant buffer, giving you ample time to take corrective action before reaching the critical limit.

Calculation Rationale
Daily XID Usage: Approximately 4 billion rows per day implies high XID consumption.
Buffer Time: At 1 billion XIDs, you would still have 1 billion XIDs remaining, giving you roughly 12 hours to address the issue if your system consumes 200 million XIDs per hour.


Salahuddin (살라후딘)



On Thu, 23 May 2024 at 09:48, sud <suds1434@gmail.com> wrote:
On Thu, May 23, 2024 at 9:00 AM Muhammad Salahuddin Manzoor <salahuddin.m@bitnine.net> wrote:
Greetings,

In high-transaction environments like yours, it may be necessary to supplement this with manual vacuuming.

Few Recommendations

Monitor Long-Running Queries try to optimize.
Optimize Autovacuum.
Partitioning.
Adopt Vacuum Strategy after peak hours.

We have these big tables already partitioned. So does "vacuum table_name" will endup scanning whole table or just the latest/live partition which is getting loaded currently? and do you mean to say running command "vacuum table_name;" frequently on selective tables that are experiencing heavy DML ? Hope this won't lock the table anyway because the data will be written/read from these tables 24/7.

When you say, "optimize autovacuum" does it mean to set a higher value of "autovacuum_max_workers" and "autovacuum_freeze_max_age"?

Considering we have ~4 billion rows inserted daily into the table and there is limit of ~2billion to the "Maximumusedtxnids", what threshold should we set for the alerting and to have enough time at hand to fix this issue?

Re: Long running query causing XID limit breach

From
sud
Date:

On Thu, May 23, 2024 at 10:42 AM Muhammad Salahuddin Manzoor <salahuddin.m@bitnine.net> wrote:
Greetings,

Running `VACUUM table_name;` on a partitioned table will vacuum each partition individually, not the whole table as a single unit.

Yes, running `VACUUM table_name;` frequently on tables or partitions with heavy DML is recommended.

Regular `VACUUM` does not lock the table for reads or writes, so it won't disrupt ongoing 24/7 data operations.

"optimize autovacuum"
Yes. Adjust following parameters as per your system/environment requirement
autovacuum_max_workers, 
autovacuum_freeze_max_age , 
autovacuum_vacuum_cost_delay

Following need to be first tested thoroughly in a test environment.
Recommended Alert Threshold
Alert at 50% Usage: Set the alert threshold at 1 billion used XIDs. This provides a significant buffer, giving you ample time to take corrective action before reaching the critical limit.

Calculation Rationale
Daily XID Usage: Approximately 4 billion rows per day implies high XID consumption.
Buffer Time: At 1 billion XIDs, you would still have 1 billion XIDs remaining, giving you roughly 12 hours to address the issue if your system consumes 200 million XIDs per hour.



Thank you so much. That helps.
So apart from setting these alerts on "Maximumusedtxnids" and making the vacuum optimized by tweaking above parameters, should we also need to have monitoring in place to ensure the Vacuum is not taking longer as compared to its normal runtime and also if it's getting blocked/failed by something? Like for example in our case where the select query was running longer , so the vacuum must not be able to succeed every time it attempts, so is it really worth having that level of alerting?  and also how can we get an idea regarding if the vacuum is not succeeding or getting failed etc to avoid such upcoming issues? 

Re: Long running query causing XID limit breach

From
sud
Date:
Also,if i am getting it correct, it means we should not run any transaction (even if it's legitimate one like for e.g. a big Reporting "SELECT" query) beyond 10hrs, as that will end up consuming 10*200million XID per hour= 2billion XID limit saturation and thus causing system failure. Hope my understanding is correct here.

On Thu, May 23, 2024 at 11:41 AM sud <suds1434@gmail.com> wrote:

On Thu, May 23, 2024 at 10:42 AM Muhammad Salahuddin Manzoor <salahuddin.m@bitnine.net> wrote:
Greetings,

Running `VACUUM table_name;` on a partitioned table will vacuum each partition individually, not the whole table as a single unit.

Yes, running `VACUUM table_name;` frequently on tables or partitions with heavy DML is recommended.

Regular `VACUUM` does not lock the table for reads or writes, so it won't disrupt ongoing 24/7 data operations.

"optimize autovacuum"
Yes. Adjust following parameters as per your system/environment requirement
autovacuum_max_workers, 
autovacuum_freeze_max_age , 
autovacuum_vacuum_cost_delay

Following need to be first tested thoroughly in a test environment.
Recommended Alert Threshold
Alert at 50% Usage: Set the alert threshold at 1 billion used XIDs. This provides a significant buffer, giving you ample time to take corrective action before reaching the critical limit.

Calculation Rationale
Daily XID Usage: Approximately 4 billion rows per day implies high XID consumption.
Buffer Time: At 1 billion XIDs, you would still have 1 billion XIDs remaining, giving you roughly 12 hours to address the issue if your system consumes 200 million XIDs per hour.



Thank you so much. That helps.
So apart from setting these alerts on "Maximumusedtxnids" and making the vacuum optimized by tweaking above parameters, should we also need to have monitoring in place to ensure the Vacuum is not taking longer as compared to its normal runtime and also if it's getting blocked/failed by something? Like for example in our case where the select query was running longer , so the vacuum must not be able to succeed every time it attempts, so is it really worth having that level of alerting?  and also how can we get an idea regarding if the vacuum is not succeeding or getting failed etc to avoid such upcoming issues? 

Re: Long running query causing XID limit breach

From
Muhammad Salahuddin Manzoor
Date:
Greetings,

Yes, monitoring and alerting for VACUUM operations are crucial.

Track VACUUM Duration and Success:

SELECT pid, state, query_start, now() - query_start AS duration, query
FROM pg_stat_activity
WHERE query LIKE 'VACUUM%'
ORDER BY duration DESC;

Check Autovacuum Activity:

SELECT relname, last_vacuum, last_autovacuum, vacuum_count, autovacuum_count
FROM pg_stat_all_tables
WHERE relname = 'your_table_name';

Log and Monitor VACUUM Failures:

log_autovacuum_min_duration = 0  # Log all autovacuum actions
log_min_messages = 'WARNING'     # Ensure warnings and above are logged

Use tools like pgBadger to analyze PostgreSQL logs and identify any issues with autovacuum operations.

Set Up Alerts for Long-Running VACUUMs:

Use monitoring tools such as pgMonitor, Nagios, Prometheus with Grafana, or New Relic to set up alerts for long-running VACUUM processes.

Yes, your understanding is correct. In a high-transaction environment like yours, long-running transactions, including legitimate reporting queries, can hold back the advancement of the transaction ID (XID) horizon. This can prevent VACUUM from properly cleaning up old XIDs, leading to the risk of XID wraparound and potential system failure.

Use some Mitigation Strategies to handle long running quires like Set Transaction Timeouts, Monitor and Kill Long-Running Queries, Optimize Query Performance, Schedule heavy reporting queries during periods of lower transactional activity.

Salahuddin (살라후딘)



On Thu, 23 May 2024 at 11:25, sud <suds1434@gmail.com> wrote:
Also,if i am getting it correct, it means we should not run any transaction (even if it's legitimate one like for e.g. a big Reporting "SELECT" query) beyond 10hrs, as that will end up consuming 10*200million XID per hour= 2billion XID limit saturation and thus causing system failure. Hope my understanding is correct here.

On Thu, May 23, 2024 at 11:41 AM sud <suds1434@gmail.com> wrote:

On Thu, May 23, 2024 at 10:42 AM Muhammad Salahuddin Manzoor <salahuddin.m@bitnine.net> wrote:
Greetings,

Running `VACUUM table_name;` on a partitioned table will vacuum each partition individually, not the whole table as a single unit.

Yes, running `VACUUM table_name;` frequently on tables or partitions with heavy DML is recommended.

Regular `VACUUM` does not lock the table for reads or writes, so it won't disrupt ongoing 24/7 data operations.

"optimize autovacuum"
Yes. Adjust following parameters as per your system/environment requirement
autovacuum_max_workers, 
autovacuum_freeze_max_age , 
autovacuum_vacuum_cost_delay

Following need to be first tested thoroughly in a test environment.
Recommended Alert Threshold
Alert at 50% Usage: Set the alert threshold at 1 billion used XIDs. This provides a significant buffer, giving you ample time to take corrective action before reaching the critical limit.

Calculation Rationale
Daily XID Usage: Approximately 4 billion rows per day implies high XID consumption.
Buffer Time: At 1 billion XIDs, you would still have 1 billion XIDs remaining, giving you roughly 12 hours to address the issue if your system consumes 200 million XIDs per hour.



Thank you so much. That helps.
So apart from setting these alerts on "Maximumusedtxnids" and making the vacuum optimized by tweaking above parameters, should we also need to have monitoring in place to ensure the Vacuum is not taking longer as compared to its normal runtime and also if it's getting blocked/failed by something? Like for example in our case where the select query was running longer , so the vacuum must not be able to succeed every time it attempts, so is it really worth having that level of alerting?  and also how can we get an idea regarding if the vacuum is not succeeding or getting failed etc to avoid such upcoming issues? 

Re: Long running query causing XID limit breach

From
yudhi s
Date:

On Thu, May 23, 2024 at 11:42 AM sud <suds1434@gmail.com> wrote:

Calculation Rationale
Daily XID Usage: Approximately 4 billion rows per day implies high XID consumption.
Buffer Time: At 1 billion XIDs, you would still have 1 billion XIDs remaining, giving you roughly 12 hours to address the issue if your system consumes 200 million XIDs per hour.




OP mentioned that initially the number of business transactions is around 500million but the rows inserted across many tables are ~4-5billion in total per day. So doesn't it mean that the XID consumption will happen based on the transactions rather on the number of rows basis. Say for example ~4billion rows may be loaded using a batch size of ~1000 using bulk load, so that will be ~4million txn so it should use ~4million XID but not 4 billion XID usage. And thus making the transactions process in batches rather than row by row minimizes the XID usage. Correct? 

Re: Long running query causing XID limit breach

From
Laurenz Albe
Date:
On Thu, 2024-05-23 at 02:46 +0530, sud wrote:
> It's RDS postgres version 15.4. We suddenly saw the "MaximumUsedTransactionIDs"
> reach to ~1.5billion and got alerted by team members who mentioned the database
> is going to be in shutdown/hung if this value reaches to ~2billion and won't be
> able to serve any incoming transactions. It was a panic situation.
>
> I have heard of it before , because of the way postgres works and the XID being
> a datatype of length 32 bit integer can only represent (2^32)/2=~2 billion
> transactions. However, as RDS performs the auto vacuum , we thought that we need
> not worry about this issue. But it seems we were wrong. And we found one adhoc
> "SELECT '' query was running on the reader instance since the last couple of
> days and when that was killed, the max xid (MaximumUsedTransactionIDs) dropped
> to 50million immediately.

This has nothing to do with autovacuum running.
PostgreSQL won't freeze any rows above the xmin horizon (see the "backend_xmin"
column in "pg_stat_activity").

> So I have few questions,
>
> 1)This system is going to be a 24/7 up and running system which will process
>   ~500million business transactions/day in future i.e. ~4-5billion rows/day
>   inserted across multiple tables each day. And as I understand each row will
>   have XID allocated. So in that case , does it mean that, we will need
>   (5billion/24)=~200million XID/hour and thus , if any such legitimate
>   application "SELECT" query keeps running for ~10 hours (and thus keep the
>   historical XID alive) , then it can saturate the "MaximumUsedTransactionIDs"
>   and make the database standstill in 2billion/200million=~10hrs. Is this
>   understanding correct? Seems we are prone to hit this limit sooner going forward.

Yes, that is correct.  You cannot run such long-running queries with a
transaction rate like that.

> 2)We have some legitimate cases where the reporting queries can run for 5-6hrs.
>   So in such cases if the start of this SELECT query happen at 100th XID on
>   table TAB1, then whatever transactions happen after that time, across all
>   other tables(table2, table3 etc) in the database won't get vacuum until that
>   SELECT query on table1 get vacuumed(as database will try to keep that same
>   100th XID image) and the XID will just keep incrementing for new transaction,
>   eventually reaching the max limit. Is my understanding correct here?

Again correct.  PostgreSQL cannot tell which tables the query will use in the
future, so VACUUM cannot clean up old row versions in any table in the database.
This is irrelevant for transaction ID wraparound, though: you'll get into trouble
even if only a single table holds an unfrozen row that is old enough.

> 3)Although RDS does the auto vacuum by default. but should we also consider
>   doing manual vacuum without impacting ongoing transactions?

That won't help.  Autovacuum is running, but cannot freeze the rows, and a manual
VACUUM cannot do it either.

> 4)Had worked in past in oracle database [...]

Oracle implements all this radically differently.

>  And in that case, It seems we have to mandatorily set "statement_timeout" to
> some value e.g. 4hrs(also i am not seeing a way to set it for any specific
> user level, so it will be set for all queries including application level)
> and also "idle_in_transaction_session_timeout" to 5minutes, even on all the
> prod and non prod databases, to restrict the long running transactions/queries
> and avoid such issues in future. Correct me if I'm wrong.

That looks right.

One thing you could consider is running the long-running queries on a standby
server.  Replication will get delayed, and you have to keep all the WAL
around for the standby to catch up once the query is done, but it should work.
You'd set "max_streaming_standby_delay" to -1 on the standby.

Yours,
Laurenz Albe



Re: Long running query causing XID limit breach

From
sud
Date:

On Thu, May 23, 2024 at 1:22 PM Laurenz Albe <laurenz.albe@cybertec.at> wrote:
On Thu, 2024-05-23 at 02:46 +0530, sud wrote:
> It's RDS postgres version 15.4. We suddenly saw the "MaximumUsedTransactionIDs"
> reach to ~1.5billion and got alerted by team members who mentioned the database
> is going to be in shutdown/hung if this value reaches to ~2billion and won't be
> able to serve any incoming transactions. It was a panic situation.
>
> I have heard of it before , because of the way postgres works and the XID being
> a datatype of length 32 bit integer can only represent (2^32)/2=~2 billion
> transactions. However, as RDS performs the auto vacuum , we thought that we need
> not worry about this issue. But it seems we were wrong. And we found one adhoc
> "SELECT '' query was running on the reader instance since the last couple of
> days and when that was killed, the max xid (MaximumUsedTransactionIDs) dropped
> to 50million immediately.

This has nothing to do with autovacuum running.
PostgreSQL won't freeze any rows above the xmin horizon (see the "backend_xmin"
column in "pg_stat_activity").

> So I have few questions,
>
> 1)This system is going to be a 24/7 up and running system which will process
>   ~500million business transactions/day in future i.e. ~4-5billion rows/day
>   inserted across multiple tables each day. And as I understand each row will
>   have XID allocated. So in that case , does it mean that, we will need
>   (5billion/24)=~200million XID/hour and thus , if any such legitimate
>   application "SELECT" query keeps running for ~10 hours (and thus keep the
>   historical XID alive) , then it can saturate the "MaximumUsedTransactionIDs"
>   and make the database standstill in 2billion/200million=~10hrs. Is this
>   understanding correct? Seems we are prone to hit this limit sooner going forward.

Yes, that is correct.  You cannot run such long-running queries with a
transaction rate like that.


When you mean transaction ,does it mean one commit ? For example if it's inserting+committing ~1000 rows in one batch then all the 1000 rows will be marked as one XID rather than 1000 different XID. and so we should look for batch processing rather than row by row types processing. Is the understanding correct?
 
One thing you could consider is running the long-running queries on a standby
server.  Replication will get delayed, and you have to keep all the WAL
around for the standby to catch up once the query is done, but it should work.
You'd set "max_streaming_standby_delay" to -1 on the standby.


We have the "Select query" running on a reader instance , but still the writer instance was showing up "MaximumUsedTransactionIDs" reaching 1.5billion, so it means both the instance as part of same cluster so sharing same XIDs, and as per your suggestion we should run this in separate standby cluster altogether which does not share same XID. Is this understanding correct? or it can be handled even with another reader instance by just tweaking some other parameter so that they won't share the same XID?

Re: Long running query causing XID limit breach

From
Laurenz Albe
Date:
On Thu, 2024-05-23 at 13:41 +0530, sud wrote:
> > Yes, that is correct.  You cannot run such long-running queries with a
> > transaction rate like that.
>
> When you mean transaction ,does it mean one commit ? For example if it's
> inserting+committing ~1000 rows in one batch then all the 1000 rows will
> be marked as one XID rather than 1000 different XID. and so we should look
> for batch processing rather than row by row types processing.
> Is the understanding correct?

Yes, that would help.

> > One thing you could consider is running the long-running queries on a standby
> > server.  Replication will get delayed, and you have to keep all the WAL
> > around for the standby to catch up once the query is done, but it should work.
> > You'd set "max_streaming_standby_delay" to -1 on the standby.
>
> We have the "Select query" running on a reader instance , but still the writer
> instance was showing up "MaximumUsedTransactionIDs" reaching 1.5billion, so it
> means both the instance as part of same cluster so sharing same XIDs

If a long running query on the standby influences the primary, that means that
you have "hot_standby_feedback" set to "on".  Set it to "off".

Yours,
Laurenz Albe



Re: Long running query causing XID limit breach

From
sud
Date:

On Thu, May 23, 2024 at 1:45 PM Laurenz Albe <laurenz.albe@cybertec.at> wrote:


If a long running query on the standby influences the primary, that means that
you have "hot_standby_feedback" set to "on".  Set it to "off".


Will the setting up of "hot_standby_feedback" value to OFF will cause the reader instance to give incorrect query results or unexpected query failure which will be potential inconsistency between the writer and reader instance, as because those XID's can be removed/cleaned by the writer node even if its being read by the reader instance query. And it can have more replication lag. So I'm wondering , if this setup is advisable one?

Re: Long running query causing XID limit breach

From
Laurenz Albe
Date:
On Thu, 2024-05-23 at 18:15 +0530, sud wrote:
> On Thu, May 23, 2024 at 1:45 PM Laurenz Albe <laurenz.albe@cybertec.at> wrote:
> > If a long running query on the standby influences the primary, that means that
> > you have "hot_standby_feedback" set to "on".  Set it to "off".
>
> Will the setting up of "hot_standby_feedback" value to OFF will cause the
> reader instance to give incorrect query results or unexpected query failure
> which will be potential inconsistency between the writer and reader instance,
> as because those XID's can be removed/cleaned by the writer node even if its
> being read by the reader instance query. And it can have more replication lag.

There will never be incorrect query results.

It can happen that a query on the standby gets canceled if you don't set
"max_standby_streaming_delay" to -1, but that can happen even if
"hot_standby_feedback" is "on".  It just happens less often.

The effect of setting "max_standby_streaming_delay" to -1 will often be a
replication delay if you run a long query.  That's what you have to
accept if you want to execute long-running queries.

You will never be able to have both of the following:
- queries never get canceled
- there is no replication delay

> So I'm wondering , if this setup is advisable one?

I'd say yes.  Anyway, if doesn't look like you have an alternative if
you want to run queries that take longer than it takes your transaction
ID counter to wrap around.

Yours,
Laurenz Albe



Re: Long running query causing XID limit breach

From
sud
Date:
On Thu, May 23, 2024 at 8:11 PM Laurenz Albe <laurenz.albe@cybertec.at> wrote:
On Thu, 2024-05-23 at 18:15 +0530, sud wrote:
> On Thu, May 23, 2024 at 1:45 PM Laurenz Albe <laurenz.albe@cybertec.at> wrote:
> > If a long running query on the standby influences the primary, that means that
> > you have "hot_standby_feedback" set to "on".  Set it to "off".
>
> Will the setting up of "hot_standby_feedback" value to OFF will cause the
> reader instance to give incorrect query results or unexpected query failure
> which will be potential inconsistency between the writer and reader instance,
> as because those XID's can be removed/cleaned by the writer node even if its
> being read by the reader instance query. And it can have more replication lag.

There will never be incorrect query results.

It can happen that a query on the standby gets canceled if you don't set
"max_standby_streaming_delay" to -1, but that can happen even if
"hot_standby_feedback" is "on".  It just happens less often.

The effect of setting "max_standby_streaming_delay" to -1 will often be a
replication delay if you run a long query.  That's what you have to
accept if you want to execute long-running queries.

You will never be able to have both of the following:
- queries never get canceled
- there is no replication delay



I am trying to understand these two parameters and each time it looks a bit confusing to me. If These two parameters complement or conflict with each other.

Say for example, If we set hot_feedback_standby to ON (which is currently set as default ON by the way), it will make the primary wait till the query completion at standby and can cause such a high bump in XID in scenarios where the query on standby runs for days(like in our current scenario which happens). So we were thinking of setting it as OFF, to avoid the transaction ID wrap around issue..

But as you also mentioned to set the "max_standby_streaming_delay" to -1 (which is currently set as 14 second in our case) ,it will wait infinitely , till the query completes on the standby and wont apply the WAL which can cause override of the XID which the standby query is reading from. But wont this same behaviour be happening while we have hot_feedback_standby set as "ON"?

But again for HA , in case primary down we should not be in big lag for the standby and thus we want the standby also with minimal lag. And as you mentioned there will never be incorrect results but at amx it will be query cancellation, so I was thinking , if it's fine to just keep the "hot_feedback_standby" as OFF and let the max_standby_streaming_delay set as it is like 14 sec. Let me know your thoughts.

Basically below are the combinations, i am confused between.. 

hot_feedback_stanby ON and max_standby_streaming_delay=-1
or
hot_feedback_stanby OFF and max_standby_streaming_delay=-1
Or
hot_feedback_stanby ON and max_standby_streaming_delay=14 sec
Or
hot_feedback_stanby OFF and max_standby_streaming_delay=14 sec

Re: Long running query causing XID limit breach

From
yudhi s
Date:


On Fri, May 24, 2024 at 10:34 AM sud <suds1434@gmail.com> wrote:
I am trying to understand these two parameters and each time it looks a bit confusing to me. If These two parameters complement or conflict with each other.

Say for example, If we set hot_feedback_standby to ON (which is currently set as default ON by the way), it will make the primary wait till the query completion at standby and can cause such a high bump in XID in scenarios where the query on standby runs for days(like in our current scenario which happens). So we were thinking of setting it as OFF, to avoid the transaction ID wrap around issue..

But as you also mentioned to set the "max_standby_streaming_delay" to -1 (which is currently set as 14 second in our case) ,it will wait infinitely , till the query completes on the standby and wont apply the WAL which can cause override of the XID which the standby query is reading from. But wont this same behaviour be happening while we have hot_feedback_standby set as "ON"?

But again for HA , in case primary down we should not be in big lag for the standby and thus we want the standby also with minimal lag. And as you mentioned there will never be incorrect results but at amx it will be query cancellation, so I was thinking , if it's fine to just keep the "hot_feedback_standby" as OFF and let the max_standby_streaming_delay set as it is like 14 sec. Let me know your thoughts.

Basically below are the combinations, i am confused between.. 

hot_feedback_stanby ON and max_standby_streaming_delay=-1
or
hot_feedback_stanby OFF and max_standby_streaming_delay=-1
Or
hot_feedback_stanby ON and max_standby_streaming_delay=14 sec
Or
hot_feedback_stanby OFF and max_standby_streaming_delay=14 sec

As per my understanding here, this would be the behaviour. Others may comment..


hot_standby_feedback ON and max_standby_streaming_delay = -1:

Ensures that long-running queries on the standby are not interrupted. The primary waits indefinitely to avoid vacuuming rows needed by standby queries.
But Can lead to significant replication lag and increased XID consumption on the primary, potentially causing transaction ID wraparound issues.

hot_standby_feedback OFF and max_standby_streaming_delay = -1:

Ensures long-running queries on the standby are not interrupted. No feedback is sent to the primary, reducing the risk of XID wraparound.
But The standby may fall significantly behind the primary, resulting in high replication lag.


hot_standby_feedback ON and max_standby_streaming_delay = 14 seconds:

The primary prevents vacuuming rows needed by standby queries, reducing query cancellations on the standby. The replication lag is limited to 14 seconds.
But Long-running queries on the standby that exceed 14 seconds may be canceled, and the primary can still experience increased XID consumption.

hot_standby_feedback OFF and max_standby_streaming_delay = 14 seconds:

 Limits replication lag to 14 seconds and reduces XID consumption on the primary. Queries on the standby exceeding 14 seconds are canceled.
but Long-running queries on the standby are more likely to be canceled due to the lack of feedback to the primary.

 

Re: Long running query causing XID limit breach

From
sud
Date:

On Sun, May 26, 2024 at 2:24 AM yudhi s <learnerdatabase99@gmail.com> wrote:

hot_standby_feedback ON and max_standby_streaming_delay = -1:

Ensures that long-running queries on the standby are not interrupted. The primary waits indefinitely to avoid vacuuming rows needed by standby queries.
But Can lead to significant replication lag and increased XID consumption on the primary, potentially causing transaction ID wraparound issues.

hot_standby_feedback OFF and max_standby_streaming_delay = -1:

Ensures long-running queries on the standby are not interrupted. No feedback is sent to the primary, reducing the risk of XID wraparound.
But The standby may fall significantly behind the primary, resulting in high replication lag.


hot_standby_feedback ON and max_standby_streaming_delay = 14 seconds:

The primary prevents vacuuming rows needed by standby queries, reducing query cancellations on the standby. The replication lag is limited to 14 seconds.
But Long-running queries on the standby that exceed 14 seconds may be canceled, and the primary can still experience increased XID consumption.

hot_standby_feedback OFF and max_standby_streaming_delay = 14 seconds:

 Limits replication lag to 14 seconds and reduces XID consumption on the primary. Queries on the standby exceeding 14 seconds are canceled.
but Long-running queries on the standby are more likely to be canceled due to the lack of feedback to the primary.

 

Thank you so much.
Does it mean that the last one we should go for i.e. (hot_standby_feedback OFF and max_standby_streaming_delay = 14 seconds), as because high availability is also a key requirement in any production environment, so keeping 14 seconds lag is kind of okay and also at the same time keeping hot_standby_feedback OFF will make sure the transaction id wraparound around won't happen because of any long running query on standby as it won't wait for the stand by feedback for vacuuming the tables.

But i have one question here , does max_standby_streaming_delay = 14 , means the queries on the standby will get cancelled after 14 seconds?

Re: Long running query causing XID limit breach

From
David HJ
Date:
anyone know how to describe from this mailing list? 

On Thu, May 23, 2024 at 5:16 AM sud <suds1434@gmail.com> wrote:
Hello , 
It's RDS postgres version 15.4. We suddenly saw the "MaximumUsedTransactionIDs" reach to ~1.5billion and got alerted by team members who mentioned the database is going to be in shutdown/hung if this value reaches to ~2billion and won't be able to serve any incoming transactions. It was a panic situation.

I have heard of it before , because of the way postgres works and the XID being a datatype of length 32 bit integer can only represent (2^32)/2=~2 billion transactions. However, as RDS performs the auto vacuum , we thought that we need not worry about this issue. But it seems we were wrong. And we found one adhoc "SELECT '' query was running on the reader instance since the last couple of days and when that was killed, the max xid (MaximumUsedTransactionIDs) dropped to 50million immediately. 

So I have few questions,

1)This system is going to be a 24/7 up and running system which will process ~500million business transactions/day in future i.e. ~4-5billion rows/day inserted across multiple tables each day. And as I understand each row will have XID allocated. So in that case , does it mean that, we will need (5billion/24)=~200million XID/hour and thus , if any such legitimate application "SELECT" query keeps running for ~10 hours (and thus keep the historical XID alive) , then it can saturate the "MaximumUsedTransactionIDs" and make the database standstill in 2billion/200million=~10hrs. Is this understanding correct? Seems we are prone to hit this limit sooner going forward.

2)We have some legitimate cases where the reporting queries can run for 5-6hrs. So in such cases if the start of this SELECT query happen at 100th XID on table TAB1, then whatever transactions happen after that time, across all other tables(table2, table3 etc) in the database won't get vacuum until that SELECT query on table1 get vacuumed(as database will try to keep that same 100th XID image) and the XID will just keep incrementing for new transaction, eventually reaching the max limit. Is my understanding correct here?

3)Although RDS does the auto vacuum by default. but should we also consider doing manual vacuum without impacting ongoing transactions? Something as below options
vacuum freeze tab1;
vacuum freeze;
vacuum;
vacuum analyze tab1;
vacuum tab1;

4)Had worked in past in oracle database where the similar transaction identifier is called as "system change number" , but never encountered that being exhausted and also there it used to have UNDO record and if a SELECT query needs anything beyond certain limit(set undo_retention parameter) the select query used to fail with snapshot too old error but not impacting any write transactions. But in postgres it seems nothing like that happens and every "Select query" will try to run till its completion without any such failure, until it gets skilled by someone. Is my understanding correct?

 And in that case, It seems we have to mandatorily set "statement_timeout" to some value e.g. 4hrs(also i am not seeing a way to set it for any specific user level, so it will be set for all queries including application level) and also "idle_in_transaction_session_timeout" to 5minutes, even on all the prod and non prod databases, to restrict the long running transactions/queries and avoid such issues in future. Correct me if I'm wrong.
 
Regards
Sud

Re: Long running query causing XID limit breach

From
Torsten Förtsch
Date:
On Sat, May 25, 2024 at 11:00 PM sud <suds1434@gmail.com> wrote:

But i have one question here , does max_standby_streaming_delay = 14 , means the queries on the standby will get cancelled after 14 seconds?

No, your query gets cancelled when it stalls replication for >14 sec. If your master is idle and does not send any WAL and the replica has caught up, the query can take as long as it wants. 

Re: Long running query causing XID limit breach

From
sud
Date:
On Sun, May 26, 2024 at 1:43 PM Torsten Förtsch <tfoertsch123@gmail.com> wrote:
On Sat, May 25, 2024 at 11:00 PM sud <suds1434@gmail.com> wrote:

But i have one question here , does max_standby_streaming_delay = 14 , means the queries on the standby will get cancelled after 14 seconds?

No, your query gets cancelled when it stalls replication for >14 sec. If your master is idle and does not send any WAL and the replica has caught up, the query can take as long as it wants. 

Thank you so much. 
For example , in below scenario,
if i have insert query going on on primary instance on table  25th may partition of TABLE1, and at same time we are selecting data from 24th May partition , then with "max_standby_streaming_delay = 14" setup , it just allows the select query to run for any duration without any restriction even if the WAL gets applied on the standby regularly. Also INSERT query in primary won't make the standby SELECT queries to cancel as because the WAL record of INSERT queries on the primary instance is not conflicting to the exact rows those were being read by the standby. Is my understanding correct here?

However, if i have Update/Delete query going on on primary instance on table  25th may partition of TABLE1 and on the exact same set of rows which were being read by the standby instance by the SELECT query, then the application of such WAL record to standby can max wait for 14 seconds and thus those select query are prone to be cancelled after 14 seconds. Is this understanding correct?

If the above is true then it doesn't look good, as because in an OLTP system there will be a lot of DMLS happening on the writer instances and there may be many queries running on the reader/standby instances which are meant to run for hours. And if we say making those SELECT queries run for hours means compromising an hour of "high availability"/RPO or a lag of an hour between primary and standby , that doesn't look good. Please correct me if I am missing something here.

Re: Long running query causing XID limit breach

From
Adrian Klaver
Date:
On 5/25/24 22:56, David HJ wrote:
> anyone know how to describe from this mailing list?

See here:

https://lists.postgresql.org/unsubscribe/


-- 
Adrian Klaver
adrian.klaver@aklaver.com




Re: Long running query causing XID limit breach

From
Torsten Förtsch
Date:
On Sun, May 26, 2024 at 11:16 AM sud <suds1434@gmail.com> wrote:
On Sun, May 26, 2024 at 1:43 PM Torsten Förtsch <tfoertsch123@gmail.com> wrote:
On Sat, May 25, 2024 at 11:00 PM sud <suds1434@gmail.com> wrote:

But i have one question here , does max_standby_streaming_delay = 14 , means the queries on the standby will get cancelled after 14 seconds?

No, your query gets cancelled when it stalls replication for >14 sec. If your master is idle and does not send any WAL and the replica has caught up, the query can take as long as it wants. 

Thank you so much. 
For example , in below scenario,
if i have insert query going on on primary instance on table  25th may partition of TABLE1, and at same time we are selecting data from 24th May partition , then with "max_standby_streaming_delay = 14" setup , it just allows the select query to run for any duration without any restriction even if the WAL gets applied on the standby regularly. Also INSERT query in primary won't make the standby SELECT queries to cancel as because the WAL record of INSERT queries on the primary instance is not conflicting to the exact rows those were being read by the standby. Is my understanding correct here?

However, if i have Update/Delete query going on on primary instance on table  25th may partition of TABLE1 and on the exact same set of rows which were being read by the standby instance by the SELECT query, then the application of such WAL record to standby can max wait for 14 seconds and thus those select query are prone to be cancelled after 14 seconds. Is this understanding correct?

If the above is true then it doesn't look good, as because in an OLTP system there will be a lot of DMLS happening on the writer instances and there may be many queries running on the reader/standby instances which are meant to run for hours. And if we say making those SELECT queries run for hours means compromising an hour of "high availability"/RPO or a lag of an hour between primary and standby , that doesn't look good. Please correct me if I am missing something here.

Each query on the replica has a backend_xmin. You can see that in pg_stat_activity. From that backend's perspective, tuples marked as deleted by any transaction greater or equal to backend_xmin are still needed. This does not depend on the table.

Now, vacuum writes to the WAL up to which point it has vacuumed on the master. In pg_waldump this looks like so:

PRUNE snapshotConflictHorizon: 774, nredirected: 0, ndead: 5, nunused: 0, redirected: [], dead: [2, 4, 6, 8, 10], unused: [], blkref #0: rel 1663/5/16430 blk 0

That snapshotConflictHorizon is also a transaction id. If the backend_xmin of all backends running transactions in the same database (the 5 in 1663/5/16430) -as the vacuum WAL record is greater than vacuum's snapshotConflictHorizon, then there is no conflict. If any of the backend_xmin's is less, then there is a conflict.

This type of conflict is determined by just 2 numbers, the conflict horizon sent by the master in the WAL, and the minimum of all backend_xmins. For your case this means a long running transaction querying table t1 might have a backend_xmin of 223. On the master update and delete operations happen on table T2. Since all the transactions on the master are fast, when vacuum hits T2, the minimum of all backend_xmins on the master might already be 425. So, garbage left over by all transactions up to 424 can be cleaned up. Now that cleanup record reaches the replica. It compares 223>425 which is false. So, there is a conflict. Now the replica can wait until its own horizon reaches 425 or it can kill all backends with a lower backend_xmin.

As I understand, hot_standby_feedback does not work for you. Not sure if you can run the query on the master? That would resolve the issues but might generate the same bloat on the master as hot_standby_feedback. Another option I can see is to run long running queries on a dedicated replica with max_standby_streaming_delay set to infinity or something large enough. If you go that way, you could also fetch the WAL from your WAL archive instead of replicating from the master. That way the replica has absolutely no chance to affect the master.

Good Luck!
Torsten

Re: Long running query causing XID limit breach

From
sud
Date:
On Sun, May 26, 2024 at 11:18 PM Torsten Förtsch <tfoertsch123@gmail.com> wrote:
Each query on the replica has a backend_xmin. You can see that in pg_stat_activity. From that backend's perspective, tuples marked as deleted by any transaction greater or equal to backend_xmin are still needed. This does not depend on the table.

Now, vacuum writes to the WAL up to which point it has vacuumed on the master. In pg_waldump this looks like so:

PRUNE snapshotConflictHorizon: 774, nredirected: 0, ndead: 5, nunused: 0, redirected: [], dead: [2, 4, 6, 8, 10], unused: [], blkref #0: rel 1663/5/16430 blk 0

That snapshotConflictHorizon is also a transaction id. If the backend_xmin of all backends running transactions in the same database (the 5 in 1663/5/16430) -as the vacuum WAL record is greater than vacuum's snapshotConflictHorizon, then there is no conflict. If any of the backend_xmin's is less, then there is a conflict.

This type of conflict is determined by just 2 numbers, the conflict horizon sent by the master in the WAL, and the minimum of all backend_xmins. For your case this means a long running transaction querying table t1 might have a backend_xmin of 223. On the master update and delete operations happen on table T2. Since all the transactions on the master are fast, when vacuum hits T2, the minimum of all backend_xmins on the master might already be 425. So, garbage left over by all transactions up to 424 can be cleaned up. Now that cleanup record reaches the replica. It compares 223>425 which is false. So, there is a conflict. Now the replica can wait until its own horizon reaches 425 or it can kill all backends with a lower backend_xmin.

As I understand, hot_standby_feedback does not work for you. Not sure if you can run the query on the master? That would resolve the issues but might generate the same bloat on the master as hot_standby_feedback. Another option I can see is to run long running queries on a dedicated replica with max_standby_streaming_delay set to infinity or something large enough. If you go that way, you could also fetch the WAL from your WAL archive instead of replicating from the master. That way the replica has absolutely no chance to affect the master.


Thank you so much.

Would you agree that we should have two standby, one with default max_standby_streaming_delay (say 10 sec ) which will be mainly used as high availability and thus will be having minimal lag. and another standby with max_standby_streaming_delay as "-1" i.e. it will wait indefinitely for the SELECT queries to finish without caring about the lag, which will be utilized for the long running SELECT queries.

And keep the hot_standby_feedback as ON for the first standby which is used as HA/high availability. And keep the hot_standby_feedback as OFF for the second standby which is utilized for long running SELECT queries, so that primary won't be waiting for the response/feedback from this standby to vacuum its old transactions and that will keep the transaction id wrap around issue from not happening because of the Read/Select queries on any of the standby. 

Re: Long running query causing XID limit breach

From
Torsten Förtsch
Date:
On Sun, May 26, 2024 at 8:46 PM sud <suds1434@gmail.com> wrote:
Would you agree that we should have two standby, one with default max_standby_streaming_delay (say 10 sec ) which will be mainly used as high availability and thus will be having minimal lag. and another standby with max_standby_streaming_delay as "-1" i.e. it will wait indefinitely for the SELECT queries to finish without caring about the lag, which will be utilized for the long running SELECT queries.

And keep the hot_standby_feedback as ON for the first standby which is used as HA/high availability. And keep the hot_standby_feedback as OFF for the second standby which is utilized for long running SELECT queries, so that primary won't be waiting for the response/feedback from this standby to vacuum its old transactions and that will keep the transaction id wrap around issue from not happening because of the Read/Select queries on any of the standby. 

Sure. That could work. Perhaps also set statement_timeout on the first replica, just in case.

Re: Long running query causing XID limit breach

From
sud
Date:

On Mon, May 27, 2024 at 12:55 AM Torsten Förtsch <tfoertsch123@gmail.com> wrote:
On Sun, May 26, 2024 at 8:46 PM sud <suds1434@gmail.com> wrote:
Would you agree that we should have two standby, one with default max_standby_streaming_delay (say 10 sec ) which will be mainly used as high availability and thus will be having minimal lag. and another standby with max_standby_streaming_delay as "-1" i.e. it will wait indefinitely for the SELECT queries to finish without caring about the lag, which will be utilized for the long running SELECT queries.

And keep the hot_standby_feedback as ON for the first standby which is used as HA/high availability. And keep the hot_standby_feedback as OFF for the second standby which is utilized for long running SELECT queries, so that primary won't be waiting for the response/feedback from this standby to vacuum its old transactions and that will keep the transaction id wrap around issue from not happening because of the Read/Select queries on any of the standby. 

Sure. That could work. Perhaps also set statement_timeout on the first replica, just in case.

Thank you so much. Yes, planning to set it like below. Hope i am doing it correctly.

Master/PrimaryFirst Replica/Standby for High AvailabilitySecond Replica for Reporting
hot_standby_feedback=ONhot_standby_feedback=ONhot_standby_feedback=OFF
max_standby_streaming_delay=10 secmax_standby_streaming_delay=10 secmax_standby_streaming_delay=-1 (Infinite)
statement_timeout = "2hrs"statement_timeout="2hrs"No statement_timeout i.e. infinite
idle_in_transaction_session_timeout=10minutesidle_in_transaction_session_timeout=10minutesNo idle_in_transaction_session_timeout i.e. infinite
autovacuum_freeze_max_age=100Mautovacuum_freeze_max_age=100Mautovacuum_freeze_max_age=100M
Log_autovacuum_min_duration=0Log_autovacuum_min_duration=0Log_autovacuum_min_duration=0
 

Re: Long running query causing XID limit breach

From
Laurenz Albe
Date:
> On Fri, May 24, 2024 at 10:34 AM sud <suds1434@gmail.com> wrote:
> > I am trying to understand these two parameters and each time it looks a bit confusing to me.
> > If These two parameters complement or conflict with each other.
> >
> > Say for example, If we set hot_feedback_standby to ON (which is currently set as
> > default ON by the way), it will make the primary wait till the query completion at
> > standby and can cause such a high bump in XID in scenarios where the query on standby
> > runs for days(like in our current scenario which happens). So we were thinking of
> > setting it as OFF, to avoid the transaction ID wrap around issue..

"hot_standby_feedback" is not "on" by default; you must have changed it.

This parameter will not make anything wait.  The effect is that VACUUM on the
primary server won't remove any old row versions that are still needed by a
query on the standby.  This holds back the "xmin" horizon and can lead to
bloat and, if you consume transaction IDs quickly, to wraparound problems.

> > But as you also mentioned to set the "max_standby_streaming_delay" to -1 (which is
> > currently set as 14 second in our case) ,it will wait infinitely , till the query
> > completes on the standby and wont apply the WAL which can cause override of the XID
> > which the standby query is reading from. But wont this same behaviour be happening
> > while we have hot_feedback_standby set as "ON"?

"It" is the standby, where you set the parameter.  The primary is not affected by that
at all.  If there is a replication conflict on the standby, replay of the WAL information
is delayed until the query is done.

There is no conflict between these settings; they do something different.

> > But again for HA , in case primary down we should not be in big lag for the standby
> > and thus we want the standby also with minimal lag. And as you mentioned there will
> > never be incorrect results but at amx it will be query cancellation, so I was thinking,
> > if it's fine to just keep the "hot_feedback_standby" as OFF and let the
> > max_standby_streaming_delay set as it is like 14 sec. Let me know your thoughts.
>

You cannot have it.
Let me repeat: you cannot have it.

The only way you can have no delay in replication AND no canceled queries is
if you use two different standby servers with different settings for
"max_standby_streaming_delay".  One of the server is for HA, the other for
your long-running queries.

Yours,
Laurenz Albe



Re: Long running query causing XID limit breach

From
yudhi s
Date:
On Mon, May 27, 2024 at 2:50 PM Laurenz Albe <laurenz.albe@cybertec.at> wrote:

> > But again for HA , in case primary down we should not be in big lag for the standby
> > and thus we want the standby also with minimal lag. And as you mentioned there will
> > never be incorrect results but at amx it will be query cancellation, so I was thinking,
> > if it's fine to just keep the "hot_feedback_standby" as OFF and let the
> > max_standby_streaming_delay set as it is like 14 sec. Let me know your thoughts.
>
You cannot have it.
Let me repeat: you cannot have it.

The only way you can have no delay in replication AND no canceled queries is
if you use two different standby servers with different settings for
"max_standby_streaming_delay".  One of the server is for HA, the other for
your long-running queries.

When you suggest having different max_standby_streaming_delay for first replica (say 10 sec for High availability) and second replica(say -1 for long running queries). Do you also suggest  keeping "hot_feedback_standby" as "OFF" for all the three instances i.e. master and both the replicas? 

Also OP has added a few other parameters as below, do you think these should be needed? 
 I think the master and first replica should have the same set up because in case of any disaster to master the first replica should be able to take the place of master.
 
Master/PrimaryFirst Replica/Standby for High AvailabilitySecond Replica for Reporting
hot_standby_feedback=ONhot_standby_feedback=ONhot_standby_feedback=OFF
max_standby_streaming_delay=10 secmax_standby_streaming_delay=10 secmax_standby_streaming_delay=-1 (Infinite)
statement_timeout = "2hrs"statement_timeout="2hrs"No statement_timeout i.e. infinite
idle_in_transaction_session_timeout=10minutesidle_in_transaction_session_timeout=10minutesNo idle_in_transaction_session_timeout i.e. infinite
autovacuum_freeze_max_age=100Mautovacuum_freeze_max_age=100Mautovacuum_freeze_max_age=100M
Log_autovacuum_min_duration=0Log_autovacuum_min_duration=0Log_autovacuum_min_duration=0
 

Re: Long running query causing XID limit breach

From
Laurenz Albe
Date:
On Wed, 2024-05-29 at 01:34 +0530, yudhi s wrote:
> > The only way you can have no delay in replication AND no canceled queries is
> > if you use two different standby servers with different settings for
> > "max_standby_streaming_delay".  One of the server is for HA, the other for
> > your long-running queries.
>
> When you suggest having different max_standby_streaming_delay for first replica
> (say 10 sec for High availability) and second replica(say -1 for long running queries).
> Do you also suggest  keeping "hot_feedback_standby" as "OFF" for all the three
> instances i.e. master and both the replicas?

The parameter is ignored on the master.
It needs to be off on the standby that is running long queries.
For the other standby it probably doesn't matter if you are not running any
queries on it.  I would leave "hot_standby_feedback = off" there as well.

Actually, I would set "hot_standby = off" on the standby that is only used
for HA.

> Also OP has added a few other parameters as below, do you think these should be needed? 
>  I think the master and first replica should have the same set up because in case
> of any disaster to master the first replica should be able to take the place of master.
>  
> Master/Primary                First Replica/Standby for High Availability    Second Replica for Reporting
> hot_standby_feedback=ON            hot_standby_feedback=ON                hot_standby_feedback=OFF
> max_standby_streaming_delay=10 sec        max_standby_streaming_delay=10 sec        max_standby_streaming_delay=-1
(Infinite)
> statement_timeout = "2hrs"            statement_timeout="2hrs"            No statement_timeout i.e. infinite
> idle_in_transaction_session_timeout=10minutes    idle_in_transaction_session_timeout=10minutes    No
idle_in_transaction_session_timeouti.e. infinite 
> autovacuum_freeze_max_age=100M        autovacuum_freeze_max_age=100M            autovacuum_freeze_max_age=100M
> Log_autovacuum_min_duration=0            Log_autovacuum_min_duration=0            Log_autovacuum_min_duration=0

- I would leave "hot_standby_feedback" off everywhere.
- "max_standby_streaming_delay" should be -1 on the reporting standby and very
  low or 0 on the HA standby. It doesn't matter on the primary.
- "statement_timeout" should be way lower on the first two nodes.
- "idle_in_transaction_session_timeout" is good.
- I would leave "autovacuum_freeze_max_age" at the default setting but 100 million
  is ok too.

Yours,
Laurenz Albe



Re: Long running query causing XID limit breach

From
sud
Date:
Hello Laurenz,

Thank you so much.This information was really helpful for us understanding the working of these parameters.

One follow up question i have , as we are setting one of the standby/replica with value idle_in_transaction_session_timeout=-1 which can cause the WAL's to be heavily backlogged in a scenario where we have a query running for very long time on that instance. So in that case will there be chances of instance restart and if that can be avoided anyway?

And the plan is to set these system parameters with different values in writer/read replica , so in that case if we apply the "alter system" command on the primary , won't the WAL going to apply those same commands forcibly on reader instance making those same as the writer instance configuration( but we want the reader replica configuration to be different from writer)? 

Appreciate your guidance.

On Wed, May 29, 2024 at 1:38 PM Laurenz Albe <laurenz.albe@cybertec.at> wrote:
On Wed, 2024-05-29 at 01:34 +0530, yudhi s wrote:
> > The only way you can have no delay in replication AND no canceled queries is
> > if you use two different standby servers with different settings for
> > "max_standby_streaming_delay".  One of the server is for HA, the other for
> > your long-running queries.
>
> When you suggest having different max_standby_streaming_delay for first replica
> (say 10 sec for High availability) and second replica(say -1 for long running queries).
> Do you also suggest  keeping "hot_feedback_standby" as "OFF" for all the three
> instances i.e. master and both the replicas?

The parameter is ignored on the master.
It needs to be off on the standby that is running long queries.
For the other standby it probably doesn't matter if you are not running any
queries on it.  I would leave "hot_standby_feedback = off" there as well.

Actually, I would set "hot_standby = off" on the standby that is only used
for HA.


- I would leave "hot_standby_feedback" off everywhere.
- "max_standby_streaming_delay" should be -1 on the reporting standby and very
  low or 0 on the HA standby. It doesn't matter on the primary.
- "statement_timeout" should be way lower on the first two nodes.
- "idle_in_transaction_session_timeout" is good.
- I would leave "autovacuum_freeze_max_age" at the default setting but 100 million
  is ok too.

Yours,
Laurenz Albe

Re: Long running query causing XID limit breach

From
Simon Elbaz
Date:
Hi,

I am following this very interesting thread.

From the documentation https://www.postgresql.org/docs/current/runtime-config-client.html#GUC-IDLE-IN-TRANSACTION-SESSION-TIMEOUT, the 0 value will disable the timeout (not -1).



On Wed, Jun 5, 2024 at 8:25 AM sud <suds1434@gmail.com> wrote:
Hello Laurenz,

Thank you so much.This information was really helpful for us understanding the working of these parameters.

One follow up question i have , as we are setting one of the standby/replica with value idle_in_transaction_session_timeout=-1 which can cause the WAL's to be heavily backlogged in a scenario where we have a query running for very long time on that instance. So in that case will there be chances of instance restart and if that can be avoided anyway?

And the plan is to set these system parameters with different values in writer/read replica , so in that case if we apply the "alter system" command on the primary , won't the WAL going to apply those same commands forcibly on reader instance making those same as the writer instance configuration( but we want the reader replica configuration to be different from writer)? 

Appreciate your guidance.

On Wed, May 29, 2024 at 1:38 PM Laurenz Albe <laurenz.albe@cybertec.at> wrote:
On Wed, 2024-05-29 at 01:34 +0530, yudhi s wrote:
> > The only way you can have no delay in replication AND no canceled queries is
> > if you use two different standby servers with different settings for
> > "max_standby_streaming_delay".  One of the server is for HA, the other for
> > your long-running queries.
>
> When you suggest having different max_standby_streaming_delay for first replica
> (say 10 sec for High availability) and second replica(say -1 for long running queries).
> Do you also suggest  keeping "hot_feedback_standby" as "OFF" for all the three
> instances i.e. master and both the replicas?

The parameter is ignored on the master.
It needs to be off on the standby that is running long queries.
For the other standby it probably doesn't matter if you are not running any
queries on it.  I would leave "hot_standby_feedback = off" there as well.

Actually, I would set "hot_standby = off" on the standby that is only used
for HA.


- I would leave "hot_standby_feedback" off everywhere.
- "max_standby_streaming_delay" should be -1 on the reporting standby and very
  low or 0 on the HA standby. It doesn't matter on the primary.
- "statement_timeout" should be way lower on the first two nodes.
- "idle_in_transaction_session_timeout" is good.
- I would leave "autovacuum_freeze_max_age" at the default setting but 100 million
  is ok too.

Yours,
Laurenz Albe

Re: Long running query causing XID limit breach

From
sud
Date:
On Wed, 5 Jun, 2024, 12:39 pm Simon Elbaz, <elbazsimon9@gmail.com> wrote:
Hi,

I am following this very interesting thread.

From the documentation https://www.postgresql.org/docs/current/runtime-config-client.html#GUC-IDLE-IN-TRANSACTION-SESSION-TIMEOUT, the 0 value will disable the timeout (not -1).



On Wed, Jun 5, 2024 at 8:25 AM sud <suds1434@gmail.com> wrote:
Hello Laurenz,

Thank you so much.This information was really helpful for us understanding the working of these parameters.

One follow up question i have , as we are setting one of the standby/replica with value idle_in_transaction_session_timeout=-1 which can cause the WAL's to be heavily backlogged in a scenario where we have a query running for very long time on that instance. So in that case will there be chances of instance restart and if that can be avoided anyway?

And the plan is to set these system parameters with different values in writer/read replica , so in that case if we apply the "alter system" command on the primary , won't the WAL going to apply those same commands forcibly on reader instance making those same as the writer instance configuration( but we want the reader replica configuration to be different from writer)? 

Appreciate your guidance.
 

My apologies. I was meant to say setting up "max_standby_streaming_delay" To -1. Which means unlimited lag. 

Re: Long running query causing XID limit breach

From
Laurenz Albe
Date:
On Wed, 2024-06-05 at 13:09 +0530, sud wrote:
> > > One follow up question i have , as we are setting one of the standby/replica
> > > with value idle_in_transaction_session_timeout=-1 which can cause the WAL's
> > > to be heavily backlogged in a scenario where we have a query running for very
> > > long time on that instance. So in that case will there be chances of instance
> > > restart and if that can be avoided anyway?
>
> My apologies. I was meant to say setting up "max_standby_streaming_delay" To -1.
> Which means unlimited lag. 

There should never be a restart unless you perform one or the standby crashes.
If you mean that you want to avoid a crash caused by a full disk on the standby,
the answer is probably "no".  Make sure that you have enough disk space and
use monitoring.

Yours,
Laurenz Albe



Re: Long running query causing XID limit breach

From
yudhi s
Date:
On Wed, Jun 5, 2024 at 3:52 PM Laurenz Albe <laurenz.albe@cybertec.at> wrote:

There should never be a restart unless you perform one or the standby crashes.
If you mean that you want to avoid a crash caused by a full disk on the standby,
the answer is probably "no".  Make sure that you have enough disk space and
use monitoring.

Yours,
Laurenz Albe

Is this because OP initially mentioned its RDS postgres, so in that case there is storage space restriction on 64TB(and 128TB in case of aurora postgres). So I believe this storage space combines data + WAL , so in that case as you mentioned, appropriate monitoring needs to be put in place. 
Or else in the worst case scenario, if the storage consumption hit that hard limit , then there will be instance restart or crash?

Re: Long running query causing XID limit breach

From
sud
Date:

On Thu, Jun 6, 2024 at 12:52 AM yudhi s <learnerdatabase99@gmail.com> wrote:
On Wed, Jun 5, 2024 at 3:52 PM Laurenz Albe <laurenz.albe@cybertec.at> wrote:

There should never be a restart unless you perform one or the standby crashes.
If you mean that you want to avoid a crash caused by a full disk on the standby,
the answer is probably "no".  Make sure that you have enough disk space and
use monitoring.

Yours,
Laurenz Albe

Is this because OP initially mentioned its RDS postgres, so in that case there is storage space restriction on 64TB(and 128TB in case of aurora postgres). So I believe this storage space combines data + WAL , so in that case as you mentioned, appropriate monitoring needs to be put in place. 
Or else in the worst case scenario, if the storage consumption hit that hard limit , then there will be instance restart or crash?

Thank You so much Laurenz and Yudhi.

Yes its RDS and as you mentioned there does exist a space limitation of ~64TB but as Laurenz mentioned the only time the second standby may crash would be probably because of  the storage space saturation and thus we need to have appropriate monitoring in place to find this and get alerted beforehand. And also a monitoring to see how much WAL gets generated per hour/day to get an idea of the usage. I am not sure how to do it , but will check on this.

Re: Long running query causing XID limit breach

From
yudhi s
Date:

On Sat, Jun 8, 2024 at 2:51 PM sud <suds1434@gmail.com> wrote:

Thank You so much Laurenz and Yudhi.

Yes its RDS and as you mentioned there does exist a space limitation of ~64TB but as Laurenz mentioned the only time the second standby may crash would be probably because of  the storage space saturation and thus we need to have appropriate monitoring in place to find this and get alerted beforehand. And also a monitoring to see how much WAL gets generated per hour/day to get an idea of the usage. I am not sure how to do it , but will check on this.

 
Not exactly related but just for our information, While going through the "aurora postgres" database docs in regards to similar concepts which are getting discussed here, I am finding some interesting stuff.


Cancel the conflicting query on the reader node if the conflict lasts longer than max_standby_streaming_delay (maximum 30 seconds). This is different from Amazon RDS or self-managed PostgreSQL. With Amazon RDS or self-managed PostgreSQL, the instance has its own physical copy of the database, and you’re able to set the parameter max_standby_streaming_delay as high as you want to prevent query cancellation.

If the conflicting query can’t cancel in time, or if multiple long-running queries are causing the replication lag to go beyond 60 seconds, Aurora restarts the reader node to ensure it’s not lagging far behind the primary node.

So if i get it correct it means, even if hot_standby_feedback is set to OFF, the constraints of max_standby_streaming_delay (30 seconds) and the 60-second replication lag limit applies. And thus Aurora may cancel long-running queries or restart reader nodes to maintain synchronization even if it just runs for >60seconds.  So it's really odd but does that mean , by no way you can guarantee a query to run >60 seconds on read replica in aurora postgres?