Re: Long running query causing XID limit breach - Mailing list pgsql-general

From sud
Subject Re: Long running query causing XID limit breach
Date
Msg-id CAD=mzVUzDD_XDa+BJ8fH96pEwgTG1NHUvaoyVePLhp3xEN=J9A@mail.gmail.com
Whole thread Raw
In response to Re: Long running query causing XID limit breach  (Torsten Förtsch <tfoertsch123@gmail.com>)
Responses Re: Long running query causing XID limit breach
List pgsql-general
On Sun, May 26, 2024 at 1:43 PM Torsten Förtsch <tfoertsch123@gmail.com> wrote:
On Sat, May 25, 2024 at 11:00 PM sud <suds1434@gmail.com> wrote:

But i have one question here , does max_standby_streaming_delay = 14 , means the queries on the standby will get cancelled after 14 seconds?

No, your query gets cancelled when it stalls replication for >14 sec. If your master is idle and does not send any WAL and the replica has caught up, the query can take as long as it wants. 

Thank you so much. 
For example , in below scenario,
if i have insert query going on on primary instance on table  25th may partition of TABLE1, and at same time we are selecting data from 24th May partition , then with "max_standby_streaming_delay = 14" setup , it just allows the select query to run for any duration without any restriction even if the WAL gets applied on the standby regularly. Also INSERT query in primary won't make the standby SELECT queries to cancel as because the WAL record of INSERT queries on the primary instance is not conflicting to the exact rows those were being read by the standby. Is my understanding correct here?

However, if i have Update/Delete query going on on primary instance on table  25th may partition of TABLE1 and on the exact same set of rows which were being read by the standby instance by the SELECT query, then the application of such WAL record to standby can max wait for 14 seconds and thus those select query are prone to be cancelled after 14 seconds. Is this understanding correct?

If the above is true then it doesn't look good, as because in an OLTP system there will be a lot of DMLS happening on the writer instances and there may be many queries running on the reader/standby instances which are meant to run for hours. And if we say making those SELECT queries run for hours means compromising an hour of "high availability"/RPO or a lag of an hour between primary and standby , that doesn't look good. Please correct me if I am missing something here.

pgsql-general by date:

Previous
From: Torsten Förtsch
Date:
Subject: Re: Long running query causing XID limit breach
Next
From: Victor Dobrovolsky
Date:
Subject: scalar plpgsql functions and their stability flags