Thread: Re: [COMMITTERS] pgsql: Hot Standby feedback for avoidance of cleanup conflicts on stand

On Thu, Feb 17, 2011 at 4:30 AM, Simon Riggs <simon@2ndquadrant.com> wrote:
> Hot Standby feedback for avoidance of cleanup conflicts on standby.
> Standby optionally sends back information about oldestXmin of queries
> which is then checked and applied to the WALSender's proc->xmin.
> GetOldestXmin() is modified slightly to agree with GetSnapshotData(),
> so that all backends on primary include WALSender within their snapshots.
> Note this does nothing to change the snapshot xmin on either master or
> standby. Feedback piggybacks on the standby reply message.
> vacuum_defer_cleanup_age is no longer used on standby, though parameter
> still exists on primary, since some use cases still exist.

I have another comments about this change.

Something like the following description should be in the doc.
    hot_standby_feedback has no effect if either hot_standby is off or    wal_receiver_status_interval is zero.

+    if (MyProc->xmin != newxmin)
+    {
+        LWLockAcquire(ProcArrayLock, LW_SHARED);
+        MyProc->xmin = newxmin;
+        LWLockRelease(ProcArrayLock);

ProcArrayLock should be taken with LW_EXCLUSIVE since the shared
variable is changed. No?

What about exposing the feedback xid and epoch in pg_stat_replication?
It's useful when we investigate which standby unexpectedly prevents
VACUUM on the primary.

It seems too aggressive to calculate the oldest xmin and return it for
each WAL write and flush on the standby. I think this because calculation
of the oldest xmin is not light operation especially when there are many
concurrent backends. How about feeding back the xmin only when the
interval has passed?

Regards,

-- 
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center


On Thu, 2011-02-17 at 13:38 +0900, Fujii Masao wrote:
> On Thu, Feb 17, 2011 at 4:30 AM, Simon Riggs <simon@2ndquadrant.com> wrote:
> > Hot Standby feedback for avoidance of cleanup conflicts on standby.
> > Standby optionally sends back information about oldestXmin of queries
> > which is then checked and applied to the WALSender's proc->xmin.
> > GetOldestXmin() is modified slightly to agree with GetSnapshotData(),
> > so that all backends on primary include WALSender within their snapshots.
> > Note this does nothing to change the snapshot xmin on either master or
> > standby. Feedback piggybacks on the standby reply message.
> > vacuum_defer_cleanup_age is no longer used on standby, though parameter
> > still exists on primary, since some use cases still exist.
> 
> I have another comments about this change.
> 
> Something like the following description should be in the doc.
> 
>      hot_standby_feedback has no effect if either hot_standby is off or
>      wal_receiver_status_interval is zero.

The docs are going to need some work after 3-4 related major changes hit
them. I'm not picking up on individual sentences right now.

> +    if (MyProc->xmin != newxmin)
> +    {
> +        LWLockAcquire(ProcArrayLock, LW_SHARED);
> +        MyProc->xmin = newxmin;
> +        LWLockRelease(ProcArrayLock);
> 
> ProcArrayLock should be taken with LW_EXCLUSIVE since the shared
> variable is changed. No?

No, shared is sufficient for setting xmin, as we do in
GetSnapshotData().

> What about exposing the feedback xid and epoch in pg_stat_replication?
> It's useful when we investigate which standby unexpectedly prevents
> VACUUM on the primary.

This begs the questions "what is the xmin of all the normal backends?"
and "Whats is the xmin of prepared transactions?" as well. I wasn't sure
that we should expose that information for walsenders when we don't do
it for everybody else. If we do it would require major sections in the
docs explaining it all, etc.. 

> It seems too aggressive to calculate the oldest xmin and return it for
> each WAL write and flush on the standby. I think this because calculation
> of the oldest xmin is not light operation especially when there are many
> concurrent backends. How about feeding back the xmin only when the
> interval has passed?

You may be correct. Some rearrangement following performance tuning is
likely, though I've tried not to pre-guess the tuning.

-- Simon Riggs           http://www.2ndQuadrant.com/books/PostgreSQL Development, 24x7 Support, Training and Services



On Thu, Feb 17, 2011 at 4:29 PM, Simon Riggs <simon@2ndquadrant.com> wrote:
>> Something like the following description should be in the doc.
>>
>>      hot_standby_feedback has no effect if either hot_standby is off or
>>      wal_receiver_status_interval is zero.
>
> The docs are going to need some work after 3-4 related major changes hit
> them. I'm not picking up on individual sentences right now.

OK.

>> +     if (MyProc->xmin != newxmin)
>> +     {
>> +             LWLockAcquire(ProcArrayLock, LW_SHARED);
>> +             MyProc->xmin = newxmin;
>> +             LWLockRelease(ProcArrayLock);
>>
>> ProcArrayLock should be taken with LW_EXCLUSIVE since the shared
>> variable is changed. No?
>
> No, shared is sufficient for setting xmin, as we do in
> GetSnapshotData().

Hmm.. it's because setting uint32 variable (i.e., xmin) is an atomic operation?
I'd like to know why LW_SHARED is sufficient in that case, just for future
reference.

>> What about exposing the feedback xid and epoch in pg_stat_replication?
>> It's useful when we investigate which standby unexpectedly prevents
>> VACUUM on the primary.
>
> This begs the questions "what is the xmin of all the normal backends?"
> and "Whats is the xmin of prepared transactions?" as well. I wasn't sure
> that we should expose that information for walsenders when we don't do
> it for everybody else. If we do it would require major sections in the
> docs explaining it all, etc..

We can *presume* which backend (or prepared transaction) unexpectedly
prevents VACUUM by seeing pg_stat_activity (or pg_prepared_xacts) and
checking whether there is long-running transaction. But there is no way to
presume which standby does that, I'm concerned.

>> It seems too aggressive to calculate the oldest xmin and return it for
>> each WAL write and flush on the standby. I think this because calculation
>> of the oldest xmin is not light operation especially when there are many
>> concurrent backends. How about feeding back the xmin only when the
>> interval has passed?
>
> You may be correct. Some rearrangement following performance tuning is
> likely, though I've tried not to pre-guess the tuning.

Are you planning to do that in beta phase or another?

Regards,

--
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center


On Fri, 2011-02-18 at 11:44 +0900, Fujii Masao wrote:
> > This begs the questions "what is the xmin of all the normal
> backends?"
> > and "Whats is the xmin of prepared transactions?" as well. I wasn't
> sure > that we should expose that information for walsenders when we
> don't do > it for everybody else. If we do it would require major
> sections in the > docs explaining it all, etc..
> 
> We can *presume* which backend (or prepared transaction) unexpectedly
> prevents VACUUM by seeing pg_stat_activity (or pg_prepared_xacts) and
> checking whether there is long-running transaction. But there is no
> way to presume which standby does that, I'm concerned.

Currently there is no information available to users about xmin.

It isn't any *harder* to work out standby xmins than it is to work out
prepared transaction xmins or other backend xmins. For example, we don't
show which transactions use serializable mode on pg_stat_activity and so
we might make the mistake of reading the last statement start time
rather than the last transaction start time.

I see it as a major piece of work to add xmin in *all* the right places
and to fully document how to use that information as a user. The debug
information at DEBUG2 is sufficient to show the code works and to debug
issues if needed.

I agree with you that it would be nice to have a "which thing is
bloating my tables" device, but I'm not assuming the responsibility to
create such a thing and fully document it, and definitely not Right Now.

-- Simon Riggs           http://www.2ndQuadrant.com/books/PostgreSQL Development, 24x7 Support, Training and Services