Re: Proposal: Expose oldest xmin as SQL function for monitoring - Mailing list pgsql-hackers

From James Coleman
Subject Re: Proposal: Expose oldest xmin as SQL function for monitoring
Date
Msg-id CAAaqYe-AH6Tx55j99RQroc-8DiQOorBXiqwAywX0p6QrESYprw@mail.gmail.com
Whole thread Raw
In response to Re: Proposal: Expose oldest xmin as SQL function for monitoring  (Craig Ringer <craig@2ndquadrant.com>)
List pgsql-hackers
On Thu, Apr 2, 2020 at 12:13 AM Craig Ringer <craig@2ndquadrant.com> wrote:
>
>
>
>
> On Thu, 2 Apr 2020 at 07:57, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>>
>> Alvaro Herrera <alvherre@2ndquadrant.com> writes:
>> > On 2020-Apr-01, Tom Lane wrote:
>> >> The fact that I had to use max(age(...)) in that sample query
>> >> hints at one reason: it's really hard to do arithmetic correctly
>> >> on raw XIDs.  Dealing with wraparound is a problem, and knowing
>> >> what's past or future is even harder.  What use-case do you
>> >> foresee exactly?
>>
>> > Maybe it would make sense to start exposing fullXids in these views and
>> > functions, for this reason.  There's no good reason to continue to
>> > expose bare Xids to userspace, we should use them only for storage.
>>
>> +1, that would help a lot.
>>
>> > But I think James' point is precisely that it's not easy to know where
>> > to look for things that keep Xmin from advancing.  Currently it's
>> > backends, replication slots, prepared transactions, and replicas with
>> > hot_standby_feedback.  If you forget to monitor just one of these, your
>> > vacuums might be useless and you won't notice until disaster strikes.
>>
>> Agreed, but just knowing what the oldest xmin is doesn't help you
>> find *where* it is.  Maybe what we need is a view showing all of
>> these potential sources of an old xmin.
>
>
>  Strongly agree.
>
> https://www.postgresql.org/message-id/CAMsr+YGSS6JBHmEHbxqMdc1XJ7sobDSq62YwaEkOHN-KBQYr-A@mail.gmail.com
>
> I was aiming to write such a view, but folks seemed opposed. I still think it'd be a very good thing to have built-in
asPg grows more complex.
 

Did you by any chance prototype anything/are you still interested?

This sounds extremely valuable to me, and while I don't want to
resurrect the old thread (it seemed like a bit of a tangent there
anyway), in my view this kind of basic diagnostic capability is
exactly the kind of thing that *has* to be in core, and then other
monitoring packages can take advantage of it.

Finding things holding back xmin from advancing is easily one of the
single biggest operational things we care about. We need to
investigate quickly when an issue occurs, so being able to do so
directly on the server (and having it be up-to-date with any new
features as they're released) is essential. And it's also one of the
areas where in my experience tracking things down is the hardest [with
capabilities in core]; you basically need to have this list in your
head of all of the things you need to check.

James



pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: Re: Berserk Autovacuum (let's save next Mandrill)
Next
From: Asif Rehman
Date:
Subject: Re: WIP/PoC for parallel backup