Thread: postgresql transaction id monitoring with nagios

postgresql transaction id monitoring with nagios

From
"Tony Wasson"
Date:
Ever since I started working with PostgreSQL I've heard the need to
watch transaction IDs. The phrase "transaction ID wraparound" still
gives me a shiver. Attached it a short script that works with the
monitoring system Nagios to keep an eye on transaction IDs. It should
be easy to adapt to any other monitoring system.

It runs the textbook query below and reports how close you are to wraparound.
          SELECT datname, age(datfrozenxid) FROM pg_database;

The script detects a wrap at 2 billion. It starts warning once one or
more databases show an age over 1 billion transactions. It reports
critical at 1.5B transactions. I hope everyone out there is vacuuming
*all* databases often.

Hope some of you can use this script!
Tony Wasson

Attachment

Re: postgresql transaction id monitoring with nagios

From
Vivek Khera
Date:
On May 2, 2006, at 2:26 PM, Tony Wasson wrote:

> The script detects a wrap at 2 billion. It starts warning once one or
> more databases show an age over 1 billion transactions. It reports
> critical at 1.5B transactions. I hope everyone out there is vacuuming
> *all* databases often.

Something seems wrong... I just ran your script against my
development database server which is vacuumed daily and it said I was
53% of the way to 2B.  Seemed strange to me, so I re-ran "vacuum -a -
z" to vacuum all databases (as superuser), reran the script and got
the same answer.



Attachment

Re: postgresql transaction id monitoring with nagios

From
Alvaro Herrera
Date:
Vivek Khera wrote:
>
> On May 2, 2006, at 2:26 PM, Tony Wasson wrote:
>
> >The script detects a wrap at 2 billion. It starts warning once one or
> >more databases show an age over 1 billion transactions. It reports
> >critical at 1.5B transactions. I hope everyone out there is vacuuming
> >*all* databases often.
>
> Something seems wrong... I just ran your script against my
> development database server which is vacuumed daily and it said I was
> 53% of the way to 2B.  Seemed strange to me, so I re-ran "vacuum -a -
> z" to vacuum all databases (as superuser), reran the script and got
> the same answer.

That's right, because a database's age is only decremented in
database-wide vacuums.  (Wow, who wouldn't want a person-wide vacuum if
it did the same thing ...)

--
Alvaro Herrera                                http://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

Re: postgresql transaction id monitoring with nagios

From
"Tony Wasson"
Date:
On 5/2/06, Vivek Khera <vivek@khera.org> wrote:
>
> On May 2, 2006, at 2:26 PM, Tony Wasson wrote:
>
> > The script detects a wrap at 2 billion. It starts warning once one or
> > more databases show an age over 1 billion transactions. It reports
> > critical at 1.5B transactions. I hope everyone out there is vacuuming
> > *all* databases often.
>
> Something seems wrong... I just ran your script against my
> development database server which is vacuumed daily and it said I was
> 53% of the way to 2B.  Seemed strange to me, so I re-ran "vacuum -a -
> z" to vacuum all databases (as superuser), reran the script and got
> the same answer.

Ah thanks, it's a bug in my understanding of the thresholds.

"With the standard freezing policy, the age column will start at one
billion for a freshly-vacuumed database."

So essentially, 1B is normal, 2B is the max. The logic is now..

The script detects a wrap at 2 billion. It starts warning once one or
more databases show an age over 1.5 billion transactions. It reports
critical at 1.75B transactions.

If anyone else understands differently, hit me with a clue bat.

Attachment

Re: postgresql transaction id monitoring with nagios

From
Alvaro Herrera
Date:
Alvaro Herrera wrote:
> Vivek Khera wrote:
> >
> > On May 2, 2006, at 2:26 PM, Tony Wasson wrote:
> >
> > >The script detects a wrap at 2 billion. It starts warning once one or
> > >more databases show an age over 1 billion transactions. It reports
> > >critical at 1.5B transactions. I hope everyone out there is vacuuming
> > >*all* databases often.
> >
> > Something seems wrong... I just ran your script against my
> > development database server which is vacuumed daily and it said I was
> > 53% of the way to 2B.  Seemed strange to me, so I re-ran "vacuum -a -
> > z" to vacuum all databases (as superuser), reran the script and got
> > the same answer.
>
> That's right, because a database's age is only decremented in
> database-wide vacuums.

Forget it ... I must be blind ...

--
Alvaro Herrera                                http://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.

Re: postgresql transaction id monitoring with nagios

From
Vivek Khera
Date:
On May 2, 2006, at 3:03 PM, Alvaro Herrera wrote:

>> Something seems wrong... I just ran your script against my
>> development database server which is vacuumed daily and it said I was
>> 53% of the way to 2B.  Seemed strange to me, so I re-ran "vacuum -a -
>> z" to vacuum all databases (as superuser), reran the script and got
>> the same answer.
>
> That's right, because a database's age is only decremented in
> database-wide vacuums.  (Wow, who wouldn't want a person-wide
> vacuum if
> it did the same thing ...)

and what exactly is "vacuumdb -a -z" doing besides a database wide
vacuum?


Attachment

Re: postgresql transaction id monitoring with nagios

From
Bruno Wolff III
Date:
On Tue, May 02, 2006 at 12:06:30 -0700,
  Tony Wasson <ajwasson@gmail.com> wrote:
>
> Ah thanks, it's a bug in my understanding of the thresholds.
>
> "With the standard freezing policy, the age column will start at one
> billion for a freshly-vacuumed database."
>
> So essentially, 1B is normal, 2B is the max. The logic is now..
>
> The script detects a wrap at 2 billion. It starts warning once one or
> more databases show an age over 1.5 billion transactions. It reports
> critical at 1.75B transactions.
>
> If anyone else understands differently, hit me with a clue bat.

Isn't this obsolete now anyway? I am pretty sure 8.1 has safeguards against
wrap around.

Re: postgresql transaction id monitoring with nagios

From
"Tony Wasson"
Date:
On 5/2/06, Bruno Wolff III <bruno@wolff.to> wrote:
> On Tue, May 02, 2006 at 12:06:30 -0700,
>   Tony Wasson <ajwasson@gmail.com> wrote:
> >
> > Ah thanks, it's a bug in my understanding of the thresholds.
> >
> > "With the standard freezing policy, the age column will start at one
> > billion for a freshly-vacuumed database."
> >
> > So essentially, 1B is normal, 2B is the max. The logic is now..
> >
> > The script detects a wrap at 2 billion. It starts warning once one or
> > more databases show an age over 1.5 billion transactions. It reports
> > critical at 1.75B transactions.
> >
> > If anyone else understands differently, hit me with a clue bat.
>
> Isn't this obsolete now anyway? I am pretty sure 8.1 has safeguards against
> wrap around.

My motivation was primarily to monitor some existing PostgreSQL 8.0
servers. I'm not convinced it is "safe" to stop worrying about
transaction ids even on an 8.1 box.

It is comforting that 8.1 does safeguard against wraparound in at
least 2 ways. First, it emits a warnings during the last 10 million
transactions. If you manage to ignore all those, posgresql will shut
down before a wraparound. I think PostgreSQL does everything correctly
there, but I suspect someone will run into the shut down daemon
problem.

Re: postgresql transaction id monitoring with nagios

From
"Jim C. Nasby"
Date:
On Tue, May 02, 2006 at 03:03:40PM -0400, Alvaro Herrera wrote:
> That's right, because a database's age is only decremented in
> database-wide vacuums.  (Wow, who wouldn't want a person-wide vacuum if
> it did the same thing ...)

The heck with age, I'd take a person-wide vacuum if it just got rid of
all my 'dead rows'...
--
Jim C. Nasby, Sr. Engineering Consultant      jnasby@pervasive.com
Pervasive Software      http://pervasive.com    work: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf       cell: 512-569-9461

Re: postgresql transaction id monitoring with nagios

From
"Jim C. Nasby"
Date:
On Tue, May 02, 2006 at 12:06:30PM -0700, Tony Wasson wrote:
> Ah thanks, it's a bug in my understanding of the thresholds.
>
> "With the standard freezing policy, the age column will start at one
> billion for a freshly-vacuumed database."
>
> So essentially, 1B is normal, 2B is the max. The logic is now..
>
> The script detects a wrap at 2 billion. It starts warning once one or
> more databases show an age over 1.5 billion transactions. It reports
> critical at 1.75B transactions.
>
> If anyone else understands differently, hit me with a clue bat.

You should take a look at the code in -HEAD that triggers autovacuum to
do a XID-wrap-prevention vacuum, as well as the code that warns that
we're approaching wrap. From memory, the limit for the later is

max_transactions << 3

Where max_transactions should be 4B on most platforms.

I'm intending to submit a patch to clean some of that code up (put all
the thresholds in one .h file rather than how they're spread through
source code right now); if you drop me an email off-list I'll send you
info once I do that.
--
Jim C. Nasby, Sr. Engineering Consultant      jnasby@pervasive.com
Pervasive Software      http://pervasive.com    work: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf       cell: 512-569-9461