Thread: postgresql transaction id monitoring with nagios
Ever since I started working with PostgreSQL I've heard the need to watch transaction IDs. The phrase "transaction ID wraparound" still gives me a shiver. Attached it a short script that works with the monitoring system Nagios to keep an eye on transaction IDs. It should be easy to adapt to any other monitoring system. It runs the textbook query below and reports how close you are to wraparound. SELECT datname, age(datfrozenxid) FROM pg_database; The script detects a wrap at 2 billion. It starts warning once one or more databases show an age over 1 billion transactions. It reports critical at 1.5B transactions. I hope everyone out there is vacuuming *all* databases often. Hope some of you can use this script! Tony Wasson
Attachment
On May 2, 2006, at 2:26 PM, Tony Wasson wrote: > The script detects a wrap at 2 billion. It starts warning once one or > more databases show an age over 1 billion transactions. It reports > critical at 1.5B transactions. I hope everyone out there is vacuuming > *all* databases often. Something seems wrong... I just ran your script against my development database server which is vacuumed daily and it said I was 53% of the way to 2B. Seemed strange to me, so I re-ran "vacuum -a - z" to vacuum all databases (as superuser), reran the script and got the same answer.
Attachment
Vivek Khera wrote: > > On May 2, 2006, at 2:26 PM, Tony Wasson wrote: > > >The script detects a wrap at 2 billion. It starts warning once one or > >more databases show an age over 1 billion transactions. It reports > >critical at 1.5B transactions. I hope everyone out there is vacuuming > >*all* databases often. > > Something seems wrong... I just ran your script against my > development database server which is vacuumed daily and it said I was > 53% of the way to 2B. Seemed strange to me, so I re-ran "vacuum -a - > z" to vacuum all databases (as superuser), reran the script and got > the same answer. That's right, because a database's age is only decremented in database-wide vacuums. (Wow, who wouldn't want a person-wide vacuum if it did the same thing ...) -- Alvaro Herrera http://www.CommandPrompt.com/ PostgreSQL Replication, Consulting, Custom Development, 24x7 support
On 5/2/06, Vivek Khera <vivek@khera.org> wrote: > > On May 2, 2006, at 2:26 PM, Tony Wasson wrote: > > > The script detects a wrap at 2 billion. It starts warning once one or > > more databases show an age over 1 billion transactions. It reports > > critical at 1.5B transactions. I hope everyone out there is vacuuming > > *all* databases often. > > Something seems wrong... I just ran your script against my > development database server which is vacuumed daily and it said I was > 53% of the way to 2B. Seemed strange to me, so I re-ran "vacuum -a - > z" to vacuum all databases (as superuser), reran the script and got > the same answer. Ah thanks, it's a bug in my understanding of the thresholds. "With the standard freezing policy, the age column will start at one billion for a freshly-vacuumed database." So essentially, 1B is normal, 2B is the max. The logic is now.. The script detects a wrap at 2 billion. It starts warning once one or more databases show an age over 1.5 billion transactions. It reports critical at 1.75B transactions. If anyone else understands differently, hit me with a clue bat.
Attachment
Alvaro Herrera wrote: > Vivek Khera wrote: > > > > On May 2, 2006, at 2:26 PM, Tony Wasson wrote: > > > > >The script detects a wrap at 2 billion. It starts warning once one or > > >more databases show an age over 1 billion transactions. It reports > > >critical at 1.5B transactions. I hope everyone out there is vacuuming > > >*all* databases often. > > > > Something seems wrong... I just ran your script against my > > development database server which is vacuumed daily and it said I was > > 53% of the way to 2B. Seemed strange to me, so I re-ran "vacuum -a - > > z" to vacuum all databases (as superuser), reran the script and got > > the same answer. > > That's right, because a database's age is only decremented in > database-wide vacuums. Forget it ... I must be blind ... -- Alvaro Herrera http://www.CommandPrompt.com/ The PostgreSQL Company - Command Prompt, Inc.
On May 2, 2006, at 3:03 PM, Alvaro Herrera wrote: >> Something seems wrong... I just ran your script against my >> development database server which is vacuumed daily and it said I was >> 53% of the way to 2B. Seemed strange to me, so I re-ran "vacuum -a - >> z" to vacuum all databases (as superuser), reran the script and got >> the same answer. > > That's right, because a database's age is only decremented in > database-wide vacuums. (Wow, who wouldn't want a person-wide > vacuum if > it did the same thing ...) and what exactly is "vacuumdb -a -z" doing besides a database wide vacuum?
Attachment
On Tue, May 02, 2006 at 12:06:30 -0700, Tony Wasson <ajwasson@gmail.com> wrote: > > Ah thanks, it's a bug in my understanding of the thresholds. > > "With the standard freezing policy, the age column will start at one > billion for a freshly-vacuumed database." > > So essentially, 1B is normal, 2B is the max. The logic is now.. > > The script detects a wrap at 2 billion. It starts warning once one or > more databases show an age over 1.5 billion transactions. It reports > critical at 1.75B transactions. > > If anyone else understands differently, hit me with a clue bat. Isn't this obsolete now anyway? I am pretty sure 8.1 has safeguards against wrap around.
On 5/2/06, Bruno Wolff III <bruno@wolff.to> wrote: > On Tue, May 02, 2006 at 12:06:30 -0700, > Tony Wasson <ajwasson@gmail.com> wrote: > > > > Ah thanks, it's a bug in my understanding of the thresholds. > > > > "With the standard freezing policy, the age column will start at one > > billion for a freshly-vacuumed database." > > > > So essentially, 1B is normal, 2B is the max. The logic is now.. > > > > The script detects a wrap at 2 billion. It starts warning once one or > > more databases show an age over 1.5 billion transactions. It reports > > critical at 1.75B transactions. > > > > If anyone else understands differently, hit me with a clue bat. > > Isn't this obsolete now anyway? I am pretty sure 8.1 has safeguards against > wrap around. My motivation was primarily to monitor some existing PostgreSQL 8.0 servers. I'm not convinced it is "safe" to stop worrying about transaction ids even on an 8.1 box. It is comforting that 8.1 does safeguard against wraparound in at least 2 ways. First, it emits a warnings during the last 10 million transactions. If you manage to ignore all those, posgresql will shut down before a wraparound. I think PostgreSQL does everything correctly there, but I suspect someone will run into the shut down daemon problem.
On Tue, May 02, 2006 at 03:03:40PM -0400, Alvaro Herrera wrote: > That's right, because a database's age is only decremented in > database-wide vacuums. (Wow, who wouldn't want a person-wide vacuum if > it did the same thing ...) The heck with age, I'd take a person-wide vacuum if it just got rid of all my 'dead rows'... -- Jim C. Nasby, Sr. Engineering Consultant jnasby@pervasive.com Pervasive Software http://pervasive.com work: 512-231-6117 vcard: http://jim.nasby.net/pervasive.vcf cell: 512-569-9461
On Tue, May 02, 2006 at 12:06:30PM -0700, Tony Wasson wrote: > Ah thanks, it's a bug in my understanding of the thresholds. > > "With the standard freezing policy, the age column will start at one > billion for a freshly-vacuumed database." > > So essentially, 1B is normal, 2B is the max. The logic is now.. > > The script detects a wrap at 2 billion. It starts warning once one or > more databases show an age over 1.5 billion transactions. It reports > critical at 1.75B transactions. > > If anyone else understands differently, hit me with a clue bat. You should take a look at the code in -HEAD that triggers autovacuum to do a XID-wrap-prevention vacuum, as well as the code that warns that we're approaching wrap. From memory, the limit for the later is max_transactions << 3 Where max_transactions should be 4B on most platforms. I'm intending to submit a patch to clean some of that code up (put all the thresholds in one .h file rather than how they're spread through source code right now); if you drop me an email off-list I'll send you info once I do that. -- Jim C. Nasby, Sr. Engineering Consultant jnasby@pervasive.com Pervasive Software http://pervasive.com work: 512-231-6117 vcard: http://jim.nasby.net/pervasive.vcf cell: 512-569-9461