Thread: 9.6.11- could not truncate directory "pg_serial": apparent wraparound
Hi,
PG: 9.6.11
OS: CentOS
Env: AWS EC2
I've faced the following exceptions in PostgreSQL server log:
> could not truncate directory "pg_serial": apparent wraparound
Sometimes it repeats every 5 min and the longest period was 40 min.
In fact, I can't find any suspicious events happening that periods.
pg_wait_sampling didn't catch any events, no long queries (more than 60s), Autovacuum workers or transactions in "idle in transaction" state were in action at this time.
The only related I could find in archive is: https://www.postgresql.org/message-id/CACjxUsON4Vya3a6r%3DubwmN-4qTDDfZjuwSzjnL1QjdUc8_gzLw%40mail.gmail.com
>You should not see the errors you are reporting nor
>the warning I mentioned unless a serializable transaction remains
>active long enough for about 1 billion transaction IDs to be
>consumed.
Database age now is just 18.5 millions of transactions.
Server has two standbys (sync and async), hot_standby_feedback is off.
Please advice what I can do to find a reason of these exceptions.
Re: 9.6.11- could not truncate directory "pg_serial": apparent wraparound
From
Pavel Suderevsky
Date:
Guys, still need your help.
Previous night:
2019-04-05 00:35:04 UTC LOG: could not truncate directory "pg_serial": apparent wraparound
2019-04-05 00:40:04 UTC LOG: could not truncate directory "pg_serial": apparent wraparound
(2 checkpoints)
It turned that I have some problem with performance related to predicate locking on this platform.
A lot of long prepared statements with the SerializableXactHashLock and predicate_lock_manager wait_events followed by high CPU usage happened during 00:30 and 00:45. During this period there were 55k pred locks granted at max and 30k in average. Probably because of high CPU usage some statements were spending a lot of time in bind/parse steps.
Probably if you advise me what could cause "pg_serial": apparent wraparound messages I would have more chances to handle all the performance issues.
Thank you!
--
Pavel Suderevsky
пн, 11 мар. 2019 г. в 19:09, Pavel Suderevsky <psuderevsky@gmail.com>:
Hi,PG: 9.6.11OS: CentOSEnv: AWS EC2I've faced the following exceptions in PostgreSQL server log:> could not truncate directory "pg_serial": apparent wraparoundSometimes it repeats every 5 min and the longest period was 40 min.In fact, I can't find any suspicious events happening that periods.pg_wait_sampling didn't catch any events, no long queries (more than 60s), Autovacuum workers or transactions in "idle in transaction" state were in action at this time.The only related I could find in archive is: https://www.postgresql.org/message-id/CACjxUsON4Vya3a6r%3DubwmN-4qTDDfZjuwSzjnL1QjdUc8_gzLw%40mail.gmail.com>You should not see the errors you are reporting nor>the warning I mentioned unless a serializable transaction remains>active long enough for about 1 billion transaction IDs to be>consumed.Database age now is just 18.5 millions of transactions.Server has two standbys (sync and async), hot_standby_feedback is off.Please advice what I can do to find a reason of these exceptions.
On Sun, Apr 7, 2019 at 2:31 AM Pavel Suderevsky <psuderevsky@gmail.com> wrote: > Probably if you advise me what could cause "pg_serial": apparent wraparound messages I would have more chances to handleall the performance issues. 9.6 has this code: /* * Give a warning if we're about to run out of SLRU pages. * * slru.c has a maximum of 64k segments, with 32 (SLRU_PAGES_PER_SEGMENT) * pages each. We need to store a 64-bit integer for each Xid, and with * default 8k block size, 65536*32 pages is only enough to cover 2^30 * XIDs. If we're about to hit that limit and wrap around, warn the user. * * To avoid spamming the user, we only give one warning when we've used 1 * billion XIDs, and stay silent until the situation is fixed and the * number of XIDs used falls below 800 million again. * * XXX: We have no safeguard to actually *prevent* the wrap-around, * though. All you get is a warning. */ if (oldSerXidControl->warningIssued) { TransactionId lowWatermark; lowWatermark = tailXid + 800000000; if (lowWatermark < FirstNormalTransactionId) lowWatermark = FirstNormalTransactionId; if (TransactionIdPrecedes(xid, lowWatermark)) oldSerXidControl->warningIssued = false; } else { TransactionId highWatermark; highWatermark = tailXid + 1000000000; if (highWatermark < FirstNormalTransactionId) highWatermark = FirstNormalTransactionId; if (TransactionIdFollows(xid, highWatermark)) { oldSerXidControl->warningIssued = true; ereport(WARNING, (errmsg("memory for serializable conflict tracking is nearly exhausted"), errhint("There might be an idle transaction or a forgotten prepared transaction causing this."))); } } Did you see that warning at some point before the later error? I think if you saw that warning, and then later the error you reported, it's probably just being prudent and avoiding the truncation because it detects a potential wraparound. If it actually does wrap around, then there is a potential for OldSerXidGetMinConflictCommitSeqNo() to report a too-recent minimum conflict CSN for a given XID, and I'm not sure what consequence that would have (without drinking a lot more coffee), but potentially some kind of incorrect answer. On server restart the problem fixes itself because pg_serial is only used to spill state relating to transactions running in this server lifetime. I wonder if this condition required you to have a serializable transaction running (or prepared) while you consume 2^30 AKA ~1 billion xids. I think it is unreachable in v11+ because commit e5eb4fa8 allowed for more SLRU pages to avoid this artificially early wrap. Gee, it'd be nice to use FullTransactionId for SERIALIZABLEXACT and pg_serial in v13 and not to even have to think about wraparound here. It's more doable here than elsewhere because the data on disk isn't persistent across server restart, let alone pg_upgrade. Let's see... each segment file is 256kb and we need to be able to address 2^64 * sizeof(SerCommitSequenceNumber), so you'd have segment files numbered from 0 up to 1ffffffffffff (so you'd need slru.c to support 13 char segment names and 64 bit segment numbers, whereas it currently has a limit of 6 in SlruScanDirectory and uses int for segment number). You'd be addressing them by FullTransactionId, but that's just the index used to find entries -- the actual amount of data stored wouldn't change, you'd just start seeing wider filenames, and all the fragile modulo comparison truncation stuff would disappear from the tree. -- Thomas Munro https://enterprisedb.com
On Tue, Apr 9, 2019 at 12:14 PM Thomas Munro <thomas.munro@gmail.com> wrote: > It's more doable here than elsewhere because the data on disk isn't > persistent across server restart, let alone pg_upgrade. Let's see... > each segment file is 256kb and we need to be able to address 2^64 * > sizeof(SerCommitSequenceNumber), so you'd have segment files numbered > from 0 up to 1ffffffffffff (so you'd need slru.c to support 13 char > segment names and 64 bit segment numbers, whereas it currently has a > limit of 6 in SlruScanDirectory and uses int for segment number). Come to think of it, even for the persistent ones, pg_upgrade could rename them to a new scheme anyway as long as the segment size remains the same. I'd be inclined to ditch the current segment numbering scheme and switch to one where the FullTransactionId of the first entry in the segment is used for its name, so that admins can more easily understand what the files under there correspond to. -- Thomas Munro https://enterprisedb.com
Re: 9.6.11- could not truncate directory "pg_serial": apparent wraparound
From
Pavel Suderevsky
Date:
On Sun, Apr 7, 2019 at 2:31 AM Pavel Suderevsky <psuderevsky@gmail.com> wrote:Thomas,
> Probably if you advise me what could cause "pg_serial": apparent wraparound messages I would have more chances to handle all the performance issues.
Did you see that warning at some point before the later error?
Thank you for your reply!
No, there have never been such warnings.
I wonder if this condition required you to have a serializable
transaction running (or prepared) while you consume 2^30 AKA ~1
billion xids. I think it is unreachable in v11+ because commit
e5eb4fa8 allowed for more SLRU pages to avoid this artificially early
wrap.
Do I understand right that this is about Virtual txids? Have no idea how even a something close to a billion of transaction ids could be consumed on this system.
--
Pavel Suderevsky
Re: 9.6.11- could not truncate directory "pg_serial": apparent wraparound
From
Pavel Suderevsky
Date:
Hi,
Got this issue again.
Settings on the platform (PG 9.6.11):
max_pred_locks_per_transaction = 3000
max_connections = 800
Despite the fact that documentation says:
> with the exception of fast-path locks, each lock manager will deliver a consistent set of results
I've noticed the following:
1. pg_locks showed 2 million SIReadLocks for the pid that has been in the "idle" state for a dozen of seconds already.
2. pg_locks showed count of 5 million SIReadLock granted although I expected a limit of 2.4 million SIReadLocks with settings provided above.
And still no any signs of serializable transactions that could live for a 1 billion xids.
--
Pavel Suderevsky
E: psuderevsky(at)gmail(dot)com
Pavel Suderevsky
E: psuderevsky(at)gmail(dot)com
вт, 9 апр. 2019 г. в 16:00, Pavel Suderevsky <psuderevsky@gmail.com>:
On Sun, Apr 7, 2019 at 2:31 AM Pavel Suderevsky <psuderevsky@gmail.com> wrote:Thomas,
> Probably if you advise me what could cause "pg_serial": apparent wraparound messages I would have more chances to handle all the performance issues.
Did you see that warning at some point before the later error?Thank you for your reply!No, there have never been such warnings.I wonder if this condition required you to have a serializable
transaction running (or prepared) while you consume 2^30 AKA ~1
billion xids. I think it is unreachable in v11+ because commit
e5eb4fa8 allowed for more SLRU pages to avoid this artificially early
wrap.Do I understand right that this is about Virtual txids? Have no idea how even a something close to a billion of transaction ids could be consumed on this system.--Pavel Suderevsky