Thread: ERROR: multixact X from before cutoff Y found to be still running
Hi, Currently, if you hold a multixact open long enough to generate an "oldest multixact is far in the past" message during VACUUM, you may see the following ERROR: WARNING: oldest multixact is far in the past HINT: Close open transactions with multixacts soon to avoid wraparound problems. ERROR: multixact X from before cutoff Y found to be still running Upon further inspection, I found that this is because the multixact limit used in this case is the threshold for which we emit the "oldest multixact" message. Instead, I think the multixact limit should be set to the result of GetOldestMultiXactId(), effectively forcing a minimum freeze age of zero. The ERROR itself is emitted by FreezeMultiXactId() and appears to be a safeguard against problems like this. I've attached a patch to set the limit to the oldest multixact instead of the "safeMxactLimit" in this case. I'd like to credit Jeremy Schneider as the original reporter. Nathan
Attachment
On 9/4/19 17:37, Nathan Bossart wrote:
This was fun (sortof) - and a good part of the afternoon for Nathan, Nasby and myself today. A rather large PostgreSQL database with default autovacuum settings had a large table that started getting behind on Sunday. The server has a fairly large number of CPUs and a respectable workload. We realized today that with their XID generation they would go read-only to prevent wraparound tomorrow. (And perfectly healthy XID age on Sunday - that's wraparound in four days! Did I mention that I'm excited for the default limit GUC change in pg12?) To make matters more interesting, whenever we attempted to run a VACUUM command we encountered the ERROR message that Nate quoted on every single attempt! There was a momentary mild panic based on the "ERRCODE_DATA_CORRUPTED" message parameter in heapam.c FreezeMultiXactId() ... but as we looked closer we're now thinking there might just be an obscure bug in the code that sets vacuum limits.
Nathan and Nasby and myself have been chatting about this for quite awhile but the vacuum code isn't exactly the simplest thing in the world to reason about. :) Anyway, it looks to me like MultiXactMemberFreezeThreshold() is intended to progressively reduce the vacuum multixact limits across multiple vacuum runs on the same table, as pressure on the members space increases. I'm thinking there was just a small oversight in writing the formula where under the most aggressive circumstances, vacuum could actually be instructed to delete multixacts that are still in use by active transactions and trigger the failure we observed.
Nate put together an initial patch (attached to the previous email, which was sent only to the bugs list). We couldn't quite come to a consensus and on the best approach, but we decided that he'd kick of the thread and I'd throw out an alternative version of the patch that might be worth discussion. [Attached to this email.] Curious what others think!
-Jeremy
Hi, Currently, if you hold a multixact open long enough to generate an "oldest multixact is far in the past" message during VACUUM, you may see the following ERROR: WARNING: oldest multixact is far in the past HINT: Close open transactions with multixacts soon to avoid wraparound problems. ERROR: multixact X from before cutoff Y found to be still running Upon further inspection, I found that this is because the multixact limit used in this case is the threshold for which we emit the "oldest multixact" message. Instead, I think the multixact limit should be set to the result of GetOldestMultiXactId(), effectively forcing a minimum freeze age of zero. The ERROR itself is emitted by FreezeMultiXactId() and appears to be a safeguard against problems like this. I've attached a patch to set the limit to the oldest multixact instead of the "safeMxactLimit" in this case. I'd like to credit Jeremy Schneider as the original reporter.
This was fun (sortof) - and a good part of the afternoon for Nathan, Nasby and myself today. A rather large PostgreSQL database with default autovacuum settings had a large table that started getting behind on Sunday. The server has a fairly large number of CPUs and a respectable workload. We realized today that with their XID generation they would go read-only to prevent wraparound tomorrow. (And perfectly healthy XID age on Sunday - that's wraparound in four days! Did I mention that I'm excited for the default limit GUC change in pg12?) To make matters more interesting, whenever we attempted to run a VACUUM command we encountered the ERROR message that Nate quoted on every single attempt! There was a momentary mild panic based on the "ERRCODE_DATA_CORRUPTED" message parameter in heapam.c FreezeMultiXactId() ... but as we looked closer we're now thinking there might just be an obscure bug in the code that sets vacuum limits.
Nathan and Nasby and myself have been chatting about this for quite awhile but the vacuum code isn't exactly the simplest thing in the world to reason about. :) Anyway, it looks to me like MultiXactMemberFreezeThreshold() is intended to progressively reduce the vacuum multixact limits across multiple vacuum runs on the same table, as pressure on the members space increases. I'm thinking there was just a small oversight in writing the formula where under the most aggressive circumstances, vacuum could actually be instructed to delete multixacts that are still in use by active transactions and trigger the failure we observed.
Nate put together an initial patch (attached to the previous email, which was sent only to the bugs list). We couldn't quite come to a consensus and on the best approach, but we decided that he'd kick of the thread and I'd throw out an alternative version of the patch that might be worth discussion. [Attached to this email.] Curious what others think!
-Jeremy
-- Jeremy Schneider Database Engineer Amazon Web Services
Attachment
On 9/4/19 17:37, Nathan Bossart wrote:
This was fun (sortof) - and a good part of the afternoon for Nathan, Nasby and myself today. A rather large PostgreSQL database with default autovacuum settings had a large table that started getting behind on Sunday. The server has a fairly large number of CPUs and a respectable workload. We realized today that with their XID generation they would go read-only to prevent wraparound tomorrow. (And perfectly healthy XID age on Sunday - that's wraparound in four days! Did I mention that I'm excited for the default limit GUC change in pg12?) To make matters more interesting, whenever we attempted to run a VACUUM command we encountered the ERROR message that Nate quoted on every single attempt! There was a momentary mild panic based on the "ERRCODE_DATA_CORRUPTED" message parameter in heapam.c FreezeMultiXactId() ... but as we looked closer we're now thinking there might just be an obscure bug in the code that sets vacuum limits.
Nathan and Nasby and myself have been chatting about this for quite awhile but the vacuum code isn't exactly the simplest thing in the world to reason about. :) Anyway, it looks to me like MultiXactMemberFreezeThreshold() is intended to progressively reduce the vacuum multixact limits across multiple vacuum runs on the same table, as pressure on the members space increases. I'm thinking there was just a small oversight in writing the formula where under the most aggressive circumstances, vacuum could actually be instructed to delete multixacts that are still in use by active transactions and trigger the failure we observed.
Nate put together an initial patch (attached to the previous email, which was sent only to the bugs list). We couldn't quite come to a consensus and on the best approach, but we decided that he'd kick of the thread and I'd throw out an alternative version of the patch that might be worth discussion. [Attached to this email.] Curious what others think!
-Jeremy
Hi, Currently, if you hold a multixact open long enough to generate an "oldest multixact is far in the past" message during VACUUM, you may see the following ERROR: WARNING: oldest multixact is far in the past HINT: Close open transactions with multixacts soon to avoid wraparound problems. ERROR: multixact X from before cutoff Y found to be still running Upon further inspection, I found that this is because the multixact limit used in this case is the threshold for which we emit the "oldest multixact" message. Instead, I think the multixact limit should be set to the result of GetOldestMultiXactId(), effectively forcing a minimum freeze age of zero. The ERROR itself is emitted by FreezeMultiXactId() and appears to be a safeguard against problems like this. I've attached a patch to set the limit to the oldest multixact instead of the "safeMxactLimit" in this case. I'd like to credit Jeremy Schneider as the original reporter.
This was fun (sortof) - and a good part of the afternoon for Nathan, Nasby and myself today. A rather large PostgreSQL database with default autovacuum settings had a large table that started getting behind on Sunday. The server has a fairly large number of CPUs and a respectable workload. We realized today that with their XID generation they would go read-only to prevent wraparound tomorrow. (And perfectly healthy XID age on Sunday - that's wraparound in four days! Did I mention that I'm excited for the default limit GUC change in pg12?) To make matters more interesting, whenever we attempted to run a VACUUM command we encountered the ERROR message that Nate quoted on every single attempt! There was a momentary mild panic based on the "ERRCODE_DATA_CORRUPTED" message parameter in heapam.c FreezeMultiXactId() ... but as we looked closer we're now thinking there might just be an obscure bug in the code that sets vacuum limits.
Nathan and Nasby and myself have been chatting about this for quite awhile but the vacuum code isn't exactly the simplest thing in the world to reason about. :) Anyway, it looks to me like MultiXactMemberFreezeThreshold() is intended to progressively reduce the vacuum multixact limits across multiple vacuum runs on the same table, as pressure on the members space increases. I'm thinking there was just a small oversight in writing the formula where under the most aggressive circumstances, vacuum could actually be instructed to delete multixacts that are still in use by active transactions and trigger the failure we observed.
Nate put together an initial patch (attached to the previous email, which was sent only to the bugs list). We couldn't quite come to a consensus and on the best approach, but we decided that he'd kick of the thread and I'd throw out an alternative version of the patch that might be worth discussion. [Attached to this email.] Curious what others think!
-Jeremy
-- Jeremy Schneider Database Engineer Amazon Web Services
On Thu, Sep 5, 2019 at 1:01 PM Jeremy Schneider <schnjere@amazon.com> wrote: > On 9/4/19 17:37, Nathan Bossart wrote: > Currently, if you hold a multixact open long enough to generate an > "oldest multixact is far in the past" message during VACUUM, you may > see the following ERROR: > > WARNING: oldest multixact is far in the past > HINT: Close open transactions with multixacts soon to avoid wraparound problems. > ERROR: multixact X from before cutoff Y found to be still running > > Upon further inspection, I found that this is because the multixact > limit used in this case is the threshold for which we emit the "oldest > multixact" message. Instead, I think the multixact limit should be > set to the result of GetOldestMultiXactId(), effectively forcing a > minimum freeze age of zero. The ERROR itself is emitted by > FreezeMultiXactId() and appears to be a safeguard against problems > like this. > > I've attached a patch to set the limit to the oldest multixact instead > of the "safeMxactLimit" in this case. I'd like to credit Jeremy > Schneider as the original reporter. > > This was fun (sortof) - and a good part of the afternoon for Nathan, Nasby and myself today. A rather large PostgreSQLdatabase with default autovacuum settings had a large table that started getting behind on Sunday. The serverhas a fairly large number of CPUs and a respectable workload. We realized today that with their XID generation theywould go read-only to prevent wraparound tomorrow. (And perfectly healthy XID age on Sunday - that's wraparound in fourdays! Did I mention that I'm excited for the default limit GUC change in pg12?) To make matters more interesting, wheneverwe attempted to run a VACUUM command we encountered the ERROR message that Nate quoted on every single attempt! There was a momentary mild panic based on the "ERRCODE_DATA_CORRUPTED" message parameter in heapam.c FreezeMultiXactId()... but as we looked closer we're now thinking there might just be an obscure bug in the code that setsvacuum limits. > > Nathan and Nasby and myself have been chatting about this for quite awhile but the vacuum code isn't exactly the simplestthing in the world to reason about. :) Anyway, it looks to me like MultiXactMemberFreezeThreshold() is intendedto progressively reduce the vacuum multixact limits across multiple vacuum runs on the same table, as pressure onthe members space increases. I'm thinking there was just a small oversight in writing the formula where under the mostaggressive circumstances, vacuum could actually be instructed to delete multixacts that are still in use by active transactionsand trigger the failure we observed. > > Nate put together an initial patch (attached to the previous email, which was sent only to the bugs list). We couldn'tquite come to a consensus and on the best approach, but we decided that he'd kick of the thread and I'd throw outan alternative version of the patch that might be worth discussion. [Attached to this email.] Curious what others think! Hi Jeremy, Nathan, Jim, Ok, so to recap... since commit 801c2dc7 in 2014, if the limit was before the 'safe' limit, then it would log the warning and start using the safe limit, even if that was newer than a multixact that is *still running*. It's not immediately clear to me if the limits on the relevant GUCs or anything else ever prevented that. Then commit 53bb309d2d5 came along in 2015 (to fix a bug: member's head could overwrite its tail) and created a way for the safe limit to be more aggressive. When member space is low, we start lowering the effective max freeze age, and as we do so the likelihood of crossing into still-running-multixact territory increases. I suppose this requires you to run out of member space (for example many backends key sharing the same FK) or maybe just set autovacuum_multixact_freeze_max_age quite low, and then prolong the life of a multixact for longer. Does the problem fix itself once you close the transaction that's in the oldest multixact, ie holding back GetOldestMultiXact() from advancing? Since VACUUM errors out, we don't corrupt data, right? Everyone else is still going to see the multixact as running and do the right thing because vacuum never manages to (bogusly) freeze the tuple. Both patches prevent mxactLimit from being newer than the oldest running multixact. The v1 patch uses the most aggressive setting possible: the oldest running multi; the v2 uses the least aggressive of the 'safe' and oldest running multi. At first glance it seems like the second one is better: it only does something different if we're in the dangerous scenario you identified, but otherwise it sticks to the safe limit, which generates less IO. -- Thomas Munro https://enterprisedb.com
On Thu, Sep 5, 2019 at 1:01 PM Jeremy Schneider <schnjere@amazon.com> wrote: > On 9/4/19 17:37, Nathan Bossart wrote: > Currently, if you hold a multixact open long enough to generate an > "oldest multixact is far in the past" message during VACUUM, you may > see the following ERROR: > > WARNING: oldest multixact is far in the past > HINT: Close open transactions with multixacts soon to avoid wraparound problems. > ERROR: multixact X from before cutoff Y found to be still running > > Upon further inspection, I found that this is because the multixact > limit used in this case is the threshold for which we emit the "oldest > multixact" message. Instead, I think the multixact limit should be > set to the result of GetOldestMultiXactId(), effectively forcing a > minimum freeze age of zero. The ERROR itself is emitted by > FreezeMultiXactId() and appears to be a safeguard against problems > like this. > > I've attached a patch to set the limit to the oldest multixact instead > of the "safeMxactLimit" in this case. I'd like to credit Jeremy > Schneider as the original reporter. > > This was fun (sortof) - and a good part of the afternoon for Nathan, Nasby and myself today. A rather large PostgreSQLdatabase with default autovacuum settings had a large table that started getting behind on Sunday. The serverhas a fairly large number of CPUs and a respectable workload. We realized today that with their XID generation theywould go read-only to prevent wraparound tomorrow. (And perfectly healthy XID age on Sunday - that's wraparound in fourdays! Did I mention that I'm excited for the default limit GUC change in pg12?) To make matters more interesting, wheneverwe attempted to run a VACUUM command we encountered the ERROR message that Nate quoted on every single attempt! There was a momentary mild panic based on the "ERRCODE_DATA_CORRUPTED" message parameter in heapam.c FreezeMultiXactId()... but as we looked closer we're now thinking there might just be an obscure bug in the code that setsvacuum limits. > > Nathan and Nasby and myself have been chatting about this for quite awhile but the vacuum code isn't exactly the simplestthing in the world to reason about. :) Anyway, it looks to me like MultiXactMemberFreezeThreshold() is intendedto progressively reduce the vacuum multixact limits across multiple vacuum runs on the same table, as pressure onthe members space increases. I'm thinking there was just a small oversight in writing the formula where under the mostaggressive circumstances, vacuum could actually be instructed to delete multixacts that are still in use by active transactionsand trigger the failure we observed. > > Nate put together an initial patch (attached to the previous email, which was sent only to the bugs list). We couldn'tquite come to a consensus and on the best approach, but we decided that he'd kick of the thread and I'd throw outan alternative version of the patch that might be worth discussion. [Attached to this email.] Curious what others think! Hi Jeremy, Nathan, Jim, Ok, so to recap... since commit 801c2dc7 in 2014, if the limit was before the 'safe' limit, then it would log the warning and start using the safe limit, even if that was newer than a multixact that is *still running*. It's not immediately clear to me if the limits on the relevant GUCs or anything else ever prevented that. Then commit 53bb309d2d5 came along in 2015 (to fix a bug: member's head could overwrite its tail) and created a way for the safe limit to be more aggressive. When member space is low, we start lowering the effective max freeze age, and as we do so the likelihood of crossing into still-running-multixact territory increases. I suppose this requires you to run out of member space (for example many backends key sharing the same FK) or maybe just set autovacuum_multixact_freeze_max_age quite low, and then prolong the life of a multixact for longer. Does the problem fix itself once you close the transaction that's in the oldest multixact, ie holding back GetOldestMultiXact() from advancing? Since VACUUM errors out, we don't corrupt data, right? Everyone else is still going to see the multixact as running and do the right thing because vacuum never manages to (bogusly) freeze the tuple. Both patches prevent mxactLimit from being newer than the oldest running multixact. The v1 patch uses the most aggressive setting possible: the oldest running multi; the v2 uses the least aggressive of the 'safe' and oldest running multi. At first glance it seems like the second one is better: it only does something different if we're in the dangerous scenario you identified, but otherwise it sticks to the safe limit, which generates less IO. -- Thomas Munro https://enterprisedb.com
On 9/4/19 21:01, Thomas Munro wrote:
On this particular production system, autovacuum_multixact_freeze_max_age is the default value of 400 million and it is not overridden for any tables. Looks to me like this was just workload driven. There are a number of FKs and those seem to be a likely candidate to me.I suppose this requires you to run out of member space (for example many backends key sharing the same FK) or maybe just set autovacuum_multixact_freeze_max_age quite low, and then prolong the life of a multixact for longer.
The really interesting thing about this case is that the only long-running connection was the autovacuum that had been running since Sunday. While we were investigating yesterday, the autovacuum process died without advancing relfrozenxid (users configured this system with poor logging, so it's not known whether autovac terminated from error or from a user who logged on to the system). As soon as the autovacuum process died, we stopped getting the "multixact X from before cutoff Y" errors.Does the problem fix itself once you close the transaction that's in the oldest multixact, ie holding back GetOldestMultiXact() from advancing?
It really appears that it was the autovacuum process itself that was providing the oldest running multixact which caused errors on yesterday's attempts to vacuum other tables - even though I though vacuum processes were ignored by that code. I'll have to take another look at some point.
Vacuum cost parameters had been adjusted after Sunday, so the original autovacuum would have used default settings. Naturally, a new autovacuum process started up right away. This new process - definitely using adjusted cost parameters - completed the vacuum of the large table with 5 passes (index_vacuum_count) in a couple hours. Maintenance work memory was already at the max; there were many hundreds of millions of dead tuples that still remained to be cleaned up.
The size of the large table (heap only) was about 75% of the memory on the server, and the table had three indexes each about half the size of the table. The storage was provisioned at just over 10k IOPS; at this rate you could read all three indexes from the storage one block at a time in about an hour. (And Linux should be reading more than a block at a time.)
It is not known whether the original autovacuum failed to completely vacuum the large table in 3 days because of cost settings alone or because there's another latent bug somewhere in the autovacuum code that put it into some kind of loop (but if autovac hit the error above then the PID would have terminated). We didn't manage to get a pstack.
That's my take as well. I don't think there's any data corruption risk here.Since VACUUM errors out, we don't corrupt data, right? Everyone else is still going to see the multixact as running and do the right thing because vacuum never manages to (bogusly) freeze the tuple.
If anyone else ever hits this in the future, I think it's safe to just kill the oldest open session. The error should go away and there shouldn't be any risk of damage to the database.
Thanks for taking a look!Both patches prevent mxactLimit from being newer than the oldest running multixact. The v1 patch uses the most aggressive setting possible: the oldest running multi; the v2 uses the least aggressive of the 'safe' and oldest running multi. At first glance it seems like the second one is better: it only does something different if we're in the dangerous scenario you identified, but otherwise it sticks to the safe limit, which generates less IO.
-Jeremy
-- Jeremy Schneider Database Engineer Amazon Web Services
On 9/4/19 21:01, Thomas Munro wrote:
On this particular production system, autovacuum_multixact_freeze_max_age is the default value of 400 million and it is not overridden for any tables. Looks to me like this was just workload driven. There are a number of FKs and those seem to be a likely candidate to me.I suppose this requires you to run out of member space (for example many backends key sharing the same FK) or maybe just set autovacuum_multixact_freeze_max_age quite low, and then prolong the life of a multixact for longer.
The really interesting thing about this case is that the only long-running connection was the autovacuum that had been running since Sunday. While we were investigating yesterday, the autovacuum process died without advancing relfrozenxid (users configured this system with poor logging, so it's not known whether autovac terminated from error or from a user who logged on to the system). As soon as the autovacuum process died, we stopped getting the "multixact X from before cutoff Y" errors.Does the problem fix itself once you close the transaction that's in the oldest multixact, ie holding back GetOldestMultiXact() from advancing?
It really appears that it was the autovacuum process itself that was providing the oldest running multixact which caused errors on yesterday's attempts to vacuum other tables - even though I though vacuum processes were ignored by that code. I'll have to take another look at some point.
Vacuum cost parameters had been adjusted after Sunday, so the original autovacuum would have used default settings. Naturally, a new autovacuum process started up right away. This new process - definitely using adjusted cost parameters - completed the vacuum of the large table with 5 passes (index_vacuum_count) in a couple hours. Maintenance work memory was already at the max; there were many hundreds of millions of dead tuples that still remained to be cleaned up.
The size of the large table (heap only) was about 75% of the memory on the server, and the table had three indexes each about half the size of the table. The storage was provisioned at just over 10k IOPS; at this rate you could read all three indexes from the storage one block at a time in about an hour. (And Linux should be reading more than a block at a time.)
It is not known whether the original autovacuum failed to completely vacuum the large table in 3 days because of cost settings alone or because there's another latent bug somewhere in the autovacuum code that put it into some kind of loop (but if autovac hit the error above then the PID would have terminated). We didn't manage to get a pstack.
That's my take as well. I don't think there's any data corruption risk here.Since VACUUM errors out, we don't corrupt data, right? Everyone else is still going to see the multixact as running and do the right thing because vacuum never manages to (bogusly) freeze the tuple.
If anyone else ever hits this in the future, I think it's safe to just kill the oldest open session. The error should go away and there shouldn't be any risk of damage to the database.
Thanks for taking a look!Both patches prevent mxactLimit from being newer than the oldest running multixact. The v1 patch uses the most aggressive setting possible: the oldest running multi; the v2 uses the least aggressive of the 'safe' and oldest running multi. At first glance it seems like the second one is better: it only does something different if we're in the dangerous scenario you identified, but otherwise it sticks to the safe limit, which generates less IO.
-Jeremy
-- Jeremy Schneider Database Engineer Amazon Web Services
On 9/4/19, 9:03 PM, "Thomas Munro" <thomas.munro@gmail.com> wrote: > Both patches prevent mxactLimit from being newer than the oldest > running multixact. The v1 patch uses the most aggressive setting > possible: the oldest running multi; the v2 uses the least aggressive > of the 'safe' and oldest running multi. At first glance it seems like > the second one is better: it only does something different if we're in > the dangerous scenario you identified, but otherwise it sticks to the > safe limit, which generates less IO. Thanks for taking a look! Right, the v2 patch will effectively ramp-down the freezemin as your freeze_max_age gets smaller, while the v1 patch will set the effective freezemin to zero as soon as your multixact age passes the threshold. I think what is unclear to me is whether this ramp-down behavior is the intended functionality or we should be doing something similar to what we do for regular transaction IDs (i.e. force freezemin to zero right after it hits the "oldest xmin is far in the past" threshold). The comment above MultiXactMemberFreezeThreshold() explains things pretty well, but AFAICT it is more geared towards influencing autovacuum scheduling. I agree that v2 is safer from the standpoint that it changes as little as possible, though. Nathan
On 9/4/19, 9:03 PM, "Thomas Munro" <thomas.munro@gmail.com> wrote: > Both patches prevent mxactLimit from being newer than the oldest > running multixact. The v1 patch uses the most aggressive setting > possible: the oldest running multi; the v2 uses the least aggressive > of the 'safe' and oldest running multi. At first glance it seems like > the second one is better: it only does something different if we're in > the dangerous scenario you identified, but otherwise it sticks to the > safe limit, which generates less IO. Thanks for taking a look! Right, the v2 patch will effectively ramp-down the freezemin as your freeze_max_age gets smaller, while the v1 patch will set the effective freezemin to zero as soon as your multixact age passes the threshold. I think what is unclear to me is whether this ramp-down behavior is the intended functionality or we should be doing something similar to what we do for regular transaction IDs (i.e. force freezemin to zero right after it hits the "oldest xmin is far in the past" threshold). The comment above MultiXactMemberFreezeThreshold() explains things pretty well, but AFAICT it is more geared towards influencing autovacuum scheduling. I agree that v2 is safer from the standpoint that it changes as little as possible, though. Nathan
On Fri, Sep 6, 2019 at 6:32 AM Jeremy Schneider <schnjere@amazon.com> wrote: > It really appears that it was the autovacuum process itself that was providing the oldest running multixact which causederrors on yesterday's attempts to vacuum other tables - even though I though vacuum processes were ignored by thatcode. I'll have to take another look at some point. Ah, that seems plausible. If the backend ever called GetMultiXactIdMembers() and thence MultiXactIdSetOldestVisible() at a time when there were live multixacts, it would set its own OldestVisibleMXactID[] slot, and then GetOldestMultiXactId() would return that value for the rest of the transaction (unless there was an even older one to return, but in the case you're describing there wasn't). GetOldestMultiXactId() doesn't have a way to ignore vacuum backends, like GetOldestXmin() does. That doesn't seem to be a problem in itself. (I am not sure why GetOldestMultiXactId() needs to consider OldestVisibleMXactId[] at all for this purpose, and not just OldestMemberXactId[], but I suppose it has to do with simultaneously key-share-locked and updated tuples or something, it's too early and I haven't had enough coffee.) -- Thomas Munro https://enterprisedb.com
On Fri, Sep 6, 2019 at 6:32 AM Jeremy Schneider <schnjere@amazon.com> wrote: > It really appears that it was the autovacuum process itself that was providing the oldest running multixact which causederrors on yesterday's attempts to vacuum other tables - even though I though vacuum processes were ignored by thatcode. I'll have to take another look at some point. Ah, that seems plausible. If the backend ever called GetMultiXactIdMembers() and thence MultiXactIdSetOldestVisible() at a time when there were live multixacts, it would set its own OldestVisibleMXactID[] slot, and then GetOldestMultiXactId() would return that value for the rest of the transaction (unless there was an even older one to return, but in the case you're describing there wasn't). GetOldestMultiXactId() doesn't have a way to ignore vacuum backends, like GetOldestXmin() does. That doesn't seem to be a problem in itself. (I am not sure why GetOldestMultiXactId() needs to consider OldestVisibleMXactId[] at all for this purpose, and not just OldestMemberXactId[], but I suppose it has to do with simultaneously key-share-locked and updated tuples or something, it's too early and I haven't had enough coffee.) -- Thomas Munro https://enterprisedb.com
On Thu, Sep 5, 2019 at 4:08 PM Bossart, Nathan <bossartn@amazon.com> wrote: > Right, the v2 patch will effectively ramp-down the freezemin as your > freeze_max_age gets smaller, while the v1 patch will set the effective > freezemin to zero as soon as your multixact age passes the threshold. > I think what is unclear to me is whether this ramp-down behavior is > the intended functionality or we should be doing something similar to > what we do for regular transaction IDs (i.e. force freezemin to zero > right after it hits the "oldest xmin is far in the past" threshold). > The comment above MultiXactMemberFreezeThreshold() explains things > pretty well, but AFAICT it is more geared towards influencing > autovacuum scheduling. I agree that v2 is safer from the standpoint > that it changes as little as possible, though. I don't presently have a view on fixing the actual but here, but I can certainly confirm that I intended MultiXactMemberFreezeThreshold() to ratchet up the pressure gradually rather than all at once, and my suspicion is that this behavior may be good to retain, but I'm not sure. One difference between regular XIDs and MultiXacts is that there's only one reason why we can need to vacuum XIDs, but there are two reasons why we can need to vacuum MultiXacts. We can either be running short of members space or we can be running short of offset space, and running out of either one is bad. Regular XIDs have no analogue of this problem: there's only one thing that you can exhaust. At the time I wrote MultiXactMemberFreezeThreshold(), only the 'offsets' array had any sort of wraparound protection, and it was space in 'offsets' that was measured by relminmxid, datminmxid, etc. You could imagine having separate catalog state to track space in the 'members' SLRU, e.g. relminmxidmembers, datminmxidmembers, etc., but that wasn't really an option for fixing the bug at hand, because it wouldn't have been back-patchable. So the challenge was to find some way of using the existing catalog state to try to provide wraparound protection for a new kind of thing for which wraparound protection had not been previously contemplated. And so MultiXactMemberFreezeThreshold() was born. (I apologize if any of the above sounds like I'm talking credit for work actually done by Thomas, who I see is listed as the primary author of the commit in question. I feel like I invented MultiXactMemberFreezeThreshold and the big comment at the top of it looks to me like something I wrote, but but this was a long time ago and I don't really remember who did what. My intent here is to provide some context that may be useful based on what I remember about that patch, not to steal anybody's thunder.) -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Thu, Sep 5, 2019 at 4:08 PM Bossart, Nathan <bossartn@amazon.com> wrote: > Right, the v2 patch will effectively ramp-down the freezemin as your > freeze_max_age gets smaller, while the v1 patch will set the effective > freezemin to zero as soon as your multixact age passes the threshold. > I think what is unclear to me is whether this ramp-down behavior is > the intended functionality or we should be doing something similar to > what we do for regular transaction IDs (i.e. force freezemin to zero > right after it hits the "oldest xmin is far in the past" threshold). > The comment above MultiXactMemberFreezeThreshold() explains things > pretty well, but AFAICT it is more geared towards influencing > autovacuum scheduling. I agree that v2 is safer from the standpoint > that it changes as little as possible, though. I don't presently have a view on fixing the actual but here, but I can certainly confirm that I intended MultiXactMemberFreezeThreshold() to ratchet up the pressure gradually rather than all at once, and my suspicion is that this behavior may be good to retain, but I'm not sure. One difference between regular XIDs and MultiXacts is that there's only one reason why we can need to vacuum XIDs, but there are two reasons why we can need to vacuum MultiXacts. We can either be running short of members space or we can be running short of offset space, and running out of either one is bad. Regular XIDs have no analogue of this problem: there's only one thing that you can exhaust. At the time I wrote MultiXactMemberFreezeThreshold(), only the 'offsets' array had any sort of wraparound protection, and it was space in 'offsets' that was measured by relminmxid, datminmxid, etc. You could imagine having separate catalog state to track space in the 'members' SLRU, e.g. relminmxidmembers, datminmxidmembers, etc., but that wasn't really an option for fixing the bug at hand, because it wouldn't have been back-patchable. So the challenge was to find some way of using the existing catalog state to try to provide wraparound protection for a new kind of thing for which wraparound protection had not been previously contemplated. And so MultiXactMemberFreezeThreshold() was born. (I apologize if any of the above sounds like I'm talking credit for work actually done by Thomas, who I see is listed as the primary author of the commit in question. I feel like I invented MultiXactMemberFreezeThreshold and the big comment at the top of it looks to me like something I wrote, but but this was a long time ago and I don't really remember who did what. My intent here is to provide some context that may be useful based on what I remember about that patch, not to steal anybody's thunder.) -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Sat, Sep 7, 2019 at 5:25 AM Robert Haas <robertmhaas@gmail.com> wrote: > (I apologize if any of the above sounds like I'm talking credit for > work actually done by Thomas, who I see is listed as the primary > author of the commit in question. I feel like I invented > MultiXactMemberFreezeThreshold and the big comment at the top of it > looks to me like something I wrote, but but this was a long time ago > and I don't really remember who did what. My intent here is to provide > some context that may be useful based on what I remember about that > patch, not to steal anybody's thunder.) I don't recall but it could well have been your idea to do it that way, my code and testing, and your comments and commit. Either way I'm happy for you to steal my bugs. -- Thomas Munro https://enterprisedb.com
On Sat, Sep 7, 2019 at 5:25 AM Robert Haas <robertmhaas@gmail.com> wrote: > (I apologize if any of the above sounds like I'm talking credit for > work actually done by Thomas, who I see is listed as the primary > author of the commit in question. I feel like I invented > MultiXactMemberFreezeThreshold and the big comment at the top of it > looks to me like something I wrote, but but this was a long time ago > and I don't really remember who did what. My intent here is to provide > some context that may be useful based on what I remember about that > patch, not to steal anybody's thunder.) I don't recall but it could well have been your idea to do it that way, my code and testing, and your comments and commit. Either way I'm happy for you to steal my bugs. -- Thomas Munro https://enterprisedb.com
On 9/6/19, 10:26 AM, "Robert Haas" <robertmhaas@gmail.com> wrote: > On Thu, Sep 5, 2019 at 4:08 PM Bossart, Nathan <bossartn@amazon.com> wrote: >> Right, the v2 patch will effectively ramp-down the freezemin as your >> freeze_max_age gets smaller, while the v1 patch will set the effective >> freezemin to zero as soon as your multixact age passes the threshold. >> I think what is unclear to me is whether this ramp-down behavior is >> the intended functionality or we should be doing something similar to >> what we do for regular transaction IDs (i.e. force freezemin to zero >> right after it hits the "oldest xmin is far in the past" threshold). >> The comment above MultiXactMemberFreezeThreshold() explains things >> pretty well, but AFAICT it is more geared towards influencing >> autovacuum scheduling. I agree that v2 is safer from the standpoint >> that it changes as little as possible, though. > > I don't presently have a view on fixing the actual but here, but I can > certainly confirm that I intended MultiXactMemberFreezeThreshold() to > ratchet up the pressure gradually rather than all at once, and my > suspicion is that this behavior may be good to retain, but I'm not > sure. Thanks for the detailed background information. FWIW I am now in favor of the v2 patch. Nathan
On 9/6/19, 10:26 AM, "Robert Haas" <robertmhaas@gmail.com> wrote: > On Thu, Sep 5, 2019 at 4:08 PM Bossart, Nathan <bossartn@amazon.com> wrote: >> Right, the v2 patch will effectively ramp-down the freezemin as your >> freeze_max_age gets smaller, while the v1 patch will set the effective >> freezemin to zero as soon as your multixact age passes the threshold. >> I think what is unclear to me is whether this ramp-down behavior is >> the intended functionality or we should be doing something similar to >> what we do for regular transaction IDs (i.e. force freezemin to zero >> right after it hits the "oldest xmin is far in the past" threshold). >> The comment above MultiXactMemberFreezeThreshold() explains things >> pretty well, but AFAICT it is more geared towards influencing >> autovacuum scheduling. I agree that v2 is safer from the standpoint >> that it changes as little as possible, though. > > I don't presently have a view on fixing the actual but here, but I can > certainly confirm that I intended MultiXactMemberFreezeThreshold() to > ratchet up the pressure gradually rather than all at once, and my > suspicion is that this behavior may be good to retain, but I'm not > sure. Thanks for the detailed background information. FWIW I am now in favor of the v2 patch. Nathan
On Wed, Sep 18, 2019 at 8:11 AM Bossart, Nathan <bossartn@amazon.com> wrote: > Thanks for the detailed background information. FWIW I am now in > favor of the v2 patch. Here's a version with a proposed commit message and a comment. Please let me know if I credited things to the right people!
Attachment
On Wed, Sep 18, 2019 at 8:11 AM Bossart, Nathan <bossartn@amazon.com> wrote: > Thanks for the detailed background information. FWIW I am now in > favor of the v2 patch. Here's a version with a proposed commit message and a comment. Please let me know if I credited things to the right people!
On 10/15/19, 11:11 PM, "Thomas Munro" <thomas.munro@gmail.com> wrote: > Here's a version with a proposed commit message and a comment. Please > let me know if I credited things to the right people! Looks good to me. Thanks! Nathan
On 10/15/19, 11:11 PM, "Thomas Munro" <thomas.munro@gmail.com> wrote: > Here's a version with a proposed commit message and a comment. Please > let me know if I credited things to the right people! Looks good to me. Thanks! Nathan
On 10/16/19 10:09, Bossart, Nathan wrote: > On 10/15/19, 11:11 PM, "Thomas Munro" <thomas.munro@gmail.com> wrote: >> Here's a version with a proposed commit message and a comment. Please >> let me know if I credited things to the right people! > > Looks good to me. Thanks! +1
On 10/16/19 10:09, Bossart, Nathan wrote: > On 10/15/19, 11:11 PM, "Thomas Munro" <thomas.munro@gmail.com> wrote: >> Here's a version with a proposed commit message and a comment. Please >> let me know if I credited things to the right people! > > Looks good to me. Thanks! +1
On Thu, Oct 17, 2019 at 6:11 AM Jeremy Schneider <schnjere@amazon.com> wrote: > On 10/16/19 10:09, Bossart, Nathan wrote: > > On 10/15/19, 11:11 PM, "Thomas Munro" <thomas.munro@gmail.com> wrote: > >> Here's a version with a proposed commit message and a comment. Please > >> let me know if I credited things to the right people! > > > > Looks good to me. Thanks! > > +1 Pushed.
On Thu, Oct 17, 2019 at 6:11 AM Jeremy Schneider <schnjere@amazon.com> wrote: > On 10/16/19 10:09, Bossart, Nathan wrote: > > On 10/15/19, 11:11 PM, "Thomas Munro" <thomas.munro@gmail.com> wrote: > >> Here's a version with a proposed commit message and a comment. Please > >> let me know if I credited things to the right people! > > > > Looks good to me. Thanks! > > +1 Pushed.