Thread: pg_multixact not getting truncated
Hackers, I'm looking at a couple of high-transaction-rate and high-FK-conflict rate servers where pg_multixact has grown to be more than 1GB in size. One such server doesn't appear to be having any notable issues with vacuuming, and the oldest mxid on the system is about 47m old. VACUUM FREEZEing the oldest databases did not cause the pg_multixact dir to get smaller --- it may have even caused it to get larger. Why would pg_multixact not be truncating? Does it never truncate files with aborted multixacts in them? Might we have another multixact bug? -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com
Josh Berkus wrote: > Hackers, > > I'm looking at a couple of high-transaction-rate and high-FK-conflict > rate servers where pg_multixact has grown to be more than 1GB in size. > One such server doesn't appear to be having any notable issues with > vacuuming, and the oldest mxid on the system is about 47m old. VACUUM > FREEZEing the oldest databases did not cause the pg_multixact dir to get > smaller --- it may have even caused it to get larger. > > Why would pg_multixact not be truncating? Does it never truncate files > with aborted multixacts in them? Might we have another multixact bug? Have a look at the MultiXactId values in pg_controldata, datminmxid in pg_database, and relminmxid in pg_class. They should advance as you VACUUM FREEZE. If it's stuck at 1, you might be in pg_upgrade trouble if you used 9.3.4 or earlier to upgrade. -- Álvaro Herrera http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
On 11/03/2014 05:06 PM, Alvaro Herrera wrote: > Josh Berkus wrote: >> Hackers, >> >> I'm looking at a couple of high-transaction-rate and high-FK-conflict >> rate servers where pg_multixact has grown to be more than 1GB in size. >> One such server doesn't appear to be having any notable issues with >> vacuuming, and the oldest mxid on the system is about 47m old. VACUUM >> FREEZEing the oldest databases did not cause the pg_multixact dir to get >> smaller --- it may have even caused it to get larger. >> >> Why would pg_multixact not be truncating? Does it never truncate files >> with aborted multixacts in them? Might we have another multixact bug? > > Have a look at the MultiXactId values in pg_controldata, datminmxid in > pg_database, and relminmxid in pg_class. They should advance as you > VACUUM FREEZE. If it's stuck at 1, you might be in pg_upgrade trouble > if you used 9.3.4 or earlier to upgrade. They're advancing. I also know for a fact that this system was not pg_upgraded, because I set it up. It's always been on 9.3.5. So after some vacuum freezing and a checkpoint, the oldestMultiXid advanced by 9 million, about 20% of the gap with current multiXid. However, that did not delete 20% of the files; before the Freeze there were 4414 files, and now there are 4195. So advancing 9m mxids resulted in only deleting ~~130 files. So I think we definitely have an issue here. BTW, the reason I started poking into this was a report from a user that they have a pg_multixact directory which is 21GB in size, and is 2X the size of the database. Here's XID data: Latest checkpoint's NextXID: 0/1126461940 Latest checkpoint's NextOID: 371135838 Latest checkpoint's NextMultiXactId: 162092874 Latest checkpoint's NextMultiOffset: 778360290 Latest checkpoint's oldestXID: 945761490 Latest checkpoint's oldestXID's DB: 370038709 Latest checkpoint's oldestActiveXID: 1126461940 Latest checkpoint's oldestMultiXid: 123452201 Latest checkpoint's oldestMulti's DB: 370038709 Oldest mxid file is 29B2, newest is 3A13 No tables had a relminmxid of 1 (some of the system tables were 0, though), and the data from pg_class and pg_database is consistent. -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com
On 11/03/2014 05:24 PM, Josh Berkus wrote: > BTW, the reason I started poking into this was a report from a user that > they have a pg_multixact directory which is 21GB in size, and is 2X the > size of the database. > > Here's XID data: > > Latest checkpoint's NextXID: 0/1126461940 > Latest checkpoint's NextOID: 371135838 > Latest checkpoint's NextMultiXactId: 162092874 > Latest checkpoint's NextMultiOffset: 778360290 > Latest checkpoint's oldestXID: 945761490 > Latest checkpoint's oldestXID's DB: 370038709 > Latest checkpoint's oldestActiveXID: 1126461940 > Latest checkpoint's oldestMultiXid: 123452201 > Latest checkpoint's oldestMulti's DB: 370038709 > > Oldest mxid file is 29B2, newest is 3A13 > > No tables had a relminmxid of 1 (some of the system tables were 0, > though), and the data from pg_class and pg_database is consistent. More tidbits: I just did a quick check on customer systems (5 of them). This only seems to be happening on customer systems where I specifically know there is a high number of FK lock waits (the system above gets around 1000 per minute that we know of). Other systems with higher transaction rates don't exhibit this issue; I checked a 9.3.5 database which normally needs to do XID wraparound once every 10 days, and it's pg_multixact is only 48K (it has one file, 0000). Note that pg_clog on the bad machine is only 64K in size. How many IDs are there per mxid file? -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com
On 11/3/14, 7:40 PM, Josh Berkus wrote: > On 11/03/2014 05:24 PM, Josh Berkus wrote: >> BTW, the reason I started poking into this was a report from a user that >> they have a pg_multixact directory which is 21GB in size, and is 2X the >> size of the database. >> >> Here's XID data: >> >> Latest checkpoint's NextXID: 0/1126461940 >> Latest checkpoint's NextOID: 371135838 >> Latest checkpoint's NextMultiXactId: 162092874 >> Latest checkpoint's NextMultiOffset: 778360290 >> Latest checkpoint's oldestXID: 945761490 >> Latest checkpoint's oldestXID's DB: 370038709 >> Latest checkpoint's oldestActiveXID: 1126461940 >> Latest checkpoint's oldestMultiXid: 123452201 >> Latest checkpoint's oldestMulti's DB: 370038709 >> >> Oldest mxid file is 29B2, newest is 3A13 >> >> No tables had a relminmxid of 1 (some of the system tables were 0, >> though), and the data from pg_class and pg_database is consistent. > > More tidbits: > > I just did a quick check on customer systems (5 of them). This only > seems to be happening on customer systems where I specifically know > there is a high number of FK lock waits (the system above gets around > 1000 per minute that we know of). Other systems with higher transaction > rates don't exhibit this issue; I checked a 9.3.5 database which > normally needs to do XID wraparound once every 10 days, and it's > pg_multixact is only 48K (it has one file, 0000). > > Note that pg_clog on the bad machine is only 64K in size. > > How many IDs are there per mxid file? #define MULTIXACT_OFFSETS_PER_PAGE (BLCKSZ / sizeof(MultiXactOffset)) So for 8k blocks, there are 2k offsets (really MultiXactIds) per page, 32 pages per SLRU segment. Your file names aren'tmaking sense to me. :( If I'm doing the math correctly, 29B2 is MXID 699 531 264 and 3A13 is 974 323 712. You're onlylooking in pg_multixact/members/, yes? Relevant code starts in vacuum.c/vac_update_datfrozenxid() If there's any rows in pg_class for tables/matviews/toast with either relfrozenxid > next XID or relminmxid > next MXID thenthe code *silently* pulls the plug right there. IMO we should at least issue a warning. That you see relminxid advancing tells me this isn't the case here. ForceTransactionIdLimitUpdate() is a bit suspect in that it only looks at xidVacLimit, but if it were breaking then you wouldn'tsee pg_database minmxid advancing. Looking through TruncateMultiXact, I don't see anything that could prevent truncation, unless the way we're handing MultiXactIDwraparound is broken (which I don't see any indication of). Can you post the contents of pg_multixact/members/? -- Jim Nasby, Data Architect, Blue Treble Consulting Data in Trouble? Get it in Treble! http://BlueTreble.com
On 11/05/2014 10:40 AM, Jim Nasby wrote: > On 11/3/14, 7:40 PM, Josh Berkus wrote: >> On 11/03/2014 05:24 PM, Josh Berkus wrote: >>> BTW, the reason I started poking into this was a report from a user that >>> they have a pg_multixact directory which is 21GB in size, and is 2X the >>> size of the database. >>> >>> Here's XID data: >>> >>> Latest checkpoint's NextXID: 0/1126461940 >>> Latest checkpoint's NextOID: 371135838 >>> Latest checkpoint's NextMultiXactId: 162092874 >>> Latest checkpoint's NextMultiOffset: 778360290 >>> Latest checkpoint's oldestXID: 945761490 >>> Latest checkpoint's oldestXID's DB: 370038709 >>> Latest checkpoint's oldestActiveXID: 1126461940 >>> Latest checkpoint's oldestMultiXid: 123452201 >>> Latest checkpoint's oldestMulti's DB: 370038709 >>> >>> Oldest mxid file is 29B2, newest is 3A13 >>> >>> No tables had a relminmxid of 1 (some of the system tables were 0, >>> though), and the data from pg_class and pg_database is consistent. >> >> More tidbits: >> >> I just did a quick check on customer systems (5 of them). This only >> seems to be happening on customer systems where I specifically know >> there is a high number of FK lock waits (the system above gets around >> 1000 per minute that we know of). Other systems with higher transaction >> rates don't exhibit this issue; I checked a 9.3.5 database which >> normally needs to do XID wraparound once every 10 days, and it's >> pg_multixact is only 48K (it has one file, 0000). >> >> Note that pg_clog on the bad machine is only 64K in size. >> >> How many IDs are there per mxid file? > #define MULTIXACT_OFFSETS_PER_PAGE (BLCKSZ / sizeof(MultiXactOffset)) > > So for 8k blocks, there are 2k offsets (really MultiXactIds) per page, > 32 pages per SLRU segment. Your file names aren't making sense to me. :( > If I'm doing the math correctly, 29B2 is MXID 699 531 264 and 3A13 is > 974 323 712. You're only looking in pg_multixact/members/, yes? These are members, not offsets. What's stored in members? > Relevant code starts in vacuum.c/vac_update_datfrozenxid() > > If there's any rows in pg_class for tables/matviews/toast with either > relfrozenxid > next XID or relminmxid > next MXID then the code > *silently* pulls the plug right there. IMO we should at least issue a > warning. Wait, how would that situation arise in the first place? Wraparound is supposed to prevent it. Mind you, I checked pg_class, and it matches the minmxid shown by pg_database, so that's not the smoking gun. > Can you post the contents of pg_multixact/members/? Well, not as of last week, obviously. https://gist.github.com/jberkus/d05db3629e8c898664c4 I haven't pasted all the filenames, because, well, look at the count. I also added the contents of the /offsets directory, for full information. Note that we've added 400 multixact files since my first email. -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com
On 11/05/2014 11:15 AM, Josh Berkus wrote: > On 11/05/2014 10:40 AM, Jim Nasby wrote: >> Can you post the contents of pg_multixact/members/? > > Well, not as of last week, obviously. > > https://gist.github.com/jberkus/d05db3629e8c898664c4 > > I haven't pasted all the filenames, because, well, look at the count. I > also added the contents of the /offsets directory, for full information. > > Note that we've added 400 multixact files since my first email. So, just did a quick survey of our managed services customers. 4 out of 6 have pg_multixact bloat. Basically it looks like this: User1 (the one above): 1.1GB User2: 64K User3: 929MB User4: 1.3GB User5: 301MB User6: 48K Nobody (out of 6 more) running a version older than 9.3 had a pg_multixact larger than 128K. None of these users used pg_upgrade, so that's not the source of this problem. Instead, as demonstrated earlier, we are simply not truncating pg_multixact sufficiently. So this looks like a pretty serious bug. -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com
Josh Berkus wrote: > So, just did a quick survey of our managed services customers. 4 out of > 6 have pg_multixact bloat. Basically it looks like this: > > User1 (the one above): 1.1GB > User2: 64K > User3: 929MB > User4: 1.3GB > User5: 301MB > User6: 48K > > Nobody (out of 6 more) running a version older than 9.3 had a > pg_multixact larger than 128K. 9.3 multixacts are completely different from pre-9.3. We now depend on the freezing parameters to determine how much is kept; previously they were kept only for live trnsactions. Quoting the pg_controldata output you pasted previously: >>> Latest checkpoint's NextMultiXactId: 162092874 >>> Latest checkpoint's oldestMultiXid: 123452201 This says that the live multixact range goes from 123 million to 162 million; roughly 40 million values. (The default value for vacuum_multixact_freeze_table_age is 150 million, which is what determines how many values are kept.) You gist.github paste tells us there are 4598 members files. Each file has 32 pages, and each page hosts 2045 members; so there are 32 * 2045 * 4598 members, or somewhat about 300 million. For 40 million multixacts, this means there are about 7 members per multixact, in average, which seems a reasonable number to me. If you want to have vacuum truncate pg_multixact more aggresively, you need to decrease vacuum_multixact_freeze_table_age and vacuum_multixact_freeze_min_age. -- Álvaro Herrera http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
On 11/07/2014 04:43 PM, Alvaro Herrera wrote: > This says that the live multixact range goes from 123 million to 162 > million; roughly 40 million values. (The default value for > vacuum_multixact_freeze_table_age is 150 million, which is what > determines how many values are kept.) > > You gist.github paste tells us there are 4598 members files. Each file > has 32 pages, and each page hosts 2045 members; so there are 32 * 2045 * > 4598 members, or somewhat about 300 million. For 40 million > multixacts, this means there are about 7 members per multixact, in > average, which seems a reasonable number to me. So the basic problem is that multixact files are just huge, with an average of 35 bytes per multixact? > If you want to have vacuum truncate pg_multixact more aggresively, you > need to decrease vacuum_multixact_freeze_table_age and > vacuum_multixact_freeze_min_age. If that's the case, then we need to set the defaults more aggressively.I suggest maybe 10 million. The alternative is allowingit to creep up to 150million, which would be 5GB. I don't see adding 5GB to user databases without warning them as good behavior. Of course, this will lead to LOTs of additional vacuuming ... -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com
Josh Berkus wrote: > On 11/07/2014 04:43 PM, Alvaro Herrera wrote: > > This says that the live multixact range goes from 123 million to 162 > > million; roughly 40 million values. (The default value for > > vacuum_multixact_freeze_table_age is 150 million, which is what > > determines how many values are kept.) > > > > You gist.github paste tells us there are 4598 members files. Each file > > has 32 pages, and each page hosts 2045 members; so there are 32 * 2045 * > > 4598 members, or somewhat about 300 million. For 40 million > > multixacts, this means there are about 7 members per multixact, in > > average, which seems a reasonable number to me. > > So the basic problem is that multixact files are just huge, with an > average of 35 bytes per multixact? The more members the multixacts have, the more space they occupy. I would have thought this was obvious enough. > > If you want to have vacuum truncate pg_multixact more aggresively, you > > need to decrease vacuum_multixact_freeze_table_age and > > vacuum_multixact_freeze_min_age. > > If that's the case, then we need to set the defaults more aggressively. > I suggest maybe 10 million. The alternative is allowing it to creep up > to 150million, which would be 5GB. I don't see adding 5GB to user > databases without warning them as good behavior. > > Of course, this will lead to LOTs of additional vacuuming ... There's a trade-off here: more vacuuming I/O usage for less disk space used. How stressed your customers really are about 1 GB of disk space? -- Álvaro Herrera http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
On 2014-11-07 16:09:44 -0800, Josh Berkus wrote: > On 11/05/2014 11:15 AM, Josh Berkus wrote: > > On 11/05/2014 10:40 AM, Jim Nasby wrote: > >> Can you post the contents of pg_multixact/members/? > > > > Well, not as of last week, obviously. > > > > https://gist.github.com/jberkus/d05db3629e8c898664c4 > > > > I haven't pasted all the filenames, because, well, look at the count. I > > also added the contents of the /offsets directory, for full information. > > > > Note that we've added 400 multixact files since my first email. > > So, just did a quick survey of our managed services customers. 4 out of > 6 have pg_multixact bloat. Basically it looks like this: > > User1 (the one above): 1.1GB > User2: 64K > User3: 929MB > User4: 1.3GB > User5: 301MB > User6: 48K > > Nobody (out of 6 more) running a version older than 9.3 had a > pg_multixact larger than 128K. > > None of these users used pg_upgrade, so that's not the source of this > problem. Instead, as demonstrated earlier, we are simply not truncating > pg_multixact sufficiently. > > So this looks like a pretty serious bug. I've not seen actual evidence of a bug here. What's their multixact related settings? How large is pg_clog? Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
On 2014-11-07 17:20:44 -0800, Josh Berkus wrote: > On 11/07/2014 04:43 PM, Alvaro Herrera wrote: > > This says that the live multixact range goes from 123 million to 162 > > million; roughly 40 million values. (The default value for > > vacuum_multixact_freeze_table_age is 150 million, which is what > > determines how many values are kept.) > > > > You gist.github paste tells us there are 4598 members files. Each file > > has 32 pages, and each page hosts 2045 members; so there are 32 * 2045 * > > 4598 members, or somewhat about 300 million. For 40 million > > multixacts, this means there are about 7 members per multixact, in > > average, which seems a reasonable number to me. > > So the basic problem is that multixact files are just huge, with an > average of 35 bytes per multixact? Depends on the concurrency. The number of members is determined by the number of xacts concurrenly locking a row. > > If you want to have vacuum truncate pg_multixact more aggresively, you > > need to decrease vacuum_multixact_freeze_table_age and > > vacuum_multixact_freeze_min_age. > > If that's the case, then we need to set the defaults more > aggressively. Why? If you have that high transaction volume, a few seldomly read files won't hurt you. > Of course, this will lead to LOTs of additional vacuuming ... Yes. And that's likely to cause much, much more grief. Also. Didn't you just *vehemently* oppose making these values tunable at all? Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
On 11/07/2014 05:29 PM, Alvaro Herrera wrote: > Josh Berkus wrote: >> Of course, this will lead to LOTs of additional vacuuming ... > > There's a trade-off here: more vacuuming I/O usage for less disk space > used. How stressed your customers really are about 1 GB of disk space? These customers not so much. The users I encountered on chat whose pg_multixact was over 20GB, and larger than their database? Lots. On 11/08/2014 03:54 AM, Andres Freund wrote: > On 2014-11-07 17:20:44 -0800, Josh Berkus wrote: >> So the basic problem is that multixact files are just huge, with an >> average of 35 bytes per multixact? > > Depends on the concurrency. The number of members is determined by the > number of xacts concurrenly locking a row.. Yeah, that leads to some extreme inflation for databases where FK conflicts are common though. On 11/08/2014 03:54 AM, Andres Freund wrote: > On 2014-11-07 17:20:44 -0800, Josh Berkus wrote: >> Of course, this will lead to LOTs of additional vacuuming ... > > Yes. And that's likely to cause much, much more grief. > > Also. Didn't you just *vehemently* oppose making these values tunable at > all? Yes, I opposed adding a *user* tunable with zero information on how it should be tuned or why. I always do and always will. I also think our defaults for multixact freezing should be tied to the ones for xid freezing, and should not by default be completely independent numbers; I'm still not convinced that it makes sense to have a separate multixact threshold at all **since the same amount of vacuuming needs to be done regardless of whether we're truncating xids or mxids**. Certainly when I play with tuning this for customers, I'm going to lower vacuum_freeze_table_age as well. -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com
On 2014-11-08 12:10:48 -0800, Josh Berkus wrote: > On 11/07/2014 05:29 PM, Alvaro Herrera wrote: > > Josh Berkus wrote: > >> Of course, this will lead to LOTs of additional vacuuming ... > > > > There's a trade-off here: more vacuuming I/O usage for less disk space > > used. How stressed your customers really are about 1 GB of disk space? > > These customers not so much. The users I encountered on chat whose > pg_multixact was over 20GB, and larger than their database? Lots. > On 11/08/2014 03:54 AM, Andres Freund wrote: > > On 2014-11-07 17:20:44 -0800, Josh Berkus wrote: > >> So the basic problem is that multixact files are just huge, with an > >> average of 35 bytes per multixact? > > > > Depends on the concurrency. The number of members is determined by the > > number of xacts concurrenly locking a row.. > > Yeah, that leads to some extreme inflation for databases where FK > conflicts are common though. On the other hand, those are the ones benefitting most from the gain in concurrency. > On 11/08/2014 03:54 AM, Andres Freund wrote: > > On 2014-11-07 17:20:44 -0800, Josh Berkus wrote: > >> Of course, this will lead to LOTs of additional vacuuming ... > > > > Yes. And that's likely to cause much, much more grief. > > > > Also. Didn't you just *vehemently* oppose making these values tunable at > > all? > > Yes, I opposed adding a *user* tunable with zero information on how it > should be tuned or why. I always do and always will. I think that's primarily naive. We don't always *have* that knowledge ahead of time. There's interactions in the real world that we are not able to predict. And people are usually more happy to find that their problem can be fixed by tuning a somewhat obscure GUC than having to patch their server or, much worse, upgrade to a newer major version that just came out. *No* user knows all our GUCs, even the really experienced ones only know half or so. And that's not because there's too many. Unless there are only three, they'll never. > I also think our > defaults for multixact freezing should be tied to the ones for xid > freezing, and should not by default be completely independent numbers; I think it'd be a good idea to tune them more automatedly in the future. But I think the current situation where you can vastly increase multivacuum_freeze_max_age while having multivacuum_multixact_freeze_max_age is *much* more useful in practice than when they always were the same. > I'm still not convinced that it makes sense to have a separate multixact > threshold at all **since the same amount of vacuuming needs to be done > regardless of whether we're truncating xids or mxids**. That's just plain wrong. The growth rate of one can be nearly independent of the other. It can e.g. be very sensible to have a huge xid freeze limit, but a much smaller multixact limit. > Certainly when I play with tuning this for customers, I'm going to lower > vacuum_freeze_table_age as well. I'm these days suggesting that people should add manual vacuuming for "older" relations during off peak hours on busy databases. There's too many sites which service degrades noticeably during a full table vacuum. If you actually mean autovacuum_freeze_max_age - I don't generally agree. It very often can be a good idea to significantly increase it. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
On 11/08/2014 01:46 PM, Andres Freund wrote: > I think it'd be a good idea to tune them more automatedly in the > future. But I think the current situation where you can vastly increase > multivacuum_freeze_max_age while having > multivacuum_multixact_freeze_max_age is *much* more useful in practice > than when they always were the same. Can you explain that? Because I'm not picturing the situation where that would make sense. >> I'm still not convinced that it makes sense to have a separate multixact >> threshold at all **since the same amount of vacuuming needs to be done >> regardless of whether we're truncating xids or mxids**. > > That's just plain wrong. The growth rate of one can be nearly > independent of the other. It can e.g. be very sensible to have a huge > xid freeze limit, but a much smaller multixact limit. Yah, so who cares? Either way you're in for a full table scan, and if you're doing the full table scan anyway, you might as well freeze the xids. >> Certainly when I play with tuning this for customers, I'm going to lower >> vacuum_freeze_table_age as well. > > I'm these days suggesting that people should add manual vacuuming for > "older" relations during off peak hours on busy databases. There's too > many sites which service degrades noticeably during a full table vacuum. Me too: https://github.com/pgexperts/flexible-freeze -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com
On 11/09/2014 08:00 PM, Josh Berkus wrote: On 11/08/2014 01:46 PM, Andres Freund wrote: >> I'm these days suggesting that people should add manual vacuuming for >> > "older" relations during off peak hours on busy databases. There's too >> > many sites which service degrades noticeably during a full table vacuum. > Me too: https://github.com/pgexperts/flexible-freeze It turns out that not even a program of preventative scheduled vacuuming helps. This is because the template0 database anchors the minmxid and prevents it from being advanced until autovacuum gets around to that database, at whatever the minmxid threshold is. -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com
On Sat, Nov 8, 2014 at 3:10 PM, Josh Berkus <josh@agliodbs.com> wrote: > I also think our > defaults for multixact freezing should be tied to the ones for xid > freezing, and should not by default be completely independent numbers; > I'm still not convinced that it makes sense to have a separate multixact > threshold at all **since the same amount of vacuuming needs to be done > regardless of whether we're truncating xids or mxids**. That doesn't make any sense at all. For one thing, it's not as if there is only ONE threshold here. There are three different ones, controlling three different aspects of the behavior: (a) the age at which we begin trying to freeze the pages we are planning to vacuum anyway, (b) the age at which we force a vacuum that we're planning to do anyway to scan the entire table, and (c) the age at which we trigger an autovacuum that we weren't otherwise planning to do. Generally, the reason why I think we need different thresholds for XIDs and MXIDs is that they may be getting consumed at vastly different rates. It may be useful to have a light that comes on in your car when you only have one gallon of fuel left, but you'd want a different threshold for an airplane because it burns fuel at a different rate. If you made that light come on when there's a gallon of fuel left, it would way too late to do any good. I think a big part of the tuning problem here is that we don't have any way of knowing what the real burn rates will be in a particular customer environment. If you're burning MXIDs very slowly, you probably want to threshold (a), the age at which we begin freezing pages we are planning to vacuum anyway, quite low, so that the next full-table vacuum triggered by XID consumption freezes all the MXIDs also, and advances relminmxid, thus preventing freezing passes specifically for MXIDs from ever happening. But if the MXID consumption rate is very high, that may result in unnecessary I/O freezing tuples that would have been updated before MXID age became an issue anyway. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On 11/10/14, 12:16 AM, Josh Berkus wrote: > On 11/09/2014 08:00 PM, Josh Berkus wrote: > On 11/08/2014 01:46 PM, Andres Freund wrote: >>> I'm these days suggesting that people should add manual vacuuming for >>>> "older" relations during off peak hours on busy databases. There's too >>>> many sites which service degrades noticeably during a full table vacuum. >> Me too: https://github.com/pgexperts/flexible-freeze > > It turns out that not even a program of preventative scheduled vacuuming > helps. This is because the template0 database anchors the minmxid and > prevents it from being advanced until autovacuum gets around to that > database, at whatever the minmxid threshold is. How did template0 even get a MultiXact? That sounds like they're really abusing the template databases. :( (Do keep in mindthat MXID 1 is a special value.) Regarding linking the two settings, I agree with others that XIDs and MXIDs are basically completely independent (as yourcustomer apparently has discovered). If you set both of the min_age parameters fairly small then it doesn't matter whichmax limit (the table_age parameters) you hit; you'll get a full scan and the low min_age limits will mean you'll getgood freezing of both. The only other thing I can think of would be having yet another set of minimum age limits that come into play when you'redoing a full scan as opposed to a partial one, but that seems like overkill to me. I guess another option would beto get more aggressive depending on the size of pg_multixact/... BTW, the only reason I know of not to set both min_age parameters to zero is to prevent loss of forensic information. Ifthat's not a concern you can always just set them to zero. Even if it is a concern, I suspect that the forensic info youcould gather from a MultiXact is a lot more limited than for an XID, so it's probably pretty safe setting that to zero. -- Jim Nasby, Data Architect, Blue Treble Consulting Data in Trouble? Get it in Treble! http://BlueTreble.com
Jim Nasby wrote: > On 11/10/14, 12:16 AM, Josh Berkus wrote: > >On 11/09/2014 08:00 PM, Josh Berkus wrote: > >On 11/08/2014 01:46 PM, Andres Freund wrote: > >>>I'm these days suggesting that people should add manual vacuuming for > >>>>"older" relations during off peak hours on busy databases. There's too > >>>>many sites which service degrades noticeably during a full table vacuum. > >>Me too: https://github.com/pgexperts/flexible-freeze > > > >It turns out that not even a program of preventative scheduled vacuuming > >helps. This is because the template0 database anchors the minmxid and > >prevents it from being advanced until autovacuum gets around to that > >database, at whatever the minmxid threshold is. > > How did template0 even get a MultiXact? That sounds like they're really abusing the template databases. :( (Do keep inmind that MXID 1 is a special value.) No, it's normal -- template0 does not have a multixact in any tuple's xmax, but datminxid is set to the value that is current when it is frozen. > BTW, the only reason I know of not to set both min_age parameters to > zero is to prevent loss of forensic information. If that's not a > concern you can always just set them to zero. Even if it is a concern, > I suspect that the forensic info you could gather from a MultiXact is > a lot more limited than for an XID, so it's probably pretty safe > setting that to zero. Freezing tuples too early could cause useless dirtying of pages; if the tuple is deleted, updated or locked again after being frozen, you end up with more writes. -- Álvaro Herrera http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
On 11/12/2014 06:57 PM, Alvaro Herrera wrote: >> How did template0 even get a MultiXact? That sounds like they're really abusing the template databases. :( (Do keep inmind that MXID 1 is a special value.) > No, it's normal -- template0 does not have a multixact in any tuple's > xmax, but datminxid is set to the value that is current when it is > frozen. > So, to follow up on this: it seems to me that we shouldn't be requiring freezing for databases where allowconn=false. This seems like a TODO to me, even possibly a backpatchable bug fix. -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com
Josh Berkus wrote: > On 11/12/2014 06:57 PM, Alvaro Herrera wrote: > >> How did template0 even get a MultiXact? That sounds like they're really abusing the template databases. :( (Do keepin mind that MXID 1 is a special value.) > > No, it's normal -- template0 does not have a multixact in any tuple's > > xmax, but datminxid is set to the value that is current when it is > > frozen. > > So, to follow up on this: it seems to me that we shouldn't be requiring > freezing for databases where allowconn=false. This seems like a TODO to > me, even possibly a backpatchable bug fix. Why do we need this for pg_multixact but not for pg_clog? -- Álvaro Herrera http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
On 11/19/2014 01:03 PM, Alvaro Herrera wrote: > Josh Berkus wrote: >> On 11/12/2014 06:57 PM, Alvaro Herrera wrote: >>>> How did template0 even get a MultiXact? That sounds like they're really abusing the template databases. :( (Do keepin mind that MXID 1 is a special value.) >>> No, it's normal -- template0 does not have a multixact in any tuple's >>> xmax, but datminxid is set to the value that is current when it is >>> frozen. >> >> So, to follow up on this: it seems to me that we shouldn't be requiring >> freezing for databases where allowconn=false. This seems like a TODO to >> me, even possibly a backpatchable bug fix. > > Why do we need this for pg_multixact but not for pg_clog? I think we want it for both. -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com
On Wed, Nov 19, 2014 at 4:16 PM, Josh Berkus <josh@agliodbs.com> wrote: > On 11/19/2014 01:03 PM, Alvaro Herrera wrote: >> Josh Berkus wrote: >>> On 11/12/2014 06:57 PM, Alvaro Herrera wrote: >>>>> How did template0 even get a MultiXact? That sounds like they're really abusing the template databases. :( (Do keepin mind that MXID 1 is a special value.) >>>> No, it's normal -- template0 does not have a multixact in any tuple's >>>> xmax, but datminxid is set to the value that is current when it is >>>> frozen. >>> >>> So, to follow up on this: it seems to me that we shouldn't be requiring >>> freezing for databases where allowconn=false. This seems like a TODO to >>> me, even possibly a backpatchable bug fix. >> >> Why do we need this for pg_multixact but not for pg_clog? > > I think we want it for both. So that we can have two ways to lose data? Forbidding connections to a database doesn't prevent XID or MXID wraparound. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
> So that we can have two ways to lose data? > > Forbidding connections to a database doesn't prevent XID or MXID wraparound. It does prevent the user from doing anything about it, though, since they can't manually vacuum template0 without knowing unpublished hackery. -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com
On Thu, Nov 20, 2014 at 3:44 PM, Josh Berkus <josh@agliodbs.com> wrote: >> So that we can have two ways to lose data? >> >> Forbidding connections to a database doesn't prevent XID or MXID wraparound. > > It does prevent the user from doing anything about it, though, since > they can't manually vacuum template0 without knowing unpublished hackery. True. I don't know what to do about that. Do you? -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Robert Haas wrote: > On Thu, Nov 20, 2014 at 3:44 PM, Josh Berkus <josh@agliodbs.com> wrote: > >> So that we can have two ways to lose data? > >> > >> Forbidding connections to a database doesn't prevent XID or MXID wraparound. > > > > It does prevent the user from doing anything about it, though, since > > they can't manually vacuum template0 without knowing unpublished hackery. > > True. I don't know what to do about that. Do you? Maybe tweak autovacuum so that it vacuum-freezes the non-connectable template databases when they are multixact_freeze_min_age old -- or something similar. That would cause the multixact age to go down to zero for those databases with enough frequency. -- Álvaro Herrera http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
On 11/20/2014 01:03 PM, Robert Haas wrote: > On Thu, Nov 20, 2014 at 3:44 PM, Josh Berkus <josh@agliodbs.com> wrote: >>> So that we can have two ways to lose data? >>> >>> Forbidding connections to a database doesn't prevent XID or MXID wraparound. >> >> It does prevent the user from doing anything about it, though, since >> they can't manually vacuum template0 without knowing unpublished hackery. > > True. I don't know what to do about that. Do you? Well, the first thing that comes to mind is that template0 should be "permanently frozen". That is, all objects in it should be created with frozen xid and mxids. After all, nobody can modify anything in it. -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com
Josh Berkus <josh@agliodbs.com> writes: > Well, the first thing that comes to mind is that template0 should be > "permanently frozen". That is, all objects in it should be created with > frozen xid and mxids. After all, nobody can modify anything in it. That sounds about as unsafe as can be. You can't stop superusers from connecting to template0 and modifying it if they want to ... and I don't really want to say "ok, the consequence of that is silent disaster many moons later". regards, tom lane
On 11/20/2014 01:47 PM, Tom Lane wrote: > Josh Berkus <josh@agliodbs.com> writes: >> Well, the first thing that comes to mind is that template0 should be >> "permanently frozen". That is, all objects in it should be created with >> frozen xid and mxids. After all, nobody can modify anything in it. > > That sounds about as unsafe as can be. You can't stop superusers from > connecting to template0 and modifying it if they want to ... and I don't > really want to say "ok, the consequence of that is silent disaster many > moons later". So it would get unfrozen when they modify it, and they'd have to deal with it. Right now we're optimizing for something only 0.1% of users ever do. The harder part of this -- the handwavy part -- is the whole idea of a "permanent freeze". Right now there's no way to mark anything as "frozen until next modifed", we're just resetting the clock on it. If there were any such thing, it would solve some of the problems around vacuum freeze. -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com
<div dir="ltr"><p dir="ltr"><br /> On Thu, Nov 20, 2014 at 9:21 PM, Josh Berkus <<a href="mailto:josh@agliodbs.com" target="_blank">josh@agliodbs.com</a>>wrote:<br /> > Well, the first thing that comes to mind is that template0 shouldbe<br /> > "permanently frozen". That is, all objects in it should be created with<br /> > frozen xid and mxids.After all, nobody can modify anything in it.<p dir="ltr">This is actually the way it used to be. It was changed becauseit was discovered there was some case where an unfrozen xid would end up in template0 anyways and for some reasonit was hard to be sure to avoid it. I don't recall exactly what the situation was that triggered it but the argumentwas made then that it was safest to just include template0 in autovacuum rather than depend on getting this 100%right and risk corruption.<p dir="ltr">--<br /> greg</div>
Greg, > This is actually the way it used to be. It was changed because it was > discovered there was some case where an unfrozen xid would end up in > template0 anyways and for some reason it was hard to be sure to avoid it. I > don't recall exactly what the situation was that triggered it but the > argument was made then that it was safest to just include template0 in > autovacuum rather than depend on getting this 100% right and risk > corruption. Right, and that was fine before pg_multixact, because even with 500m XIDs in the bank, pg_clog is still pretty small. The problem is that with the same number of multixacts, pg_multixact is around *16GB* in size. Thing is, template0 is just there as a check on users messing up template1. Having that kind if precaution causing repeated operational problems for users is not good design. Maybe we should just get rid of template0 and come up with some other mechanism to reset template1 to bare-bones state. Actually, here's a question ... pg_clog is usually smaller than I think it should be (that is, smaller than 4bytes * XID_age). Why is that? -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com
On 11/21/2014 10:44 AM, Josh Berkus wrote: > Greg, > > >> This is actually the way it used to be. It was changed because it was >> discovered there was some case where an unfrozen xid would end up in >> template0 anyways and for some reason it was hard to be sure to avoid it. I >> don't recall exactly what the situation was that triggered it but the >> argument was made then that it was safest to just include template0 in >> autovacuum rather than depend on getting this 100% right and risk >> corruption. > > Right, and that was fine before pg_multixact, because even with 500m > XIDs in the bank, pg_clog is still pretty small. The problem is that > with the same number of multixacts, pg_multixact is around *16GB* in size. > > Thing is, template0 is just there as a check on users messing up > template1. Having that kind if precaution causing repeated operational > problems for users is not good design. Maybe we should just get rid of > template0 and come up with some other mechanism to reset template1 to > bare-bones state. Or and even simpler solution: provide a way for the superuser to manually vacuum template0 *without* needing to update pg_database. -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com
Josh Berkus wrote: > Actually, here's a question ... pg_clog is usually smaller than I think > it should be (that is, smaller than 4bytes * XID_age). Why is that? It's not 4 bytes per Xid, but 2 bits per Xid, so Xid_age / 4. -- Álvaro Herrera http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
On 11/21/14, 12:51 PM, Josh Berkus wrote: > On 11/21/2014 10:44 AM, Josh Berkus wrote: >> Greg, >> >> >>> This is actually the way it used to be. It was changed because it was >>> discovered there was some case where an unfrozen xid would end up in >>> template0 anyways and for some reason it was hard to be sure to avoid it. I >>> don't recall exactly what the situation was that triggered it but the >>> argument was made then that it was safest to just include template0 in >>> autovacuum rather than depend on getting this 100% right and risk >>> corruption. >> >> Right, and that was fine before pg_multixact, because even with 500m >> XIDs in the bank, pg_clog is still pretty small. The problem is that >> with the same number of multixacts, pg_multixact is around *16GB* in size. >> >> Thing is, template0 is just there as a check on users messing up >> template1. Having that kind if precaution causing repeated operational >> problems for users is not good design. Maybe we should just get rid of >> template0 and come up with some other mechanism to reset template1 to >> bare-bones state. > > Or and even simpler solution: provide a way for the superuser to > manually vacuum template0 *without* needing to update pg_database. AIUI, this is only an issue because evin if you completely freeze a normal database you can't set frozenxid or minmxid toFrozen because that will be wrong as soon as any DML happens and we don't want to screw with a real-time update to pg_classand pg_database. But any database you can't connect to is clearly a special case. What if we allowed you to vacuum a database you couldn't connect to, and while scanning such a database tracked whether eachrel was completely frozen? Because no one else could have connected we know that no new (M)XIDs could have been createdin that database. Is there any fundamental reason a vacuum from one database couldn't vacuum relations in another database, so long as no onecould be connected to it? I'm sure there's some sanity checks that would need to be modified... We'd need something more sophisticated than datcanconnect for this to work as well, since we'd need to prevent anyone fromcopying a database while being vacuumed, as well as preventing anyone from changing datcanconnect while the vacuum isrunning. -- Jim Nasby, Data Architect, Blue Treble Consulting Data in Trouble? Get it in Treble! http://BlueTreble.com