Thread: Minimally avoiding Transaction Wraparound in VLDBs
For a while now, I've been seeking a method of reducing the impact of VACUUM when run against a database where 80-95% of the data is effectively read only and will not be deleted/updated again. This is the situation in most Data Warehouses. When you get very large databases (VLDB) the execution time of VACUUM becomes prohibitive. I understand the need to run some form of VACUUM to avoid transaction id wraparound, but I see that VACUUM does a lot of other things too. A new command is proposed - VACUUM MINIMAL. The *sole* purpose of this command is to do the absolute minimum required to avoid transaction id wraparound. (Better names welcome....) This does the same thing as VACUUM except it: 1. does not VACUUM any table younger than 4 billion XIDs old RATIONALE: If a table was created less than 4 billion XIDs ago, it clearly can't have any tuples in it with an XID more than 4 billion XIDs old, so we don't need to VACUUM it to avoid XID wraparound. (Current VACUUM will scan everything, even if a table was created only a few transactions ago). 2. does not VACUUM indexes RATIONALE: Indexes don't contain XIDs so don't *need* to be vacuumed. By taking those two steps, VACUUM MINIMAL will execute fairly quickly even on large Data Warehouses. Those steps are fairly easy to implement without change to the basic structure of VACUUM. This command only makes sense when you *know* that tables don't need vacuuming. That is the case when: - you're using autovacuum, since all tables have dead-tuples removed whenever this is required - and so indexes will have been covered also - you are using table partitioning and the data retention period of your data is less than 4 billion transactions. At 100 requests/sec that is a whole year of data - and if you are using COPY to load the data, then that comes out at hundreds of billions of rows, or Terabytes of data. (Which is also the rationale, since you really *don't* want to VACUUM a Terabyte of tables with indexes on them, ever). The limit is set at 4 billion because with this command we are trying to avoid doing work as long as possible. This makes the command faster, which in turn allows the command to be run more regularly, probably daily. Of course, you would get a somewhat longer running command once table XIDs have been frozen but this is for the user to understand and avoid, if they have problems with that. Thus, the user has a choice of two ways of avoiding XID wraparound: - VACUUM - VACUUM MINIMAL Each with their specific strengths and weaknesses. We've discussed in the past the idea that VACUUM can be speeded-up by using a bitmap to track which blocks to clean. That's a good idea and I see that as somewhat orthogonal to the reason for this proposal. To be of use in the circumstances I'm trying to optimise for, the vacuum bitmaps would need to be non-lossy, persistent and recoverable to be of use for xid wraparound use (I know the clog code could be used for that), as well as only utilised for tables bigger than a certain threshold, say 128 heap blocks or more - to avoid having too many bitmaps when there are 1000s of tables. They also still don't help much with VACUUMing big indexes in a DW context, since VACUUM still makes two passes of each index even when there are no dead rows to remove from the the index. That could be tuned somewhat, for which I also have a design but why bother tuning VACUUM when you can just skip it? Comments? Best Regards, Simon Riggs
Simon Riggs <simon@2ndquadrant.com> writes: > A new command is proposed - VACUUM MINIMAL. The *sole* purpose of this > command is to do the absolute minimum required to avoid transaction id > wraparound. (Better names welcome....) I do not see the point. If you only need to run it every billion transactions, saving a few cycles seems a bit pointless. > This does the same thing as VACUUM except it: > 1. does not VACUUM any table younger than 4 billion XIDs old So? Your large tables are likely to be long-lived, so this isn't actually going to save a thing in a DB that's been around long enough to have an impending wrap problem. > 2. does not VACUUM indexes > RATIONALE: Indexes don't contain XIDs so don't *need* to be vacuumed. Nonstarter. If you remove any tuples, you *must* vacuum indexes to remove the corresponding index entries. If you don't remove any tuples, you don't scan the indexes anyway IIRC. > By taking those two steps, VACUUM MINIMAL will execute fairly quickly > even on large Data Warehouses. I don't think this'll save a thing. > This command only makes sense when you *know* that tables don't need > vacuuming. That is the case when: > - you're using autovacuum, since all tables have dead-tuples removed > whenever this is required - and so indexes will have been covered also If you're using autovacuum then the problem is already taken care of. It will be taken care of better in 8.2, if we add per-table tracking of XID wraparound risk, but it's handled now. The only way that this proposal makes any sense is if you are trying not to vacuum at all, ever. > - you are using table partitioning and the data retention period of your > data is less than 4 billion transactions. Again, per-table tracking of wraparound horizon would take care of this case, more elegantly and more safely. > The limit is set at 4 billion because with this command we are trying to > avoid doing work as long as possible. You do realize that 2 billion is already the wrap horizon, and you can't wait that long if you're doing this on a routine basis rather than immediately-when-needed? regards, tom lane
On Wed, 2005-08-31 at 19:24 -0400, Tom Lane wrote: > Simon Riggs <simon@2ndquadrant.com> writes: > > A new command is proposed - VACUUM MINIMAL. The *sole* purpose of this > > command is to do the absolute minimum required to avoid transaction id > > wraparound. (Better names welcome....) > > I do not see the point. If you only need to run it every billion > transactions, saving a few cycles seems a bit pointless. It is not simply a few cycles, it could be days of grinding I/O. The worse it is, the more it gets put off, which makes it worse etc.. ...it kindof prevents embedded systems from working happily. > > This does the same thing as VACUUM except it: > > > 1. does not VACUUM any table younger than 4 billion XIDs old > > So? Your large tables are likely to be long-lived, so this isn't > actually going to save a thing in a DB that's been around long enough > to have an impending wrap problem. My point was, with table partitioning that the largest tables would *not* be the longest lived. Any very large table has to be truncated down to a manageable size at some point. Partitioning completely avoids the need to DELETE or TRUNCATE, since you use CREATE TABLE to introduce new partitions and DROP TABLE to remove them. > > 2. does not VACUUM indexes > > RATIONALE: Indexes don't contain XIDs so don't *need* to be vacuumed. > > Nonstarter. If you remove any tuples, you *must* vacuum indexes to > remove the corresponding index entries. Yes, but if no tuples have been deleted or updated, there will be no tuples removed. > If you don't remove any tuples, > you don't scan the indexes anyway IIRC. No. Even if you remove *zero* tuples, an index is still scanned twice. Once to not delete the rows and once to not delete the pages. That isn't as mad as it sounds, since a VACUUM needs to mark pages as HALF_DEAD on one VACUUM, then come back for a DELETED marker later because of MVCC. But that behaviour isn't required at all in the scenario I described, so I was seeking a way to get around that. We could optimise that, as I said, but I was trying to avoid the problem entirely. But, I'm easy, if you like the sound of that better I have the details for that worked out also. > > By taking those two steps, VACUUM MINIMAL will execute fairly quickly > > even on large Data Warehouses. > > I don't think this'll save a thing. I understand that the scenarios I describe seem highly atypical, but there are a number people already running this kind of system. I've worked with a couple and I've seen a half-dozen others on list, and there are more coming. > > This command only makes sense when you *know* that tables don't need > > vacuuming. That is the case when: > > - you're using autovacuum, since all tables have dead-tuples removed > > whenever this is required - and so indexes will have been covered also > > If you're using autovacuum then the problem is already taken care of. autovacuum will respond only to UPDATEs and DELETEs. In the scenario I outline, these will *never* occur on the largest tables. A VACUUM would still eventually be required to freeze long lived tuples and this would not be performed by autovacuum. > It will be taken care of better in 8.2, if we add per-table tracking > of XID wraparound risk, but it's handled now. Thats a good idea, I must have missed the discussion on that. But what I am suggesting is for a certain class of table, which just happens to be very large, we defer a VACUUM for as long as possible. The hope is, and I believe that this could be very likely, that the table would cease to exist before the table became eligible for VACUUM. The current viewpoint is that "most" rows never live for 1 billion rows, so never get frozen. Thats completely valid in something like a Stock table, but its completely wrong in a TransactionHistory table. For a historical data table I want to be able to defer the need for a full table VACUUM for as long as possible, and when/should it ever occur, I want to avoid scanning the indexes pointlessly. > The only way that this > proposal makes any sense is if you are trying not to vacuum at all, ever. Yes, that is exactly what I'm trying to do, for the largest tables only. I never want to VACUUM them because I know they don't need it to reclaim rows and I have a *good* expectation that the table will be dropped, one day. For more "normal" tables, I'm happy to VACUUM them and don't want to alter that behaviour at all. > You do realize that 2 billion is already the wrap horizon, and you can't > wait that long if you're doing this on a routine basis rather than > immediately-when-needed? OK, 4 billion was just a mistake...I just meant avoid VACUUM as long as possible. I changed the number immediately before posting, to emphasise the deferral aspect of this proposal. Best Regards, Simon Riggs
On Thu, Sep 01, 2005 at 01:57:02AM +0100, Simon Riggs wrote: > > If you're using autovacuum then the problem is already taken care of. > > autovacuum will respond only to UPDATEs and DELETEs. In the scenario I > outline, these will *never* occur on the largest tables. A VACUUM would > still eventually be required to freeze long lived tuples and this would > not be performed by autovacuum. Hum, I don't understand -- if you don't want to vacuum the table, why run vacuum at all? You can (as of 8.1) disable autovacuum for specific tables. The exception is that you are forced to run a database-wide VACUUM once in a while (every billion-and-so), but this will hopefully disappear in 8.2 too, leaving you effectively with the option of never vacuuming a table. -- Alvaro Herrera -- Valdivia, Chile Architect, www.EnterpriseDB.com You liked Linux a lot when he was just the gawky kid from down the block mowing your lawn or shoveling the snow. But now that he wants to date your daughter, you're not so sure he measures up. (Larry Greenemeier)
Simon Riggs <simon@2ndquadrant.com> writes: > On Wed, 2005-08-31 at 19:24 -0400, Tom Lane wrote: >> If you don't remove any tuples, >> you don't scan the indexes anyway IIRC. > No. Even if you remove *zero* tuples, an index is still scanned twice. > Once to not delete the rows and once to not delete the pages. Yeah? Well, that could probably be improved with a less intrusive fix, that is, one that does it automatically instead of involving the user. I really really do not like proposals to introduce still another kind of VACUUM. We have too many already; any casual glance through the archives will show that most PG users don't have a grip on when to use VACUUM FULL vs VACUUM. Throwing in some more types will make that problem exponentially worse. > autovacuum will respond only to UPDATEs and DELETEs. In the scenario I > outline, these will *never* occur on the largest tables. A VACUUM would > still eventually be required to freeze long lived tuples and this would > not be performed by autovacuum. Yes, it will, at least as of 8.1. regards, tom lane
> I really really do not like proposals to introduce still another kind > of VACUUM. We have too many already; any casual glance through the > archives will show that most PG users don't have a grip on when to use > VACUUM FULL vs VACUUM. Throwing in some more types will make that > problem exponentially worse. Yes, but if they're all under the control of autovacuum, then users don't have to worry... Chris
Christopher Kings-Lynne <chriskl@familyhealth.com.au> writes: >> I really really do not like proposals to introduce still another kind >> of VACUUM. We have too many already; any casual glance through the >> archives will show that most PG users don't have a grip on when to use >> VACUUM FULL vs VACUUM. Throwing in some more types will make that >> problem exponentially worse. > Yes, but if they're all under the control of autovacuum, then users > don't have to worry... Well, if the proposal comes packaged with an algorithm by which autovacuum will use it, that's a different story. What's sticking in my craw about this proposal is really that it's assuming detailed manual management of vacuuming, which is exactly the thing we've been sweating to get rid of. BTW ... the original Berkeley papers on Postgres make frequent reference to a "vacuum daemon", which seems to be essentially what we're trying to build with autovacuum. Does anyone know if the Berkeley implementation ever actually had auto vacuuming, or was that all handwaving? If it did exist, why was it removed? regards, tom lane
On Wed, 31 Aug 2005, Tom Lane wrote: > Christopher Kings-Lynne <chriskl@familyhealth.com.au> writes: > >> I really really do not like proposals to introduce still another kind > >> of VACUUM. We have too many already; any casual glance through the > >> archives will show that most PG users don't have a grip on when to use > >> VACUUM FULL vs VACUUM. Throwing in some more types will make that > >> problem exponentially worse. > > > Yes, but if they're all under the control of autovacuum, then users > > don't have to worry... > > Well, if the proposal comes packaged with an algorithm by which > autovacuum will use it, that's a different story. What's sticking in > my craw about this proposal is really that it's assuming detailed manual > management of vacuuming, which is exactly the thing we've been sweating > to get rid of. > > BTW ... the original Berkeley papers on Postgres make frequent reference > to a "vacuum daemon", which seems to be essentially what we're trying to > build with autovacuum. Does anyone know if the Berkeley implementation > ever actually had auto vacuuming, or was that all handwaving? If it did > exist, why was it removed? Well, from my reading of some of the early papers, VACUUM was kind of different to what it is now. The idea was that expired data would be moved out the heap and stored else where. A timetravel mechanism could be used to see previous versions of the row. It makes sense that they would manage this with a daemon, but I never saw one. Mind, I wasn't looking for one. Thanks, Gavin
Gavin, Tom, > Well, from my reading of some of the early papers, VACUUM was kind of > different to what it is now. The idea was that expired data would be moved > out the heap and stored else where. A timetravel mechanism could be used > to see previous versions of the row. And from talking to a couple of Stonebraker's former students at conferences, this mechanism was never build satisfactorily; it always existed on paper, at least at UCB. Illustra might have finished it. Elein around? -- Josh Berkus Aglio Database Solutions San Francisco
Tom, > If you're using autovacuum then the problem is already taken care of. > It will be taken care of better in 8.2, if we add per-table tracking > of XID wraparound risk, but it's handled now. The only way that this > proposal makes any sense is if you are trying not to vacuum at all, ever. Hmmm ... the potential problem which Simon is mentioning is very real ... in large DWs, there may be tables/partitions which are never, ever vacuumed. Ever. For example, at one client's site they load their data via ETL jobs that insert about 25 million rows a day and update about 100,000. Given that updates are < 5%, there is no reason from a data efficiency perspective to ever vacuum. So that's the plan ... the main fact table will never, ever be vacuumed. (in that particular case, since the ETL uses large transaction batches, XID wraparound won't happen for about 20 years. But with a different data load model, it could be a serious problem). So, will per-table XID tracking allow us to avoid *ever* vacuuming some tables? If not, what could? -- Josh Berkus Aglio Database Solutions San Francisco
On 2005-09-01, Alvaro Herrera <alvherre@alvh.no-ip.org> wrote: > On Thu, Sep 01, 2005 at 01:57:02AM +0100, Simon Riggs wrote: > >> > If you're using autovacuum then the problem is already taken care of. >> >> autovacuum will respond only to UPDATEs and DELETEs. In the scenario I >> outline, these will *never* occur on the largest tables. A VACUUM would >> still eventually be required to freeze long lived tuples and this would >> not be performed by autovacuum. > > Hum, I don't understand -- if you don't want to vacuum the table, why > run vacuum at all? You can (as of 8.1) disable autovacuum for specific > tables. The exception is that you are forced to run a database-wide > VACUUM once in a while (every billion-and-so), but this will hopefully > disappear in 8.2 too, Wishful thinking, or do you have a concrete plan to achieve it? -- Andrew, Supernews http://www.supernews.com - individual and corporate NNTP services
On Wed, 2005-08-31 at 22:21 -0400, Tom Lane wrote: > Simon Riggs <simon@2ndquadrant.com> writes: > > On Wed, 2005-08-31 at 19:24 -0400, Tom Lane wrote: > >> If you don't remove any tuples, > >> you don't scan the indexes anyway IIRC. > > > No. Even if you remove *zero* tuples, an index is still scanned twice. > > Once to not delete the rows and once to not delete the pages. > > Yeah? Well, that could probably be improved with a less intrusive fix, > that is, one that does it automatically instead of involving the user. > > I really really do not like proposals to introduce still another kind > of VACUUM. We have too many already; any casual glance through the > archives will show that most PG users don't have a grip on when to use > VACUUM FULL vs VACUUM. Throwing in some more types will make that > problem exponentially worse. I'll post my proposal for changing that, so we can see the two alternatives. I'm easy either way at the moment. Best Regards, Simon Riggs
On Thu, Sep 01, 2005 at 04:21:58AM -0000, Andrew - Supernews wrote: > On 2005-09-01, Alvaro Herrera <alvherre@alvh.no-ip.org> wrote: > > On Thu, Sep 01, 2005 at 01:57:02AM +0100, Simon Riggs wrote: > > > >> > If you're using autovacuum then the problem is already taken care of. > >> > >> autovacuum will respond only to UPDATEs and DELETEs. In the scenario I > >> outline, these will *never* occur on the largest tables. A VACUUM would > >> still eventually be required to freeze long lived tuples and this would > >> not be performed by autovacuum. > > > > Hum, I don't understand -- if you don't want to vacuum the table, why > > run vacuum at all? You can (as of 8.1) disable autovacuum for specific > > tables. The exception is that you are forced to run a database-wide > > VACUUM once in a while (every billion-and-so), but this will hopefully > > disappear in 8.2 too, > > Wishful thinking, or do you have a concrete plan to achieve it? We talked about it during the autovacuum discussions just before feature freeze. There is a vague plan which I intend to study eventually. -- Alvaro Herrera -- Valdivia, Chile Architect, www.EnterpriseDB.com "On the other flipper, one wrong move and we're Fatal Exceptions" (T.U.X.: Term Unit X - http://www.thelinuxreview.com/TUX/)
Josh Berkus <josh@agliodbs.com> writes: > So, will per-table XID tracking allow us to avoid *ever* vacuuming some > tables? If your definition of "ever" is less than a billion transactions, sure. (As Simon points out, with time-partitioned data sets that could often be arranged, so it's not a completely silly response.) > If not, what could? The only possibility for more-than-a-billion is widening XIDs to 64 bits, which would come with its own set of performance penalties. regards, tom lane
On Thu, 2005-09-01 at 10:29 +0100, Simon Riggs wrote: > On Wed, 2005-08-31 at 22:21 -0400, Tom Lane wrote: > > Simon Riggs <simon@2ndquadrant.com> writes: > > > On Wed, 2005-08-31 at 19:24 -0400, Tom Lane wrote: > > >> If you don't remove any tuples, > > >> you don't scan the indexes anyway IIRC. > > > > > No. Even if you remove *zero* tuples, an index is still scanned twice. > > > Once to not delete the rows and once to not delete the pages. > > > > Yeah? Well, that could probably be improved with a less intrusive fix, > > that is, one that does it automatically instead of involving the user. > > > > I really really do not like proposals to introduce still another kind > > of VACUUM. We have too many already; any casual glance through the > > archives will show that most PG users don't have a grip on when to use > > VACUUM FULL vs VACUUM. Throwing in some more types will make that > > problem exponentially worse. > > I'll post my proposal for changing that, so we can see the two > alternatives. I'm easy either way at the moment. Currently, VACUUM will always scan each index twice, even if there are no tuples to remove from the index. Once in index_bulk_delete() and once in index_vacuum_cleanup() (at least for the nbtree and rtree AMs). My first proposal is to add an extra parameter onto the index_bulk_delete() call - ntuples. If ntuples == 0 then btbulkdelete() will avoid scanning and return immediately. If a scan occurs, then we keep track of how many tuples have been marked deleted and stop the scan when we have reached this number. This approach means that two use cases will be optimised: 1) where the VACUUM does not remove any tuples 2) where all the deletions are on the LHS of the index, as would be the case in a frequently updated table like Shipments where an index had either a monotonically increasing key or a time based key. Optimising index_vacuum_cleanup() is more complex. btvacuumcleanup() marks pages deleted OR adds pages already deleted onto the freelist. Non-FULL VACUUM never does both at the same time for MVCC reasons, so it takes two VACUUMs to actually move a page back onto the freelist. So performing a scan of the index during index_vacuum_cleanup() does actually have a purpose when there are no tuples deleted during the vacuum because it might be the subsequent VACUUM coming back later to freelist the deleted pages. (This is interesting, because autovacuum knows nothing of the deleted pages and may not trigger a second vacuum, even though one would be beneficial). index_vacuum_cleanup() knows how many rows are to be removed because it is already part of the info structure passed between index_bulk_delete() and index_vacuum_cleanup(). There are two options: 1) avoid the scan when there are no dead tuples, and ignore the possibility that a VACUUM might be doing a follow-up scan to remove previously deleted tuples. 2) close the loophole by recording whether a previous VACUUM had set any pages as deleted, or not. If the index was "clean", then we would skip the scan. 1) seems the best option since 2) is a lot of work for less gain. If all of that seems OK, I'll code a patch. Best Regards, Simon Riggs
Simon Riggs <simon@2ndquadrant.com> writes: > My first proposal is to add an extra parameter onto the > index_bulk_delete() call - ntuples. If ntuples == 0 then btbulkdelete() > will avoid scanning and return immediately. If a scan occurs, then we > keep track of how many tuples have been marked deleted and stop the scan > when we have reached this number. This seems reasonable. I think you overstate the value of an early stop, but it can't hurt given the minimal cost of making the check. > 1) avoid the scan when there are no dead tuples, and ignore the > possibility that a VACUUM might be doing a follow-up scan to remove > previously deleted tuples. I think this is a pretty bad idea. Your analysis assumes that the two scans are equally expensive, which is about as far from true as is possible. In the first place, the first scan is in index order while the second is in physical order (hence can benefit from readahead). In the second place, the first scan iterates over each item on each page, where the second does not. So what you're proposing is to install an unavoidable disk space bloat mechanism in order to save only a small part of the cost of VACUUM. (What I'd really like to see is a way to do the first scan in physical order, but we'd need a different locking approach AFAICS --- see the notes in nbtree/README.) Keeping track of whether there's any work pending for the second pass doesn't seem too unreasonable to me. It was left undone in the original coding on the KISS principle, but it could certainly be done. I'd still like to see some demonstration first that this part of index vacuuming is a significant cost compared to the heap and first-pass index costs. regards, tom lane
On Wed, 31 Aug 2005, Tom Lane wrote: > BTW ... the original Berkeley papers on Postgres make frequent reference > to a "vacuum daemon", which seems to be essentially what we're trying to > build with autovacuum. Does anyone know if the Berkeley implementation > ever actually had auto vacuuming, or was that all handwaving? If it did > exist, why was it removed? Well, I was just poking around the executor and noticed this in ExecDelete(): /* * Note: Normally one would think that we have to delete index tuples * associated with the heap tuple now.. * * ... but in POSTGRES, we have no need to do this because the vacuum * daemon automatically opens an index scanand deletes index tuples * when it finds deleted heap tuples. -cim 9/27/89 */ So, it seems they must have actually written the vacuum daemon. Gavin
On Thu, Sep 22, 2005 at 03:52:21PM +1000, Gavin Sherry wrote: > On Wed, 31 Aug 2005, Tom Lane wrote: > > > BTW ... the original Berkeley papers on Postgres make frequent reference > > to a "vacuum daemon", which seems to be essentially what we're trying to > > build with autovacuum. Does anyone know if the Berkeley implementation > > ever actually had auto vacuuming, or was that all handwaving? If it did > > exist, why was it removed? > > Well, I was just poking around the executor and noticed this in > ExecDelete(): > > /* > * Note: Normally one would think that we have to delete index tuples > * associated with the heap tuple now.. > * > * ... but in POSTGRES, we have no need to do this because the vacuum > * daemon automatically opens an index scan and deletes index tuples > * when it finds deleted heap tuples. -cim 9/27/89 > */ I have an idea who this might be :) Cheers, D -- David Fetter david@fetter.org http://fetter.org/ phone: +1 510 893 6100 mobile: +1 415 235 3778 Remember to vote!
On Sep 22, 2005, at 3:55 PM, David Fetter wrote: > On Thu, Sep 22, 2005 at 03:52:21PM +1000, Gavin Sherry wrote: > >> On Wed, 31 Aug 2005, Tom Lane wrote: >> >> Well, I was just poking around the executor and noticed this in >> ExecDelete(): >> >> /* >> * Note: Normally one would think that we have to delete index >> tuples >> * associated with the heap tuple now.. >> * >> * ... but in POSTGRES, we have no need to do this because the >> vacuum >> * daemon automatically opens an index scan and deletes index >> tuples >> * when it finds deleted heap tuples. -cim 9/27/89 >> */ >> > > I have an idea who this might be :) giof http://www.taylors.org/cim/resume/cimarron.html Michael Glaesemann grzm myrealbox com