Thread: How to make lazy VACUUM of one table run in several transactions ?
Hi Tom, I've got the impression that you have worked most actively on VACUUM and so I ask you this directly instead of adressing pgsql-hackers list in general. Feel free to correct me :) I have a problem, that I think can be solved by splitting the vacuum up to run in several transactions, each running for no more than X minutes. The problem itself is having a database with small (5-50k rows), fast- changing tables and huge (5-25M rows) slower changing tables, all running in a 24/7 setup. The small table needs to be kept small by constant vacuuming (a loop doing a vacuum on this table after each 15 sec interval) to keep up with its traffic. The problem appears when the big table needs to be vacuumed, as this vacuum on big table prevents the vacuum on small table from freeing up the space used by dead tuples. And the sutuation is made *worse* by running the vacuum with vacuum_cost_limit to reduce the I/O impact, (kind of priority-reversal), as then vacuum runs then even longer, and slows down operations on the small table even more. The fastest fix seems to change vacuum command to run in several transactions. So what should be done in addition to changing lazy_vacuum_rel(onerel, vacstmt); to check for some time/page_cnt limit after each heap page (near the start of main loop in lazy_scan_heap() ), and if it is reached then stop, clean up indexes, and return the blkno of next page needing to be vacuumed,and replacing the call to lazy_vacuum_rel(onerel, vacstmt); in vacuum.c with the following loop. next_page_to_vacuum = 0; while (next_page_to_vacuum < RelationGetNumberOfBlocks(onerel)) { StartTransactionCommand(); ActiveSnapshot = CopySnapshot(GetTransactionSnapshot()); next_page_to_vacuum = lazy_vacuum_rel(onerel, vacstmt); CommitTransactionCommand(); } Must some locks also be released an reaquired inside this loop, or is there something else I should keep in mind when trying to do this ? The operations in this loop need not be cheap - I'm happy if I can keep individual transactions below 5 to 10 minutes, though 1-2 min bould be best. P.S. One other typical case where long transactions are disastrous is Slony1's use of LISTEN/NOTIFY (the non-indexed table pg_listener, after having grown to 100Mb, is not too responsive), though there the real solution for Slony1 would be switching to polling instead of interrupt (notify) mode for high-volume databases. -- Hannu Krosing <hannu@tm.ee>
On Sun, Apr 24, 2005 at 12:02:37PM +0300, Hannu Krosing wrote: > to check for some time/page_cnt limit after each heap page (near the > start of main loop in lazy_scan_heap() ), and if it is reached then > stop, clean up indexes, and return the blkno of next page needing to > be vacuumed, and replacing the call to lazy_vacuum_rel(onerel, > vacstmt); in vacuum.c with the following loop. > > next_page_to_vacuum = 0; > while (next_page_to_vacuum < RelationGetNumberOfBlocks(onerel)) { > StartTransactionCommand(); > ActiveSnapshot = CopySnapshot(GetTransactionSnapshot()); > > next_page_to_vacuum = lazy_vacuum_rel(onerel, vacstmt); > > CommitTransactionCommand(); > } > > Must some locks also be released an reaquired inside this loop, or is > there something else I should keep in mind when trying to do this ? There is "session lock" on the table. You must release that. However, after releasing and reacquiring that lock, all you know about the table must be rechecked. In particular the table can be dropped :-) or truncated, or vacuumed by some other process; etc. So you'd need to start the vacuum "from scratch." (I guess you'd skip the first N pages.) One thing to keep in mind is whether the Xmin calculations are right after such a thing ... e.g. if you truncated the clog with the wrong parameters, you could lose data. -- Alvaro Herrera (<alvherre[@]dcc.uchile.cl>) "Doing what he did amounts to sticking his fingers under the hood of the implementation; if he gets his fingers burnt, it's his problem." (Tom Lane)
Alvaro Herrera <alvherre@dcc.uchile.cl> writes: > On Sun, Apr 24, 2005 at 12:02:37PM +0300, Hannu Krosing wrote: >> Must some locks also be released an reaquired inside this loop, or is >> there something else I should keep in mind when trying to do this ? > There is "session lock" on the table. You must release that. Actually, the only hope of making this work is NOT to release that. If you hold the appropriate lock at the session level then it is reasonable to consider successive transactions within the vacuum as being one big operation. I think the major issue with this would be memory management, ie, how to prevent CommitTransactionCommand from cleaning up all of vacuum's working state. regards, tom lane
On E, 2005-04-25 at 11:11 -0400, Tom Lane wrote: > Alvaro Herrera <alvherre@dcc.uchile.cl> writes: > > On Sun, Apr 24, 2005 at 12:02:37PM +0300, Hannu Krosing wrote: > >> Must some locks also be released an reaquired inside this loop, or is > >> there something else I should keep in mind when trying to do this ? > > > There is "session lock" on the table. You must release that. > > Actually, the only hope of making this work is NOT to release that. > If you hold the appropriate lock at the session level then it is > reasonable to consider successive transactions within the vacuum > as being one big operation. > > I think the major issue with this would be memory management, ie, > how to prevent CommitTransactionCommand from cleaning up all of > vacuum's working state. Are there any known (easy :) tricks for achieving this ? Now that I have had time to look a little more at this I have another idea: Could I avoid having a transaction at all? As VACUUM is not "transactional" in the sense that it does not change anything visible to users ever, can't be undone by rollback, etc... , could it be possible to create enough "transaction-like" environment for it to really run outside of transactions. Perhaps just advancing oldestXmin at certain intervals ? -- Hannu Krosing <hannu@tm.ee>
Hannu Krosing <hannu@tm.ee> writes: > Could I avoid having a transaction at all? Not really; too much of the database access infrastructure is tied to transaction stuff ... even facilities as basic as memory management. > As VACUUM is not "transactional" in the sense that it does not change > anything visible to users ever, can't be undone by rollback, etc... , > could it be possible to create enough "transaction-like" environment for > it to really run outside of transactions. Perhaps just advancing > oldestXmin at certain intervals ? I wonder whether you could avoid advertising the VACUUM's XID in PGPROC. Not sure that this can work, but it would be a lot simpler than stopping and starting transactions ... regards, tom lane
On T, 2005-04-26 at 17:54 -0400, Tom Lane wrote: > Hannu Krosing <hannu@tm.ee> writes: > > Could I avoid having a transaction at all? > > Not really; too much of the database access infrastructure is tied to > transaction stuff ... even facilities as basic as memory management. > > > As VACUUM is not "transactional" in the sense that it does not change > > anything visible to users ever, can't be undone by rollback, etc... , > > could it be possible to create enough "transaction-like" environment for > > it to really run outside of transactions. Perhaps just advancing > > oldestXmin at certain intervals ? > > I wonder whether you could avoid advertising the VACUUM's XID in PGPROC. > Not sure that this can work, but it would be a lot simpler than stopping > and starting transactions ... What dreadful things will happed if I just set the xid and xmin of current transaction (proc->xid, ->xmin) to some value below FirstNormalTransactionId, so that TransactionIdIsNormal(xid) will make GetOldestXmin ignore my transaction ? Will this for example confuse WAL or cause something else equally grim to happen ? An alternative would be to have some special "ignore-me" flag in PGPROC. A more general solution to the problem "VACUUM does not clean dead tuples fast enough due to an old transaction" problem is keeping the OldestXmin for each table separately as a list of table OIDs in each PGPROC. This would be automatically extandable to long COPY, or in fact any single SQL statement running in its implicit transaction by examining the query plan and reserving all tables touched by the query and its dependencies. And this could be done as a user-level command for explicit transactions (LOCK TABLE my_table, othertable, yetanothertable FOR/FROM VACUUM) which would tell postgres that the current query will need these (and only these) tables. If LOCK FOR VACUUM is done inside a transaction, then touching any other table should be an error. -- Hannu Krosing <hannu@tm.ee>
Hannu Krosing <hannu@tm.ee> writes: > A more general solution to the problem "VACUUM does not clean dead > tuples fast enough due to an old transaction" problem is keeping the > OldestXmin for each table separately as a list of table OIDs in each > PGPROC. > This would be automatically extandable to long COPY, or in fact any > single SQL statement running in its implicit transaction by examining > the query plan and reserving all tables touched by the query and its > dependencies. This is completely unworkable, since it amounts to assuming you know at the start of a serializable transaction which tables it will touch. In point of fact you can't even know that for the current query let alone future ones --- consider user-defined functions. (Not to mention that we can't expect to fit that much info into a fixed amount of shared memory.) regards, tom lane
On E, 2005-05-02 at 10:38 -0400, Tom Lane wrote: But what about my question about just changing xid in PGPROC ? > Hannu Krosing <hannu@tm.ee> writes: > > A more general solution to the problem "VACUUM does not clean dead > > tuples fast enough due to an old transaction" problem is keeping the > > OldestXmin for each table separately as a list of table OIDs in each > > PGPROC. > > > This would be automatically extandable to long COPY, or in fact any > > single SQL statement running in its implicit transaction by examining > > the query plan and reserving all tables touched by the query and its > > dependencies. > > This is completely unworkable, since it amounts to assuming you know at > the start of a serializable transaction which tables it will touch. I meant it for "simple" things, like VACUUM and maybe also simple COPY (tables ref'd by foreign keys can be found from pg_depend), but yes we can't automatically know it in general case. That's why I proposed it to be an error for any transaction with such list to touch any tables not in the list, so the assumptions of what tables are tested are simple to test. > In > point of fact you can't even know that for the current query let alone > future ones --- consider user-defined functions. > > (Not to mention that we can't expect to fit that much info into a fixed > amount of shared memory.) what should be in shared memory, is one list of ints per backend (oids of tables for current transaction), mostly just one :) The general case should be not to have such list at all which assumes that any table could be touched. -- Hannu Krosing <hannu@tm.ee>
Something weird is going on with some of mai emails - I tried to send this from hannu@tm.ee, then from hannu@skype.net and it does not appear in list. Another answer sent after this came through in less than 5 min. Now I removed Tom Lane from To: and moved pgsql-hackers@postgresql.org from CC: to To: field On T, 2005-04-26 at 17:54 -0400, Tom Lane wrote: > Hannu Krosing <hannu@tm.ee> writes: > > Could I avoid having a transaction at all? > > Not really; too much of the database access infrastructure is tied to > transaction stuff ... even facilities as basic as memory management. > > > As VACUUM is not "transactional" in the sense that it does not change > > anything visible to users ever, can't be undone by rollback, etc... , > > could it be possible to create enough "transaction-like" environment for > > it to really run outside of transactions. Perhaps just advancing > > oldestXmin at certain intervals ? > > I wonder whether you could avoid advertising the VACUUM's XID in PGPROC. > Not sure that this can work, but it would be a lot simpler than stopping > and starting transactions ... What I came up with is adding an extra bool to PGPROC (inVacuum) which is set if the current command is Vacuum in its own transaction and modifying GetOldestXmin() tos skip backends where inVacuum==true. I also added this skipping logic to where PGPROC-xmin is set. The resultin code passes 'make check' and also seems to do the right thing, wrt its purpose. I tested it by generating a 2M row table 'bightable' and then running 'vacuum verbose analyse bigtable;' on it (runtime ~25 sec). At the same time I repeatedly ran in another connection 'update smalltable set data=data+1;vacuum verbose smalltable;' which was able to free all 16 old rows all the time during both vacuum and analyse phases. The only surprise was that it did not work during plain 'analyse bigtable;'. It seems that plain analyse does not set use_own_xacts and thus cant't make use of this optimisation. Please comment on the attached patch (against ver 8.0.2), especially what could be broken (WAL, some trx isolation level, ... ?) and how to test if it is. Also, does this have any chance to get accepted in 8.1 ? -- Hannu Krosing <hannu@skype.net>