Thread: Non-transactional pg_class, try 2
Here I repost the patch to implement non-transactional catalogs, the first of which is pg_ntclass, intended to hold the non-transactional info about pg_class (reltuples, relpages). pg_ntclass is a relation of a new relkind, RELKIND_NON_TRANSACTIONAL (ideas for shorter names welcome). In pg_class, we store a TID to the corresponding tuple. The tuples are not cached; they are obtained by heap_fetch() each time they are requested. This may be worth reconsideration. heap_update refuses to operate on a non-transactional catalog, because there's no (easy) way to update pg_class accordingly. This normally shouldn't be a problem. vac_update_relstats updates the tuple by using the new heap_inplace_update call. VACUUM FULL also refuses to operate on these tables, and ANALYZE silently skips them. Only plain VACUUM cleans them. Note that you can DELETE from pg_ntclass. Not sure if we should disallow it somehow, because it's not easy to get out from that if you do. (But it's possible -- just insert enough tuples until you reach the needed TID, and then delete the ones that are not pointed by any pg_class row). Regression test pass; I updated the stats test because it was accessing pg_class.relpages. So there's already a test to verify that it's working. There is one caveat that I'm worried about. I had to add a "typedef" to pg_class.h to put ItemPointerData in FormData_pg_class, because the C struct doesn't recognize the "tid" type, but the bootstrap type system does not recognize ItemPointerData as a valid type. I find this mighty ugly because it will have side effects whenever we #include pg_class.h (which is virtually anywhere, since that header is #included in htup.h which in turn is included almost everywhere). Suggestions welcome. Maybe this is not a problem. Other two caveats are: 1. During bootstrap, RelationBuildLocalRelation creates nailed relations with hardcoded TID=(0,1). This is because we don't have access to pg_class yet, so we can't find the real pointer; and furthermore, we are going to fix the entries later in the bootstrapping process. 2. The whole VACUUM/VACUUM FULL/ANALYZE relation list stuff is pretty ugly as well; and autovacuum is skipping pg_ntclass (really all non-transactional catalogs) altogether. We could improve the situation by introducing some sort of struct like {relid, relkind}, so that vacuum_rel could know what relkind to expect, and it could skip non-transactional catalogs cleanly in vacuum full and analyze. I intend to apply this patch by tuesday or wednesday, unless an objection is raised prior to that.
Attachment
Alvaro Herrera wrote: > Here I repost the patch to implement non-transactional catalogs, the > first of which is pg_ntclass, intended to hold the non-transactional > info about pg_class (reltuples, relpages). I forgot to attach the new file pg_ntclass.h (src/include/catalog). Here it is.
Attachment
Hi, This is the relminxid patch corresponding to the pg_ntclass patch I just posted. Obviously, the relminxid and relvacuumxid fields are in pg_ntclass (not pg_class like in the previous incarnations of this patch). This makes the whole business much saner and now I don't need to insert bogus CommandCounterIncrement calls. Regression tests pass. The thing that bothers me most about this is that it turns LockRelation into an operation that needs to heap_fetch() from pg_ntclass in some cases, and possibly update it. I think we should consider some sort of "non-transactional shared cache" for storing RELKIND_NON_TRANSACTIONAL catalog entries. Eventually it may help the sequences stuff as well, if we implement sequences using that kind of catalog. The documentation changes may be a bit off in this patch, since I didn't worry about merging it with the pg_ntclass patch. But it's easy to correct and I'll do it before committing it. My intention is to wait two or three days after committing the pg_ntclass patch, and then commit this one, unless I hear objections before that.
Attachment
Alvaro Herrera <alvherre@commandprompt.com> writes: > I forgot to attach the new file pg_ntclass.h (src/include/catalog). > Here it is. Couple thoughts about this: * I still suggest calling it pg_class_nt not pg_ntclass; that naming convention seems like it will scale better if there are more nontransactional "appendage" relations. I'm surprised you didn't already need to invent pg_database_nt, for instance ... don't datvacuumxid and datfrozenxid need to be nontransactional? * The DATA() entries for the bootstrapped relations ought to be commented as to which rels they belong to (corresponding to the hardwired TIDs in pg-class.h): DATA(insert ( 0 0 )); /* pg_type */ etc regards, tom lane
Alvaro Herrera <alvherre@alvh.no-ip.org> writes: > VACUUM FULL also refuses to operate on these tables, and ANALYZE > silently skips them. Only plain VACUUM cleans them. I wonder whether VACUUM FULL applied to an NT table shouldn't just act like plain VACUUM instead. Probably not very important though. > Note that you can DELETE from pg_ntclass. Not sure if we should > disallow it somehow, because it's not easy to get out from that if you > do. No worse than DELETE FROM pg_class ;-). Please verify that the aclcheck prohibitions on changing system catalogs are properly applied to these catalogs too. > There is one caveat that I'm worried about. I had to add a "typedef" to > pg_class.h to put ItemPointerData in FormData_pg_class, because the C > struct doesn't recognize the "tid" type, but the bootstrap type system > does not recognize ItemPointerData as a valid type. I find this mighty > ugly because it will have side effects whenever we #include pg_class.h > (which is virtually anywhere, since that header is #included in htup.h > which in turn is included almost everywhere). Suggestions welcome. > Maybe this is not a problem. Would it work to do #define tid ItemPointerData ... tid relntrans; ... #undef tid ? I'm not sure whether this will cause the right things to appear in the .bki file, but if it does then the #undef would limit the scope of the "tid" name. In any case, the only thing uglier than a hack is an uncommented hack ;-) ... the typedef or macro needs to have a comments explaining what and why. The *real* problem with what you've done is that pg_class.h now depends on having ItemPointerData defined before it's included, which creates an inclusion ordering dependency that should not exist. If you stick with either of these approaches then pg_class.h needs to #include whatever defines ItemPointerData. I notice that postgres.h defines a typedef for aclitem to work around a similar issue. Is it reasonable to move ItemPointerData into postgres.h so we could put the tid typedef beside the aclitem one? > Other two caveats are: > 1. During bootstrap, RelationBuildLocalRelation creates nailed relations > with hardcoded TID=(0,1). This is because we don't have access to > pg_class yet, so we can't find the real pointer; and furthermore, we are > going to fix the entries later in the bootstrapping process. This seems dangerous; can't you set it to InvalidItemPointer instead? If it's not used before fixed, this doesn't matter, and if someone *does* try to use it, that will catch the problem. > 2. The whole VACUUM/VACUUM FULL/ANALYZE relation list stuff is pretty > ugly as well; and autovacuum is skipping pg_ntclass (really all > non-transactional catalogs) altogether. We could improve the situation > by introducing some sort of struct like {relid, relkind}, so that > vacuum_rel could know what relkind to expect, and it could skip > non-transactional catalogs cleanly in vacuum full and analyze. Need to do something about this. pg_ntclass will contain XIDs (of inserting/deleting transactions) so it MUST get vacuumed to be sure we don't expose ourselves to XID wrap problems. regards, tom lane
Alvaro Herrera <alvherre@commandprompt.com> writes: > This is the relminxid patch corresponding to the pg_ntclass patch I just > posted. That disable_heap_unfreeze thing seriously sucks. How bad are the API changes needed to pass that as a parameter instead of having a very-dangerous global variable? The comment at line 328ff in dbcommands.c seems misguided, which makes me doubt the code too. datfrozenxid and datvacuumxid should be considered as indicating what XIDs appear inside the database, not what is in its pg_database row. The changes in vacuum.c are far too extensive to review meaningfully. What did you do, and did it really need to touch so much code? > The thing that bothers me most about this is that it turns LockRelation > into an operation that needs to heap_fetch() from pg_ntclass in some > cases, and possibly update it. Have you done any profiling to see what that actually costs? I think we could possibly dodge the work in the normal case if we are willing to make VACUUM FREEZE take ExclusiveLock and send out a relation cache inval on the relation. Then, we can cache the pg_ntclass tuple in relcache entries (discarding it on cache inval), and if the cached value says it's not frozen then it's not frozen. You couldn't trust the cached value much further than that, but that would at least take the fetch out of the normal path in LockRelation. The trick here is the problem that if VACUUM FREEZE fails after modifying pg_ntclass, its relcache inval won't be sent out. A bigger issue here is that I'm not sure what the locking protocol is for pg_ntclass itself. It looks like you're not consistently taking a RowExclusiveLock when you update it. BTW, I think you have the order of operations wrong in LockRelation; should it not unfreeze only *after* obtaining lock? Consider race condition against relation drop for instance. regards, tom lane
Alvaro Herrera <alvherre@commandprompt.com> writes: > No, actually it's correct. The point of that comment is that if the > source database is frozen, then all XIDs appearing inside both databases > (source and newly created) are frozen. So it's possible that the row in > pg_database is frozen as well. But because we are creating a new row in > pg_database, it's not really frozen any longer; so we change the > pg_database fields in the new row to match. No, this only says that pg_database has to be unfrozen. If the source DB is frozen then the clone is frozen too. >> The changes in vacuum.c are far too extensive to review meaningfully. >> What did you do, and did it really need to touch so much code? > Yeah, they are extensive. ... > Maybe I should take a stab at making incremental patches instead of > doing everything in one patch. This way it would be easier to review > for correctness (and I'd be more confident that it is actually correct > as well). Please. I've got no confidence that I see what's going on there. regards, tom lane
Tom Lane wrote: > Alvaro Herrera <alvherre@alvh.no-ip.org> writes: > Would it work to do > > #define tid ItemPointerData > ... > tid relntrans; > ... > #undef tid > ? Yeah, it probably would. I'll try. > The *real* problem with what you've done is that pg_class.h now depends > on having ItemPointerData defined before it's included, which creates an > inclusion ordering dependency that should not exist. If you stick with > either of these approaches then pg_class.h needs to #include whatever > defines ItemPointerData. storage/itemptr.h is #included in pg_class.h (first chunk of the patch). > > Other two caveats are: > > 1. During bootstrap, RelationBuildLocalRelation creates nailed relations > > with hardcoded TID=(0,1). This is because we don't have access to > > pg_class yet, so we can't find the real pointer; and furthermore, we are > > going to fix the entries later in the bootstrapping process. > > This seems dangerous; can't you set it to InvalidItemPointer instead? > If it's not used before fixed, this doesn't matter, and if someone > *does* try to use it, that will catch the problem. Doesn't work because the bootstrap system actually _writes_ there :-( A workaround could be to disable writing in bootstrapping mode, and store InvalidItemPointer. (Actually storing InvalidItemPointer was the first thing I did, but it crashed on bootstrap.) I wasn't worried about bootstrap writing invalid values, because the correct values are written shortly after (at the latest, when vacuum is run by initdb). And if I set it to Invalid and have bootstrap mode skip writing, exactly the same thing will happen ... > > 2. The whole VACUUM/VACUUM FULL/ANALYZE relation list stuff is pretty > > ugly as well; and autovacuum is skipping pg_ntclass (really all > > non-transactional catalogs) altogether. We could improve the situation > > by introducing some sort of struct like {relid, relkind}, so that > > vacuum_rel could know what relkind to expect, and it could skip > > non-transactional catalogs cleanly in vacuum full and analyze. > > Need to do something about this. pg_ntclass will contain XIDs (of > inserting/deleting transactions) so it MUST get vacuumed to be sure > we don't expose ourselves to XID wrap problems. Oh, certainly it does get vacuumed. vacuum.c is modified so that non-transactional catalogs are vacuumed (in database-wide VACUUM for instance). The only thing I was stating above was that the way vacuum.c handles the list of relations, is a bit of a mess, because vacuum_rel wants to check the relkind but get_oids_list forms only a single list and it's assumed that they are all RELKIND_RELATION rels. I had to modify it a bit so that NON_TRANSACTIONAL rels are also included in that list, and therefore the check had to be relaxed. I also made ANALYZE silently skip non-transactional catalogs, in an similarly ugly way. I don't remember what was the rationale for this -- certainly there isn't any harm. But on the other hand, analyzing it serves no purpose since the statistics are not used for anything.
Alvaro Herrera <alvherre@commandprompt.com> writes: > Tom Lane wrote: >> Alvaro Herrera <alvherre@alvh.no-ip.org> writes: >>> Other two caveats are: >>> 1. During bootstrap, RelationBuildLocalRelation creates nailed relations >>> with hardcoded TID=(0,1). >> >> This seems dangerous; can't you set it to InvalidItemPointer instead? >> If it's not used before fixed, this doesn't matter, and if someone >> *does* try to use it, that will catch the problem. > Doesn't work because the bootstrap system actually _writes_ there :-( A > workaround could be to disable writing in bootstrapping mode, and store > InvalidItemPointer. (Actually storing InvalidItemPointer was the first > thing I did, but it crashed on bootstrap.) Or, set it to (0,1) and reserve that TID as a dummy entry. What I'm afraid of here is scribbling on some other relation's entry. I'd like to see some defense against that, don't much care what. We do plenty of disable-this-in-bootstrap-mode checks, so one more doesn't seem like a problem; so the first solution may be better. regards, tom lane
Tom Lane wrote: > Alvaro Herrera <alvherre@commandprompt.com> writes: > > This is the relminxid patch corresponding to the pg_ntclass patch I just > > posted. > > That disable_heap_unfreeze thing seriously sucks. How bad are the API > changes needed to pass that as a parameter instead of having a > very-dangerous global variable? Let's see -- I would need to fix all callers of LockRelation, and the problem I found in an earlier version of the patch (before the invention of the non-transaction stuff) was that some callers needed to pass that information several levels down. It's possible that this was an artifact of the fact that it was using the relcache. I'll experiment with changing stuff so that the global variable is not needed. > The comment at line 328ff in dbcommands.c seems misguided, which makes > me doubt the code too. datfrozenxid and datvacuumxid should be > considered as indicating what XIDs appear inside the database, not what > is in its pg_database row. No, actually it's correct. The point of that comment is that if the source database is frozen, then all XIDs appearing inside both databases (source and newly created) are frozen. So it's possible that the row in pg_database is frozen as well. But because we are creating a new row in pg_database, it's not really frozen any longer; so we change the pg_database fields in the new row to match. Actually, pg_database is going to be unfrozen right after that code, because it's opened with RowExclusiveLock shortly after, precisely to insert that new row we are inserting. So maybe this is not an issue. > The changes in vacuum.c are far too extensive to review meaningfully. > What did you do, and did it really need to touch so much code? Yeah, they are extensive. I did several things there: get rid of a couple of global variables that no longer need to be global; remove the return value from vacuum_rel, since it's no longer needed (it's used to determine whether we can truncate pg_clog, but now we can do it regardless of whether this particular vacuuming took place or not); I changed some variables from the old "frozenXid" name to "minXid"; I put in a hack to make VACUUM FREEZE take a stronger lock; changed the API of vacuum_rel so that instead of taking a specific acceptable relkind, it receives whether TOAST is acceptable or not; and added the code needed to keep track of the earliest Xid found in code. But by far the most extensive change is the melding of vac_update_dbstats into vac_update_dbminxid, and the update of vac_update_relstats to cope with pg_ntclass. Maybe I should take a stab at making incremental patches instead of doing everything in one patch. This way it would be easier to review for correctness (and I'd be more confident that it is actually correct as well). > > The thing that bothers me most about this is that it turns LockRelation > > into an operation that needs to heap_fetch() from pg_ntclass in some > > cases, and possibly update it. > > Have you done any profiling to see what that actually costs? No, but I guess it must be expensive. While relminxid was still in the relcache, it was cheap because we checked the value before having to actually do anything else. That's why I was suggesting having a separate cache for non-transactional stuff. > I think we could possibly dodge the work in the normal case if we are > willing to make VACUUM FREEZE take ExclusiveLock and send out a relation > cache inval on the relation. Well, one problem is that if enough transactions pass after the last update to a table, a normal VACUUM (i.e. not FREEZE) could mark a table as frozen as well; marking frozen is not an exclusive property of VACUUM FREEZE. > BTW, I think you have the order of operations wrong in LockRelation; > should it not unfreeze only *after* obtaining lock? Consider race > condition against relation drop for instance. Hmm, good point. I think it was OK (and actually, it was required) while relminxid was still on pg_class; or rather, there was a race condition the other way around, so it was required to unfreeze the table _before_ obtaining the lock. But it's certainly wrong now. I'll work on pg_class_nt and I'll be back to this soon. Thanks for the review.
Tom Lane wrote: > Or, set it to (0,1) and reserve that TID as a dummy entry. What I'm > afraid of here is scribbling on some other relation's entry. I'd like > to see some defense against that, don't much care what. > > We do plenty of disable-this-in-bootstrap-mode checks, so one more > doesn't seem like a problem; so the first solution may be better. New version of the patch, including fixes to all the feedback you provided. Thanks! I used a dummy entry in (0,1), which seems cleaner to me (the index-creation stuff in bootstrap is apparently still needed to generate sinval messages, so it's not as easy as returning early from the function). Maybe we could include a step in initdb to get rid of it, but it doesn't seem too much of an issue.
Attachment
On Sun, 2006-06-11 at 17:53 -0400, Alvaro Herrera wrote: > Here I repost the patch to implement non-transactional catalogs, the > first of which is pg_ntclass, intended to hold the non-transactional > info about pg_class (reltuples, relpages). Would it be possible to get a summary of what this new feature gives us? I'm trying to follow the implementation but the why of it seems to have been buried in the detail. Will a user be able to update reltuples and relpages manually? Thanks. -- Simon Riggs EnterpriseDB http://www.enterprisedb.com
Alvaro Herrera wrote: > Tom Lane wrote: > > Alvaro Herrera <alvherre@commandprompt.com> writes: > > > What I'm after is not freezing for read-only media, nor archive, nor > > > read-only tables. What I'm after is removing the requirement that all > > > databases must be vacuumed wholly every 2 billion transactions. > > > > Well, if that's the only goal then I hardly think we need to have a > > discussion, because your alternative #1 is *obviously* the winner: > > > > > 1. Remove the special case, i.e., process frozen databases in VACUUM > > > like every other database. > > > This is the easiest, because no extra logic is needed. Just make > > > sure they are vacuumed in time. The only problem would be that we'd > > > need to uselessly vacuum tables that we know are frozen, from time to > > > time. But then, those tables are probably small, so what's the > > > problem with that? > > I'm happy to do at least this for 8.2. We can still try to do the > non-transactional catalog later, either in this release or the next; the > code is almost there, and it'll be easier to discuss/design because > we'll have taken the relminxid stuff out of the way. I attach a patch pursuant to this idea (lacking doc patches for the maintenance section.) This patch has the nasty side effect mentioned above -- people will have to set template0 as connectable and manually run vacuum on it periodically, unless they run autovacuum. A future improvement in this area would be to allow frozen tables and frozen databases, removing this requirement. But I'm inclined to apply it as is, in the spirit of incremental improvement. Objectors please speak up! -- Alvaro Herrera http://www.CommandPrompt.com/ The PostgreSQL Company - Command Prompt, Inc.
Alvaro Herrera wrote: > I attach a patch pursuant to this idea (lacking doc patches for the > maintenance section.) Huh, really attached, sorry :-( -- Alvaro Herrera http://www.CommandPrompt.com/ PostgreSQL Replication, Consulting, Custom Development, 24x7 support
Attachment
Alvaro Herrera <alvherre@commandprompt.com> writes: > This patch has the nasty side effect mentioned above -- people will have > to set template0 as connectable and manually run vacuum on it > periodically, unless they run autovacuum. That's pretty messy --- making template0 connectable is a great way to shoot yourself in the foot. What I'd propose instead is that even if autovacuum is nominally off, the system forces autovacuum when it notices that a non-connectable database is approaching wraparound. In this mode the autovac daemon would be restricted to processing non-connectable databases. (The subtext here is that autovac is the wave of the future anyway.) In fact, maybe we should just force an autovac cycle for any DB that appears to be approaching wraparound, rather than waiting for the shutdown-before-wraparound code to kick in. Getting into that state amounts to whacking DBAs upside the head for being stupid, which doesn't really win us any friends ... Implementation-wise, I'd propose that we add another PostmasterSignal event type whereby a backend could request the postmaster to launch an autovac process even if autovac is off. The end-of-VACUUM code that scans pg_database.datminxid would issue the signal if it finds anything seriously old. regards, tom lane
On Wed, 2006-06-28 at 20:08 -0400, Tom Lane wrote: > In fact, maybe we should just force an autovac cycle for any DB that > appears to be approaching wraparound, rather than waiting for the > shutdown-before-wraparound code to kick in. Getting into that state > amounts to whacking DBAs upside the head for being stupid, which > doesn't really win us any friends ... Yes, please can we have the auto autovacuum cut in rather than the wraparound message? I'd rather have them complain that we did this, than complain that we didn't. Normally, I wouldn't support automatically starting admin tasks without thr sysadmins knowledge. -- Simon Riggs EnterpriseDB http://www.enterprisedb.com
Tom Lane wrote: > In fact, maybe we should just force an autovac cycle for any DB that > appears to be approaching wraparound, rather than waiting for the > shutdown-before-wraparound code to kick in. Getting into that state > amounts to whacking DBAs upside the head for being stupid, which > doesn't really win us any friends ... Sounds fine. How far back should we allow databases to go? If we wait too long, pg_clog won't be truncated regularly, so I think we should do it rather early than wait until it's close to wraparound. > Implementation-wise, I'd propose that we add another PostmasterSignal > event type whereby a backend could request the postmaster to launch > an autovac process even if autovac is off. The end-of-VACUUM code that > scans pg_database.datminxid would issue the signal if it finds anything > seriously old. I think we could give autovac a "reason for being started", which would normally be the periodic stuff, but if the postmaster got the signal from a backend, pass that info to autovac and it could use a different database selection algorithm -- say, just select the oldest database, even if it's not in danger of Xid wraparound. So this would allow early database-wide vacuums for non-connectable databases (template0), and normal per-table vacuuming for database that are in actual use. -- Alvaro Herrera http://www.CommandPrompt.com/ PostgreSQL Replication, Consulting, Custom Development, 24x7 support
On Thu, Jun 29, 2006 at 09:39:27AM +0100, Simon Riggs wrote: > On Wed, 2006-06-28 at 20:08 -0400, Tom Lane wrote: > > In fact, maybe we should just force an autovac cycle for any DB that > > appears to be approaching wraparound, rather than waiting for the > > shutdown-before-wraparound code to kick in. Getting into that state > > amounts to whacking DBAs upside the head for being stupid, which > > doesn't really win us any friends ... > > Yes, please can we have the auto autovacuum cut in rather than the > wraparound message? I'd rather have them complain that we did this, than > complain that we didn't. > > Normally, I wouldn't support automatically starting admin tasks without > thr sysadmins knowledge. I think it'd be good to put a big, fat WARNING in the log if we fire up an autovac to avoid an XID wrap, since it's an indication that the vacuuming scheme that's in place probably isn't good enough. -- Jim C. Nasby, Sr. Engineering Consultant jnasby@pervasive.com Pervasive Software http://pervasive.com work: 512-231-6117 vcard: http://jim.nasby.net/pervasive.vcf cell: 512-569-9461
"Jim C. Nasby" <jnasby@pervasive.com> writes: > I think it'd be good to put a big, fat WARNING in the log if we fire up > an autovac to avoid an XID wrap, since it's an indication that the > vacuuming scheme that's in place probably isn't good enough. No, for nonconnectable databases it'd be expected behavior (given the proposed patch). Warn away if the db is connectable, though ... regards, tom lane
Alvaro Herrera <alvherre@commandprompt.com> writes: > I think we could give autovac a "reason for being started", which would > normally be the periodic stuff, but if the postmaster got the signal > from a backend, pass that info to autovac and it could use a different > database selection algorithm -- say, just select the oldest database, > even if it's not in danger of Xid wraparound. I don't think it's worth the trouble to provide such a signaling mechanism (it'd be a bit of a PITA to make it work in both fork and EXEC_BACKEND cases). ISTM it's sufficient for autovac to look at the GUC state and notice whether it's nominally enabled or not. If not, it should only consider anti-wraparound vacuum operations. If the signal is given just when VACUUM sees a datvacuumxid value that's old enough to cause autovac to decide it had better go prevent wraparound, then this will work without any additional data transfer. We could negotiate exactly how old DBs need to be to cause this; if you want to make it less than a billion xacts so that we can keep pg_clog cut down to size, that's fine with me. regards, tom lane
Tom Lane wrote: > Alvaro Herrera <alvherre@commandprompt.com> writes: > > I think we could give autovac a "reason for being started", which would > > normally be the periodic stuff, but if the postmaster got the signal > > from a backend, pass that info to autovac and it could use a different > > database selection algorithm -- say, just select the oldest database, > > even if it's not in danger of Xid wraparound. > > I don't think it's worth the trouble to provide such a signaling > mechanism (it'd be a bit of a PITA to make it work in both fork and > EXEC_BACKEND cases). ISTM it's sufficient for autovac to look at the > GUC state and notice whether it's nominally enabled or not. If not, > it should only consider anti-wraparound vacuum operations. > > If the signal is given just when VACUUM sees a datvacuumxid value that's > old enough to cause autovac to decide it had better go prevent > wraparound, then this will work without any additional data transfer. > We could negotiate exactly how old DBs need to be to cause this; if you > want to make it less than a billion xacts so that we can keep pg_clog > cut down to size, that's fine with me. Committed. I didn't do anything but the simplest stuff; autovacuum checks whether it's enabled in GUC, and if it doesn't then it just picks the oldest database and issues a database-wide vacuum. The vac_truncate_clog routine will send a signal at the same time as the WARNING about nearing Xid wraparound would be emitted. One funny thing I noticed is that if there is more than one database needing db-wide vacuum, vacuum will send a signal; autovacuum will process one database; and autovacuum will send a signal as well when it's done because of the other databases. But autovacuum will get the second signal and do nothing, because of the code to check for frequent autovacuum starts. This kind of suprised me at first but it's really to be expected. I considered disabling that timing code in the case of getting the signal, but I don't think it's worth it. One important improvement we'd may want to do is changing vacuum so that it only calls vac_truncate_clog once if invoked by autovacuum (currently it'll be called every time a table is vacuumed). Also I think we could change the OldestXmin stuff so that it only takes database-local transactions into account for non-shared tables. But in the spirit of incremental improvement I think we're much better now. Happy sprinting to everyone, -- Alvaro Herrera http://www.CommandPrompt.com/ The PostgreSQL Company - Command Prompt, Inc.