Thread: Alternative for vacuuming queue-like tables
Hi all, Short background: postgres does not support very well queue type tables in an environment where these queue tables are small in size but heavily inserted/updated/deleted, while there are activities in the system which cause long running transactions. The reason is that the queue table cannot be vacuumed properly due to the long running transactions, and suffers bloat. This was a quite big problem for our postgres systems, until I discovered that there is an alternative solution to vacuuming if the transactions on the queue table are always short. Beside vacuuming aggressively the queue table (which works fine most of the time, when I have no long running transaction currently) I placed a cron job to cluster the queue table on it's PK index. That seems to work well, the table is kept small even in the presence of long running transactions. The only time this is NOT working if some long running transaction actually accesses the queue table. An example is a full DB backup. I know this made one of my systems drop it's load significantly. HTH, Csaba.
2006-04-28 (금), 14:40 +0200, Csaba Nagy 쓰시길: > I placed a cron job to > cluster the queue table on it's PK index. what does that mean? -- my site <a href="http://www.myowndictionary.com">myowndictionary</a> was made to help students of many languages learn them faster.
On Fri, 2006-04-28 at 15:20, kmh496 wrote: > 2006-04-28 (금), 14:40 +0200, Csaba Nagy 쓰시길: > > I placed a cron job to > > cluster the queue table on it's PK index. > what does that mean? Means execute: CLUSTER pk_queue_table ON queue_table; See http://www.postgresql.org/docs/8.1/static/sql-cluster.html for more details on CLUSTER. This will actually rebuild the table I guess after it can lock it exclusively. Due to the fact that the queue transactions are very short lived, the exclusive lock is fast acquired and the clustering operation is also fast as the table has only a few entries. I'm not sure how this operation can work in the presence of other long running transactions which did not touch the queue table yet, but it actually does work, I can confirm that. Is it violating MVCC maybe ? Cheers, Csaba.
On Apr 28, 2006, at 9:32 AM, Csaba Nagy wrote: > I'm not sure how this operation can work in the presence of other long > running transactions which did not touch the queue table yet, but it > actually does work, I can confirm that. Is it violating MVCC maybe ? It sounds like it does potentially violate mvcc, but I haven't thought hard about it. It also has the advantage that it rebuilds your indexes during the clustering. And a convenient feature is that you just define those CLUSTER statements once, then whenever you need it, just run "CLUSTER;" all by itself and all tables previously defined as clustered will be re- clustered on the specified index.
Csaba Nagy <nagy@ecircle-ag.com> writes: > I'm not sure how this operation can work in the presence of other long > running transactions which did not touch the queue table yet, but it > actually does work, I can confirm that. Is it violating MVCC maybe ? Yes :-(. I think you can get away with it if all your transactions that use the queue table run in READ COMMITTED (not serializable) mode, and if they work like BEGIN; LOCK queue_table IN some-suitable-mode; process queue table; COMMIT; Grabbing the lock will ensure that CLUSTER finishes before any snapshot is taken. regards, tom lane
> > actually does work, I can confirm that. Is it violating MVCC maybe ? > > Yes :-(. I think you can get away with it if all your transactions that [snip] Well, I actually don't want to get away this time :-) This table is only processed by the queue manager and that uses very short transactions, basically only atomic insert/update/delete. No long running transaction will ever touch that table. So this fits perfectly my queue cleanup needs. Cheers, Csaba.
nagy@ecircle-ag.com (Csaba Nagy) writes: > On Fri, 2006-04-28 at 15:20, kmh496 wrote: >> 2006-04-28 (ê¸), 14:40 +0200, Csaba Nagy ì°ì길: >> > I placed a cron job to >> > cluster the queue table on it's PK index. >> what does that mean? > > Means execute: > > CLUSTER pk_queue_table ON queue_table; > > See http://www.postgresql.org/docs/8.1/static/sql-cluster.html for more > details on CLUSTER. > > This will actually rebuild the table I guess after it can lock it > exclusively. Due to the fact that the queue transactions are very short > lived, the exclusive lock is fast acquired and the clustering operation > is also fast as the table has only a few entries. > > I'm not sure how this operation can work in the presence of other long > running transactions which did not touch the queue table yet, but it > actually does work, I can confirm that. Is it violating MVCC maybe ? There is, I believe, a problem there; there is a scenario where data can get "dropped out from under" those old connections. This has been added to the TODO... http://www.postgresql.org/docs/faqs.TODO.html * Make CLUSTER preserve recently-dead tuples per MVCC requirements -- (format nil "~S@~S" "cbbrowne" "ntlug.org") http://cbbrowne.com/info/sgml.html Should vegetarians eat animal crackers?
> There is, I believe, a problem there; there is a scenario where data > can get "dropped out from under" those old connections. > > This has been added to the TODO... > > http://www.postgresql.org/docs/faqs.TODO.html > > * Make CLUSTER preserve recently-dead tuples per MVCC requirements OK, I can see this being a problem in the general case. However, for my queue table the current behavior is a life-saver. Would it be possible to still provide a variant of rebuild which is documented to not be MVCC compliant ? Something like: I don't care the old transactions which did not touch yet this table to see the old data for this table, I want to rebuild it. I actually don't care about clustering in this case, only about rebuilding the table once I can get an exclusive lock on it. I guess this would be possible with the "switch to a copy" method except the new table is a different entity (as mentioned in another post) and it will not preserve the dependencies of the original table. I guess what I'm asking for is a kind of "REBUILD TABLE" which is not MVCC by definition but it would be useful in the mentioned queue table case. Cheers, Csaba.
On Friday 28 April 2006 12:20, Csaba Nagy wrote: > > There is, I believe, a problem there; there is a scenario where data > > can get "dropped out from under" those old connections. > > > > This has been added to the TODO... > > > > http://www.postgresql.org/docs/faqs.TODO.html > > > > * Make CLUSTER preserve recently-dead tuples per MVCC requirements > > OK, I can see this being a problem in the general case. > > However, for my queue table the current behavior is a life-saver. Would > it be possible to still provide a variant of rebuild which is documented > to not be MVCC compliant ? Something like: I don't care the old > transactions which did not touch yet this table to see the old data for > this table, I want to rebuild it. > > I actually don't care about clustering in this case, only about > rebuilding the table once I can get an exclusive lock on it. I guess > this would be possible with the "switch to a copy" method except the new > table is a different entity (as mentioned in another post) and it will > not preserve the dependencies of the original table. > > I guess what I'm asking for is a kind of "REBUILD TABLE" which is not > MVCC by definition but it would be useful in the mentioned queue table > case. > vaccum full ? -- Robert Treat Build A Brighter Lamp :: Linux Apache {middleware} PostgreSQL
Csaba Nagy wrote: >> There is, I believe, a problem there; there is a scenario where data >> can get "dropped out from under" those old connections. >> >> This has been added to the TODO... >> >> http://www.postgresql.org/docs/faqs.TODO.html >> >> * Make CLUSTER preserve recently-dead tuples per MVCC requirements > > OK, I can see this being a problem in the general case. > > However, for my queue table the current behavior is a life-saver. Would > it be possible to still provide a variant of rebuild which is documented > to not be MVCC compliant ? Something like: I don't care the old > transactions which did not touch yet this table to see the old data for > this table, I want to rebuild it. > > I actually don't care about clustering in this case, only about > rebuilding the table once I can get an exclusive lock on it. I guess > this would be possible with the "switch to a copy" method except the new > table is a different entity (as mentioned in another post) and it will > not preserve the dependencies of the original table. > > I guess what I'm asking for is a kind of "REBUILD TABLE" which is not > MVCC by definition but it would be useful in the mentioned queue table > case. The general problem seems to be that a transaction has no way to promise never to touch a specific table. Maybe some kind of "negative lock" would help here - you'd do "exclude table foo from transaction" at the start of your transaction, which would cause postgres to raise an error if you indeed tried to access that table. Vacuum could then ignore your transaction when deciding which tuples it can safely remove from the table foo. This would be a nice feature IMHO - would it be possible to do something like that, or am I overlooking something. greetings, Florian Pflug
"Florian G. Pflug" <fgp@phlo.org> writes: > The general problem seems to be that a transaction has no way to promise > never to touch a specific table. Maybe some kind of "negative lock" > would help here - you'd do "exclude table foo from transaction" at the > start of your transaction, which would cause postgres to raise an error > if you indeed tried to access that table. Vacuum could then ignore your > transaction when deciding which tuples it can safely remove from the > table foo. Unfortunately that really wouldn't help VACUUM at all. The nasty problem for VACUUM is that what it has to figure out is not the oldest transaction that it thinks is running, but the oldest transaction that anyone else thinks is running. So what it looks through PGPROC for is not the oldest XID, but the oldest XMIN. And even if it excluded procs that had promised not to touch the target table, it would find that their XIDs had been factored into other processes' XMINs, resulting in no improvement. As a comparison point, VACUUM already includes code to ignore backends in other databases (if it's vacuuming a non-shared table), but it turns out that that code is almost entirely useless :-(, because those other backends still get factored into the XMINs computed by backends that are in the same database as VACUUM. We've speculated about fixing this by having each backend compute and advertise both "global" and "database local" XMINs, but the extra cycles that'd need to be spent in *every* snapshot computation seem like a pretty nasty penalty. And the approach certainly does not scale to anything like per-table exclusions. regards, tom lane
> > I guess what I'm asking for is a kind of "REBUILD TABLE" which is not > > MVCC by definition but it would be useful in the mentioned queue table > > case. > > > > vaccum full ? Nope, it won't work, it will still leave in all the dead tuples potentially visible by old transactions, even if those transactions didn't touch yet the vacuumed table. In my case that means for some tables sometimes 1000s times of bloat, given that the table always has few active rows and high insert/update/delete rates... CLUSTER is locking the table exclusively, which means it will wait until all transactions which actually touched the table are finished, and then rebuild the table, ignoring dead rows, without caring about old transactions who could still see the dead rows (at least this is how I think it works based on my observations). Perfect for my purpose, but not MVCC... Cheers, Csaba.
On Sat, Apr 29, 2006 at 06:39:21PM -0400, Tom Lane wrote: > "Florian G. Pflug" <fgp@phlo.org> writes: > > The general problem seems to be that a transaction has no way to promise > > never to touch a specific table. Maybe some kind of "negative lock" > > would help here - you'd do "exclude table foo from transaction" at the > > start of your transaction, which would cause postgres to raise an error > > if you indeed tried to access that table. Vacuum could then ignore your > > transaction when deciding which tuples it can safely remove from the > > table foo. > > Unfortunately that really wouldn't help VACUUM at all. The nasty > problem for VACUUM is that what it has to figure out is not the oldest > transaction that it thinks is running, but the oldest transaction that > anyone else thinks is running. So what it looks through PGPROC for is > not the oldest XID, but the oldest XMIN. And even if it excluded procs > that had promised not to touch the target table, it would find that > their XIDs had been factored into other processes' XMINs, resulting > in no improvement. > > As a comparison point, VACUUM already includes code to ignore backends > in other databases (if it's vacuuming a non-shared table), but it turns > out that that code is almost entirely useless :-(, because those other > backends still get factored into the XMINs computed by backends that are > in the same database as VACUUM. We've speculated about fixing this by > having each backend compute and advertise both "global" and "database > local" XMINs, but the extra cycles that'd need to be spent in *every* > snapshot computation seem like a pretty nasty penalty. And the approach > certainly does not scale to anything like per-table exclusions. I'd actually been thinking about this recently, and had come up with the following half-baked ideas: Allow a transaction to specify exactly what tables it will be touching, perhaps as an extension to BEGIN. Should any action that transaction takes attempt to access a table not specified, throw an error. A possible variant on that would be to automatically determine at transaction start all the tables that would be accessed by that transaction. Once that list is available, vacuum should be able to use it to ignore any transactions that have promised not to touch whatever table it's vacuuming. While this doesn't help with transactions that don't make any promises, for short-running transactions we don't really care. As long as all long-running transactions state their intentions it should allow for useful vacuuming of queue tables and the like. Given that we don't care about what tables a short-running transaction will access, we could delay the (probably expensive) determination of what tables a backend will access until the transaction is over a specific age. At that point the list of tables could be built and (theoretically) the transactions XMIN adjusted accordingly. Unfortunately this won't help with pg_dump right now. But an extension would be to allow passing a table order into pg_dump so that it dumps queue tables first. Once a table is dumped pg_dump shouldn't need to access it again, so it's theoretically possible to make the "I promise not to access these tables" list dynamic during the life of a transaction; as pg_dump finishes with tables it could then promise not to touch them again. Or maybe a better idea would just be to come up with some other means for people to do things like queue tables and session tables... -- Jim C. Nasby, Sr. Engineering Consultant jnasby@pervasive.com Pervasive Software http://pervasive.com work: 512-231-6117 vcard: http://jim.nasby.net/pervasive.vcf cell: 512-569-9461
"Jim C. Nasby" <jnasby@pervasive.com> writes: > I'd actually been thinking about this recently, and had come up with the > following half-baked ideas: > Allow a transaction to specify exactly what tables it will be touching, > perhaps as an extension to BEGIN. Should any action that transaction > takes attempt to access a table not specified, throw an error. > A possible variant on that would be to automatically determine at > transaction start all the tables that would be accessed by that > transaction. > Once that list is available, vacuum should be able to use it to ignore > any transactions that have promised not to touch whatever table it's > vacuuming. No, you missed my point entirely. The above would help not at all, unless the restrictions were somehow propagated through XMIN calculations, which seems impracticable. (Every backend calculate a separate XMIN with respect to every table that's being mentioned by any other backend? I don't think so...) regards, tom lane
Tom Lane wrote: > "Jim C. Nasby" <jnasby@pervasive.com> writes: > >>I'd actually been thinking about this recently, and had come up with the >>following half-baked ideas: > >>Allow a transaction to specify exactly what tables it will be touching, >>perhaps as an extension to BEGIN. Should any action that transaction >>takes attempt to access a table not specified, throw an error. > >>A possible variant on that would be to automatically determine at >>transaction start all the tables that would be accessed by that >>transaction. > >>Once that list is available, vacuum should be able to use it to ignore >>any transactions that have promised not to touch whatever table it's >>vacuuming. > > No, you missed my point entirely. The above would help not at all, > unless the restrictions were somehow propagated through XMIN > calculations, which seems impracticable. (Every backend calculate a > separate XMIN with respect to every table that's being mentioned by any > other backend? I don't think so...) So basically the problem is that calculating a cluster-wide XMIN is fine from a performance point of view, but to coarse from a vacuum point of view. OTOH calculating a table-specific XMIN is fine enough for vacuum to vacuum queue-like tables efficiently, but bad from a performance point of view. What about doing the xmin-calculations and the xmin exporting per tablespace instead? This would be more generic than the local/global split, because all global objects are in the pg_global tablespace I believe. OTOH most people probably don't have more than a handfull of tablespaces, which seems a manageable amount... greetings, Florian Pflug
On Thu, May 04, 2006 at 03:30:34PM -0400, Tom Lane wrote: > "Jim C. Nasby" <jnasby@pervasive.com> writes: > > I'd actually been thinking about this recently, and had come up with the > > following half-baked ideas: > > > Allow a transaction to specify exactly what tables it will be touching, > > perhaps as an extension to BEGIN. Should any action that transaction > > takes attempt to access a table not specified, throw an error. > > > A possible variant on that would be to automatically determine at > > transaction start all the tables that would be accessed by that > > transaction. > > > Once that list is available, vacuum should be able to use it to ignore > > any transactions that have promised not to touch whatever table it's > > vacuuming. > > No, you missed my point entirely. The above would help not at all, > unless the restrictions were somehow propagated through XMIN > calculations, which seems impracticable. (Every backend calculate a > separate XMIN with respect to every table that's being mentioned by any > other backend? I don't think so...) I mentioned it further down the post, as well as the idea that we really don't care about short-lived transactions, so theoretically we could just compute this information for transactions that have been running for more than some period of time. Presumably the overhead of calculating a seperate XMIN for each table wouldn't be that great for a transaction that's already been running 15 seconds... -- Jim C. Nasby, Sr. Engineering Consultant jnasby@pervasive.com Pervasive Software http://pervasive.com work: 512-231-6117 vcard: http://jim.nasby.net/pervasive.vcf cell: 512-569-9461