Thread: synchronized scans for VACUUM
Previous thread for reference: http://archives.postgresql.org/pgsql-patches/2007-06/msg00096.php The objections to synchronized scans for VACUUM as listed in that thread (summary): 1. vacuum sometimes progresses faster than a regular heapscan, because it doesn't need to check WHERE clauses, etc. 2. vacuum takes breaks from the scan to clean up the indexes when it runs out of maintenance_work_mem. 3. vacuum takes breaks for the cost delay 4. vacuum will dirty a lot of the blocks as it goes, and that will cause some kind of interaction with the ring buffer I'd like to address these one by one to see what problems are really in our way: 1. This would mean that it's not an I/O limited scan. I think as long as we're talking about regular table scans that can benefit from synchronized scanning, a vacuum of the same table would also benefit. A microbenchmark could show whether some benefit exists or not. 2. There have been suggestions about a more compact representation for the tuple id list. If this works, it will solve this problem. 3. Offering synchronized vacuums could reduce the need for these elective pauses. 4. This probably has more to do with the buffer ring than synchronized scans. There could be some bad interaction there, but I don't see that it's clearly bad. Additionally, with the possible exception of #4, I don't see the situation being worse than it is currently. Thoughts? Regards,Jeff Davis
Jeff Davis <pgsql@j-davis.com> writes: > The objections to synchronized scans for VACUUM as listed in that thread > (summary): > 2. vacuum takes breaks from the scan to clean up the indexes when it > runs out of maintenance_work_mem. > 2. There have been suggestions about a more compact representation for > the tuple id list. If this works, it will solve this problem. It will certainly not "solve" the problem. What it will do is mean that the breaks are further apart and longer, which seems to me to make the conflict with syncscan behavior worse not better. > 3. vacuum takes breaks for the cost delay > 3. Offering synchronized vacuums could reduce the need for these > elective pauses. How so? A vacuum that happens not to be part of a syncscan herd is going to be just as bad for system performance as ever. It still seems to me that vacuum is unlikely to be a productive member of a syncscan herd --- it just isn't going to have similar scan-speed behavior to typical queries. regards, tom lane
"Tom Lane" <tgl@sss.pgh.pa.us> writes: > Jeff Davis <pgsql@j-davis.com> writes: >> The objections to synchronized scans for VACUUM as listed in that thread >> (summary): > >> 2. vacuum takes breaks from the scan to clean up the indexes when it >> runs out of maintenance_work_mem. > >> 2. There have been suggestions about a more compact representation for >> the tuple id list. If this works, it will solve this problem. > > It will certainly not "solve" the problem. What it will do is mean that > the breaks are further apart and longer, which seems to me to make the > conflict with syncscan behavior worse not better. How would it make them longer? They still have the same amount of i/o to do scanning the indexes. I suppose they would dirty more pages which might slow them down? In any case I think the representation you proposed back when this idea last came up was so compact that pretty much any size table ought to be representable in a reasonable work_mem -- at least for the kind of machine which would normally be dealing with that size table. > It still seems to me that vacuum is unlikely to be a productive member > of a syncscan herd --- it just isn't going to have similar scan-speed > behavior to typical queries. That's my thinking too. Our general direction has been toward reducing vacuum's i/o bandwidth requirements, not worrying about making it run as fast as possible. That said if it happened to latch on to a sync scan herd it would have very few cache misses which would cause it to rack up very few vacuum cost delay points. Perhaps the vacuum cost delay for a cache hit ought to be 0? -- Gregory Stark EnterpriseDB http://www.enterprisedb.com Get trained by Bruce Momjian - ask me about EnterpriseDB'sPostgreSQL training!
Gregory Stark <stark@enterprisedb.com> writes: >> It will certainly not "solve" the problem. What it will do is mean that >> the breaks are further apart and longer, which seems to me to make the >> conflict with syncscan behavior worse not better. > How would it make them longer? They still have the same amount of i/o to do > scanning the indexes. I suppose they would dirty more pages which might slow > them down? More tuples to delete = more writes (in WAL, if not immediately in the index itself) = longer to complete the indexscan. It's still cheaper than doing multiple indexscans, of course, but my point is that the index-fixing work gets concentrated. regards, tom lane