Thread: Some thoughts about i/o priorities and throttling vacuum
So when I suggested on linux-kernel that vacuumcould benefit from some way to prioritize i/o resourcse, someone suggested vacuum could just throttle its own disk accesses. While I think they their conception of vacuum is still broken and the throttling methods they described are the wrong direction, on further thought I think they actually have the right idea. pg_autovacuum knows at what rate free space has been accumulating. It knows how large the fsm available is. It can therefore calculate exactly how much time it has available to complete the next vacuum run before the fsm runs out (assuming the free space continues accumulating at a constant rate). If it passed that information on to vacuum then vacuum could throttle its own disk accesses by, say, reading 64k at a time then sleeping for a fraction of a second. The time spent sleeping would be calculated to have the vacuum take the required total time. This would produce a more even and less resource hogging duty cycle where vacuum would be continuously running at low levels, rather than a duty cycle where it doesn't run at all until it's needed, but then floods the disk controllers with continuous sequential reads. (There are a few details of course. You would need to leave a safety margin in case free space accumulation speeds up. And accounting for the actual time spent doing the vacuum would make calculating the sleep time tricky. But they seem fairly tractable problems.) Personally I think i/o priorities give much better leverage. It would let vacuum run as fast as the disk subsystems can handle during idle times, and then fade away as soon as any heavy transaction load appears. But the flip side is that with i/o prioritization vacuum might not actually finish in time. -- greg
Greg Stark <gsstark@mit.edu> writes: > ... vacuum could throttle > its own disk accesses by, say, reading 64k at a time then sleeping for > a fraction of a second. > ... > Personally I think i/o priorities give much better leverage. Pie in the sky is great too ;-). But there is no such thing as i/o priorities, at least not in any portable sense. OTOH I was just musing to myself earlier today that putting a tunable delay into VACUUM's per-page loop might make it more friendly to competing processes. I dunno if it'd work or just be a waste of time, but it does seem worth experimenting with. Want to try it out and report back? regards, tom lane
I think adding tunable delay per-page loop into VACUUM will help keep system responsive at all times. In many cases, especially for mostly read-only tables, plain VACUUM does not need to complete immediately (VACUUM FULL should complete immediately). I prefer that VACUUM takes its sweet time to run as long as it doesn't disrupt other queries. See my other post on "VACUUM degrades performance significantly. Database becomes unusable!" on pgsql-general mailing list. Regards, Stephen "Tom Lane" <tgl@sss.pgh.pa.us> wrote in message news:16818.1066282922@sss.pgh.pa.us... > Greg Stark <gsstark@mit.edu> writes: > > ... vacuum could throttle > > its own disk accesses by, say, reading 64k at a time then sleeping for > > a fraction of a second. > > ... > > Personally I think i/o priorities give much better leverage. > > Pie in the sky is great too ;-). But there is no such thing as i/o > priorities, at least not in any portable sense. > > OTOH I was just musing to myself earlier today that putting a tunable > delay into VACUUM's per-page loop might make it more friendly to > competing processes. I dunno if it'd work or just be a waste of time, > but it does seem worth experimenting with. > > Want to try it out and report back? > > regards, tom lane > > ---------------------------(end of broadcast)--------------------------- > TIP 6: Have you searched our list archives? > > http://archives.postgresql.org >
Stephen wrote: > I think adding tunable delay per-page loop into VACUUM will help keep system > responsive at all times. In many cases, especially for mostly read-only > tables, plain VACUUM does not need to complete immediately (VACUUM FULL > should complete immediately). I prefer that VACUUM takes its sweet time to > run as long as it doesn't disrupt other queries. See my other post on > "VACUUM degrades performance significantly. Database becomes unusable!" on > pgsql-general mailing list. Of course, this makes VACUUM run longer, and if you are waiting for it to finish, it would be worse, like if you are running it at night or something. I think the delay has to take into account the number of active transactions or something. -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 359-1001+ If your life is a hard drive, | 13 Roberts Road + Christ can be your backup. | Newtown Square, Pennsylvania19073
Bruce Momjian <pgman@candle.pha.pa.us> writes: > Of course, this makes VACUUM run longer, and if you are waiting for it > to finish, it would be worse, like if you are running it at night or > something. My plan was that the time delay would be a parameter and pg_autovacuum would set it based on the observed rate at which free space is accumulating. Someone could manually specify a delay, but by default it would run with no delay when run on the command line. > I think the delay has to take into account the number of active > transactions or something. That's a possibility. That's actually what the linux-kernel folk suggested. Someone there suggested using aio to do carefully schedule i/o only when no i/o was pending from transactions. But vacuum has no way to judge whether those transactions are really doing much disk i/o or only reading cached blocks, or even whether the disk i/o they're doing is on the same disk. They could also be waiting on the client or on locks from other transactions. -- greg
On Thu, 2003-10-16 at 16:16, Greg Stark wrote: > Bruce Momjian <pgman@candle.pha.pa.us> writes: > > Of course, this makes VACUUM run longer, and if you are waiting for it > > to finish, it would be worse, like if you are running it at night or > > something. > > My plan was that the time delay would be a parameter and pg_autovacuum would > set it based on the observed rate at which free space is accumulating. I don't know that pg_autovacuum is smart enough to make a good guess as to an appropriate parameter. > > I think the delay has to take into account the number of active > > transactions or something. I think this is a better plan than pg_autovacuum, this would also allow vacuum to have a different delay for each loop depending on the current number of transactions. > But vacuum has no way to judge whether those transactions are really doing > much disk i/o or only reading cached blocks, or even whether the disk i/o > they're doing is on the same disk. They could also be waiting on the client or > on locks from other transactions. True, it would be a rough estimate, but at least one based on something representative of system I/O load at that moment.
Bruce Momjian <pgman@candle.pha.pa.us> writes: > Of course, this makes VACUUM run longer, and if you are waiting for it > to finish, it would be worse, like if you are running it at night or > something. > I think the delay has to take into account the number of active > transactions or something. I was just thinking of a GUC parameter: wait N milliseconds between pages, where N defaults to zero probably. A user who wants to run his vacuum as a background process could set N larger than zero. I don't believe we are anywhere near being able to automatically adjust the delay based on load, and even if we could, this would ignore the point you make above --- the user's intent has to matter as much as anything else. regards, tom lane
Is it possible to have an optional delay in plain VACUUM for each invocation rather than database wide? Something along the line of an optional THROTTLE or DELAY parameter for the VACUUM command. The THROTTLE is ignored when FULL or FREEZE is selected. VACUUM [ FULL ] [ FREEZE ] [ VERBOSE ] [THROTTLE] ANALYZE [ table [ (column [, ...] ) ] ] This way autovacuum can still throttle VACUUM as needed in future (either in contrib or backend) and administrators can decide to apply different delays for different tables depending on the usage. Regards, Stephen "Tom Lane" <tgl@sss.pgh.pa.us> wrote in message news:16916.1066349859@sss.pgh.pa.us... > Bruce Momjian <pgman@candle.pha.pa.us> writes: > > Of course, this makes VACUUM run longer, and if you are waiting for it > > to finish, it would be worse, like if you are running it at night or > > something. > > I think the delay has to take into account the number of active > > transactions or something. > > I was just thinking of a GUC parameter: wait N milliseconds between > pages, where N defaults to zero probably. A user who wants to run his > vacuum as a background process could set N larger than zero. I don't > believe we are anywhere near being able to automatically adjust the > delay based on load, and even if we could, this would ignore the point > you make above --- the user's intent has to matter as much as anything > else. > > regards, tom lane > > ---------------------------(end of broadcast)--------------------------- > TIP 1: subscribe and unsubscribe commands go to majordomo@postgresql.org >
Tom Lane wrote: > I was just thinking of a GUC parameter: wait N milliseconds between > pages, where N defaults to zero probably. A user who wants to run his > vacuum as a background process could set N larger than zero. I don't > believe we are anywhere near being able to automatically adjust the > delay based on load, and even if we could, this would ignore the point > you make above --- the user's intent has to matter as much as anything > else. I am slightly confused here. IIRC pg_autovacuum never did a vacuum full. At the most it does vacuum /vacuum analyse, none of which chew disk bandwidth. And if pg_autovacuum is running along with postmaster all the time, with aggressive polling like 5 sec, the database should not accumulate any dead tuples nor it would suffer xid wraparound as there are vacuum happening constantly. What's left in above scenario? As long as all the requirements for pg_autovacuum are met, namely setting it up, setting it up aggressively and tuning postgresql.conf correctly, vacuum and related problems should be a thing in past, at least as far as 7.4 and onwards is considered. Of course RSM implementation for vacuum would still be much needed but right now, it does not affect disk IO directly(except for tossing buffer cache out of track that is). What am I missing? Shridhar
On Fri, Oct 17, 2003 at 07:04:45PM +0530, Shridhar Daithankar wrote: > I am slightly confused here. IIRC pg_autovacuum never did a vacuum full. At > the most it does vacuum /vacuum analyse, none of which chew disk bandwidth. The latter is false. VACUUM FULL certainly uses _more_ disk bandwidth than VACUUM, but it's just false that plain VACUUM doesn't contend for disk. And if you're already maxed, then that extra bandwidth you cannot afford. A -- ---- Andrew Sullivan 204-4141 Yonge Street Afilias Canada Toronto, Ontario Canada <andrew@libertyrms.info> M2P 2A8 +1 416 646 3304 x110
Andrew Sullivan wrote: > On Fri, Oct 17, 2003 at 07:04:45PM +0530, Shridhar Daithankar wrote: > >>I am slightly confused here. IIRC pg_autovacuum never did a vacuum full. At >>the most it does vacuum /vacuum analyse, none of which chew disk bandwidth. > > > The latter is false. VACUUM FULL certainly uses _more_ disk > bandwidth than VACUUM, but it's just false that plain VACUUM doesn't > contend for disk. And if you're already maxed, then that extra > bandwidth you cannot afford. What part of plain vacuum takes disk bandwidth? WAL? Clog? Certainly not data files themselves, right? OK, I understand some system can be saturated enough to have additional WAL/Clog burdon, but genuinely curious, how much disk bandwidth is required for plain vacuum and what are the factors it depends upon? Shridhar
On Fri, Oct 17, 2003 at 07:04:45PM +0530, Shridhar Daithankar wrote: > And if pg_autovacuum is running along with postmaster all the time, with > aggressive polling like 5 sec, the database should not accumulate any dead > tuples nor it would suffer xid wraparound as there are vacuum happening > constantly. The database can suffer XID wraparound anyway if there's at least one table without updates, because the autovacuum daemon will never vacuum it (correct me if I'm wrong). -- Alvaro Herrera (<alvherre[a]dcc.uchile.cl>) "Tiene valor aquel que admite que es un cobarde" (Fernandel)
Alvaro Herrera wrote: > On Fri, Oct 17, 2003 at 07:04:45PM +0530, Shridhar Daithankar wrote: > > >>And if pg_autovacuum is running along with postmaster all the time, with >>aggressive polling like 5 sec, the database should not accumulate any dead >>tuples nor it would suffer xid wraparound as there are vacuum happening >>constantly. > > > The database can suffer XID wraparound anyway if there's at least one > table without updates, because the autovacuum daemon will never vacuum > it (correct me if I'm wrong). > If a table is never updated and hence not vacuumed at all, why would it be involved in a transaction that would have xid wrap around? pg_autovacuum takes care of insert/updates/deletes. If a table never participates in above three and hence escape from pg_autovauum, it also escapes from xid wraparound, isn't it? Shridhar
On Fri, Oct 17, 2003 at 07:41:38PM +0530, Shridhar Daithankar wrote: > Alvaro Herrera wrote: > > >On Fri, Oct 17, 2003 at 07:04:45PM +0530, Shridhar Daithankar wrote: > >The database can suffer XID wraparound anyway if there's at least one > >table without updates, because the autovacuum daemon will never vacuum > >it (correct me if I'm wrong). > > If a table is never updated and hence not vacuumed at all, why would it be > involved in a transaction that would have xid wrap around? Because the tuples on it were involved in some insert operation at some time (else the table would not have any tuples). So it _has_ to be vacuumed, else you run the risk of losing the tuples when the wraparound happens. (Sorry, I don't know how to explain this better.) Maybe in this case it's best to do a VACUUM FREEZE; that'd ensure that the table would never ever need a vacuum again until it suffers an insert, delete or update. Perhaps the autovacuum daemon could detect the case where a table has only very old tuples and freeze it. -- Alvaro Herrera (<alvherre[a]dcc.uchile.cl>) "El número de instalaciones de UNIX se ha elevado a 10, y se espera que este número aumente" (UPM, 1972)
Shridhar Daithankar <shridhar_daithankar@persistent.co.in> writes: > What part of plain vacuum takes disk bandwidth? Reading (and possibly rewriting) all the pages. regards, tom lane
Alvaro Herrera wrote: > On Fri, Oct 17, 2003 at 07:41:38PM +0530, Shridhar Daithankar wrote: > >>Alvaro Herrera wrote: >> >> >>>On Fri, Oct 17, 2003 at 07:04:45PM +0530, Shridhar Daithankar wrote: > > >>>The database can suffer XID wraparound anyway if there's at least one >>>table without updates, because the autovacuum daemon will never vacuum >>>it (correct me if I'm wrong). >> >>If a table is never updated and hence not vacuumed at all, why would it be >>involved in a transaction that would have xid wrap around? > > > Because the tuples on it were involved in some insert operation at some > time (else the table would not have any tuples). So it _has_ to be > vacuumed, else you run the risk of losing the tuples when the wraparound > happens. (Sorry, I don't know how to explain this better.) OK. So here is what I understand. I have a table which contains 100 rows which appeated there due to some insert operation. Then I vacuum it. And sit there for internity for rest of the database to approach the singularity(the xid wraparound..:-) Nice term, isn't it?). So this static table is vulnerable to xid wraparound? I doubt. Did I miss something? Shridhar
Tom Lane wrote: > Shridhar Daithankar <shridhar_daithankar@persistent.co.in> writes: > >>What part of plain vacuum takes disk bandwidth? > > > Reading (and possibly rewriting) all the pages. I was under impression that was for shared memory pages only and not for disk pages. OK. I can see difference of understanding here. Plain Vacuum goes around the table/database and makes space, shared buffers and disks, reusable whenever possible but *does not* free any space. Would it be possible to have a vacuum variant that would just shuffle thr. shared buffers and not touch disk at all? pg_autovacuum could probably be ulra agressive with such a shared-buffers only scan? Is it possible or feasible? IMO that could be a clever solution rather than throttling IO for vacuum. For one thing, getting that throttiling right, would be extremely difficult and varying from site to site. If it is going to be tough to tune, then it will be underutilised and will lose it's value rather rapidly. Just a thought.. Shridhar
Shridhar Daithankar <shridhar_daithankar@persistent.co.in> writes: > Would it be possible to have a vacuum variant that would just shuffle thr. > shared buffers and not touch disk at all? What would be the use of that? You couldn't predict *anything* about the coverage. Maybe you find all the free space in a particular table, but most likely you don't. In any case an I/O-free vacuum is impossible since once you have decided to recycle a particular tuple, you don't have any option about removing the corresponding index entries first. So unless both the table and all its indexes are in RAM, you will be incurring I/O. regards, tom lane
On Fri, Oct 17, 2003 at 07:55:44PM +0530, Shridhar Daithankar wrote: > OK. So here is what I understand. I have a table which contains 100 rows > which appeated there due to some insert operation. Then I vacuum it. And > sit there for internity for rest of the database to approach the > singularity(the xid wraparound..:-) Nice term, isn't it?). > > So this static table is vulnerable to xid wraparound? I doubt. > > Did I miss something? You are missing the part when the XID that was formerly a "committed transaction" becomes an uncommitted transaction when the wraparound occurs... so the tuples will have creation XID by an uncommitted transaction, and current transactions will not see them. Voila, your table is empty. The trick to keep in mind is that the XID comparison functions use "modulo" operations, _but_ there are special "frozen" XIDs that are always "committed" -- that's why a VACUUM FREEZE would relieve the table forever from this problem. (At least this is how I understand it -- I could be totally wrong here) -- Alvaro Herrera (<alvherre[a]dcc.uchile.cl>) "Los dioses no protegen a los insensatos. Éstos reciben protección de otros insensatos mejor dotados" (Luis Wu, Mundo Anillo)
On Fri, 2003-10-17 at 09:34, Shridhar Daithankar wrote: > I am slightly confused here. IIRC pg_autovacuum never did a vacuum full. Correct. > At the > most it does vacuum /vacuum analyse, Incorrect, it either does vacuum analyse, or just analyse > none of which chew disk bandwidth. Incorrect, vacuum can have lots of disk I/O, analyze has considerably less, but still some. > And if > pg_autovacuum is running along with postmaster all the time, with aggressive > polling like 5 sec, the database should not accumulate any dead tuples True, however, I think such aggressive polling will be a net loss in efficiency. > nor it > would suffer xid wraparound as there are vacuum happening constantly. Wrong, pg_autovacuum typically just does vacuum [table name], which does not effect the xid wraparound issue, one has to issue a vacuum against an entire database to effect that. > What's left in above scenario? As long as all the requirements for pg_autovacuum > are met, namely setting it up, setting it up aggressively and tuning > postgresql.conf correctly, vacuum and related problems should be a thing in > past, at least as far as 7.4 and onwards is considered. Well it still remains to be seen if the client side implementation of pg_autovacuum is sufficient. Also, we will see if index bloat is handled (less an autovac issue, but semi-related). Ideally, autovac should make better decisions based on FSM and perhaps even the RSM (is that what it was called?) that people have talked about setting up. With all that said, hopefully pg_autovacuum proves to be a successful experiment, and if so, then it needs to be integrated into core somehow. Matthew
On Fri, 2003-10-17 at 10:25, Shridhar Daithankar wrote: > OK. So here is what I understand. I have a table which contains 100 rows which > appeated there due to some insert operation. Then I vacuum it. And sit there for > internity for rest of the database to approach the singularity(the xid > wraparound..:-) Nice term, isn't it?). > > So this static table is vulnerable to xid wraparound? I doubt. No that table would probably be ok, because you did a vacuum on it after the inserts. The problem is that pg_autovacuum may choose not to do a vacuum if you didn't cross a threshold, or someone outside of pg_autovacuum may have done the vacuum and autovac doesn't know about it, so it can't guarantee that all tables in the database are safe from xid wraparound. One additional thing, some of this might be possible if pg_autovacuum saved its data between restarts. Right now it restarts with no memory of what happened before.
Alvaro Herrera <alvherre@dcc.uchile.cl> writes: > Maybe in this case it's best to do a VACUUM FREEZE; that'd ensure that > the table would never ever need a vacuum again until it suffers > an insert, delete or update. But how would you keep track of that? Certainly an external autovacuum daemon couldn't know for sure that the table had never been modified since it was frozen. I suppose you could think about altering the backend to mark a table "dirty" whenever an insert/update/delete is done, but I'd have to think this would be a net waste of cycles in the vast majority of cases. How many people have tables that are *really* read-only over the long haul (billions of transactions)? I think the existing approach of forcing a database-wide vacuum every billion or so transactions is probably the most efficient way of dealing with the issue. It's almost certainly cheaper, net, than any scheme that adds even a tiny overhead to each individual insert/update/delete. regards, tom lane
Matthew T. O'Connor wrote: > On Fri, 2003-10-17 at 10:25, Shridhar Daithankar wrote: > >>OK. So here is what I understand. I have a table which contains 100 rows which >>appeated there due to some insert operation. Then I vacuum it. And sit there for >>internity for rest of the database to approach the singularity(the xid >>wraparound..:-) Nice term, isn't it?). >> >>So this static table is vulnerable to xid wraparound? I doubt. > > > No that table would probably be ok, because you did a vacuum on it after > the inserts. The problem is that pg_autovacuum may choose not to do a > vacuum if you didn't cross a threshold, or someone outside of > pg_autovacuum may have done the vacuum and autovac doesn't know about > it, so it can't guarantee that all tables in the database are safe from > xid wraparound. > > One additional thing, some of this might be possible if pg_autovacuum > saved its data between restarts. Right now it restarts with no memory > of what happened before. Well, the unmaintened gborg version adopted approach of storing such info. in a table, so that it survives postgresql/pg_atuvacuum restart or both. That was considered a tablespace pollution back then. But personally I think, it should be ok. If ever it goes to catalogues, I would rather add few columns to pg_class for such a stat. But again, thats not my call to make. Shridhar
Tom Lane wrote: > Shridhar Daithankar <shridhar_daithankar@persistent.co.in> writes: > >>Would it be possible to have a vacuum variant that would just shuffle thr. >>shared buffers and not touch disk at all? > > > What would be the use of that? You couldn't predict *anything* about > the coverage. Maybe you find all the free space in a particular table, > but most likely you don't. > > In any case an I/O-free vacuum is impossible since once you have decided > to recycle a particular tuple, you don't have any option about removing > the corresponding index entries first. So unless both the table and all > its indexes are in RAM, you will be incurring I/O. I am just suggesting it as a variant and not a replacement for existing vacuum options. Knowing that it does not do any IO, it could be triggered lot more aggressively. Furthermore if we assume pg_autovacuum as integral part of database operation, right before from a single database object is created, I think it could cover many/most database usage patterns barring multiple indexes, for which normal vacuum variants could be used. Furthermore, when a tuple is updated, all the relevant indexes are updated, right? So if such a vacuum is aggressive enough, it could catch the index entries as well, in the RAM. Think of it like catching hens. Easier to do in a cage rather than over a farm. So catch as many of them in cage. If they escape or spill out of cage due to over-population, you have to tread the farm anyways... Just a thought. Shridhar
On Fri, Oct 17, 2003 at 07:25:13PM +0530, Shridhar Daithankar wrote: > What part of plain vacuum takes disk bandwidth? WAL? Clog? Certainly not > data files themselves, right? Sure, the data files. The data files still have to be completely read from beginning to end by VACUUM. A -- ---- Andrew Sullivan 204-4141 Yonge Street Afilias Canada Toronto, Ontario Canada <andrew@libertyrms.info> M2P 2A8 +1 416 646 3304 x110
On Fri, 2003-10-17 at 10:22, Tom Lane wrote: > Shridhar Daithankar <shridhar_daithankar@persistent.co.in> writes: > > What part of plain vacuum takes disk bandwidth? > > Reading (and possibly rewriting) all the pages. Would it be possible for the backend to keep a list of the first N (N being a large number but not significant in memory usage) pages it has deleted tuples out of and a second list of N pages it has inserted tuples into. After the transaction has completed and there is an idle period (say 1/4 second between transaction) it can pass the insert information on a rollback and delete information on a commit to a separate backend. This 'vacuum' backend could then prioritize garbage collection for the pages it knows have been changed performing a single page vacuum when a specific page has seen a high level of reported activity. If this daemon could also get a hold of information about idleness of IO in general the decision about what to vacuum and when may be better (heavily hit pages during peak periods, all reports pages on medium load). When completely idle, run through the entire system to get back as much as possible.
Rod Taylor wrote: > On Fri, 2003-10-17 at 10:22, Tom Lane wrote: > >>Shridhar Daithankar <shridhar_daithankar@persistent.co.in> writes: >> >>>What part of plain vacuum takes disk bandwidth? >> >>Reading (and possibly rewriting) all the pages. > > > Would it be possible for the backend to keep a list of the first N (N > being a large number but not significant in memory usage) pages it has > deleted tuples out of and a second list of N pages it has inserted > tuples into. That is RSM, reclaimable space map. It is on TODO. > After the transaction has completed and there is an idle period (say 1/4 > second between transaction) it can pass the insert information on a > rollback and delete information on a commit to a separate backend. > > This 'vacuum' backend could then prioritize garbage collection for the > pages it knows have been changed performing a single page vacuum when a > specific page has seen a high level of reported activity. > > If this daemon could also get a hold of information about idleness of IO > in general the decision about what to vacuum and when may be better > (heavily hit pages during peak periods, all reports pages on medium > load). When completely idle, run through the entire system to get back > as much as possible. I agree. This seems to be the best way of dealing with things. Of course, probably there are details we are missing here, but in general its good. Shridhar
Shridhar Daithankar <shridhar_daithankar@persistent.co.in> writes: > I agree. This seems to be the best way of dealing with things. Of course, > probably there are details we are missing here, but in general its good. Actually, this is all pure handwaving, because you are ignoring the need to remove index tuples. The existing VACUUM code amortizes index cleanup over as many tuples as it can. If you do partial vacuuming of tables then you are necessarily going to be expending more cycles (and I/O) per tuple, on average, to get rid of the index entries. It's not at all clear that there's any real win to be had in that direction. Perhaps it's a win, but you have no evidence on which to assert so. regards, tom lane
Alvaro Herrera <alvherre@dcc.uchile.cl> writes: > The trick to keep in mind is that the XID comparison functions use > "modulo" operations, _but_ there are special "frozen" XIDs that are > always "committed" -- that's why a VACUUM FREEZE would relieve the table > forever from this problem. > (At least this is how I understand it -- I could be totally wrong here) No, that's exactly correct. regards, tom lane
On Fri, 2003-10-17 at 10:53, Shridhar Daithankar wrote: > Matthew T. O'Connor wrote: > > One additional thing, some of this might be possible if pg_autovacuum > > saved its data between restarts. Right now it restarts with no memory > > of what happened before. > > Well, the unmaintened gborg version adopted approach of storing such info. in a > table, so that it survives postgresql/pg_atuvacuum restart or both. > > That was considered a tablespace pollution back then. But personally I think, it > should be ok. If ever it goes to catalogues, I would rather add few columns to > pg_class for such a stat. But again, thats not my call to make. I still consider it tablespace pollution, when / if it gets integrated into the backend, and it uses system tables that is a different story, you are not modifying a users database. What should happen is that on exit pg_autovacuum writes it's data to a file that it rereads on startup, or something like that....
shridhar_daithankar@persistent.co.in (Shridhar Daithankar) writes: > Tom Lane wrote: >> I was just thinking of a GUC parameter: wait N milliseconds between >> pages, where N defaults to zero probably. A user who wants to run his >> vacuum as a background process could set N larger than zero. I don't >> believe we are anywhere near being able to automatically adjust the >> delay based on load, and even if we could, this would ignore the point >> you make above --- the user's intent has to matter as much as anything >> else. > > I am slightly confused here. IIRC pg_autovacuum never did a vacuum > full. At the most it does vacuum /vacuum analyse, none of which chew > disk bandwidth. [remainder elided; your second sentence is the vital bit...] > What am I missing? You are missing that VACUUM most certainly _does_ chew up disk bandwidth, because it must load the pages of the table into memory. If the system is busy doing other I/O, then the other I/O has to compete with the I/O initiated by VACUUM. VACUUM FULL is certainly more expensive than VACUUM/VACUUM ANALYZE; the point is that even the latter is NOT free on big tables when there is a lot of "traffic." VACUUM is like putting an extra few transport trucks onto the highway. It may only go from one highway junction to the next, and be fairly brief, if traffic is moving well. But if traffic is heavy, it adds to the congestion. (And that's as far as the analogy can go; I can't imagine a way of drawing the GUC parameter into this...) -- (format nil "~S@~S" "cbbrowne" "libertyrms.info") <http://dev6.int.libertyrms.com/> Christopher Browne (416) 646 3304 x124 (land)
shridhar_daithankar@persistent.co.in (Shridhar Daithankar) writes: > Andrew Sullivan wrote: > >> On Fri, Oct 17, 2003 at 07:04:45PM +0530, Shridhar Daithankar wrote: >> >>> I am slightly confused here. IIRC pg_autovacuum never did a vacuum >>> full. At the most it does vacuum /vacuum analyse, none of which >>> chew disk bandwidth. >> The latter is false. VACUUM FULL certainly uses _more_ disk >> bandwidth than VACUUM, but it's just false that plain VACUUM doesn't >> contend for disk. And if you're already maxed, then that extra >> bandwidth you cannot afford. > > What part of plain vacuum takes disk bandwidth? WAL? Clog? Certainly > not data files themselves, right? Certainly YES, the data files themselves. VACUUM has to read through the pages to assess what tuples are to expire. So if the data file is 8GB long, VACUUM has to read through 8GB of data. As compared to VACUUM FULL, it is certainly cheaper, as it is not rummaging around to reorder pages, but rather walking through, single page by single page. Thus, where VACUUM FULL might involve (in effect) reading through the file several times (as it shifts data between pages), VACUUM only reads through it once. That's (for the "for instance") 8GB of reads. -- "cbbrowne","@","libertyrms.info" <http://dev6.int.libertyrms.com/> Christopher Browne (416) 646 3304 x124 (land)
Christopher Browne <cbbrowne@libertyrms.info> writes: > VACUUM is like putting an extra few transport trucks onto the highway. > It may only go from one highway junction to the next, and be fairly > brief, if traffic is moving well. But if traffic is heavy, it adds to > the congestion. (And that's as far as the analogy can go; I can't > imagine a way of drawing the GUC parameter into this...) Ooh strained metaphors. This game is always fun. So I think of it the other way around. A busy database is like downtown traffic with everyone going every which way for short trips. Running vacuum is like having a few trucks driving through your city streets for through traffic. Having a parameter to slow down the through traffic is like, uh, having express lanes for local traffic. er, yeah, that's the ticket. Except who ever heard of having express lanes for local traffic. Hm. -- greg
Greg Stark wrote: > Christopher Browne <cbbrowne@libertyrms.info> writes: > >>VACUUM is like putting an extra few transport trucks onto the highway. >>It may only go from one highway junction to the next, and be fairly >>brief, if traffic is moving well. But if traffic is heavy, it adds to >>the congestion. (And that's as far as the analogy can go; I can't >>imagine a way of drawing the GUC parameter into this...) > > Ooh strained metaphors. This game is always fun. > > So I think of it the other way around. A busy database is like downtown > traffic with everyone going every which way for short trips. Running vacuum is > like having a few trucks driving through your city streets for through > traffic. > > Having a parameter to slow down the through traffic is like, uh, having > express lanes for local traffic. er, yeah, that's the ticket. Except who ever > heard of having express lanes for local traffic. Hm. All I know is that Jan Wieck would have each car filled to the brim with spikes.... Mike Mascari mascarm@mascari.com
On Fri, 2003-10-17 at 14:56, Mike Mascari wrote: > Greg Stark wrote: > > Ooh strained metaphors. This game is always fun. > > > > So I think of it the other way around. A busy database is like downtown > > traffic with everyone going every which way for short trips. Running vacuum is > > like having a few trucks driving through your city streets for through > > traffic. > > > > Having a parameter to slow down the through traffic is like, uh, having > > express lanes for local traffic. er, yeah, that's the ticket. Except who ever > > heard of having express lanes for local traffic. Hm. > > All I know is that Jan Wieck would have each car filled to the brim > with spikes.... ROTFLAMO
mascarm@mascari.com (Mike Mascari) writes: > Greg Stark wrote: >> Christopher Browne <cbbrowne@libertyrms.info> writes: >>>VACUUM is like putting an extra few transport trucks onto the >>>highway. It may only go from one highway junction to the next, and >>>be fairly brief, if traffic is moving well. But if traffic is >>>heavy, it adds to the congestion. (And that's as far as the >>>analogy can go; I can't imagine a way of drawing the GUC parameter >>>into this...) >> >> Ooh strained metaphors. This game is always fun. >> >> So I think of it the other way around. A busy database is like >> downtown traffic with everyone going every which way for short >> trips. Running vacuum is like having a few trucks driving through >> your city streets for through traffic. >> >> Having a parameter to slow down the through traffic is like, uh, >> having express lanes for local traffic. er, yeah, that's the >> ticket. Except who ever heard of having express lanes for local >> traffic. Hm. > > All I know is that Jan Wieck would have each car filled to the brim > with spikes.... No, you just need _one_ spike. _One_ spike in the centre of the steering wheel. There would be _so_ much less tailgating if they had those spikes... -- "cbbrowne","@","libertyrms.info" <http://dev6.int.libertyrms.com/> Christopher Browne (416) 646 3304 x124 (land)