Thread: pg_avd
Here is the code for the auto vacuum daemon I'm working on. I know some type of auto vacuum is highly desired by the project, I just don't know if this implementation of auto vacuum is desired as it's not built into the backend, rather it's a client app. Attached is tar.gz with all the files, expand it into the src/bin directory in the source tree and apply the Makefile.diff patch to src/bin/Makefile and then pg_avd will be built as an executable and installed in the bin directory along with psql and others. I have done some benchmarking (I posted some of it to hackers several weeks back) it always helps with file sizes, but doesn't always help performance since it might kick off a vacuum in the middle of your transaction. However it does prevent postgres from performing badly due to excessive file size and distorted statistics. Please try it out and give me feedback. Included in the tar is a README that describes what I have done, and how to use pg_avd. Matthew
Attachment
This looks very good. It is something I have been hoping to have for 7.4. It does what I think everyone wanted, that is, uses the stats collector to determine what tables to vacuum. Just a few comments: The copyright line in the text should probably be removed. pg_avd seems pretty cryptic. Perhaps pg_autovacuum. I like the fact it is a client app. Now, where do we put it? /bin, /contrib, gborg? --------------------------------------------------------------------------- Matthew T. O'Connor wrote: > Here is the code for the auto vacuum daemon I'm working on. > > I know some type of auto vacuum is highly desired by the project, I just > don't know if this implementation of auto vacuum is desired as it's not > built into the backend, rather it's a client app. > > Attached is tar.gz with all the files, expand it into the src/bin > directory in the source tree and apply the Makefile.diff patch to > src/bin/Makefile and then pg_avd will be built as an executable and > installed in the bin directory along with psql and others. > > I have done some benchmarking (I posted some of it to hackers several > weeks back) it always helps with file sizes, but doesn't always help > performance since it might kick off a vacuum in the middle of your > transaction. However it does prevent postgres from performing badly due > to excessive file size and distorted statistics. > > Please try it out and give me feedback. Included in the tar is a README > that describes what I have done, and how to use pg_avd. > > Matthew > Content-Description: [ Attachment, skipping... ] > > ---------------------------(end of broadcast)--------------------------- > TIP 3: if posting/reading through Usenet, please send an appropriate > subscribe-nomail command to majordomo@postgresql.org so that your > message can get through to the mailing list cleanly -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 359-1001 + If your life is a hard drive, | 13 Roberts Road + Christ can be your backup. | Newtown Square, Pennsylvania 19073
I think this is an important feature, especially for newbies, etc. I vote for /bin. That way it will grow and improve and it will soon become a power user feature. Chris ----- Original Message ----- From: "Bruce Momjian" <pgman@candle.pha.pa.us> To: "Matthew T. O'Connor" <matthew@zeut.net> Cc: "Patches" <pgsql-patches@postgresql.org> Sent: Tuesday, February 18, 2003 11:25 AM Subject: Re: [PATCHES] pg_avd > > This looks very good. It is something I have been hoping to have for > 7.4. It does what I think everyone wanted, that is, uses the stats > collector to determine what tables to vacuum. > > Just a few comments: > > The copyright line in the text should probably be removed. > pg_avd seems pretty cryptic. Perhaps pg_autovacuum. > I like the fact it is a client app. > > Now, where do we put it? /bin, /contrib, gborg? > > -------------------------------------------------------------------------- - > > Matthew T. O'Connor wrote: > > Here is the code for the auto vacuum daemon I'm working on. > > > > I know some type of auto vacuum is highly desired by the project, I just > > don't know if this implementation of auto vacuum is desired as it's not > > built into the backend, rather it's a client app. > > > > Attached is tar.gz with all the files, expand it into the src/bin > > directory in the source tree and apply the Makefile.diff patch to > > src/bin/Makefile and then pg_avd will be built as an executable and > > installed in the bin directory along with psql and others. > > > > I have done some benchmarking (I posted some of it to hackers several > > weeks back) it always helps with file sizes, but doesn't always help > > performance since it might kick off a vacuum in the middle of your > > transaction. However it does prevent postgres from performing badly due > > to excessive file size and distorted statistics. > > > > Please try it out and give me feedback. Included in the tar is a README > > that describes what I have done, and how to use pg_avd. > > > > Matthew > > > > Content-Description: > > [ Attachment, skipping... ] > > > > > ---------------------------(end of broadcast)--------------------------- > > TIP 3: if posting/reading through Usenet, please send an appropriate > > subscribe-nomail command to majordomo@postgresql.org so that your > > message can get through to the mailing list cleanly > > -- > Bruce Momjian | http://candle.pha.pa.us > pgman@candle.pha.pa.us | (610) 359-1001 > + If your life is a hard drive, | 13 Roberts Road > + Christ can be your backup. | Newtown Square, Pennsylvania 19073 > > ---------------------------(end of broadcast)--------------------------- > TIP 2: you can get off all lists at once with the unregister command > (send "unregister YourEmailAddressHere" to majordomo@postgresql.org) >
On Mon, 2003-02-17 at 22:25, Bruce Momjian wrote: > This looks very good. It is something I have been hoping to have for > 7.4. It does what I think everyone wanted, that is, uses the stats > collector to determine what tables to vacuum. > > Just a few comments: > > The copyright line in the text should probably be removed. > pg_avd seems pretty cryptic. Perhaps pg_autovacuum. > I like the fact it is a client app. The copyright line can be removed, I think it's there as part of a cut and paste from kdevelop. Also, changing the name is fine, if people like pg_autovacuum that is fine with me. Being a client app has advantages and disadvantages. It's simple, and an avd crash won't effect the postmaster. However, there are bad things also, it's not tightly coupled with postmaster startup and shutdown for one. Thanks for the feedback.
I think it's a question of, is this solution one that we want to keep for a while, or do we want a different implementation of AVD, perhaps something built into the backend that could take advantage of the FSM also. I would like to see it in bin, but thats just me :-) I would prefer contrib over gborg. What do people think of using pg_ctl to start and stop the AVD? Perhaps an new "with AVD" option when starting the postmaster. There are missing features still, but I will continue to work on this and fill in the gaps if people think it's worth it. On Mon, 2003-02-17 at 22:52, Christopher Kings-Lynne wrote: > I think this is an important feature, especially for newbies, etc. I vote > for /bin. That way it will grow and improve and it will soon become a power > user feature. > > Chris > > ----- Original Message ----- > From: "Bruce Momjian" <pgman@candle.pha.pa.us> > To: "Matthew T. O'Connor" <matthew@zeut.net> > Cc: "Patches" <pgsql-patches@postgresql.org> > Sent: Tuesday, February 18, 2003 11:25 AM > Subject: Re: [PATCHES] pg_avd > > > > > > This looks very good. It is something I have been hoping to have for > > 7.4. It does what I think everyone wanted, that is, uses the stats > > collector to determine what tables to vacuum. > > > > Just a few comments: > > > > The copyright line in the text should probably be removed. > > pg_avd seems pretty cryptic. Perhaps pg_autovacuum. > > I like the fact it is a client app. > > > > Now, where do we put it? /bin, /contrib, gborg?
On Tue, 2003-02-11 at 21:48, Matthew T. O'Connor wrote: > Here is the code for the auto vacuum daemon I'm working on. Some minor nit-picking follows below. I tend to be a bit of a style-nazi, don't mind me :-) - the length argument to snprintf() includes the terminating NUL byte, so code like pg_avd.c line 334 is off-by-one: char buf[256]; /* ... */ snprintf(buf,255,"..."); - AFAICS, there is no reason for update_db_list() and append_new_dbs() to use more than 1 database connection and 1 database query between them (just fetch the list of all DBs and the appropriate stats, then loop through the in-memory list of DB information, removing no-longer-present DBs and adding newly-created DBs). - if you're going to use the 'constant == variable' technique for comparisons, it would be nice to use it consistently - the usage info for "-h" is incorrect, as is the name of the app mentioned in the usage info ("pg_avd", not "pg_avd_c") - the return value of init_tables() is not what the comment states it should be - if all the pg_avd functions are only used by pg_avd itself, why not make them static? More significant changes I think would be useful: - separating the logic for ANALYZE and VACUUM seems like a good idea, IMHO. For example, INSERT doesn't create any dead tuples, so it shouldn't effect the need to VACUUM in any way -- but enough INSERTs will eventually warrant an ANALYZE. Also, I'd think most installations will need to VACUUM more often than they'll need to ANALYZE, so doing both at the same time doesn't seem optimal. - using some of the ideas from contrib/pgstattuple (i.e. looking at the amount of "dead space" in the relation) could be an interesting enhancement. As far as where this belongs, I vote against it going into bin/. It isn't polished enough, either in concept or in implementation, to deserve that kind of endorsement. But I think putting it into contrib/ for the next release would be a good idea: if people like it, we can take a look at seeing what other features / fine-tuning it needs to warrant being part of the official package. Cheers, Neil -- Neil Conway <neilc@samurai.com> || PGP Key ID: DB3C29FC
> - separating the logic for ANALYZE and VACUUM seems like a good idea, > IMHO. For example, INSERT doesn't create any dead tuples, so it > shouldn't effect the need to VACUUM in any way -- but enough INSERTs You mean "affect" :) Sorry - just a grammar nazi :) > As far as where this belongs, I vote against it going into bin/. It > isn't polished enough, either in concept or in implementation, to > deserve that kind of endorsement. But I think putting it into contrib/ > for the next release would be a good idea: if people like it, we can > take a look at seeing what other features / fine-tuning it needs to > warrant being part of the official package. True - I change my vote to /contrib :) Chris
Neil Conway <neilc@samurai.com> writes: > Some minor nit-picking follows below. I tend to be a bit of a > style-nazi, don't mind me :-) > - the length argument to snprintf() includes the terminating NUL byte, > so code like pg_avd.c line 334 is off-by-one: > char buf[256]; > /* ... */ > snprintf(buf,255,"..."); Actually the preferred coding of this is snprintf(buf, sizeof(buf), ...); which is both correct and impervious to subsequent alteration of the declared size of buf. I get antsy whenever I see a snprintf with a literal-constant size argument, because it's a mistake waiting to happen. Use sizeof() when you can, or at least a #define. regards, tom lane
Hi all, On one hand pg_autovacuum must give opportunity to launch it without sleep loop, for instance for launch it at night by cron, then it vacuumed only tables which needed it. On other, if pg_autovacuum running as daemon, IMHO better launch it from postmaster, in the same manner, as launched statistic collector. Must pg_autovacuum also check transaction ID (XID) numbers to prevent transaction ID wraparound? (PostgreSQL v.7.3.1 docs 8.2.3. Preventing transaction ID wraparound failures) And must it reindex in the same manner as contrib/reindexdb? (PostgreSQL v.7.3.1 docs 8.3. Routine Reindexing) -- Olleg Samoylov
On Tue, 2003-02-18 at 09:04, Olleg Samoylov wrote: > Hi all, > > On one hand pg_autovacuum must give opportunity to launch it without sleep > loop, for instance for launch it at night by cron, then it vacuumed only > tables which needed it. pg_autovacuum does not preclude you from doing normal vacuuming from cron. > On other, if pg_autovacuum running as daemon, IMHO better launch it from > postmaster, in the same manner, as launched statistic collector. The problem is that it's not a backend process, it's a libpq based client app. I'm not sure it's a good idea to have the postmaster launch client external applications, I can't explain why, just seems that way to me. Any one else have any thoughts? > Must pg_autovacuum also check transaction ID (XID) numbers to prevent > transaction ID wraparound? (PostgreSQL v.7.3.1 docs 8.2.3. Preventing > transaction ID wraparound failures) Yes, it must, this is one of the "missing features" that I will complete if people like the overall pg_autovacuum concept. > And must it reindex in the same manner as contrib/reindexdb? (PostgreSQL > v.7.3.1 docs 8.3. Routine Reindexing) Have never thought of it as a reindexing tool. I think that is outside the scope of pg_autovacuum since reindexing is not a simple thing, not always wanted, and has concurrency issues (I think).
Thank you much for the feedback / code review. On Tue, 2003-02-18 at 03:24, Neil Conway wrote: > On Tue, 2003-02-11 at 21:48, Matthew T. O'Connor wrote: > > Here is the code for the auto vacuum daemon I'm working on. > > Some minor nit-picking follows below. I tend to be a bit of a > style-nazi, don't mind me :-) > > - the length argument to snprintf() includes the terminating NUL byte, > so code like pg_avd.c line 334 is off-by-one: > > char buf[256]; > /* ... */ > snprintf(buf,255,"..."); Will Fix, (actuall I will fix it as Tom suggested in the subsequent email) > - AFAICS, there is no reason for update_db_list() and append_new_dbs() > to use more than 1 database connection and 1 database query between them > (just fetch the list of all DBs and the appropriate stats, then loop > through the in-memory list of DB information, removing no-longer-present > DBs and adding newly-created DBs). Agreed, will do. > - if you're going to use the 'constant == variable' technique for > comparisons, it would be nice to use it consistently Ok, I'll try to find all the places where it's not consistent. The next version I submit should be better. > - the usage info for "-h" is incorrect, as is the name of the app > mentioned in the usage info ("pg_avd", not "pg_avd_c") Opps.... > - the return value of init_tables() is not what the comment states it > should be OK. > - if all the pg_avd functions are only used by pg_avd itself, why not > make them static? Ok will do. > More significant changes I think would be useful: > > - separating the logic for ANALYZE and VACUUM seems like a good idea, > IMHO. For example, INSERT doesn't create any dead tuples, so it > shouldn't effect the need to VACUUM in any way -- but enough INSERTs > will eventually warrant an ANALYZE. Also, I'd think most installations > will need to VACUUM more often than they'll need to ANALYZE, so doing > both at the same time doesn't seem optimal. Agreed, I have some of the infrastructure for this inplace (it tracks inserts and deletes separately (update counts as one of each), and I have a different threshold for both inserts and deletes, I'll finish this off for the next version. > - using some of the ideas from contrib/pgstattuple (i.e. looking at the > amount of "dead space" in the relation) could be an interesting > enhancement. I did a little looking at this a while ago, it wasn't clear to me that pgstattuple was any faster than just doing a vacuum. > As far as where this belongs, I vote against it going into bin/. It > isn't polished enough, either in concept or in implementation, to > deserve that kind of endorsement. But I think putting it into contrib/ > for the next release would be a good idea: if people like it, we can > take a look at seeing what other features / fine-tuning it needs to > warrant being part of the official package. Well I agree the implementation is not polished enough, but I can keep working on that till everyone is happy. The question I really have is the concept. However I feel I'm getting enough positive feedback to keep working on it for inclusion into contrib, possibly bin but I agree that one cycle in contrib might be a good idea.
On Tue, 2003-02-18 at 10:07, Tom Lane wrote: > Neil Conway <neilc@samurai.com> writes: > > Some minor nit-picking follows below. I tend to be a bit of a > > style-nazi, don't mind me :-) > > > - the length argument to snprintf() includes the terminating NUL byte, > > so code like pg_avd.c line 334 is off-by-one: > > > char buf[256]; > > /* ... */ > > snprintf(buf,255,"..."); > > Actually the preferred coding of this is > > snprintf(buf, sizeof(buf), ...); > > which is both correct and impervious to subsequent alteration of the > declared size of buf. I get antsy whenever I see a snprintf with a > literal-constant size argument, because it's a mistake waiting to > happen. Use sizeof() when you can, or at least a #define. OK, I will make those changes in the next version along with all the changes mentioned in the other emails I have received. I will try to have it in the next two days or so. Matthew
Matthew T. O'Connor wrote: > Well I agree the implementation is not polished enough, but I can keep > working on that till everyone is happy. The question I really have is > the concept. However I feel I'm getting enough positive feedback to > keep working on it for inclusion into contrib, possibly bin but I agree > that one cycle in contrib might be a good idea. Yep, certainly very good feedback on this. The only question I have is whether there is any way to see how much free space is available for each relation, and whether this is an issue. Is there code that sees lots of inserts into a table and starts a vacuum on it to try to fill up the free space list? My concern is that without knowing how much free space is recorded, it might be hard to know if a vacuum is required. -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 359-1001 + If your life is a hard drive, | 13 Roberts Road + Christ can be your backup. | Newtown Square, Pennsylvania 19073
Matthew T. O'Connor wrote: > > - using some of the ideas from contrib/pgstattuple (i.e. looking at the > > amount of "dead space" in the relation) could be an interesting > > enhancement. > > I did a little looking at this a while ago, it wasn't clear to me that > pgstattuple was any faster than just doing a vacuum. Someone reported that pg_stattuple is faster than COUNT(*), so it might be faster than VACUUM. Actually, it is a loadable function, so it has direct access to the backend area, while libpq does not. It might also be interesting to create a function that can return the free space map size for a relation. That would allow you to know when there is little free space available for a table getting a lot of inserts. Of course, I am just throwing out ideas, and they may not be good ones. :-) -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 359-1001 + If your life is a hard drive, | 13 Roberts Road + Christ can be your backup. | Newtown Square, Pennsylvania 19073
From: "Bruce Momjian" <pgman@candle.pha.pa.us> > The only question I have is whether there is any way to see how much > free space is available for each relation, and whether this is an issue. > Is there code that sees lots of inserts into a table and starts a > vacuum on it to try to fill up the free space list? My concern is that > without knowing how much free space is recorded, it might be hard to > know if a vacuum is required. Since there is currently no way to know about the status from the FSM from the client side I didn't encorporate anything related to it. I had thought about extending the stats system to show FSM information, but I wasn't sure it's a good idea and I wasn't sure how to do it (not that good when dealing with code in the backend yet...) . Thoughts anyone? More generally speaking, everytime there is an update or a delete, free space was created, so just by looking at the activity on the table we have a rough idea if we need to vacuum. What we don't know is if a highly updated table has exhausted it's FSM entries and would benifit from a vacuum to repopulate the FSM.
From: "Bruce Momjian" <pgman@candle.pha.pa.us> > It might also be interesting to create a function that can return the free space map > size for a relation. That would allow you to know when there is little > free space available for a table getting a lot of inserts. I had thought about extending the stats system to report the FSM info related to a particular table. It might be better as a function, not sure.
Matthew T. O'Connor writes: > I think it's a question of, is this solution one that we want to keep > for a while, or do we want a different implementation of AVD, perhaps > something built into the backend that could take advantage of the FSM > also. To me it seems that this would be much better if kept inside the server. We already have machinery for launching periodic processes that do stuff in the data files (the checkpoint process). -- Peter Eisentraut peter_e@gmx.net
Peter Eisentraut <peter_e@gmx.net> writes: > Matthew T. O'Connor writes: >> I think it's a question of, is this solution one that we want to keep >> for a while, or do we want a different implementation of AVD, perhaps >> something built into the backend that could take advantage of the FSM >> also. > To me it seems that this would be much better if kept inside the server. I agree, it seems like a server-side implementation would be the only credible way to go for a production-grade version of this feature. But I don't see anything wrong with building a client-side prototype, which is what pg_avd looks like from here. (Unless the client is contorted by not being able to get at things it needs.) regards, tom lane
On Wed, 2003-02-19 at 10:11, Tom Lane wrote: > Peter Eisentraut <peter_e@gmx.net> writes: > > Matthew T. O'Connor writes: > >> I think it's a question of, is this solution one that we want to keep > >> for a while, or do we want a different implementation of AVD, perhaps > >> something built into the backend that could take advantage of the FSM > >> also. > > > To me it seems that this would be much better if kept inside the server. > > I agree, it seems like a server-side implementation would be the only > credible way to go for a production-grade version of this feature. > > But I don't see anything wrong with building a client-side prototype, > which is what pg_avd looks like from here. (Unless the client is > contorted by not being able to get at things it needs.) I don't think pg_avd is contorted, but it is limited to the data published by the stats system, so there is no FSM etc... I also think it would probably be better in the backend, I just wasn't sure if the additional complexity was worth it. The primary advantage of a client side implementation is simplicity. That said, I originally tried to do this in the backend, but found the task too daunting for me and gave up. When Shridhar started some work on a client side version I decided to run with that and see how far I could get. Question: Should I keep working on pg_avd for contrib inclusion in 7.4, or should I try again on a backend implementation that might be less likely to get into 7.4?
Matthew T. O'Connor wrote: > I don't think pg_avd is contorted, but it is limited to the data > published by the stats system, so there is no FSM etc... > > I also think it would probably be better in the backend, I just wasn't > sure if the additional complexity was worth it. The primary advantage > of a client side implementation is simplicity. > > That said, I originally tried to do this in the backend, but found the > task too daunting for me and gave up. When Shridhar started some work > on a client side version I decided to run with that and see how far I > could get. > > Question: Should I keep working on pg_avd for contrib inclusion in 7.4, > or should I try again on a backend implementation that might be less > likely to get into 7.4? Well, one idea would be to write a loadable function that dumps out FSM information --- then you could improve your implementation and eventually put it in the backend. I would like to have something for 7.4. -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 359-1001 + If your life is a hard drive, | 13 Roberts Road + Christ can be your backup. | Newtown Square, Pennsylvania 19073
Maybe i can provide a datapoint on this matter. I've been using pgavd from over at gborg (Shridhar Daithankar's (i think?) code) in production for some months now. I love it. I was some hazzle to get it up, but once installed everything is smooth and nice. The vacuum + analyze seems to affect my system in an only postive way: 1) Everything seems snappier. 2) Weekly vacuum full is faster now. 3) The updates on a table with counters is updated faster nowadays. 4) pgavd is totally automatic, so just install it and run it. Well i just wanted to say that i don't care if it's an external/client process, it makes a huge differance to us lazy sob's that always forgets to vacuum all the time. It's even better that it's using pgstats to determine when to run vacuum, so after big inserts it gets vacuumed promptly. Magnus