Thread: pg_autovacuum integration attempt #2
I have make the changes suggested after I posted my last patch, I have also make several additional improvements. it needs to be tested more, but since the deadline is coming up so soon, I wanted to post an update just to keep everyone abreast of what I'm doing. Please review and let me know what I need to change to get it applied to CVS. As before, this patch requires that pg_autovacuum.c and .h are moved from contrib to src/backend/postmaster and src/include/postmaster respectively. I have also attached the pg_autovacuum.h file that needs to be put in src/include/catelog/ for the new pg_autovacuum system table. I assume I will have to write the docs for this, but right now I'm focusing on the code, I get get the docs written after the feature freeze. Matthew O'Connor
Attachment
FYI, I am out of town from Friday 7/2 till Monday evening 7/5. I probably won't have email while I'm gone. I will continue working on this when I get back. Also, I have made a few minor tweaks since submitting this patch namely changing autovac_enabled to default to false. I am also trying to use a GUC assign hook function to require pgstat_collect_tuplelevel = true when autovac_enabled = true. The problem I'm running into is that the hook function seems to be getting called before the pgstat variables is set, thus the hook complains even when pgstat_collect_tuplelevel is set to true. Is there someway to force the order in which variables are read from GUC? Thanks, Matthew On Tue, 2004-06-29 at 10:44, Matthew T. O'Connor wrote: > I have make the changes suggested after I posted my last patch, I have > also make several additional improvements. it needs to be tested more, > but since the deadline is coming up so soon, I wanted to post an update > just to keep everyone abreast of what I'm doing. Please review and let > me know what I need to change to get it applied to CVS. > > As before, this patch requires that pg_autovacuum.c and .h are moved > from contrib to src/backend/postmaster and src/include/postmaster > respectively. I have also attached the pg_autovacuum.h file that needs > to be put in src/include/catelog/ for the new pg_autovacuum system > table. > > I assume I will have to write the docs for this, but right now I'm > focusing on the code, I get get the docs written after the feature > freeze. > > Matthew O'Connor > > > > > ______________________________________________________________________ > ---------------------------(end of broadcast)--------------------------- > TIP 8: explain analyze is your friend
"Matthew T. O'Connor" <matthew@zeut.net> writes: > Is there someway to force the order in which variables are > read from GUC? No. regards, tom lane
I have added this patch plus your later comments to the patch queue. Your patch has been added to the PostgreSQL unapplied patches list at: http://momjian.postgresql.org/cgi-bin/pgpatches I will try to apply it within the next 48 hours. --------------------------------------------------------------------------- Matthew T. O'Connor wrote: > I have make the changes suggested after I posted my last patch, I have > also make several additional improvements. it needs to be tested more, > but since the deadline is coming up so soon, I wanted to post an update > just to keep everyone abreast of what I'm doing. Please review and let > me know what I need to change to get it applied to CVS. > > As before, this patch requires that pg_autovacuum.c and .h are moved > from contrib to src/backend/postmaster and src/include/postmaster > respectively. I have also attached the pg_autovacuum.h file that needs > to be put in src/include/catelog/ for the new pg_autovacuum system > table. > > I assume I will have to write the docs for this, but right now I'm > focusing on the code, I get get the docs written after the feature > freeze. > > Matthew O'Connor > > > [ Attachment, skipping... ] [ Attachment, skipping... ] > > ---------------------------(end of broadcast)--------------------------- > TIP 8: explain analyze is your friend -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 359-1001 + If your life is a hard drive, | 13 Roberts Road + Christ can be your backup. | Newtown Square, Pennsylvania 19073
Bruce Momjian wrote: > I have added this patch plus your later comments to the patch queue. The autovacuum process still uses libpq to send its queries, which is not the idea behind backend integration. > Your patch has been added to the PostgreSQL unapplied patches list > at: > > http://momjian.postgresql.org/cgi-bin/pgpatches > > I will try to apply it within the next 48 hours. > > --------------------------------------------------------------------- >------ > > Matthew T. O'Connor wrote: > > I have make the changes suggested after I posted my last patch, I > > have also make several additional improvements. it needs to be > > tested more, but since the deadline is coming up so soon, I wanted > > to post an update just to keep everyone abreast of what I'm doing. > > Please review and let me know what I need to change to get it > > applied to CVS. > > > > As before, this patch requires that pg_autovacuum.c and .h are > > moved from contrib to src/backend/postmaster and > > src/include/postmaster respectively. I have also attached the > > pg_autovacuum.h file that needs to be put in src/include/catelog/ > > for the new pg_autovacuum system table. > > > > I assume I will have to write the docs for this, but right now I'm > > focusing on the code, I get get the docs written after the feature > > freeze. > > > > Matthew O'Connor > > [ Attachment, skipping... ] > > [ Attachment, skipping... ] > > > ---------------------------(end of > > broadcast)--------------------------- TIP 8: explain analyze is > > your friend
Peter Eisentraut wrote: >Bruce Momjian wrote: > > >>I have added this patch plus your later comments to the patch queue. >> >> > >The autovacuum process still uses libpq to send its queries, which is >not the idea behind backend integration. > > Can we consider this a non-fatal bug that has to be solved soon after the patch is applied? Regards, Andreas
Peter Eisentraut <peter_e@gmx.net> writes: > The autovacuum process still uses libpq to send its queries, which is > not the idea behind backend integration. Sure, but one step at a time. Getting it auto-launched from the postmaster is already a good increment in usability, and offhand I don't see that there's any part of that work that we'd have to throw away later. (This is not to say that I like the patch; I haven't reviewed it yet. But I don't want to reject it just because it's not the final form of autovacuum.) regards, tom lane
Peter Eisentraut wrote: > Bruce Momjian wrote: >>I have added this patch plus your later comments to the patch queue. > > The autovacuum process still uses libpq to send its queries, which is > not the idea behind backend integration. Actually, I intentionally had pg_autovacuum continue to use libpq based Tom's advise. Please see this thread: http://archives.postgresql.org/pgsql-hackers/2004-03/msg00931.php And more specifically, these follow ups: http://archives.postgresql.org/pgsql-hackers/2004-03/msg00989.php http://archives.postgresql.org/pgsql-hackers/2004-03/msg00992.php The short of it is that Tom felt having the autovac daemon continue to use libpq keeps "autovac control code still at arms length from the backend" To me the main benifit of having pg_autovacuum integrated in as a backend process is not eliminating libpq from the process, but rather: * access to GUC, elog (and other things) * allows autovac to be started and shutdown by the backend based on a GUC variable * allows autovac to have it's own system table to maintain it's data across restarts * eventually should allow vacuum decisions to be based on factors beyond the stats system (FSM etc...) IMHO, the use of libpq is not a bug, it's a feature. Thoughts? Matthew
"Matthew T. O'Connor" <matthew@zeut.net> writes: > Actually, I intentionally had pg_autovacuum continue to use libpq based > Tom's advise. Please see this thread: > http://archives.postgresql.org/pgsql-hackers/2004-03/msg00931.php > And more specifically, these follow ups: > http://archives.postgresql.org/pgsql-hackers/2004-03/msg00989.php > http://archives.postgresql.org/pgsql-hackers/2004-03/msg00992.php Something seems to have truncated msg00987 in the archives, but I looked it up in my own mail folder ... regards, tom lane ------- Forwarded Message Date: Mon, 22 Mar 2004 16:38:43 -0500 From: Tom Lane <tgl@sss.pgh.pa.us> To: Gavin Sherry <swm@linuxworld.com.au> cc: "Matthew T. O'Connor" <matthew@zeut.net>, Bruce Momjian <pgman@candle.pha.pa.us>, Christopher Kings-Lynne <chriskl@familyhealth.com.au>, Peter Eisentraut <peter_e@gmx.net>, pgsql-hackers@postgresql.org Subject: Re: [HACKERS] pg_autovacuum next steps Gavin Sherry <swm@linuxworld.com.au> writes: > So, do we want a static time, a GUC controlled time or some time which is > modified by pg_autovacuum's/stat's collector's knowledge of the amount of > work which goes on in any given database? From the point of view of the postmaster a GUC-controlled delay would seem like the best thing. We could discuss having the autovacuum code try to feed back adjustments in the delay, but remember that one of the golden virtues for the postmaster proper is simplicity; that translates directly to robustness. We don't want the postmaster engaging in anything complicated that could potentially lock it up or crash it due to a bug. Possibly this point could be finessed by a two-level structure, that is, postmaster launches autovacuum daemon (which is not itself a backend) and that in turn launches backends to do the real per-database work. The postmaster's only subsequent involvement is restarting the autovac daemon if it dies. The autovac daemon can be as complex as you want. This nice-sounding arrangement is probably not directly workable because of the fact that the postmaster has no good way to know about or control backends if they aren't its direct children. Perhaps the autovac daemon *should* use libpq, that is, not fork backends but connect via the postmaster each time it wants to run a backend. Then the backends are ordinary children of the postmaster and everything acts normally. (This could amount to using the existing autovac code, and simply adding a frammish to the postmaster to autospawn the autovac daemon as a non-backend child process.) regards, tom lane ------- End of Forwarded Message
Matthew, were are we on this patch? --------------------------------------------------------------------------- Matthew T. O'Connor wrote: > FYI, I am out of town from Friday 7/2 till Monday evening 7/5. I > probably won't have email while I'm gone. I will continue working on > this when I get back. > > Also, I have made a few minor tweaks since submitting this patch namely > changing autovac_enabled to default to false. I am also trying to use a > GUC assign hook function to require pgstat_collect_tuplelevel = true > when autovac_enabled = true. The problem I'm running into is that the > hook function seems to be getting called before the pgstat variables is > set, thus the hook complains even when pgstat_collect_tuplelevel is set > to true. Is there someway to force the order in which variables are > read from GUC? > > Thanks, > > Matthew > > > On Tue, 2004-06-29 at 10:44, Matthew T. O'Connor wrote: > > I have make the changes suggested after I posted my last patch, I have > > also make several additional improvements. it needs to be tested more, > > but since the deadline is coming up so soon, I wanted to post an update > > just to keep everyone abreast of what I'm doing. Please review and let > > me know what I need to change to get it applied to CVS. > > > > As before, this patch requires that pg_autovacuum.c and .h are moved > > from contrib to src/backend/postmaster and src/include/postmaster > > respectively. I have also attached the pg_autovacuum.h file that needs > > to be put in src/include/catelog/ for the new pg_autovacuum system > > table. > > > > I assume I will have to write the docs for this, but right now I'm > > focusing on the code, I get get the docs written after the feature > > freeze. > > > > Matthew O'Connor > > > > > > > > > > ______________________________________________________________________ > > ---------------------------(end of broadcast)--------------------------- > > TIP 8: explain analyze is your friend > > > ---------------------------(end of broadcast)--------------------------- > TIP 5: Have you checked our extensive FAQ? > > http://www.postgresql.org/docs/faqs/FAQ.html > -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 359-1001 + If your life is a hard drive, | 13 Roberts Road + Christ can be your backup. | Newtown Square, Pennsylvania 19073
I think it's ready to apply barring any feedback to the contrary. Actually I do have a small improvement I will send in tomorrow (sorry can't do it any sooner) that defaulted autovac_enabled to false, and makes autovac fail more gracefully when the stats system is not enabled, but that shouldn't stop you from applying this patch. The thing I was trying to do was use the GUC hook function to make sure that the required GUC variables are also set before GUC reports autovac as enabled. This seemed cleaner to me, but apparently it won't work since it seems that autovac_enabled is read from GUC before the stats variables, and there is no way to force the order in which they are read. Am I missing something? Matthew > Matthew, were are we on this patch? > > --------------------------------------------------------------------------- > > Matthew T. O'Connor wrote: >> FYI, I am out of town from Friday 7/2 till Monday evening 7/5. I >> probably won't have email while I'm gone. I will continue working on >> this when I get back. >> >> Also, I have made a few minor tweaks since submitting this patch namely >> changing autovac_enabled to default to false. I am also trying to use a >> GUC assign hook function to require pgstat_collect_tuplelevel = true >> when autovac_enabled = true. The problem I'm running into is that the >> hook function seems to be getting called before the pgstat variables is >> set, thus the hook complains even when pgstat_collect_tuplelevel is set >> to true. Is there someway to force the order in which variables are >> read from GUC? >> >> Thanks, >> >> Matthew >> >> >> On Tue, 2004-06-29 at 10:44, Matthew T. O'Connor wrote: >> > I have make the changes suggested after I posted my last patch, I have >> > also make several additional improvements. it needs to be tested >> more, >> > but since the deadline is coming up so soon, I wanted to post an >> update >> > just to keep everyone abreast of what I'm doing. Please review and let >> > me know what I need to change to get it applied to CVS. >> > >> > As before, this patch requires that pg_autovacuum.c and .h are moved >> > from contrib to src/backend/postmaster and src/include/postmaster >> > respectively. I have also attached the pg_autovacuum.h file that needs >> > to be put in src/include/catelog/ for the new pg_autovacuum system >> > table. >> > >> > I assume I will have to write the docs for this, but right now I'm >> > focusing on the code, I get get the docs written after the feature >> > freeze. >> > >> > Matthew O'Connor >> > >> > >> > >> > >> > ______________________________________________________________________ >> > ---------------------------(end of >> broadcast)--------------------------- >> > TIP 8: explain analyze is your friend >> >> >> ---------------------------(end of broadcast)--------------------------- >> TIP 5: Have you checked our extensive FAQ? >> >> http://www.postgresql.org/docs/faqs/FAQ.html >> > > -- > Bruce Momjian | http://candle.pha.pa.us > pgman@candle.pha.pa.us | (610) 359-1001 > + If your life is a hard drive, | 13 Roberts Road > + Christ can be your backup. | Newtown Square, Pennsylvania > 19073 > > ---------------------------(end of broadcast)--------------------------- > TIP 8: explain analyze is your friend > >
Matthew T. O'Connor wrote: > I think it's ready to apply barring any feedback to the contrary. > Actually I do have a small improvement I will send in tomorrow (sorry > can't do it any sooner) that defaulted autovac_enabled to false, and makes > autovac fail more gracefully when the stats system is not enabled, but > that shouldn't stop you from applying this patch. Great! > The thing I was trying to do was use the GUC hook function to make sure > that the required GUC variables are also set before GUC reports autovac as > enabled. This seemed cleaner to me, but apparently it won't work since it > seems that autovac_enabled is read from GUC before the stats variables, > and there is no way to force the order in which they are read. Am I > missing something? Oh, so that is the reason for asking about ordering. The only other case I have seen like this is for log_statement_stats: ereport(ERROR, (errcode(ERRCODE_INVALID_PARAMETER_VALUE), errmsg("cannot enable \"log_statement_stats\" when \"log_parser_stats\",\n" "\"log_planner_stats\", or \"log_executor_stats\" is true."))); The code works most of the time because it is checking to see if two values are set to non-defaults. In your case, you want to have both set to non-defaults for it to work. I can see that requiring ordering and I can't think of a way to fix that. If you put something in the code after the config file was read, how do you check later for cases where the user is using SET. Ah, I got it. The 'source' tells you if it is coming from a config file or from a user SET or something. You could do that check during the assignment when it is coming from SET, and add a function call after the config file is loaded for more global checks of multiple variables. Anyway, we can do that later. Let's set it to false and get it in. -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 359-1001 + If your life is a hard drive, | 13 Roberts Road + Christ can be your backup. | Newtown Square, Pennsylvania 19073
Matthew T. O'Connor wrote: > I think it's ready to apply barring any feedback to the contrary. > Actually I do have a small improvement I will send in tomorrow (sorry > can't do it any sooner) that defaulted autovac_enabled to false, and > makes autovac fail more gracefully when the stats system is not > enabled, but that shouldn't stop you from applying this patch. A nitpick on that parameter name: There is not reason to abbreviate "autovacuum": there is enough space. And the "_enabled" is redundant. > The thing I was trying to do was use the GUC hook function to make > sure that the required GUC variables are also set before GUC reports > autovac as enabled. This seemed cleaner to me, but apparently it > won't work since it seems that autovac_enabled is read from GUC > before the stats variables, and there is no way to force the order in > which they are read. Am I missing something? What I am missing at least is what you really mean by "the required GUC variables are also set". GUC variables are always set to something (at least after the initialization phase, but you don't get to do anything with GUC before the initialization, obviously). -- Peter Eisentraut http://developer.postgresql.org/~petere/
Peter Eisentraut wrote: > Matthew T. O'Connor wrote: > > I think it's ready to apply barring any feedback to the contrary. > > Actually I do have a small improvement I will send in tomorrow (sorry > > can't do it any sooner) that defaulted autovac_enabled to false, and > > makes autovac fail more gracefully when the stats system is not > > enabled, but that shouldn't stop you from applying this patch. > > A nitpick on that parameter name: There is not reason to abbreviate > "autovacuum": there is enough space. And the "_enabled" is redundant. Agreed. Nitpicks are important too because it makes us so beautiful. :-) > > The thing I was trying to do was use the GUC hook function to make > > sure that the required GUC variables are also set before GUC reports > > autovac as enabled. This seemed cleaner to me, but apparently it > > won't work since it seems that autovac_enabled is read from GUC > > before the stats variables, and there is no way to force the order in > > which they are read. Am I missing something? > > What I am missing at least is what you really mean by "the required GUC > variables are also set". GUC variables are always set to something (at > least after the initialization phase, but you don't get to do anything > with GUC before the initialization, obviously). He means he has to have the stats collector enabled along with autovacuum, and he doesn't have a way to effectively test that stats are enabled when enabling autovacuum because the ordering of postgresql.conf variable loads isn't predefined. -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 359-1001 + If your life is a hard drive, | 13 Roberts Road + Christ can be your backup. | Newtown Square, Pennsylvania 19073
On Fri, 2004-07-16 at 17:13, Bruce Momjian wrote: > Peter Eisentraut wrote: > > A nitpick on that parameter name: There is not reason to abbreviate > > "autovacuum": there is enough space. And the "_enabled" is redundant. > > Agreed. Nitpicks are important too because it makes us so beautiful. :-) :-) I'll make that change. > > What I am missing at least is what you really mean by "the required GUC > > variables are also set". GUC variables are always set to something (at > > least after the initialization phase, but you don't get to do anything > > with GUC before the initialization, obviously). > > He means he has to have the stats collector enabled along with > autovacuum, and he doesn't have a way to effectively test that stats are > enabled when enabling autovacuum because the ordering of postgresql.conf > variable loads isn't predefined. Exactly. However, I can work around the GUC ordering issue just by having the postmaster check all the required variables before launching autovac. What I wanted to do was have GUC prevent autovacuum = true when the stats collector is not enabled, reguardless of what is in postgresql.conf.
>>The thing I was trying to do was use the GUC hook function to make >>sure that the required GUC variables are also set before GUC reports >>autovac as enabled. This seemed cleaner to me, but apparently it >>won't work since it seems that autovac_enabled is read from GUC >>before the stats variables, and there is no way to force the order in >>which they are read. Am I missing something? Can we please have it default to enabled :)
Matthew T. O'Connor wrote: > On Fri, 2004-07-16 at 17:13, Bruce Momjian wrote: > > Peter Eisentraut wrote: > > > A nitpick on that parameter name: There is not reason to abbreviate > > > "autovacuum": there is enough space. And the "_enabled" is redundant. > > > > Agreed. Nitpicks are important too because it makes us so beautiful. :-) > > :-) I'll make that change. > > > > What I am missing at least is what you really mean by "the required GUC > > > variables are also set". GUC variables are always set to something (at > > > least after the initialization phase, but you don't get to do anything > > > with GUC before the initialization, obviously). > > > > He means he has to have the stats collector enabled along with > > autovacuum, and he doesn't have a way to effectively test that stats are > > enabled when enabling autovacuum because the ordering of postgresql.conf > > variable loads isn't predefined. > > Exactly. However, I can work around the GUC ordering issue just by > having the postmaster check all the required variables before launching > autovac. What I wanted to do was have GUC prevent autovacuum = true > when the stats collector is not enabled, reguardless of what is in > postgresql.conf. Yes, that's what I would do. Agreed it would be nice to report the failure during SET though, but I can't think of a way to do that. You are actually showing a general limitation of checking GUC variable interactions. For example, I now see that my assign_log_stats() is wrong because if you enable one of the parameters in the postgresql.conf file, and then disable one and enable another, the reload might not work depending on the order they appear in the file. I wonder if we need to have some function to check the state of variable interactions after the file is processed and after other groupings like user/database settings. We can deal with this later, of course, but it is clearly a limitation of our current system. Added to TODO: o Enforce rules for setting combinations -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 359-1001 + If your life is a hard drive, | 13 Roberts Road + Christ can be your backup. | Newtown Square, Pennsylvania 19073
Christopher Kings-Lynne wrote: > >>The thing I was trying to do was use the GUC hook function to make > >>sure that the required GUC variables are also set before GUC reports > >>autovac as enabled. This seemed cleaner to me, but apparently it > >>won't work since it seems that autovac_enabled is read from GUC > >>before the stats variables, and there is no way to force the order in > >>which they are read. Am I missing something? > > Can we please have it default to enabled :) We can but without also enabling statistics it will not work. Do we want to enable both by default? -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 359-1001 + If your life is a hard drive, | 13 Roberts Road + Christ can be your backup. | Newtown Square, Pennsylvania 19073
>>Can we please have it default to enabled :) > > > We can but without also enabling statistics it will not work. Do we > want to enable both by default? Weeell...it just seemed to me that we won't cut down on the support mails unless it's on by default... I mean at some point in the future, we WILL have to have it on by default, surely? Chris
Christopher Kings-Lynne wrote: > >>Can we please have it default to enabled :) > > > > > > We can but without also enabling statistics it will not work. Do we > > want to enable both by default? > > Weeell...it just seemed to me that we won't cut down on the support > mails unless it's on by default... I mean at some point in the future, > we WILL have to have it on by default, surely? Yes, I would think it would become default enabled someday. Should we wait one release or do it for 7.5? -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 359-1001 + If your life is a hard drive, | 13 Roberts Road + Christ can be your backup. | Newtown Square, Pennsylvania 19073
"Matthew T. O'Connor" <matthew@zeut.net> writes: > What I wanted to do was have GUC prevent autovacuum = true > when the stats collector is not enabled, reguardless of what is in > postgresql.conf. GUC can't do that, but I think you can just hack the boolean variable during startup --- compare what pgstat.c does with enable_stats_collector. (Not very clean, but I think it is okay within the context of postmaster start.) regards, tom lane
> Exactly. However, I can work around the GUC ordering issue just by > having the postmaster check all the required variables before launching > autovac. What I wanted to do was have GUC prevent autovacuum = true > when the stats collector is not enabled, reguardless of what is in > postgresql.conf. If the user has requested autovacuum, then why not just enable it? The user shouldn't need to know that autovacuum is implemented via widget X -- enable widget X for the user. Are you also going to complain about a default statement_timeout that doesn't allow vacuum to finish, or will you work around it by setting statement_timeout = 0 on your connection?
Bruce Momjian <pgman@candle.pha.pa.us> writes: > Yes, I would think it would become default enabled someday. Should we > wait one release or do it for 7.5? I think it's a bit premature to have it on by default, seeing that it's still far from being in final form. In my mind it's still a pretty experimental feature. For one thing, I would expect the final form to work, to some extent, with or without stats; and therefore the current thrashing about forcing it off without stats is just wasted effort. regards, tom lane