Thread: autovacuum launcher using InitPostgres
Hi, This seems to be the last holdup for flatfiles.c. This patch removes usage of that code in autovacuum launcher, instead making it go through the most basic relcache initialization so that it is able to read pg_database. To this end, InitPostgres has been split in two. The launcher only calls the first half, the rest of the callers have been patched to invoke the second half. The only thing I'm aware is missing from this patch is fixing up avlauncher's signal handling, and adding a bit more commentary; also I haven't tested it under EXEC_BACKEND yet. Comments? -- Alvaro Herrera http://www.CommandPrompt.com/ PostgreSQL Replication, Consulting, Custom Development, 24x7 support
Attachment
Alvaro Herrera <alvherre@commandprompt.com> writes: > To this end, InitPostgres has been split in two. The launcher only > calls the first half, the rest of the callers have been patched to > invoke the second half. This just seems truly messy :-(. Let me see if I can find something cleaner. BTW, is it *really* the case that the AV launcher won't need RecentGlobalXmin? The way the HOT stuff works, I think anything that examines heap pages at all had better have that set. regards, tom lane
Tom Lane wrote: > Alvaro Herrera <alvherre@commandprompt.com> writes: > > To this end, InitPostgres has been split in two. The launcher only > > calls the first half, the rest of the callers have been patched to > > invoke the second half. > > This just seems truly messy :-(. Let me see if I can find something > cleaner. I was considering having InitPostgres be an umbrella function, so that extant callers stay as today, but the various underlying pieces are skipped depending on who's calling. For example I didn't like the bit about starting a transaction or not depending on whether it was the launcher. > BTW, is it *really* the case that the AV launcher won't need > RecentGlobalXmin? The way the HOT stuff works, I think anything that > examines heap pages at all had better have that set. Ugh. I forgot about that. -- Alvaro Herrera http://www.CommandPrompt.com/ The PostgreSQL Company - Command Prompt, Inc.
Alvaro Herrera <alvherre@commandprompt.com> writes: > Tom Lane wrote: >> This just seems truly messy :-(. Let me see if I can find something >> cleaner. > I was considering having InitPostgres be an umbrella function, so that > extant callers stay as today, but the various underlying pieces are > skipped depending on who's calling. For example I didn't like the bit > about starting a transaction or not depending on whether it was the > launcher. Yeah. If you have InitPostgres know that much about the AV launcher's requirements, it's not clear why it shouldn't just know everything. Having it return with the initial transaction still open just seems completely horrid. >> BTW, is it *really* the case that the AV launcher won't need >> RecentGlobalXmin? The way the HOT stuff works, I think anything that >> examines heap pages at all had better have that set. > Ugh. I forgot about that. We could possibly put if (IsAutovacuumLauncher()){ CommitTransactionCommand(); return;} right after the RelationCacheInitializePhase2 call. No uglier than what's here, for sure. While I was looking at this I wondered whether RelationCacheInitializePhase2 really needs to be inside the startup transaction at all. I think it could probably be moved up before that. However, if the AV launcher has to do GetTransactionSnapshot then it's not clear that improves matters anyway. regards, tom lane
Tom Lane wrote: > Alvaro Herrera <alvherre@commandprompt.com> writes: > > Tom Lane wrote: > >> This just seems truly messy :-(. Let me see if I can find something > >> cleaner. > > > I was considering having InitPostgres be an umbrella function, so that > > extant callers stay as today, but the various underlying pieces are > > skipped depending on who's calling. For example I didn't like the bit > > about starting a transaction or not depending on whether it was the > > launcher. > > Yeah. If you have InitPostgres know that much about the AV launcher's > requirements, it's not clear why it shouldn't just know everything. > Having it return with the initial transaction still open just seems > completely horrid. How about this? > While I was looking at this I wondered whether > RelationCacheInitializePhase2 really needs to be inside the startup > transaction at all. I think it could probably be moved up before > that. However, if the AV launcher has to do GetTransactionSnapshot > then it's not clear that improves matters anyway. Well, the difference is that the initial transaction would be a few microsec shorter ... not sure if that matters. -- Alvaro Herrera http://www.CommandPrompt.com/ The PostgreSQL Company - Command Prompt, Inc.
Attachment
Tom Lane wrote: > Alvaro Herrera <alvherre@commandprompt.com> writes: >> Tom Lane wrote: >>> This just seems truly messy :-(. Let me see if I can find something >>> cleaner. I quite like the idea of splitting initialization into two phases, one that let's you access shared catalogs, and one to bind to a database. I didn't look into the details, though. >> I was considering having InitPostgres be an umbrella function, so that >> extant callers stay as today, but the various underlying pieces are >> skipped depending on who's calling. For example I didn't like the bit >> about starting a transaction or not depending on whether it was the >> launcher. > > Yeah. If you have InitPostgres know that much about the AV launcher's > requirements, it's not clear why it shouldn't just know everything. > Having it return with the initial transaction still open just seems > completely horrid. Yeah, that sounds messy. Can AV launcher simply open a 2nd initial transaction? -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com
Heikki Linnakangas wrote: > Tom Lane wrote: > > Alvaro Herrera <alvherre@commandprompt.com> writes: > >> Tom Lane wrote: > >>> This just seems truly messy :-(. Let me see if I can find something > >>> cleaner. > > I quite like the idea of splitting initialization into two phases, one > that let's you access shared catalogs, and one to bind to a database. I > didn't look into the details, though. The problem is that it only gives you access to pg_database, because the other shared catalogs require more relcache initialization than we do. So I'm not sure it'd be useful for anything else. > >> I was considering having InitPostgres be an umbrella function, so that > >> extant callers stay as today, but the various underlying pieces are > >> skipped depending on who's calling. For example I didn't like the bit > >> about starting a transaction or not depending on whether it was the > >> launcher. > > > > Yeah. If you have InitPostgres know that much about the AV launcher's > > requirements, it's not clear why it shouldn't just know everything. > > Having it return with the initial transaction still open just seems > > completely horrid. > > Yeah, that sounds messy. Can AV launcher simply open a 2nd initial > transaction? The main body of avlauncher opens a new transaction whenever it needs one. The problem is that the transaction in InitPostgres is closed in the second half -- the code I had was skipping StartTransactionCommand in the launcher case, but as Tom says that was wrong because it was failing to set RecentGlobalXmin. If we're looking at simplifing InitPostgres, one thing we could do is separate the part for the bootstrap process, which is like 10% of the work for a regular backend ... -- Alvaro Herrera http://www.CommandPrompt.com/ PostgreSQL Replication, Consulting, Custom Development, 24x7 support
Alvaro Herrera <alvherre@commandprompt.com> writes: > How about this? I think the accounting for the AV launcher in shmem allocation is a bit confused yet --- for instance, isn't MaxBackends already including the launcher? I wonder if it would be cleaner to include the launcher in the autovacuum_max_workers parameter, and increase the min/default values of that by one. >> While I was looking at this I wondered whether >> RelationCacheInitializePhase2 really needs to be inside the startup >> transaction at all. I think it could probably be moved up before >> that. However, if the AV launcher has to do GetTransactionSnapshot >> then it's not clear that improves matters anyway. > Well, the difference is that the initial transaction would be a few > microsec shorter ... not sure if that matters. I can't see how it would. At the point where that runs we'd not be holding any locks of interest, so there's no concurrency benefit to be gained. With this setup it wouldn't make InitPostgres any cleaner anyway, so I'd leave it alone. regards, tom lane
Alvaro Herrera <alvherre@commandprompt.com> writes: > Heikki Linnakangas wrote: >> I quite like the idea of splitting initialization into two phases, one >> that let's you access shared catalogs, and one to bind to a database. I >> didn't look into the details, though. > The problem is that it only gives you access to pg_database, because the > other shared catalogs require more relcache initialization than we do. > So I'm not sure it'd be useful for anything else. The part I was unhappy about was falling out with the startup transaction still open (which the patch as written didn't do, but would have needed to given the snapshot issue). I don't object to trying to restructure InitPostgres along the above lines if it can be done cleanly. But on the third hand, Alvaro's right that there isn't any other obvious use for it. We've also got the client_encoding (GUC initialization timing) issue to fix in this area. I suggest that any major restructuring-for-beauty wait till after the dust settles. I think what we have here (in the revised patch) is ugly but not too ugly to live with. regards, tom lane
Tom Lane wrote: > Alvaro Herrera <alvherre@commandprompt.com> writes: > > How about this? > > I think the accounting for the AV launcher in shmem allocation is a bit > confused yet --- for instance, isn't MaxBackends already including the > launcher? I wonder if it would be cleaner to include the launcher in > the autovacuum_max_workers parameter, and increase the min/default > values of that by one. Huh, yeah, sorry about that -- fixed here. I think the name of the param, which includes "worker", precludes from raising the values. Changes between v2 and v3: diff -u src/backend/storage/lmgr/proc.c src/backend/storage/lmgr/proc.c --- src/backend/storage/lmgr/proc.c 31 Aug 2009 13:36:56 -0000 +++ src/backend/storage/lmgr/proc.c 31 Aug 2009 16:14:08 -0000 @@ -103,7 +103,7 @@ /* AuxiliaryProcs */ size = add_size(size, mul_size(NUM_AUXILIARY_PROCS, sizeof(PGPROC))); /* MyProcs, including autovacuum workers and launcher */ - size = add_size(size, mul_size(MaxBackends + 1, sizeof(PGPROC))); + size = add_size(size, mul_size(MaxBackends, sizeof(PGPROC))); /* ProcStructLock */ size = add_size(size, sizeof(slock_t)); @@ -192,6 +192,7 @@ ProcGlobal->freeProcs = &procs[i]; } + /* note: the "+1" here accounts for the autovac launcher */ procs = (PGPROC *) ShmemAlloc((autovacuum_max_workers + 1) * sizeof(PGPROC)); if (!procs) ereport(FATAL, diff -u src/backend/utils/misc/guc.c src/backend/utils/misc/guc.c --- src/backend/utils/misc/guc.c 31 Aug 2009 03:07:47 -0000 +++ src/backend/utils/misc/guc.c 31 Aug 2009 16:12:56 -0000 @@ -7570,7 +7570,7 @@ static bool assign_maxconnections(int newval, bool doit, GucSource source) { - if (newval + autovacuum_max_workers > INT_MAX / 4) + if (newval + autovacuum_max_workers + 1 > INT_MAX / 4) return false; if (doit) -- Alvaro Herrera http://www.CommandPrompt.com/ PostgreSQL Replication, Consulting, Custom Development, 24x7 support
Attachment
Alvaro Herrera <alvherre@commandprompt.com> writes: > Tom Lane wrote: >> I wonder if it would be cleaner to include the launcher in >> the autovacuum_max_workers parameter, and increase the min/default >> values of that by one. > Huh, yeah, sorry about that -- fixed here. I think the name of the > param, which includes "worker", precludes from raising the values. Well, I'm not sure the average user knows or cares about the difference between the launcher and the workers. The thing that was in the back of my mind was that we would now have the option to have the launcher show up in pg_stat_activity. If we were to do that then the case for counting it in the user-visible number-of-processes parameter would get a lot stronger (enough to justify renaming the parameter, if you insist that the launcher isn't a worker). I don't however have any strong opinion on whether we *should* include it in pg_stat_activity --- comments? In the meantime, this looks reasonably sane in a fast read-through, but I saw a few comments that could use improvement, and I have not tried to actually review it (like look for missed places to change). Do you mind if I work it over for an hour or two? regards, tom lane
Alvaro Herrera <alvherre@commandprompt.com> writes: > Tom Lane wrote: >> While I was looking at this I wondered whether >> RelationCacheInitializePhase2 really needs to be inside the startup >> transaction at all. I think it could probably be moved up before >> that. However, if the AV launcher has to do GetTransactionSnapshot >> then it's not clear that improves matters anyway. > Well, the difference is that the initial transaction would be a few > microsec shorter ... not sure if that matters. Actually, there is a better way to do this: if we move up the RelationCacheInitializePhase2 call, then we can have the AV launcher case just fall out *before* the transaction start. It can do GetTransactionSnapshot inside its own transaction that reads pg_database. This is a better solution because it'll have a more up-to-date version of RecentGlobalXmin while scanning pg_database. (Indeed, I think this might be *necessary* over the very long haul.) I think I've got the signal handling cleaned up, but need to test. Is there any really good reason why autovacuum has its own avl_quickdie instead of using quickdie() for SIGQUIT? regards, tom lane
Tom Lane wrote: > Actually, there is a better way to do this: if we move up the > RelationCacheInitializePhase2 call, then we can have the AV launcher > case just fall out *before* the transaction start. It can do > GetTransactionSnapshot inside its own transaction that reads > pg_database. This is a better solution because it'll have a more > up-to-date version of RecentGlobalXmin while scanning pg_database. > (Indeed, I think this might be *necessary* over the very long haul.) Hmm, good idea. > I think I've got the signal handling cleaned up, but need to test. > Is there any really good reason why autovacuum has its own avl_quickdie > instead of using quickdie() for SIGQUIT? No, probably I just copied it because the others were static. Not sure about quickdie() itself. -- Alvaro Herrera http://www.CommandPrompt.com/ PostgreSQL Replication, Consulting, Custom Development, 24x7 support
Tom Lane wrote: > Alvaro Herrera <alvherre@commandprompt.com> writes: > > Tom Lane wrote: > >> I wonder if it would be cleaner to include the launcher in > >> the autovacuum_max_workers parameter, and increase the min/default > >> values of that by one. > > > Huh, yeah, sorry about that -- fixed here. I think the name of the > > param, which includes "worker", precludes from raising the values. > > Well, I'm not sure the average user knows or cares about the difference > between the launcher and the workers. The thing that was in the back of > my mind was that we would now have the option to have the launcher show > up in pg_stat_activity. If we were to do that then the case for > counting it in the user-visible number-of-processes parameter would get > a lot stronger (enough to justify renaming the parameter, if you insist > that the launcher isn't a worker). I don't however have any strong > opinion on whether we *should* include it in pg_stat_activity --- > comments? The user may not care about the difference, but there's a point in having the limit be the simpler concept of "this is the maximum amount of processes running vacuum at any time". The launcher is very uninteresting to users. > In the meantime, this looks reasonably sane in a fast read-through, > but I saw a few comments that could use improvement, and I have not > tried to actually review it (like look for missed places to change). > Do you mind if I work it over for an hour or two? Please go ahead. -- Alvaro Herrera http://www.CommandPrompt.com/ The PostgreSQL Company - Command Prompt, Inc.
Alvaro Herrera <alvherre@commandprompt.com> writes: > The only thing I'm aware is missing from this patch is fixing up > avlauncher's signal handling, and adding a bit more commentary; also I > haven't tested it under EXEC_BACKEND yet. I did the signal handling work and fixed a couple of small oversights, and applied it. I'm not sure what other commentary you had in mind, but feel free to add. regards, tom lane
Alvaro Herrera <alvherre@commandprompt.com> writes: > Tom Lane wrote: >> Well, I'm not sure the average user knows or cares about the difference >> between the launcher and the workers. The thing that was in the back of >> my mind was that we would now have the option to have the launcher show >> up in pg_stat_activity. If we were to do that then the case for >> counting it in the user-visible number-of-processes parameter would get >> a lot stronger (enough to justify renaming the parameter, if you insist >> that the launcher isn't a worker). I don't however have any strong >> opinion on whether we *should* include it in pg_stat_activity --- >> comments? > The user may not care about the difference, but there's a point in > having the limit be the simpler concept of "this is the maximum amount > of processes running vacuum at any time". The launcher is very > uninteresting to users. I committed things that way, but I'm still not convinced that we shouldn't expose the launcher in pg_stat_activity. The thing that is bothering me is that it is now able to take locks and potentially could block some other process or even participate in a deadlock. Do we really want to have entries in pg_locks that don't match any entry in pg_stat_activity? regards, tom lane
Hi, Tom Lane <tgl@sss.pgh.pa.us> writes: >> The user may not care about the difference, but there's a point in >> having the limit be the simpler concept of "this is the maximum amount >> of processes running vacuum at any time". The launcher is very >> uninteresting to users. Adding to this, the launcher will not consume maintenance_work_mem whereas each worker is able to allocate that much, IIUC. > I committed things that way, but I'm still not convinced that we > shouldn't expose the launcher in pg_stat_activity. The thing that > is bothering me is that it is now able to take locks and potentially > could block some other process or even participate in a deadlock. > Do we really want to have entries in pg_locks that don't match any > entry in pg_stat_activity? Having the launcher locks show as such gets my vote too, but then I'm following on your opinion that a launcher ain't a worker and that users need to know about it. Let's keep the autovacuum_max_workers GUC naming, not counting the "there can be only one" launcher so that we're able to size maintenance_work_mem. Regards, -- dim
On Mon, Aug 31, 2009 at 21:49, Tom Lane<tgl@sss.pgh.pa.us> wrote: > Alvaro Herrera <alvherre@commandprompt.com> writes: >> Tom Lane wrote: >>> Well, I'm not sure the average user knows or cares about the difference >>> between the launcher and the workers. The thing that was in the back of >>> my mind was that we would now have the option to have the launcher show >>> up in pg_stat_activity. If we were to do that then the case for >>> counting it in the user-visible number-of-processes parameter would get >>> a lot stronger (enough to justify renaming the parameter, if you insist >>> that the launcher isn't a worker). I don't however have any strong >>> opinion on whether we *should* include it in pg_stat_activity --- >>> comments? > >> The user may not care about the difference, but there's a point in >> having the limit be the simpler concept of "this is the maximum amount >> of processes running vacuum at any time". The launcher is very >> uninteresting to users. > > I committed things that way, but I'm still not convinced that we > shouldn't expose the launcher in pg_stat_activity. The thing that > is bothering me is that it is now able to take locks and potentially > could block some other process or even participate in a deadlock. > Do we really want to have entries in pg_locks that don't match any > entry in pg_stat_activity? At first I'd have voted that we don't have it in pg_stat_activity, but the argument about pg_locks really is the killer on. IMHO, it *has* to go there then. I would also suggest that we add a separate column to pg_stat_activity indicating what type of process it is, so that monitoring tools can figure that out without having to resort to string matching on the current query field. This field would indicate if it's a backend, autovac worker, autovac launcher. And perhaps we should even add the other utility processes to pg_stat_activity, for completeness? Another thought along a similar line is some way to detect the idle and idle-in-transaction states without doing string matching as well. Perhaps this could be overloaded on said extra field, or should it have a separate one? -- Magnus HaganderMe: http://www.hagander.net/Work: http://www.redpill-linpro.com/