Re: autovauum integration patch: Attempt #4 - Mailing list pgsql-patches
From | Tom Lane |
---|---|
Subject | Re: autovauum integration patch: Attempt #4 |
Date | |
Msg-id | 16462.1091489171@sss.pgh.pa.us Whole thread Raw |
In response to | Re: autovauum integration patch: Attempt #4 ("Matthew T. O'Connor" <matthew@zeut.net>) |
Responses |
Re: autovauum integration patch: Attempt #4
Re: autovauum integration patch: Attempt #4 |
List | pgsql-patches |
"Matthew T. O'Connor" <matthew@zeut.net> writes: > Please apply to CVS or tell me what I need to change to get it applied. I looked over this patch (sorry for the delay), and found a number of problems. Bigger problems: * I don't think you've thought through system shutdown at all. The postmaster code as given will not try to shutdown the autovac daemon until the same time it shuts down the bgwriter, which is no good for a couple of reasons: (a) None of this code will trigger as long as there is any ordinary backend running; like say the one(s) invoked by autovac itself. This means that autovac-driven vacuuming is considered just as good a reason to keep the system going as an ordinary user query, which I think is not cool. IMHO a SIGTERM to the postmaster should result in canceling whatever autovacuum operation is currently running. (b) It's really not cool that autovac isn't shut down *before* we start shutting down bgwriter. I think that it might not make too much difference right at the moment, since the autovac daemon isn't actually making any use of its connection to shared memory, but the moment that autovac tries to do anything on its own behalf rather than via a backend this is going to be a serious risk. There can't be anything going on in parallel with the shutdown checkpoint. I think really we want autovac to shut down at the beginning of the shutdown cycle, not the end, and not to start bgwriter shutdown until autovac is gone. * I hadn't quite focused before on the fact that this patch requires statically binding frontend libpq into the backend. There are a number of issues with that, the most risky being that if libpq.so is compiled thread-aware then it's going to create problems on platforms where there's a difference between thread-aware and non-thread-aware C library code. Even without threading worries there are conflicts: if someone calls a dllist.c routine, which instance will they get? I'm also concerned about the implications for modules like contrib/dblink, which expect to load libpq.so dynamically. There could be conflicts between the dynamically linked libpq and the inbuilt one. * AFAICS there is no provision for setting pg_user or pg_user_password, which means that the daemon won't actually be able to connect in non-TRUST environments. I don't know what we do about this: putting such a password in a GUC variable is right out (because any user could read it) and putting it on the postmaster command line is no better (anyone on the same machine can see it). Right at the moment we do not have any secure place for postmaster parameters. Smaller problems: * It still contains too much code copied-and-pasted from bgwriter, such as ShutdownXLOG(0, 0); DumpFreeSpaceMap(0, 0); autovac has *no* business doing that. I don't have time to go through it line-by-line to see what else shouldn't have been copied. * The patch reverts some recent changes in postmaster.c (write_stderr changes for instance). * Emitting this warning on every single iteration of the postmaster idle loop is excessive: elog(WARNING, "pg_autovacuum: autovac is enabled, but requires stats_row_level which is not enabled"); and this one even more so: if (!autovacuum_start_daemon) elog(DEBUG1, "pg_autovacuum: not enabled"); * Any message that's actually likely to be seen by users should be an ereport, not elog. elog is okay for debugging messages and can't-happen errors, but not for messages that will be seen by ordinary users, because there's no support for message translation in elog. * I don't think you've actually thought very carefully about elog-based error handling; for instance this bit: dbs->conn = db_connect(dbs); if (dbs->conn == NULL) { /* Serious problem: We can't connect to * template1 */ elog(ERROR, "pg_autovacuum: Cannot connect to template1, exiting."); exit(1); } Control won't make it to the exit(1) (which is a darn good thing in itself, seeing that you are connected to shared memory and hence had better be using proc_exit()). What *will* happen in the code as given is that control will bounce back to the sigsetjmp, which will recover and re-call AutoVacLoop, which would be okay except you just forgot any open backend connections you have (plus leaked all the memory in your datastructures). Maybe it would be better if you did not have an error catcher but just aborted the process on any serious error, letting the postmaster start a new one. (This ties into your FIXME note about how the postmaster should react to autovac crashes...) * autovacuum.h exports way too much stuff that is not relevant to other modules. (Rule #1: you don't declare static functions in a header file; these *will* provoke warnings.) I'm not sure what we do now. I can't apply this in its current state, and I do not have time to fix it. I don't really want to push it in and assume we can fix the problems during beta ... regards, tom lane
pgsql-patches by date: