Bringing some sanity to RestoreGUCState() - Mailing list pgsql-hackers
From | Tom Lane |
---|---|
Subject | Bringing some sanity to RestoreGUCState() |
Date | |
Msg-id | 4105247.1616174862@sss.pgh.pa.us Whole thread Raw |
Responses |
Re: Bringing some sanity to RestoreGUCState()
|
List | pgsql-hackers |
In the thread about valgrind leak detection [1], we noticed that RestoreGUCState(), which is intended to load the leader process's GUC settings into a parallel worker, was causing visible memory leaks by invoking InitializeOneGUCOption() on already-set-up GUCs. I noted that simply removing that call made the leaks go away with no obvious ill effects, but didn't stop to look closer. I've now looked closer, and I see that the reason that removing that call has no ill effects is in fact that it's a complete no-op. Every GUC that this chooses to target: if (!can_skip_gucvar(guc_variables[i])) InitializeOneGUCOption(guc_variables[i]); is one that the leader backend will send a value for, so that the reinitialized value will certainly be overwritten in the next loop. (Actually, the set of forcibly-reinited GUCs is a strict subset of those that the leader will send, since GUCs that have source PGC_S_DEFAULT in the newly-started worker might have other sources in the leader.) I wonder whether the intent was to do the negation of this test, ie reset GUCs that the leader *isn't* going to send. But that's very obviously the wrong thing, because it would lose the values of (at least) PGC_POSTMASTER variables. So we can remove the code that does this, and I intend to go do so. However, given the unmistakable evidence of sloppy thinking here, I looked closer at exactly what can_skip_gucvar() is doing, and I think we're either sending too much or too little. The argument for "sending too little" comes from the race condition that's described in the function's comments: a variable that has source PGC_S_DEFAULT (ie, has never moved off its compiled-in default) in the leader could have just been updated in the postmaster, due to re-reading postgresql.conf after SIGHUP. In that case, when the postmaster forks the worker it will inherit the new setting from postgresql.conf, and will run with that because the leader didn't send its value. So we risk having a situation where parallel workers are using a setting that the leader won't adopt until it next goes idle. Now, this shouldn't cause any really fundamental problems (if it could, the variable shouldn't have been marked as safe to change at SIGHUP). But you could imagine some minor query weirdness being traceable to that. I think that the authors of this code judged that the cost of sending default GUC values was more than preventing such weirdness is worth, and I can see the point. Neglecting the PGC_POSTMASTER and PGC_INTERNAL variables, which seem safe to not send, I see this in a regression test install: =# select source,count(*) from pg_settings where context not in ('postmaster', 'internal') group by 1; source | count ----------------------+------- client | 2 environment variable | 1 configuration file | 6 default | 246 database | 6 override | 3 command line | 1 (7 rows) Sending 265 values to a new parallel worker instead of 19 would be a pretty large cost to avoid a race condition that probably wouldn't have significant ill effects anyway. However, if you are willing to accept that tradeoff, then this code is leaving a lot on the table, because there is no more reason for it to send values with any of these sources than there is to send PGC_S_DEFAULT ones: PGC_S_DYNAMIC_DEFAULT, /* default computed during initialization */ PGC_S_ENV_VAR, /* postmaster environment variable */ PGC_S_FILE, /* postgresql.conf */ PGC_S_ARGV, /* postmaster command line */ PGC_S_GLOBAL, /* global in-database setting */ PGC_S_DATABASE, /* per-database setting */ PGC_S_USER, /* per-user setting */ PGC_S_DATABASE_USER, /* per-user-and-database setting */ The new worker will have absorbed all such values already during its regular InitPostgres() call. I suppose there's an argument to be made that skipping such values widens the scope of the race hazard a bit, since a GUC that the DBA has already chosen to move off of default (or set via ALTER USER/ DATABASE) might be one she's more likely to change later. But that argument seems pretty tissue-thin to me. In short, I think we really only need to transmit GUCs with sources of PGC_S_CLIENT and higher. In the regression environment that would cut us down from sending 19 values to sending 5. In production the win would likely be substantially more, since it's more likely that the DBA would have tweaked more things in postgresql.conf; which are variables we are sending today and don't have to. Thoughts? regards, tom lane [1] https://www.postgresql.org/message-id/flat/3471359.1615937770%40sss.pgh.pa.us
pgsql-hackers by date: