Re: Modernizing our GUC infrastructure - Mailing list pgsql-hackers
From | Junwang Zhao |
---|---|
Subject | Re: Modernizing our GUC infrastructure |
Date | |
Msg-id | CAEG8a3+Qf+iaqdtZoLqaLv7i1HYtR0ofn_6vhWDg4ciVhrJvLA@mail.gmail.com Whole thread Raw |
In response to | Modernizing our GUC infrastructure (Tom Lane <tgl@sss.pgh.pa.us>) |
Responses |
Re: Modernizing our GUC infrastructure
|
List | pgsql-hackers |
Hi Tom, @@ -5836,74 +5865,106 @@ build_guc_variables(void) } /* - * Create table with 20% slack + * Create hash table with 20% slack */ size_vars = num_vars + num_vars / 4; Should we change 20% to 25%, I thought that might be a typo. On Tue, Sep 6, 2022 at 6:28 AM Tom Lane <tgl@sss.pgh.pa.us> wrote: > > Attached is a patch series that attempts to modernize our GUC > infrastructure, in particular removing the performance bottlenecks > it has when there are lots of GUC variables. I wrote this because > I am starting to question the schema-variables patch [1] --- that's > getting to be quite a large patch and I grow less and less sure > that it's solving a problem our users want solved. I think what > people actually want is better support of the existing mechanism > for ad-hoc session variables, namely abusing custom GUCs for that > purpose. One of the big reasons we have been resistant to formally > supporting that is fear of the poor performance guc.c would have > with lots of variables. But we already have quite a lot of them: > > regression=# select count(*) from pg_settings; > count > ------- > 353 > (1 row) > > and more are getting added all the time. I think this patch series > could likely be justified just in terms of positive effect on core > performance, never mind user-added GUCs. > > 0001 and 0002 below are concerned with converting guc.c to store its > data in a dedicated memory context (GUCMemoryContext) instead of using > raw malloc(). This is not directly a performance improvement, and > I'm prepared to drop the idea if there's a lot of pushback, but I > think it would be a good thing to do. The only hard reason for using > malloc() there was the lack of ability to avoid throwing elog(ERROR) > on out-of-memory in palloc(). But mcxt.c grew that ability years ago. > Switching to a dedicated context would greatly improve visibility and > accountability of GUC-related allocations. Also, the 0003 patch will > switch guc.c to relying on a palloc-based hashtable, and it seems a > bit silly to have part of the data structure in palloc and part in > malloc. However 0002 is a bit invasive, in that it forces code > changes in GUC check callbacks, if they want to reallocate the new > value or create an "extra" data structure. My feeling is that not > enough external modules use those facilities for this to pose a big > problem. However, the ones that are subject to it will have a > non-fun time tracking down why their code is crashing. (The recent > context-header changes mean that you get a very obscure failure when > trying to pfree() a malloc'd chunk -- for me, that typically ends > in an assertion failure in generation.c. Can we make that less > confusing?) > > 0003 replaces guc.c's bsearch-a-sorted-array lookup infrastructure > with a dynahash hash table. (I also looked at simplehash, but it > has no support for not elog'ing on OOM, and it increases the size > of guc.o by 10KB or so.) This fixes the worse-than-O(N^2) time > needed to create N new GUCs, as in > > do $$ > begin > for i in 1..10000 loop > perform set_config('foo.bar' || i::text, i::text, false); > end loop; > end $$; > > On my machine, this takes about 4700 ms in HEAD, versus 23 ms > with this patch. However, the places that were linearly scanning > the array now need to use hash_seq_search, so some other things > like transaction shutdown (AtEOXact_GUC) get slower. > > To address that, 0004 adds some auxiliary lists that link together > just the variables that are interesting for specific purposes. > This is helpful even without considering the possibility of a > lot of user-added GUCs: in a typical session, for example, not > many of those 353 GUCs have non-default values, and even fewer > get modified in any one transaction (typically, anyway). > > As an example of the speedup from 0004, these DO loops: > > create or replace function func_with_set(int) returns int > strict immutable language plpgsql as > $$ begin return $1; end $$ > set enable_seqscan = false; > > do $$ > begin > for i in 1..100000 loop > perform func_with_set(i); > end loop; > end $$; > > do $$ > begin > for i in 1..100000 loop > begin > perform func_with_set(i); > exception when others then raise; > end; > end loop; > end $$; > > take about 260 and 320 ms respectively for me, in HEAD with > just the stock set of variables. But after creating 10000 > GUCs with the previous DO loop, they're up to about 3200 ms. > 0004 brings that back down to being indistinguishable from the > speed with few GUCs. > > So I think this is good cleanup in its own right, plus it > removes one major objection to considering user-defined GUCs > as a supported feature. > > regards, tom lane > > [1] https://www.postgresql.org/message-id/flat/CAFj8pRD053CY_N4%3D6SvPe7ke6xPbh%3DK50LUAOwjC3jm8Me9Obg%40mail.gmail.com > -- Regards Junwang Zhao
pgsql-hackers by date: