Re: Modernizing our GUC infrastructure - Mailing list pgsql-hackers

From Junwang Zhao
Subject Re: Modernizing our GUC infrastructure
Date
Msg-id CAEG8a3+Qf+iaqdtZoLqaLv7i1HYtR0ofn_6vhWDg4ciVhrJvLA@mail.gmail.com
Whole thread Raw
In response to Modernizing our GUC infrastructure  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: Modernizing our GUC infrastructure
List pgsql-hackers
Hi Tom,

@@ -5836,74 +5865,106 @@ build_guc_variables(void)
  }

  /*
- * Create table with 20% slack
+ * Create hash table with 20% slack
  */
  size_vars = num_vars + num_vars / 4;

Should we change 20% to 25%, I thought that might be
a typo.

On Tue, Sep 6, 2022 at 6:28 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
>
> Attached is a patch series that attempts to modernize our GUC
> infrastructure, in particular removing the performance bottlenecks
> it has when there are lots of GUC variables.  I wrote this because
> I am starting to question the schema-variables patch [1] --- that's
> getting to be quite a large patch and I grow less and less sure
> that it's solving a problem our users want solved.  I think what
> people actually want is better support of the existing mechanism
> for ad-hoc session variables, namely abusing custom GUCs for that
> purpose.  One of the big reasons we have been resistant to formally
> supporting that is fear of the poor performance guc.c would have
> with lots of variables.  But we already have quite a lot of them:
>
> regression=# select count(*) from pg_settings;
>  count
> -------
>    353
> (1 row)
>
> and more are getting added all the time.  I think this patch series
> could likely be justified just in terms of positive effect on core
> performance, never mind user-added GUCs.
>
> 0001 and 0002 below are concerned with converting guc.c to store its
> data in a dedicated memory context (GUCMemoryContext) instead of using
> raw malloc().  This is not directly a performance improvement, and
> I'm prepared to drop the idea if there's a lot of pushback, but I
> think it would be a good thing to do.  The only hard reason for using
> malloc() there was the lack of ability to avoid throwing elog(ERROR)
> on out-of-memory in palloc().  But mcxt.c grew that ability years ago.
> Switching to a dedicated context would greatly improve visibility and
> accountability of GUC-related allocations.  Also, the 0003 patch will
> switch guc.c to relying on a palloc-based hashtable, and it seems a
> bit silly to have part of the data structure in palloc and part in
> malloc.  However 0002 is a bit invasive, in that it forces code
> changes in GUC check callbacks, if they want to reallocate the new
> value or create an "extra" data structure.  My feeling is that not
> enough external modules use those facilities for this to pose a big
> problem.  However, the ones that are subject to it will have a
> non-fun time tracking down why their code is crashing.  (The recent
> context-header changes mean that you get a very obscure failure when
> trying to pfree() a malloc'd chunk -- for me, that typically ends
> in an assertion failure in generation.c.  Can we make that less
> confusing?)
>
> 0003 replaces guc.c's bsearch-a-sorted-array lookup infrastructure
> with a dynahash hash table.  (I also looked at simplehash, but it
> has no support for not elog'ing on OOM, and it increases the size
> of guc.o by 10KB or so.)  This fixes the worse-than-O(N^2) time
> needed to create N new GUCs, as in
>
> do $$
> begin
>   for i in 1..10000 loop
>     perform set_config('foo.bar' || i::text, i::text, false);
>   end loop;
> end $$;
>
> On my machine, this takes about 4700 ms in HEAD, versus 23 ms
> with this patch.  However, the places that were linearly scanning
> the array now need to use hash_seq_search, so some other things
> like transaction shutdown (AtEOXact_GUC) get slower.
>
> To address that, 0004 adds some auxiliary lists that link together
> just the variables that are interesting for specific purposes.
> This is helpful even without considering the possibility of a
> lot of user-added GUCs: in a typical session, for example, not
> many of those 353 GUCs have non-default values, and even fewer
> get modified in any one transaction (typically, anyway).
>
> As an example of the speedup from 0004, these DO loops:
>
> create or replace function func_with_set(int) returns int
> strict immutable language plpgsql as
> $$ begin return $1; end $$
> set enable_seqscan = false;
>
> do $$
> begin
>   for i in 1..100000 loop
>     perform func_with_set(i);
>   end loop;
> end $$;
>
> do $$
> begin
>   for i in 1..100000 loop
>     begin
>       perform func_with_set(i);
>     exception when others then raise;
>     end;
>   end loop;
> end $$;
>
> take about 260 and 320 ms respectively for me, in HEAD with
> just the stock set of variables.  But after creating 10000
> GUCs with the previous DO loop, they're up to about 3200 ms.
> 0004 brings that back down to being indistinguishable from the
> speed with few GUCs.
>
> So I think this is good cleanup in its own right, plus it
> removes one major objection to considering user-defined GUCs
> as a supported feature.
>
>                         regards, tom lane
>
> [1]
https://www.postgresql.org/message-id/flat/CAFj8pRD053CY_N4%3D6SvPe7ke6xPbh%3DK50LUAOwjC3jm8Me9Obg%40mail.gmail.com
>


-- 
Regards
Junwang Zhao



pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: Re: Reducing the chunk header sizes on all memory context types
Next
From: Tom Lane
Date:
Subject: Re: Modernizing our GUC infrastructure