Re: Avoid overhead open-close indexes (catalog updates) - Mailing list pgsql-hackers

From Michael Paquier
Subject Re: Avoid overhead open-close indexes (catalog updates)
Date
Msg-id Y3Lj9rnamMG3fugy@paquier.xyz
Whole thread Raw
In response to Re: Avoid overhead open-close indexes (catalog updates)  (Ranier Vilela <ranier.vf@gmail.com>)
Responses Re: Avoid overhead open-close indexes (catalog updates)
List pgsql-hackers
On Sat, Nov 12, 2022 at 11:03:46AM -0300, Ranier Vilela wrote:
> I think complexity doesn't pay off.
> For example, CopyStatistics not knowing how many tuples will be processed.
> IMHO, this step is right now.
> CatalogTupleInsertWithInfo offers considerable improvement without
> introducing bugs and maintenance issues.

Considerable may be a bit an overstatement?  I can see a difference in
profiles when switching from one to the other in some extreme cases,
but for the REINDEX CONCURRENTLY case most of the runtime is going to
be eaten in the wait phases, the index build and its validation.

Anyway, multi-inserts are going to be solution better than
CatalogTupleInsertWithInfo() in some cases, because we would just
generate one WAL record of N inserts rather than N records with one
INSERT each.

Looking closely, EnumValuesCreate() is a DDL path but I'd like to
think that two enum values are at least present at creation in most
cases.  AddRoleMems() becomes relevant when using more than one role,
which is a less common pattern, so I'd be fine with switching to a
single index-opening approach with CatalogTupleUpdateWithInfo() as you
suggest without the tuple slot management.  CopyStatistics() does not
know in advance the number of tuples it would insert, and it would be
a gain when there are more than 2 expressions with entries in
pg_statistic as of HEAD.  Perhaps you're right with your simple
suggestion to stick with CatalogTupleUpdateWithInfo() in this case.
Maybe there is some external code calling this routine for tables, who
knows.

update_attstats() is actually an area that cannot be changed now that
I look at it, as we could finish to update some entries, so the slot
approach will not be relevant, but using CatalogTupleUpdateWithInfo()
is.  (As a matter of fact, the regression test suite is reporting that
update_attstats() is called for one attribute 10% of the time, did not
check the insert/update rate though).

Would you like to give a try with the tuple slot management in
EnumValuesCreate()?
--
Michael

Attachment

pgsql-hackers by date:

Previous
From: Jeff Davis
Date:
Subject: Re: Collation version tracking for macOS
Next
From: Richard Guo
Date:
Subject: Re: A new strategy for pull-up correlated ANY_SUBLINK