Re: Ought to use heap_multi_insert() for pg_attribute/depend insertions? - Mailing list pgsql-hackers

From Michael Paquier
Subject Re: Ought to use heap_multi_insert() for pg_attribute/depend insertions?
Date
Msg-id 20200729083253.GG28700@paquier.xyz
Whole thread Raw
In response to Re: Ought to use heap_multi_insert() for pg_attribute/depend insertions?  (michael@paquier.xyz)
Responses Re: Ought to use heap_multi_insert() for pg_attribute/depend insertions?
List pgsql-hackers
On Wed, Jul 01, 2020 at 06:24:18PM +0900, Michael Paquier wrote:
> I am not sure either, still it looks worth thinking about it.
> Attached is a rebased version of the last patch.  What this currently
> holds is just the switch to heap_multi_insert() for the three catalogs
> pg_attribute, pg_depend and pg_shdepend.  One point that looks worth
> debating about is to how much to cap the data inserted at once.  This
> uses 64kB for all three, with a number of slots chosen based on the
> size of each record, similarly to what we do for COPY.

I got an extra round of review done for this patch.  One spot was
missed in heap_multi_insert() for a comment telling catalogs not using
multi inserts.  After some consideration, I think that using 64kB as a
base number to calculate the number of slots should be fine, similarly
to COPY.

While on it, I have done some measurements to see the difference in
WAL produced and get an idea of the gain.  For example, this function
would create one table with a wanted number of attributes:
CREATE OR REPLACE FUNCTION create_cols(tabname text, num_cols int)
RETURNS VOID AS
$func$
DECLARE
  query text;
BEGIN
  query := 'CREATE TABLE ' || tabname || ' (';
  FOR i IN 1..num_cols LOOP
    query := query || 'a_' || i::text || ' int';
    IF i != num_cols THEN
      query := query || ', ';
    END IF;
  END LOOP;
  query := query || ')';
  EXECUTE format(query);
END
$func$ LANGUAGE plpgsql;

On HEAD, with a table that has 1300 attributes, this leads to 563kB of
WAL produced.  With the patch, we get down to 505kB.  That's an
extreme case of course, but that's nice a nice gain.

A second test, after creating a database from a template that has
roughly 10k entries in pg_shdepend (10k empty tables actually), showed
a reduction from 2158kB to 1762kB in WAL.

Finally comes the third catalog, pg_depend, and there is one thing
that makes me itching about this part.  We do a lot of useless work
for the allocation and destruction of the slots when there are pinned
dependencies, and there can be a lot of them.  Just by running the
main regression tests, it is easy to see that in 0003 we still do a
lot of calls of recordMultipleDependencies() for one single
dependency, and that most of these are actually pinned.  So we finish
by doing a lot of slot manipulation to insert nothing at the end,
contrary to the counterparts with pg_shdepend and pg_attribute.  In
short, I think that for now it would be fine to commit a patch that
does the multi-INSERT optimization for pg_attribute and pg_shdepend,
but that we need more careful work for pg_depend.  For example we
could go through all the dependencies first and recalculate the number
of slots to use depending on what is pinned or not, but this would
make sense actually when more dependencies are inserted at once in
more code paths, mostly for ALTER TABLE.  So this needs more
consideration IMO.

Thoughts?
--
Michael

Attachment

pgsql-hackers by date:

Previous
From: Amit Langote
Date:
Subject: Re: [POC] Fast COPY FROM command for the table with foreign partitions
Next
From: "Andrey V. Lepikhov"
Date:
Subject: Re: [POC] Fast COPY FROM command for the table with foreign partitions