Re: Switch to multi-inserts for pg_depend - Mailing list pgsql-hackers

From Andres Freund
Subject Re: Switch to multi-inserts for pg_depend
Date
Msg-id 20200811003221.iuawregryk6z2wjd@alap3.anarazel.de
Whole thread Raw
In response to Switch to multi-inserts for pg_depend  (Michael Paquier <michael@paquier.xyz>)
Responses Re: Switch to multi-inserts for pg_depend  (Michael Paquier <michael@paquier.xyz>)
List pgsql-hackers
Hi,

On 2020-08-07 15:16:19 +0900, Michael Paquier wrote:
> From cd117fa88938c89ac953a5e3c036828337150b07 Mon Sep 17 00:00:00 2001
> From: Michael Paquier <michael@paquier.xyz>
> Date: Fri, 7 Aug 2020 10:57:40 +0900
> Subject: [PATCH 1/2] Use multi-inserts for pg_depend
> 
> This is a follow-up of the work done in e3931d01.  This case is a bit
> different than pg_attribute and pg_shdepend: the maximum number of items
> to insert is known in advance, but there is no need to handle pinned
> dependencies.  Hence, the base allocation for slots is done based on the
> number of items and the maximum allowed with a cap at 64kB, and items
> are initialized once used to minimize the overhead of the operation.
> 
> Author: Daniel Gustafsson, Michael Paquier
> Discussion: https://postgr.es/m/XXX
> ---
>  src/backend/catalog/pg_depend.c | 95 ++++++++++++++++++++++++---------
>  1 file changed, 69 insertions(+), 26 deletions(-)
> 
> diff --git a/src/backend/catalog/pg_depend.c b/src/backend/catalog/pg_depend.c
> index 70baf03178..596f0c5e29 100644
> --- a/src/backend/catalog/pg_depend.c
> +++ b/src/backend/catalog/pg_depend.c
> @@ -47,6 +47,12 @@ recordDependencyOn(const ObjectAddress *depender,
>      recordMultipleDependencies(depender, referenced, 1, behavior);
>  }
>  
> +/*
> + * Cap the maximum amount of bytes allocated for recordMultipleDependencies()
> + * slots.
> + */
> +#define MAX_PGDEPEND_INSERT_BYTES    65535
> +

Do we really want to end up with several separate defines for different
type of catalog batch inserts? That doesn't seem like a good
thing. Think there should be a single define for all catalog bulk
inserts.


>  /*
>   * Record multiple dependencies (of the same kind) for a single dependent
>   * object.  This has a little less overhead than recording each separately.
> @@ -59,10 +65,10 @@ recordMultipleDependencies(const ObjectAddress *depender,
>  {
>      Relation    dependDesc;
>      CatalogIndexState indstate;
> -    HeapTuple    tup;
> -    int            i;
> -    bool        nulls[Natts_pg_depend];
> -    Datum        values[Natts_pg_depend];
> +    int            slotCount, i;
> +    TupleTableSlot **slot;
> +    int            nslots, max_slots;
> +    bool        slot_init = true;
>  
>      if (nreferenced <= 0)
>          return;                    /* nothing to do */
> @@ -76,11 +82,18 @@ recordMultipleDependencies(const ObjectAddress *depender,
>  
>      dependDesc = table_open(DependRelationId, RowExclusiveLock);
>  
> +    /*
> +     * Allocate the slots to use, but delay initialization until we know that
> +     * they will be used.
> +     */
> +    max_slots = Min(nreferenced,
> +                    MAX_PGDEPEND_INSERT_BYTES / sizeof(FormData_pg_depend));
> +    slot = palloc(sizeof(TupleTableSlot *) * max_slots);
> +
>      /* Don't open indexes unless we need to make an update */
>      indstate = NULL;
>  
> -    memset(nulls, false, sizeof(nulls));
> -
> +    slotCount = 0;
>      for (i = 0; i < nreferenced; i++, referenced++)
>      {
>          /*
> @@ -88,38 +101,68 @@ recordMultipleDependencies(const ObjectAddress *depender,
>           * need to record dependencies on it.  This saves lots of space in
>           * pg_depend, so it's worth the time taken to check.
>           */
> -        if (!isObjectPinned(referenced, dependDesc))
> +        if (isObjectPinned(referenced, dependDesc))
> +            continue;
> +

Hm, would it be better to first iterate over the dependencies, compute
the number of dependencies to be inserted, and then go ahead and create
the right number of slots?


> From fcc0a11e9fc94d2fedc71dd10ba2a23713225963 Mon Sep 17 00:00:00 2001
> From: Michael Paquier <michael@paquier.xyz>
> Date: Fri, 7 Aug 2020 15:14:51 +0900
> Subject: [PATCH 2/2] Switch to multi-insert dependencies for many code paths
> 
> This makes use of the new APIs to insert dependencies in groups, instead
> of doing the operation one-by-one.

Seems several places have been modified to new APIs despite only
covering a single dependency. Perhaps worth mentioning?

Greetings,

Andres Freund



pgsql-hackers by date:

Previous
From: Andres Freund
Date:
Subject: Re: Hybrid Hash/Nested Loop joins and caching results from subplans
Next
From: Andres Freund
Date:
Subject: Re: walsender waiting_for_ping spuriously set