Re: Introduce new multi insert Table AM and improve performance of various SQL commands with it for Heap AM - Mailing list pgsql-hackers

From Daniil Davydov
Subject Re: Introduce new multi insert Table AM and improve performance of various SQL commands with it for Heap AM
Date
Msg-id CAJDiXgiD4G0rBwWk6=0TTy9AYHi4LmPL6AS5VpMLf8OEbAxEhw@mail.gmail.com
Whole thread Raw
In response to Re: Introduce new multi insert Table AM and improve performance of various SQL commands with it for Heap AM  (Jingtang Zhang <mrdrivingduck@gmail.com>)
List pgsql-hackers
Hi,

On Sun, Apr 6, 2025 at 8:55 PM Jingtang Zhang <mrdrivingduck@gmail.com> wrote:
>
> It was quite a while since I last looked at the patch. I've tested it again,
> and still get regression on patched version where a table has many columns.
> And it is totally CPU-bounded on tts_virtual_copyslot.
>
> Unpatched version:
> 1 col:
> Time: 8909.714 ms (00:08.910)
> Time: 8803.579 ms (00:08.804)
> Time: 8600.415 ms (00:08.600)
> 32 cols:
> Time: 12911.699 ms (00:12.912)
> Time: 13543.491 ms (00:13.543)
> Time: 13325.368 ms (00:13.325)
>
> Patched version:
> 1 col:
> Time: 3532.841 ms (00:03.533)
> Time: 3598.223 ms (00:03.598)
> Time: 3515.858 ms (00:03.516)
> 32 cols:
> Time: 35647.724 ms (00:35.648)
> Time: 35596.233 ms (00:35.596)
> Time: 35669.106 ms (00:35.669)
>

Hm, maybe I didn't choose the best way to measure performance. Can you
please share how you do it?

> I've tested your patch with tuplestore and found the regression does not exist
> anymore, but I haven't look deep inside it.
>
> Patched version (with tuplestore):
> 1 col:
> Time: 3500.502 ms (00:03.501)
> Time: 3486.886 ms (00:03.487)
> Time: 3514.233 ms (00:03.514)
> 32 cols:
> Time: 10375.391 ms (00:10.375)
> Time: 10248.256 ms (00:10.248)
> Time: 10248.289 ms (00:10.248)
>
> It seems to be a good idea if there is no other issue with your patch.

As far as I understand, the use of multi inserts for queries like
"INSERT INTO ... SELECT FROM" is not discussed here anymore due to the
fact that in such cases we will have to take into account the volatile
functions and ROW triggers.
I've been thinking about this for a while and made a patch as an
experiment. The principles that the patch works on are listed below.
1)
Since performance decreases for single INSERTs (within a multi inserts
mechanism), I designed this feature as an option for the table. Thus,
if the user knows that he will perform a lot of inserts on the table,
he can specify "WITH (append_optimized=true)".
2)
The availability of volatile functions is monitored during the
construction of a subtree for a ModifyTable node. I'm not that
familiar with the query plan construction mechanism, but it seems to
me that this way we can track any occurrence of volatile functions.
Of course, most volatile functions don't stop us from using multi
inserts, but checking each such function would take a very long time,
so the very fact of having a volatile function is enough for us to
abandon multi-inserts.
3)
Default expressions of the target table are also checked for volatile
functions. The same rules apply to them as in (2). As an exception, I
allowed the use of SERIAL in the column data type, since this is a
fairly common use case.
4)
If the target table contains any ROW triggers, we don't use multi insert.
5)
Patch also contains a regression test. This is a "sandbox" where you
can do some experiments with append-optimized tables.

I hope that patch (targeted on 'master' branch,
2c7bd2ba507e273f2d7fe1b2f6d30775ed4f3c09) will be useful for this
thread.

--
Best regards,
Daniil Davydov

Attachment

pgsql-hackers by date:

Previous
From: Ashutosh Bapat
Date:
Subject: Re: Changing shared_buffers without restart
Next
From: Kyotaro Horiguchi
Date:
Subject: Unquoted file name in an error message