Re: Introduce new multi insert Table AM and improve performance of various SQL commands with it for Heap AM - Mailing list pgsql-hackers

From Daniil Davydov
Subject Re: Introduce new multi insert Table AM and improve performance of various SQL commands with it for Heap AM
Date
Msg-id CAJDiXggcx+v7eKruvvBK-mpyf3Y3e8vgBJhcZwhkm4p6907edw@mail.gmail.com
Whole thread Raw
In response to Re: Introduce new multi insert Table AM and improve performance of various SQL commands with it for Heap AM  (Jingtang Zhang <mrdrivingduck@gmail.com>)
Responses Re: Introduce new multi insert Table AM and improve performance of various SQL commands with it for Heap AM
List pgsql-hackers
Hi,
A few days ago I came up with an idea to implement multi insert
optimization wherever possible. I prepared a raw patch
and it showed a great performance gain (up to 4 times during INSERT
... INTO ... in the best case).
Then I was very happy to find this thread. You did a great job and I
want to help you to bring the matter to an end.

On Thu, Oct 31, 2024 at 11:17 AM Jingtang Zhang <mrdrivingduck@gmail.com> wrote:
> I did some performance test these days, and I have some findings.
> HEAD:
>   12.29%  postgres            [.] pg_checksum_block
>    6.33%  postgres            [.] GetPrivateRefCountEntry
>    5.40%  postgres            [.] pg_comp_crc32c_sse42
>    4.54%  [kernel]            [k] copy_user_enhanced_fast_string
>    2.69%  postgres            [.] BufferIsValid
>    1.52%  postgres            [.] XLogRecordAssemble
>
> Patched:
>   11.75%  postgres            [.] tts_virtual_materialize
>    8.87%  postgres            [.] pg_checksum_block
>    8.17%  postgres            [.] slot_deform_heap_tuple
>    8.09%  postgres            [.] heap_compute_data_size
>    6.17%  postgres            [.] fill_val
>    3.81%  postgres            [.] heap_fill_tuple
>    3.37%  postgres            [.] tts_virtual_copyslot
>    2.62%  [kernel]            [k] copy_user_enhanced_fast_string

I applied v25 patches on the master branch and made some measurements
to find out what is the bottleneck in this case. The 'time' utility
showed that without a patch, this query will run 1.5 times slower. I
also made a few flamegraphs for this test. Most of the time is spent
calling
these two functions : tts_virtual_copyslot and heap_form_tuple.
All tests were run in virtual machine with these CPU characteristics:
Architecture:             x86_64
CPU(s):                   2
  On-line CPU(s) list:    0,1
Virtualization features:
  Virtualization:         AMD-V
  Hypervisor vendor:      KVM
  Virtualization type:    full
Caches (sum of all):
  L1d:                    128 KiB (2 instances)
  L1i:                    128 KiB (2 instances)
  L2:                     1 MiB (2 instances)
  L3:                     32 MiB (2 instances)
NUMA:
  NUMA node(s):           1
  NUMA node0 CPU(s):      0,1

In my implementation, I used Tuplestore functionality to store tuples.
In order to get rid of getting stuck in the above mentioned functions,
I crossed it with the current implementation (v25 patches) and got a
10% increase in performance (for the test above). I also set up v22
patches to
compare performance (with/without tuplestore) for INSERT ... INTO ...
queries (with -j 4 -c 10 parameters for pgbech), and there also was an
increase in TPS (about 3-4%).

I attach a patch that adds Tuplestore to v25. What do you think about this idea?

--
Best regards,
Daniil Davydov

Attachment

pgsql-hackers by date:

Previous
From: Jacob Brazeal
Date:
Subject: Incorrect assert in libpqwalreceiver
Next
From: vignesh C
Date:
Subject: Re: Enhance file_fdw to report processed and skipped tuples in COPY progress