Re: Introduce new multi insert Table AM and improve performance of various SQL commands with it for Heap AM - Mailing list pgsql-hackers
From | Bharath Rupireddy |
---|---|
Subject | Re: Introduce new multi insert Table AM and improve performance of various SQL commands with it for Heap AM |
Date | |
Msg-id | CALj2ACUz5+_YNEa4ZY-XG960_oXefM50MjD71VgSCAVDkF3bzQ@mail.gmail.com Whole thread Raw |
In response to | Re: Introduce new multi insert Table AM and improve performance of various SQL commands with it for Heap AM (Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com>) |
Responses |
Re: Introduce new multi insert Table AM and improve performance of various SQL commands with it for Heap AM
Re: Introduce new multi insert Table AM and improve performance of various SQL commands with it for Heap AM |
List | pgsql-hackers |
On Mon, Apr 29, 2024 at 11:36 AM Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com> wrote: > > Please see the attached v20 patch set. It looks like with the use of the new multi insert table access method (TAM) for COPY (v20-0005), pgbench regressed about 35% [1]. The reason is that the memory-based flushing decision the new TAM takes [2] differs from that of what the COPY does today with table_multi_insert. The COPY with table_multi_insert, maintains exact size of the tuples in CopyFromState after it does the line parsing. For instance, the tuple size of a table with two integer columns is 8 (4+4) bytes here. The new TAM relies on the memory occupied by the slot's memory context which holds the actual tuple as a good approximation for the tuple size. But, this memory context size also includes a tuple header, so the size here is not just 8 (4+4) bytes but more. Because of this, the buffers get flushed sooner than that of the existing COPY with table_multi_insert AM causing regression in pgbench which uses COPY extensively. The new TAM aren't designed to be able to receive tuple sizes from the callers, even if we do that, the API doesn't look generic. Here are couple of ideas to get away with this: 1. Try to get the actual tuple sizes excluding header sizes for each column in the new TAM. 2. Try not to use the new TAM for COPY in which case the table_multi_insert stays forever. 3. Try passing a flag to tell the new TAM that the caller does the flushing and no need for an internal flushing. I haven't explored the idea (1) in depth yet. If we find a way to do so, it looks to me that we are going backwards since we need to strip off headers for each column of a row for all of the rows. I suspect this would cost a bit more and may not solve the regression. With an eventual goal to get rid of table_multi_insert, (3) may not be a better choice. (3) seems reasonable to implement and reduce the regression. I did so in the attached v21 patches. A new flag TM_SKIP_INTERNAL_BUFFER_FLUSH is introduced in v21 patch, when specified, no internal flushing is done, the caller has to flush the buffered tuples using table_modify_buffer_flush(). Check the test results [3] HEAD 2.948 s, PATCHED 2.946 s. v21 also adds code to maintain tuple size for virtual tuple slots. This helps make better memory-based flushing decisions in the new TAM. Thoughts? [1] HEAD: done in 2.84 s (drop tables 0.00 s, create tables 0.01 s, client-side generate 1.99 s, vacuum 0.21 s, primary keys 0.62 s). done in 2.78 s (drop tables 0.00 s, create tables 0.01 s, client-side generate 1.88 s, vacuum 0.21 s, primary keys 0.69 s). done in 2.97 s (drop tables 0.00 s, create tables 0.01 s, client-side generate 2.07 s, vacuum 0.21 s, primary keys 0.69 s). done in 2.86 s (drop tables 0.00 s, create tables 0.01 s, client-side generate 1.96 s, vacuum 0.21 s, primary keys 0.69 s). done in 2.90 s (drop tables 0.00 s, create tables 0.01 s, client-side generate 2.05 s, vacuum 0.21 s, primary keys 0.64 s). done in 2.83 s (drop tables 0.00 s, create tables 0.01 s, client-side generate 1.96 s, vacuum 0.21 s, primary keys 0.66 s). done in 2.80 s (drop tables 0.00 s, create tables 0.01 s, client-side generate 1.95 s, vacuum 0.20 s, primary keys 0.63 s). done in 2.79 s (drop tables 0.00 s, create tables 0.01 s, client-side generate 1.89 s, vacuum 0.21 s, primary keys 0.69 s). done in 3.75 s (drop tables 0.00 s, create tables 0.01 s, client-side generate 2.17 s, vacuum 0.32 s, primary keys 1.25 s). done in 3.86 s (drop tables 0.00 s, create tables 0.08 s, client-side generate 2.97 s, vacuum 0.21 s, primary keys 0.59 s). AVG done in 2.948 s v20 PATCHED: done in 3.94 s (drop tables 0.00 s, create tables 0.01 s, client-side generate 3.12 s, vacuum 0.19 s, primary keys 0.62 s). done in 4.04 s (drop tables 0.00 s, create tables 0.01 s, client-side generate 3.22 s, vacuum 0.20 s, primary keys 0.61 s). done in 3.98 s (drop tables 0.00 s, create tables 0.01 s, client-side generate 3.16 s, vacuum 0.20 s, primary keys 0.61 s). done in 4.04 s (drop tables 0.00 s, create tables 0.01 s, client-side generate 3.16 s, vacuum 0.20 s, primary keys 0.67 s). done in 3.98 s (drop tables 0.00 s, create tables 0.01 s, client-side generate 3.16 s, vacuum 0.20 s, primary keys 0.61 s). done in 4.00 s (drop tables 0.00 s, create tables 0.01 s, client-side generate 3.17 s, vacuum 0.20 s, primary keys 0.63 s). done in 4.43 s (drop tables 0.00 s, create tables 0.01 s, client-side generate 3.24 s, vacuum 0.21 s, primary keys 0.98 s). done in 4.16 s (drop tables 0.00 s, create tables 0.01 s, client-side generate 3.36 s, vacuum 0.20 s, primary keys 0.59 s). done in 3.62 s (drop tables 0.00 s, create tables 0.01 s, client-side generate 2.83 s, vacuum 0.20 s, primary keys 0.58 s). done in 3.67 s (drop tables 0.00 s, create tables 0.01 s, client-side generate 2.84 s, vacuum 0.21 s, primary keys 0.61 s). AVG done in 3.986 s [2] + /* + * Memory allocated for the whole tuple is in slot's memory context, so + * use it keep track of the total space occupied by all buffered tuples. + */ + if (TTS_SHOULDFREE(slot)) + mistate->cur_size += MemoryContextMemAllocated(slot->tts_mcxt, false); + + if (mistate->cur_slots >= HEAP_MAX_BUFFERED_SLOTS || + mistate->cur_size >= HEAP_MAX_BUFFERED_BYTES) + heap_modify_buffer_flush(state); [3] v21 PATCHED: done in 2.92 s (drop tables 0.00 s, create tables 0.01 s, client-side generate 2.12 s, vacuum 0.21 s, primary keys 0.59 s). done in 2.89 s (drop tables 0.00 s, create tables 0.01 s, client-side generate 2.07 s, vacuum 0.21 s, primary keys 0.61 s). done in 2.89 s (drop tables 0.00 s, create tables 0.01 s, client-side generate 2.05 s, vacuum 0.21 s, primary keys 0.62 s). done in 2.90 s (drop tables 0.00 s, create tables 0.01 s, client-side generate 2.07 s, vacuum 0.21 s, primary keys 0.62 s). done in 2.80 s (drop tables 0.00 s, create tables 0.01 s, client-side generate 2.00 s, vacuum 0.21 s, primary keys 0.59 s). done in 2.84 s (drop tables 0.00 s, create tables 0.01 s, client-side generate 2.04 s, vacuum 0.20 s, primary keys 0.60 s). done in 2.84 s (drop tables 0.00 s, create tables 0.01 s, client-side generate 2.03 s, vacuum 0.20 s, primary keys 0.59 s). done in 2.85 s (drop tables 0.00 s, create tables 0.01 s, client-side generate 2.04 s, vacuum 0.20 s, primary keys 0.60 s). done in 3.48 s (drop tables 0.00 s, create tables 0.01 s, client-side generate 2.44 s, vacuum 0.23 s, primary keys 0.80 s). done in 3.05 s (drop tables 0.00 s, create tables 0.01 s, client-side generate 2.28 s, vacuum 0.21 s, primary keys 0.55 s). AVG done in 2.946 s -- Bharath Rupireddy PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com
Attachment
- v21-0001-Introduce-new-Table-Access-Methods-for-single-an.patch
- v21-0002-Optimize-CTAS-CMV-RMV-and-TABLE-REWRITES-with-mu.patch
- v21-0003-Optimize-INSERT-INTO-.-SELECT-with-multi-inserts.patch
- v21-0004-Optimize-Logical-Replication-apply-with-multi-in.patch
- v21-0005-Use-new-multi-insert-Table-AM-for-COPY-FROM.patch
pgsql-hackers by date: